Summary of ttH HXSWG meetings (Oct 2014-Feb 2015)

In the following we summarise some aspects that have been pointed out during the ttH HXSWG meetings and that could be pursued within the framework of the HXSWG. This document will serve as a basis for a future working group report. It will be continuously updated by the ttH conveners, including input and feedback from the whole ttH group.

Oct 20 Signal modeling in ttH (Indico)

Impact of signal modeling on ttH searches: At present MC signal simulations are not a serious source of uncertainty in ttH searches. Nevertheless a good understanding of ttH MC systematics will become relevant in the context of Top Yukawa measurements at higher luminosity. In order to prioritize the needs for future theory developments, we urge the experimental collaborations to quantify the impact of signal modelling uncertainties on the ttH signal acceptance: does it exceed the 20% level (relevant for y_t precision)? If yes, in which ttH analysis? What are the most relevant observables?

Shape uncertainties: thee tail of the pT(ttH) and pT(tt) distributions as well as the eta(ttH), N(jets), and HT(jets) distributions show significant shape discrepancies (20% and beyond) between NLO+PS predictions based on different matching methods, scale choices, and parton showers. Such dependencies should be significantly alleviated using NLO merging methods (FxFx in aMC@NLO/Madgraph5 and MEPS@NLO in Sherpa+OpenLoops).

Scale choices: Two conventional scale choices (a “fixed” and a “dynamic” one) are used in ttH MC simulations within ATLAS. Dynamic scale choices are preferable at large pT, and in general it might be useful to recommend (a set of?) standard scale choices and appropriate prescriptions for shape uncertainty estimates based on different scale choices.

Uncertainty estimates at NLO: methodological aspects related to theory scale choices and uncertainty estimates, especially in the framework of new NLO+PS and NLO merging methods, should be discussed in the framework of the HXSWG.

Official input parameters: Standard input parameters and PDFs for the ttH signal have been requested. A list of input parameters recommended by the HXSWG can be found here. At present Mt=172.5+-2.5 GeV (and no MH value) is recommended. These recommendations might change in the future. No recommendation at present for the electroweak input scheme to be used in the top-mass/top-Yukawa relation. (For ttH also recommendations on the MC modeling of Higgs decays might be useful).

Importance of Jet activity:In order to assess the possible need of theory improvements in the modelling of QCD radiation, we urge the experimental collaborations to assess the relative importance of extra jet emissions, i.e. ttH+1,2,3 jets events, in the framework of specific ttH analyses. Such events could play an important role if jets resulting from top/Higgs decays are often out of acceptance.

Minimal prerequisites for reliable ttH modeling

  • NLO+PS precision

  • spin correlated top decays (off-shell top decays through smearing of on-shell tops)

Recent and ongoing theory developments

  • NLO merging for ttH+0,1 jets is available in Madgraph5_aMC@NLO and Sherpa (in combination with OpenLoops or with code by Dawson, Reina, Wackeroth). Stefan Hoeche and collaborators offered Sherpa support to ATLAS and CMS

  • weak corrections are available at parton level in Madgraph5_aMC@NLO; extension to full EW corrections ongoing. Relevant for boosted regime (-8% correction)

Future theory developments (in tentative order of priority)

  • NLO top/Higgs decays

  • ttH signal/background interferences

Nov 3 Backgrounds and uncertainties in experimental ttH, H-->bb searches (Indico)

General considerations: tt+jets MC modeling (especially tt+b-jets) is a dominant source of uncertainty in ttH(bb) analyses at 7+8 TeV . Run1 analyses are either based on LO ME+PS or inclusive NLO+PS MC, and in both cases the formal accuracy of tt+jets final states is only LO. On the one hand, tt+jets MC uncertainties should be reduced by means of state-of-the art NLO simulations. On the other hand, given the significant impact of MC uncertainties even at NLO, their estimate requires a transparent and theoretically motivated methodolgy. The issue of MC uncertainties is intimately connected to the methodology employed in the experimental analysis (jet-flavour categorisation, top-pT reweighting, other data-driven procedures,...) and to the subtle interplay between various levels of MC simulation (matrix elements, shower,...). In this context, as a starting point, it is highly desirable to identify and understand all essential aspects (theoretical and experimental) that are relevant for MC uncertainty estimates in ATLAS/CMS analyses, and to document them in a precise and transparent language that could facilitate the exchange between theory and experiment.

In the following we propose a first synthesis of tt+jets MC uncertainty issues emerged from the meeting. This includes also a detailed description of top-reweighting (in ATLAS) and other informations that have been collected after the meeting.

tt+jets categorisation for Monte Carlo uncertainty (MCU) estimate: tt+jets MC samples are split into a certain number of independent subsamples (tt+light-flavour, ttb, ttbb, ttc,...) that are defined in terms of the numbers of b- and c-jets (Nb,Nc) and/or the total number of jets (Nj). Top-decay products are typically not considered in this categorisation, and ATLAS/CMS employ different subsamples and different definitions of Nb,Nc,Nj (see the descriptions for what was used in Run 1 here). The various subsamples can be obtained from a single inclusive tt+jets generator or using dedicated generators for certain subsamples (e.g. for tt+b-jets). It is highly desirable that both experiments adopt a common categorisation approach, based on a proposal from the theory community. This requires a precise definition of:

  • Nb, Nc, Nj: which simulation level (MEs, shower, hadronication, detector)? which definition of flavour jets? What are the relevant pT-thresholds and cuts?
  • a definition (in terms of Nb, Nc, Nj) of the most appropriate subsamples that require independent MCUs

This standard definition should be as simple as possible and should allow for a consistent assesment of MCUs (ideally with a clean separation of perturbative/non-perturbative effects). It should also facilitate comparisons among the various MC tools on the market.

Treatment of MCUs in experimental fits Normalisation and shape variations for each tt+jets subsample are represented in terms of independent nuisance parameters that are fitted to data together with the signal strength. Each theory uncertainty enters the fit as a prior distribution for the related nuisance parameter, and various MCUs (like the normalisation of tt+light-jets) are strongly reduced when MC predictions are fitted to data. Typically tt+HF subsamples feature the largest post-fit uncertainties. Moreover, due the limited shape separation between the small ttH(bb) signal and the large tt+HF background, the fit tends to constrain only their combination, which is dominated by tt+HF, while the signal component remains poorly constrainted.

MCU estimates in CMS, using inclusive LO ME+PS tt+jets sample (Madgraph):

  • normalisation and uncertainty of total ttbar+X cross section from NNLO
  • ad-hoc 50% rate unc. for ttbb, ttb, ttcc subsamples (uncorrelated)
  • factor-2 ren and fact scale uncertainties for subsamples with different parton multiplicity (uncorrelated): weights of events originating from tt+n-parton matrix elements are varied as alphaS^n(Q) at LO keeping fixed the total rate of tt+X => impacts shape of Nj distribution; scalings simultaneously applied in the shower to adjust for variations in the amount of ISR/FSR
  • DATA/MC reweighting of top-pT (impacts shape of leading-jet and lepton pT): a top-pT dependent correction factor K(pT) is introduced, such that MC(x)=MC*x*K(pT) yields agreement with data at x=1 for the inclusive top-pT distribution. The nuisance parameter x is varied in the range [0,2]. This induces a 20% correction and MCU in the boosted-top regime.
  • No additional merging-scale variations are applied
  • tt+c-jets contributions:* for tt+c-jets (20% of background in signal region) a dedicated NLO simulation would be desirable (not yet available)

Dominant sources of MCU in CMS: ttbb rate, top-pT reweighting, ttb and ttcc rates, MC statistics

MCU estimate in ATLAS using NLO+PS (Powheg+Pythia), ME+PS (Madgraph) and S-MC@NLO ttbb (Sherpa+OpenLoops) samples:

  • normalisation and uncertainty of total ttbar+X cross section from NNLO
  • ad-hoc uncorrelated 50% uncertainties for inclusive tt+b and tt+c cross sections
  • DATA/MC reweighting of inclusive distributions in ttbar-pT (yields correct Njet distribution) and top-pT (to correct other shapes) is applied to all tt+jets subsamples (including tt+HF). In this context, MC is compared to unfolded data, which involve a significant dependence on Pythia (and even on the employed tool) and on the related uncertainties. See more details below.
  • tt+b-jets MC predictions and uncertanties are obtained by reweighting the inclusive NLO+PS sample with a dedicated S-MC@NLO ttbb sample in the 4F scheme; in this context, various tt+b-jets subsamples (see slides) that allow for a consistent matching of the two samples are used; an independent and differential reweighting is applied to each subsample; MCUs are taken from variations of PDFS, ren/fact scales (factor-2 and kinematical), and shower parameters in the S-MC@NLO ttbb sample.
  • the employed tt+b-jets categorisation is based on the number of reconstructed b-jets at particle level (MC truth, after hadronisation). It involves pT-thresholds for B-hadrons and b-jets. Consistent matching is ensured by removing b-jets from UE and top-decay showering.
  • comparisons of NLO+PS, ME+PS and S-MC@NLO ttbb are used as a sanity check: S-MC@NLO features an excess in subsamples with “merged HF jets” (more b-hadrons in a jet). Here one should keep in mind that, in the inclusive NLO+PS and ME+PS simulations, b-quarks in tt+b-jet subsamples originate mostly from the shower (unless a small merging scale is used).
  • All comparisons in the ATLAS talk involve reweighted Powheg/Magraph+Phythia predictions, while Sherpa+OpenLoops is not reweighted: it’s a first principle NLO MC prediction. Top/ttbar-pT reweighting significantly improves the agreement with the S-MC@NLO ttbb prediction.
  • tt+c-jets contributions:* for tt+c-jets (20% of background in signal region) a dedicated NLO simulation would be desirable (not yet available)

Dominant sources of uncertainties in ATLAS: ttbb rate, top- and ttbar-pT reweighting, ttcc rate (MC statistics is also an issue)

Top reweighting and related systematics. To compensate for the mismodeling of the top and ttbar pT distributions, MC simulations are reweighted with a pT-dependent correction factor derived from data. The reweighting is applied at the level of the unfolded top-pT distribution(s), which are derived from “data” using a migration matrix obtained from “pseudo data”. In the following, as an illustration of top-reweighting (and related systematics) we sketch the approach employed by ATLAS (see While ATLAS performs a double-differential reweighting of top- and ttbar-pT, here we consider only top-pT. The nominal tt+jets MC sample, generated with Powheg+Pythia, is passed through detector simulation and is used to determine a reconstructed top-pT distribution (pseudo-data). The relation between the top-pT distribution in pseudo-data and the corresponding distribution at MC truth level is encoded in the migration matrix. More precisely, MC truth corresponds to the top-pT in showered (or non showered?) parton-level ttbar events within the Powheg+Pythia.

The migration matrix is supposed to describe pT-distortions resulting from detector-smearing and acceptance cuts, and is also sensitive to QCD radiation effects due to the different QCD-radiation dependence of the top-pT at MC-truth and reconstruction level. The reconstructed top-pT is obtained from a kinematic likelihood fit on the events, where the jets/lepton/missing ET are fitted to the ttbar hypothesis and the different permutation of jets are checked. Events with low likelihood are cut out to remove non-ttbar background. For the events passing the cut the permutation with highest likelihood is taken and the hadronic top-pT and leptonic top-pT are extracted. The reconstructed top-pT is typically more sensitive to QCD radiation (and related uncertainty) wrt the MC-truth top-pT.

Finally, using the Powheg+Pythia based migration matrix, the top-pT distribution reconstructed from real data is converted into an unfolded top-pT distribution. The latter is used to reweight the Powheg+Pythia pT-distribution at MC-truth parton level by a factor rw(x_i,pT)=f(x_i,pT)/MC(pT), where MC(pT) is the MC prediction, while f(x_i,pT) denoted the unfolded distribution. The variables x_i=(x_1,x_2,...) parametrise the dependence of the migration matrix on the various relevant uncertainties, and x_i=0 corresponds to the nominal prediction.

Each independent experimental uncertainty (b-tagging, jet energy scale, etc.) is described by a corresponding x_i variation and a related variation of rw(x_i,pT). All sources of uncertainty could be in principle propagated to the full simulation (MC + x_i dependent reweighting + detector simulation + x_i dependent top reconstruction) in a correlated way, in such a way that x_i variations tend to cancel in the reconstructed top-pT distribution, and the latter always agrees with data within statistical uncertainties (note that the final ttH(bb) fit is based on reconstruction level!). However, since top-reweighting is based on 7 TeV data (and related detector calibration + uncertainties), x_i variations in top-reweighting and top-reconstruction (at 8 TeV) cannot be correlated. In practice, the nominal reconstruction (w.o.) x_i variations is always employed. This tends to overestimate x_i uncertainties.

The MC Generator uncertainty is encoded in a modified reweighting rw’(x_i,pT)=f’(x_i,pT)/MC(pT), where unfolded data f’(x_i,pT) are based on a migration matrix obtained from an alternative generator (MC@NLO). This reweighting is defined for (and applied to) the default MC prediction, MC(pT). The uncertainty associated to the rw-rw’ difference is not correlated since the alternative generator is never used for the simulation.

The ISR/FSR systematic is evaluated in a completely different way. Pseudo-data generated with two MC simulations with ISR/FSR variations (up/down) are unfolded with the nominal migration matrix, and the relative effect with respect to the central MC prediction is used as a systematic. Such ISR/FSR variations shift the unfolded top-pT(ttbar-pT) distribution by about 5%(15%). These variations are not correlated with corresponding ISR/FSR uncertainties of the tt+jets MC sample (for which only the nominal Pythia settings are used). Thus they do not cancel out when the reweighted sample is passed through detector simulation and the tops are reconstructed.

The reweighting of the inclusive top-pT distribution (and related uncertainties) is applied to all tt+n-jet subsamples in a fully correlated way. In particular, also tt+HF final states are reweighted with the same top-pT correction factor. This procedure is supported by the observation (in ATLAS MC studies) that, in tt+b-jets subsamples, reweighted Powheg+Pythia and Madgraph+Pythia predictions for the top/ttbar-pT distributions are in better agreement with S-MC@NLO ttbb wrt non-reweighted ones.

Electroweak contributions to ttbb: it was pointed out that pp->ttbb might receive significant tree-level EW contributions of order alpha^2*alphaS^2. This should be checked.

Nov 10 Theory perspectives on tt+jets and tt+HF production (Indico)

(1) Overview

The meeting focused on theoretical tools for the calculation of top+antitop pair production with light and heavy-flavor jets (here generically denoted by tt+jets and tt+HF), in particular b-quark jets (here denoted by tt+b jets). Both tt+jets and tt+b jets represent the most limiting background in the detection of a Higgs boson produced with a pair of top+antitop when the Higgs boson decays into a bottom+antibottom pair (i.e. ttH, H->bb).

We had three talks from the three collaborations that have most recently studied these processes and addressed the issue of defining the nature and size of the theoretical systematic uncertainty intrinsic to the tools used in the calculation and to the observables chosen for the comparison with data.

The talks/speakers and main focus of each talk were:

  • PowHel (speaker: Zoltan Trocsanyi) : tt+2b jets (NLO+PS with mb=0)
  • Sherpa+Openloops (speaker: Frank Siegert): tt+jets (MEPS@NLO merging of 0j,1j,2j) and tt+>=1b jets (NLO+PS with mb>0)
  • Madgraph5_aMC@NLO (speaker: Rikkert Frederix): tt+jets ( FxFx and UNLOPS NLO merging)
All three collaborations (PowHel, Sherpa+OpenLoops, and Madgraph5_aMC@NLO) calculate the production of tt+jets or tt+b jets interfacing the exact NLO QCD matrix elements for the corresponding parton-level subprocesses with a parton-shower Monte Carlo (PSMC). In PowHel, the matching with PS is based on the POWHEG method, while Madraph5_aMC@NLO and Sherpa+OpenLoops use (different implementations of) the MC@NLO matching method. In the Sherpa implementation, which is referred to as S-MC@NLO, NLO matrix elements are matched to the Sherpa parton shower, while Powheg and MG5_aMC@NLO simulations are based on Pythia or Herwig.

Various sources of theoretical uncertainties are present at the level of matrix elements (MEs), parton showers (PS), and in the procedures used to match and merge these two ingredients. Uncertainties from missing higher-order QCD corrections in the MEs are typically assessed through variations of the renormalisation and factorisation scales. Besides the usual factor-two variations of such scales, for multi-particle and multi-scale processes also different choices of dynamical scale should be considered. Additional uncertainties due to the PDFs and to possible approximations in the MEs (e.g. setting mb=0) need also to be included.

Moreover, the fact that the shower generates extra light and heavy jets leads to non trivial issues when Monte-Carlo simulations are applied to specific event categories based on the multiplicity of light- and heavy-flavour jets. In particular, depending on the event category, the choice of generation cuts and possible restrictions at ME level can lead to an insufficient level of Monte-Carlo inclusiveness, thereby hampering consistent comparisons among different simulations and against data.

The situation after the matching with a PSMC and multi-jet merging becomes more complicated. On the one hand, new uncertainties related to the choice of the shower starting scale and the merging scale are involved. In the context of matching/merging, one should also keep in mind that, for technical reasons, at the moment renormalisation scale variations cannot be applied to jets that are emitted by the parton-shower. Uncertainties related to jet emissions are thus typically underestimated on the paton-shower side, while the merging approach provides, at the same time, a more reliable description and a more conservative uncertainty estimate for jet emissions.

(2) tt+jets

Sherpa+Openloops and Madgraph5_aMC@NLO have presented NLO results for tt+jets merging that include, respectively, up to 2 jets and 1 jet at NLO. These simulations are based on pp -> tt+0j, 1j, ... NLO matrix elements matched to a PSMC and consistently merged in a way that avoids double counting.

Multi-jet merging is essential to an accurate description of tt+jets and to a more accurate assessment of the theoretical systematic uncertainty. It was noted however that in current measurements often the comparison is made with inclusive NLO+PS samples, where ME information stops at the level of tt+1j LO MEs, while state-of-the art NLO merging can push the theoretical accuracy up to tt+2j NLO and tt+3j LO MEs.

A careful assessment of the uncertainty from PSMC is missing and comparisons are still made with prescriptions that seem to work but are not motivated on the basis of more careful studies nor by some more well defined theoretical reasons (e.g.: "Powheg with 'hdamp' describe the data best", or similar).

Multi-jet merging at NLO is expected to lead to a drastic reduction of theoretical uncertainties related to the choice of the resummation scale (shower starting scale). Merged simulations depend also on a merging scale: a jet-resolution measure, which is used to separate regions of phase space that are populated by NLO accurate MEs and by the PS, respectively. On the one hand, the lower the merging scale, the more phase space associated with jet emissions is described in terms of NLO accurate MEs. Ideally, one would chose the merging scale such that objects which are resolved as separate jets in the experimental analysis are always simulated with NLO accuracy. However, generic theoretical considerations suggest that a too low merging scale can lead to uncontrolled logarithmic effects, which are formally beyond the PS accuracy but can become numerically important. A precise quantitiative analysis of this issue is not yet available for tt+jets. Such studies would be very valuable in order to converge towards a well motivated prescription for the choice of the merging scale and related uncertainty estimates.

This is an area that needs improvement and it is very important that now different tools (Sherpa+Openloops and Madgraph5_aMC@NLO) are available and can be used for comparison. Recently, OpenLoops MEs became available at

(3) tt+b jets, more specifically tt+2b jets

Powhel and Openloops+Sherpa presented their studies of tt+b-jets. Madgraph5_aMC@NLO did not show specific results, but discussed the kinds of issues that need to be addressed in order to be able to consistently implement this kind of processes in a PSMC matched with NLO MEs and achieve a more reliable control of the theoretical systematics.

From a theoretical point of view, the problems encountered in the study of tt+b jets are very important and very challenging. They are being addressed for the case of b jets at the moment, and they could be even more serious for c jets (both very important for the study of ttH as we learned from the experimental talks presented in this working group on Nov. 3rd). The understanding of the b-jet case will be therefore extremely beneficial to the description of a whole family of hadronic processes (not only tt+HF) that involves b-jets and c-jets.

Tools like Powhel, Openloops+Sherpa, Madgraph5_aMC@NLO, and any other that will be made available in the future, are the state-of-the-art tools that we need for such studies. Thanks to these tools, interesting aspects of processes like tt+b-jets are emerging and the theoretical description of these processes is fast improving. However, it is still difficult at the moment to fully assess the theoretical systematic uncertainty intrinsic to these prediction.

In particular, we have heard in these talks about that the following issues need to be clarified/improved:

  • (3.a) scale uncertainty: As already known from parton-level studies, the choice of a dynamical scale for both renomalization and factorization scales leads to a better perturbative behavior of the NLO cross section (both total xs and distributions). Powhel and Sherpa+OpenLoops adopt different dynamical-scale choices. The Powhel choice is based on HT and aims at keeping the scale always hard and larger than mt, while Sherpa+OpenLoops uses a CKKW-inspired prescription for the renormalisation scale, such that the scale for b-quark emissions is adapted to their respective pT, while the factorisation and resummation scales are kept harder.

    Both collaborations (Powhel and Sherpa+Openloops) estimate the residual error from scale dependence in the 25-35% range, roughly, which is quite sizable for a NLO QCD calculation. This scale uncertainty is dominated by renormalisation-scale variations, and remain at a similar level when NLO MEs are matched to the parton shower.

  • (3.b) PDF uncertainty: The Powhel collaboration estimates an error of 10% from PDF, based on a study of the CTEQ, MSTW, and NNPDF sets.
  • (3.c) PSMC uncertainty: By matching with Pythia-6 and Pythia-8 the Powhel collaboration estimates a PSMC uncertainty of 10%. Studies of this kind should be expanded, to understand the origin of the systematic, compare with Herwig++, and compare under different conditions (cuts, etc.). Also the systematic uncertainty reletd to the choice of the so-called “hdamp” parameter, which plays a similar role as the shower starting scale in the MC@NLO method, should be investigated.
  • (3.d) effect of t-quark decays: Both collaboration can include the decay of the final-state top quarks accounting (approximately) for spin-correlation effects. As confirmed by the Powhel collaboration, spin correlations can have a substantial effect and should always be included.
  • (3.e) massless vs massive b quarks: The Sherpa+Openloops calculation uses the 4FNS (4 flavor number scheme) where b-quarks massive and are not present in the proton: they arise form g->bb splittings.

    Since collinear g->bb singularities are regularised by the finite b-mass, this approach permits to cover the entire b-quark phase space at the level of NLO MEs, resulting into a fully inclusive description of final states with tt+>=1 b-jets, i.e. including also tt+1b-jet. At NLO+PS level, for ttbb final states with two hard b-jets, the two b-jets are usually arising from the ttbb MEs. However there are also configurations where one b-jet arises from a (rather) collinear bbbar pair in the MEs and the second one is generated through a (rather) collinear g->bb splitting by the PS. The Sherpa+Openloops studies emphasize that the effect of such “double g->bb splitttings” can be quite sizable, especially for m_bb>100 GeV, and need more systematic attention. In the 4FNS the frist g->bb splitting is entirely described by NLO MEs, while the second one can only be simulated at the level of accuracy of the PS. Given its potentially high impact in the signal region, this mechanism and the related uncertainties should be studied in more detail.

    The Powhel ttbb calculation is performed in the 5FNS, where mb=0. The presence of a g->bb collinear singularity at m_bb->0 requires appropriate generation cuts that restrict NLO MEs to a phase space region with sufficiently hard and well separated b-quarks. In particular, it requires an explicit or implicit generation cut on m_bb. In principle, as far as this cut is very low, one would expect little impact on physical observables characterised by large m_bb. However, contributions of the type of the above-mentioned double-splitting mechanism should occur also in a 5F NLO+PS ttbb calculation, and such contributions should be strongly sensitive to the choice of the the m_bb generation cut (they are formally singular in the limit of vanishnig cut). The numerical impact of these issues remains to be investigated. In any case, within the 5FNS, the natural solution to this problem is provided by multi-jet merging, where the singular ME description of collinear bbbar pairs is replaced by the regular PS description below a certain merging scale. This automatically requires the merging of tt+jets MEs wih different b-quark and light-jet multiplicity.

    Actually 5FNS NLO simulations based on multi-jet merging for tt+0,1,2 jets provide a natural alternative (to the 4FNS) for a complete description of tt+b-jet final states with one or more b-jets (and of course also for tt+light-jets). Singularities at ME level are avoided by the presence of a merging cut, and double-counting issues between matrix elements and parton shower are also automatically avoided. There are thus two internally consistent formalism for NLO simulation of tt+>=1b-jets based on the 5FNS (NLO tt+jets merging) and 4FNS (NLO+PS ttbb). The open questions are: How do they compare, and which is the best description for tt+b-jets: 5FNS or 4FNS? Is there a fully consistent prescription to combine a 4FNS simulation of tt+b-jets with a 5FNS (NLO merged) simulation for the rest of the tt+jets phase space without double counting? Is there a “hybrid approach” that permits to avoid the drawbacks of the 4FNS (no resummation of initial-state g->bb splittings) and 5FNS (no Mb effects) is a consistent way? How easely can this be implemented in a NLO PSMC?

(4) Preliminary recommendations to the experiments and to theory

It is difficult at the moment to provide the experiments with a recommendation that satisfactorily matches the complexity of experimental analyses. New NLO accurate theory predictions are emerging for the different populations of events used in the analyses, which differentiate on the basis of Nj, Nb, and even Nc (notice that theory has not provided results for tt+c jets yet). However these new results need to be scrutinized in more detail in order to establish a solid basis for a recommendation. Still, a preliminary recommendation can go as follow:

  • (4.a) We recommend that the experiments acquire experience with the generation of tt+jet samples using and comparing the various NLO multi-jet merging techniques and tools on the market, i.e. FxFx /UNLOPS with MG5_aMC@NLO and MEPS@NLO with Sherpa+OpenLoops, possibly in strict collaboration with their main authors. Such simulations are fully inclusive and can be used both for light- and heavy-flavour final states. Given the high technical complexity of tt+2jets at NLO, in a first phase we recommend to restrict these investigations to tt+0,1-jets merging.
  • (4.b) For an improved description of tt+b-jets final states, we recommend that, using the above tools, the experiments acquire experience with the simulation of ttbb in the 4FNS. Such simulations can be applied to the analysis of tt+b-jet subsamples that involve one or more b-jets in addition to the two b-jets that arise from top decays.
  • (4.c) We recommend that the authors of the relevant tools agree on a standard setup (input parameters, choices and variations of the various technical scales, observables) to be used for a consistent comparison of the different tools and methods. To this end we will circulate a detailed proposal in early January. Based on this setup, we strongy encourage the theory collaborations to produce benchmark results that should be presented and discussed within the HXSWG and will serve as a technical reference for simulations within the experiments.
  • (4.d) We encourage, theory an experiment, to present and document MC studies in a transparent way, i.e. providing the full list of (default and user-defined) parameter choices and the considered variations. Based on quantitative studies, it is important to converge towards a satisfatory understanding of the uncertainties related to the NLO matching and merging procedures (resummation and merging scales) and to arrive at a global and widely accepted prescription for the choice of the related scales and for their variation. Understanding the intrinsic accuracy of the different methods/tools is a central goal that should provide, in the mid-term, the basis for an HXSWG recommendation for physics simluations.
  • (4.e) It is crucial to explore the issues explained in point (3) above and to understand how to consistently provide results for Nb=1 and Nb=2. The same will apply to Nc=1 and Nc=2. The problem of jets made of a (bb) pairs from g->bb splitting is very important and ongoing studies will put it on firmer ground.
We would like these goals to be achieved in the context of this working group, which will offer the natural ground for coordination and discussion.

Nov 24 Backgrounds and uncertainties in ttH, H-->gamma gamma (Indico)

(1) Overview

The production of a Higgs boson with a top-antitop quark pair followed by the decay of the Higgs boson into two photons (here denoted as ttH, H->2 photons) can have a cleaner signature than the analogous production followed by the Higgs boson decaying into a bottom-antibottom quark pair (ttH, H->bb). Being statistically limited, the relevance of (ttH, H->2 photons) will increase in Run II, where it will play a crucial role in constraining the top-quark Yukawa coupling.

Experimentally this channel is studied using signatures that involve two hard photons and a variable number of leptons and jets, together with some missing energy, depending on the decays of the top and antitop quarks produced with the Higgs boson. Irreducible backgrounds in all these searches are processes in which hard photons are produced with a top-antitop pair or single top, i.e. processes like: t+photon, t+2 photons, top+antitop+photon (tt+photon), top+antitop+2 photons (tt+2 photons), as well as processes involving a combination of photons and jets. Thanks to state-of-the art theory tools, these background processes can be described at NLO QCD, including matching to a parton-shower Monte Carlo (PSMC). This level of precision can benefit the separation of signal from background, and is essential for the understanding of a series of kinematic distributions that may not always involve a clear peak like the two-photon invariant-mass distribution. However, the Run 1 analyses at CMS and ATLAS have not yet used these state-of-the-art tools; this will be a significant addition to these analyses in Run 2.

In this meeting we had two experimental talks (one from ATLAS and one from CMS) that summarized ongoing analyses of ttH, H->2 photons, and one theory talk (by the Powhel collaboration) that discussed the nature of theoretical systematic uncertainties particularly on the prediction of tt+2 photons.

(2) Experimental summary

The ATLAS analysis is mainly aimed at minimizing the tt+photons continuum, as a result the latter does not give a major contribution to the systematic uncertainty, and one can circumvent the need of MC simulations for the tt+diphotons continuum by exploiting a side-band approach, as in the case of the inclusive Higgs → diphotons analysis,. On the other hand, the ATLAS analysis receives a significant contamination from other (non ttH) Higgs channels. Heavy-flavour contributions to inclusive H+jets production represent the dominant source of systematic uncertainty.

In the case of the CMS analysis the situation is different. This analysis aims at the minimisation of the contamination from other Higgs channels, and the role of the tt+diphoton continuum is more important. In particular, the CMS analysis will require reliable MC predictions for tt+1,2 photons and photons+jets. It was pointed out that such background simulations would benefit also other analyses, i.e. not only ttH.

(3) Theory summary

The Powhel collaboration has interfaced the NLO QCD calculation of both ttH(H-> 2 photons), tt+photon, and tt+2 photons with parton-showers (PS) using the Powheg-Box environment. The code is not publicly available through the Powheg-Box distribution, but files of events for comparison with experimental measurements have been produced and can be obtained through the Powhel collaboration itself.

A key issue in implementing a process with hard photons is how to define the photon isolation in such a way that it is infrared safe and at the same time as close as possible to the experimental definition of photon isolation. The photon isolation criterium implements the need to limit the hadronic activity around a photon in order to select photons from the original hard scattering process and not from the following jet activity (electromagnetic quark/antiquark radiation). Since the isolation procedure tends to cut into the phase space of partons surrounding a photon it may interfere with the cancellation of infrared divergences due to soft or collinear partons that appear in both the virtual corrections and the real radiation at a given QCD perturbative order. Two solutions have been proposed over time, namely a "fixed cone" isolation, which limits the parton activity in a fixed-size cone around a photon and reabsorb the parton infrared dynamics in that phase-space region into fragmentation functions, and a "smooth cone" isolation that avoids the use of fragmentation functions by defining an isolation cone that continuously shrinks to the smallest necessary size to allow for all the infrared radiation to be properly included. Theoretically, the "smooth cone" isolation allows for a “cleaner” prediction in the sense that it does not involve the non-perturbative component intrinsic to the fragmentation functions. Experimentally, however, the "smooth cone" isolation cannot be implemented because it would imply an infinite resolution, and the "fixed cone" isolation is preferred.

At the level of NLO QCD fixed order calculations, it is standard to compare the implementations of both isolation prescriptions and determine how much they affect the theoretical predictions. In the case of tt+photon and tt+2 photons the two prescriptions seem to give very compatible results.

However, when the NLO QCD fixed order calculation of these processes is inserted in a PSMC and interfaced with PS, one needs to apply an isolation prescription both at the level of generation cuts as well as on the final products of the showering and hadronization processes, and that can only coincide with the experimental fixed-cone isolation prescription (with some hadronic leakage to be taken from the specifics of the experimental analyses).

The Powhel collaboration has therefore implemented two level of isolation cuts: i) a first level of isolation cuts as part of the generation cuts of the real-emission processes, at the LHE level, which can be implemented either using a fixed- or a smooth-cone prescription; ii) a second level of isolation cuts applied at the level of final state hadrons, i.e. after running PS and hadronization on the LHE of the process, which are implemented in strict accordance with the corresponding experimental cuts.

They observe that:

i) using the dynamical scale H_T/2 the scale dependence of the tt+2-photon cross section is reduced from 30% at LO to 15% at NLO; ii) It is possible to obtain a realistic description of isolated photons without fragmentation functions. To this end one can apply, at the level of generation cuts, a loose photon isolation based on the smooth cone prescription. Physical results after full parton showering turn out to be (a) independent of the technical parameters associated with such a smooth isolation and (b) quantitatively consistent with the case where generation cuts are based on a loose fixed-cone isolation including fragmentation contributions.

(4) Recommendations:

  • Public tools. It is desirable to complement the Powhel samples with simulations based on publicly available tools such as Madgraph5_aMC@NLO, public Powheg-box, Sherpa+OpenLoops (or Sherpa+other one-loop providers).

  • Merging multi-photon emissions. Similarly as jets, also photons can be “produced” either at matrix-element level or through QED emissions in the parton shower (if the parton shower includes QED effects). In particular, final states with two photons can arise from matrix elements with n=0,1,2 photons in combination with (2-n) photon emissions from the parton shower. This leads to possible double-counting problems that should be addressed with an approach analogous to multi-jet merging. We recommend to investigate the relevance of this issue in the context of tt+2-photon simulations.

  • Photons from top decays. In addition to the need of merging photon emissions that originate from matrix elements and from the parton shower, one should consider also photon emissions that arise from the top-decay products. In principle one should thus consider an inclusive sample with tt+0,1,2 photons, where photon emissions are consistently merged with the parton shower independently of top decays, and then one should allow for additional photon emissions from top decays. In this respect we recommend to assess the importance (and the related theoretical uncertainty) of background contributions arising from photons that are emitted from top decays. This should be done by taking into account realistic photon isolation requirements (wrt jets and leptons) as in the ATLAS and CMS analyses.

  • Theory systematics. All sources of theory systematics in NLO+PS simulations of signal and background should be assessed in detail (including PS effects, hadronization effects, etc.) in the environments of ATLAS/CMS analyses. In this context one should assess the relative benefits of Monte Carlo predictions versus data-driven determinations of the tt+photons background.

Dec 1 Backgrounds and uncertainties in ttH, H-->multileptons (Indico)

(1) Overview of Decay Channels

The ttH(H->multilepton) channel targets ttH production with at least two leptons in the final state, where the Higgs decays to either WW, ZZ , or tautau, however mostly the WW decay mode is studied. Several final state signatures are considered that select at least one lepton from the decaying ttbar system a second lepton from the Higgs decay. These signatures include:

  • 2 same-sign leptons + b-jets
  • 3 leptons (with no resonant Z->ll)+ b-jets
  • 4 leptons (other than H->ZZ->4l – no resonant Z->ll)+ b-jets

Depending on the analysis, one or two b-tags are required. A link to the public CMS paper can be found here: arXiv:1408.1682 .

(2) Background Estimation and Uncertainties

The background contamination and the related uncertainty play a central role in the multilepton analyses. Background estimates are obtained from both MC driven and data driven methods. Run1 analyses were largely based on MC simulations with LO multi-jet merging (without spin correlations). Backgrounds that are estimated from MC include the ttV (V=W or Z/gamma*) and Diboson (WZ in 3l and ZZ in 4l) background processes. In the case of the diboson backgrounds, both are estimated using MC, and they are normalized to data in a control region where the Z-veto is inverted and no b-jet tags are required. Backgrounds arising from fake leptons or mistagged b-jets are obtained using a data-driven method, where MC simulations provide shape information for the extrapolation to the signal region. The largest contribution to these fake lepton backgrounds is ttbar production.

For backgrounds estimated from MC, the uncertainties on the cross sections and PDFs are taken into account. The MC modelling uncertainties are estimated via variations of the normalization and factorization scales, as well as the threshold between ME and PS. These uncertainties are considerably large in both experiments. Furthermore, both experiments are estimating the uncertainties due to MC modelling in a similar manner.

ttW and ttZ/gamma* are, besides tt+fake leptons, the dominant source of background, and they play a critical role for the ttH sensitivity. The ttW normalisation is one of the leading sources of systematic uncertainty, and it will not be possible to reduce this uncertainty with data in the mid-term future. A precise desciption of ttW+jets is crucial, since ttW contributes to the signal region only in combinaion with extra jets. In particular, it is important to include up to 2 extra jets since, for di-lepton signatures, ttWjj has the same signature as ttH(H->WW) with W->jets. At present, merging-scale varations represent a large source of ttW+jets uncertainty in the signal region. Due to its very small cross section, the impact of the irreducible ttVV background turns out to be negligible.

In the ATLAS talk it was pointedt out that off-shell effects in ttV production and decay can be important. In particular, tt+gamma*(gamma*->ll) can deliver a significant background contribution (in the 2- and 3-lepton channels) and its modeling, down to very small photon virtuality, is nontrivial. Also top decays could be a significant source of such gamma*->ll contributions.

It was also pointed out that, when only a single b-tag is required, also tV processes with a single top can deliver a significant background contribution. In this context, possible double counting at NLO between processes like ttV, tVV, tV etc. should be avoided in a systematic way.

For Diboson production, a fit to data is made in a control region with no b-jets in order to estimate the VV rate. However the VV background remains difficult to control since VV is produced with at least two b-jets (VV+bb) in the signal region. So far, for the extrapolation from the control to the signal region and for the estimate of the related MC uncertainty, various generators have been used and compared (for example Sherpa explicitly including HF and Powheg+Pythia6 excluding HF). They show compatible agreement. However, this is one of the limitations of the analysis. Typical errors on the additional HF contribution are about ~ 30 %.

The leading source of VV backgorund is ZW production. Besides genuine ZW+b-jets production, the ZW+HF background receives a significant (~50%) contribution from ZW+light/charm jets with mistags. This calls for an inclusive simulation of ZW+jets production. In any case, the overall impact of the ZW background in the signal region is subdominant.

Fake leptons are estimated using data in a control region that has the same sources of fakes in the signal region (e.g. jet-misID, B hadron non-prompt lepton decays). Uncertainties on the fake rate normalization are about 40 (60) % for inclusive (at least 2 b-tag) events. Further shape uncertainties are also derived.

(3) Theory Tools

At NLO, scale uncertainties for ttV(V=W,Z,gamma) are around 10%, and PDF uncertainties around 8%. Predictions for ttV at NLO+PS accuracy, based on the Powheg method, have been available for a while, both in the form of publications and event samples by the Powhel collaboration. Besides NLO+PS, the tools presented at this meeting, Madgraph5_aMC@NLO and Sherpa+OpenLoops, support also the simulation of ttV+0,1 jets with multi-jet merging at NLO. These tools differ both in the employed parton showers (Pythia/Herwig and Sherpa, respectively) and in the NLO merging methods (FxFx/UNLOPS and MEPS@NLO, respectively). Simulations of VV+0,1,2 jets with NLO merging are also possible. Both tools support top-quark, W, and Z decays at LO accuracy including spin correlations. In the case of ttZ, off-shell effects (Z/gamma*->leptons) can be included via simulation of the full tt+2-leptons process.

Aspects that should be further developed and discussed include: the possibility to apply NLO merging in presence of massive b-quarks (for VV+b-jets); precise quantitiative prescriptions to asses uncertainties related to the merging scale, parton-shower starting scale (resummation scale) and other MC-related uncertainties.

(4) Recommendations for Run-II:

  • Backgrounds from ttV and VV+bb are difficult to constrain from data directly, and MC estimates and uncertainties are heavily used. The HXSWG should converge towards a recommendation for the proper treatment of all relevant sources of uncertainty, especially for what concerns the technical scales (merging, resummation) that are relevant for the new NLO merging techniques. Such recommendations should be applied in a uniform way between experiments. The availability of such a recommendation would be highly valuable for all ttH signal and background simulations and more generally for any MC simulation.

  • For ttV+jets and VV+jets simulations the usage of the new tools based on NLO mergig is recommended. To start with, up to one jet should be included at NLO. At the same time we urge the experiments to quantify the need of multi-jet simulations with more than one jet. Clear indications, from experimental side, on the kinematic observables that require accurate shape uncertainty estimates would also be very useful.

  • Spin correlations should always be included in any decay. In the case of the ttZ background, the presence of a Z-veto in the experiental analyses calls for an off-shell treatment, i.e. for a full simulation of tt+dilepton production, including off-shell Z/gamma contributions.

  • The quantitiative importance of tV backgrounds with a single top quark in the ATLAS and CMS analyses should be assessed in more detail.

Jan 12 ttH Combination: Systematics and correlations (Indico)

This meeting was mostly informational, focused on describing the technical details of how the systematics and correlations among systematics are handled in the ttH combinations executed at CMS and ATLAS. No recommendations or controversial points were identified.

Important future considerations include planning for the eventual LHC combination of ttH results, as well as the incorporation of ttH results in the full combination on all Higgs results at the LHC. The details of these plans were not known at the time of the meeting. We note that combined results, and their ultimate relevance to the Higgs couplings to fermions and bosons, falls also under the domain of WG2 of the HXSWG (Higgs properties).

Jan 26 Signal modeling in tHq (Indico)

Feb 2 Backgrounds and uncertainties in tHq (Indico)

Feb 6 Common meeting on tt+b-jet backgrounds to ttH(bb) (Indico)

1) Discussion on harmonization between ATLAS and CMS for 4FS modelling for tt+bb

→ Both ATLAS and CMS are looking into using 4FS MC as the nominal modelling for tt+bb

To first order, it does not matter if it’s a re-weighting or a replacement from the inclusive samples used for the nominal modelling scheme. We cannot move forward on discussing systematics without this first point. If one collaboration does not use the 4F - there is no way to make a recommendation on correlation of 4F systematics

2) Function for smooth transition between tt+0b and tt+>=1b. This we are in favour of having (given the current state of analyses). There is a danger of constraining the tt+bb normalization uncertainty using the regions with 0 additional reconstructed b-jets. Whether we apply a particle jet pT cut of 15 or 20 GeV, this will have almost no impact, though applying 15 GeV vs 0 GeV … this will impact the result in certain fit configurations especially with correlation tt+bb systematics. The less we use the CR in the fit - as CMS is already doing - the less this will matter, though this is the safest approach. Moving to only 4b-tags in the fit model - then this is a non-issue.

During the last public meeting it was pointed out that other sources of TH systematics are much bigger. The leading ones is the 4F-4F comparison. It was also pointed out (from ATLAS side) that the reweighting approach could be plagued by a sizable ”residual 5F systematics” (corresponding to variations in the nominal 5F sample before reweighting). However such residual 5F uncertainty is still dominated by MC statistics and might turn out to be irrelevant.

3) Systematics discussion 3a) 4F vs 4F comparison: Do the differences make sense? Regarding MG5 scale changes, we can follow-up on the experiments side to check how things change relative to what we used in the YR4 and hopefully something meaningful will come out.

TO-DO: Recommendation for validation of new resummation scale HT/2 in MG5: compare against YR4 results using same settings Ask MG5 authors for gridpack production for both ATLAS and CMS Keep in mind: YR4 settings are not recommended for physics analysis

3b) 4F systematics: correlation scheme discussion.

→ There are no theoretical differences in 4F calculations between the different tt+>=1b sub-categories. The sub-categories only exist because of experimental cuts/selection on jets. Since we can model down to 0 GeV or close-by b-quark emission, these theory modelling systematics are smooth between the sub-categories tt+b/tt+2b for example, and for this reason the systematics should be correlated in the 4F case. → Keep in mind: both 5F and 4F simulations are both dominated by g->bb splittings, which supports correlated systematics. Note also that merged 5F tt+jets simulations are dominated by g->bb splittings → One should include all relevant uncertainties related to shower g->bb splittings

-- StefanoPozzorini - 27 Oct 2014

This topic: LHCPhysics > LHCHXSWGTTH > LHCHWGTthMeetingsSummary
Topic revision: r13 - 2017-05-15 - StefanG
This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback