HH multilepton (HIG-21-YYY) Review Twiki for the Run2 analysis
Higgs Extended Conveners (Rainer and Luca) comments on v3 of AN-2020/032 (21 December 2020)
Dear Ram Krishna and HH multilepton group,
thank you very much for sending this very accomplished analysis note for this complex analysis. I enclose a few comments in the following.
Best regards & a nice holiday season
Rainer
<b><span>==================</span></b>
Comments to AN-2020/032v3
<b><span>==================</span></b>
COLOR SCHEME USED IN THE TWIKI:
Black = Higgs Convener’s comments
Blue = Response to Higgs Conveners comments
Red = Andrew Brikerhoff's commnets
Green = Response to Andrew’s comments
Orange = Item on to do list.
L 394: there is no such thing as a "two-prong tau decay" because of charge conservation. This must be some kind of jargon... at least explain (maybe one track is lost?)
Yes, this decay mode is not physical but is used in the HPS algorithm in order to recover the inefficiency in identifying and reconstructing "three-prong tau decay" due to loss of a charged track. Replaced the word "decay" with "reconstruction" to clarify. All taus considered in our analysis are required to be reconstructed in either a one-prong or a three-prong tau decays mode (i.e. all taus reconstructed in the two-prong decay mode are discarded, cf. Table 5).
Sec. 5.1 ff: should triggers not be discussed separately for each channel?
The list of triggers and the trigger section in general was directly inherited from the Run-2 Legacy ttH Multi-Lepton analysis, HIG-19-008 (Sec-5.1, AN_2019_111_V14.pdf) for practical reasons (similar lepton and tau multiplicities). We have added this citation in the text, and listed the channels for each set of triggers more explicitly. The overall trigger strategy is similar enough between channels that we feel it can be dealt with all together.
L 533ff: how do you categorize signal events with one W->tau_h nu decay?
We categorize events by number of reconstructed "leptons" (electrons and muons) and hadronic taus, not explicitly by Higgs or W decay mode. Thus we count the sum total of all three HH decay modes (HH->4W/4tau/2W2tau) as "signal" in all our categories. In the 1 lepton + 3 tau channel, signal where one tau comes from a W decay (i.e. 2W2tau) make up 20% of the signal events.
L 722ff: there is probably not a one-to-one association of physical decay modes to these categories. Is it practical to compute the trigger efficiency per analysis category instead of per decay mode?
The data/MC SF for trigger efficiencies are applied as function of the multiplicity of reconstructed leptons and taus. The motivation is that the choice of triggers is based on the lepton and tau multiplicity (cf. Section 5).
In our opinion, our choice of applying the trigger efficiency SF as function of lepton and tau multiplicity has the advantage that it accounts for the correlations of uncertainties in the trigger efficiency between analysis channels (e.g. the 3l, 3l+1tau, and 4l channels use the same triggers and the uncertainty on the trigger efficiency SF is 100% correlated). There is also a practical reason in that the trigger efficiencies for each set of triggers used by us in the analysis are have been measured by groups (e.g. Tau POG, H->tautau DESY group, ttH Multi-lepton group). Our choice of triggers for the different channels and our choice of applying the trigger efficiency SF as function of lepton and tau multiplicity takes into account how the trigger efficiency SF were measured by the different groups.
Revisit. In particular, this does not address potential variation in trigger efficiency SFs for low-mass vs. high-mass signal, and whether the measurement-region kinematics match the signal region. (AWB)
The description of the data/MC SF for trigger efficiencies in AN-2020/-32 v3 were indeed a bit misleading. There is in fact a rather close relation between the analysis channel and trigger SF.
More specifically, the trigger SF are applied separately for the channels: <br/>
A) 2lss and 2l+2tau (for which the same combination and single- and double-lepton triggers is used) <br/>
B) 3l, 3l+1tau, and 4l (for which the same combination and single-, double-, and triple-lepton triggers is used) <br/>
C) 1l+3tau (for which a combination of single-lepton and lepton+tau cross-triggers is used) <br/>
D) 0l+4tau (for which the double-tau trigger is used)
Sec 6.5: is this an approved standard procedure?
Yes. This method of fitting the visible mu_Tau_h mass in Z->tau tau events in data for correcting the Tau_h energy scale is a standard approved procedure used by Tau POG and the SM H->tau tau sub-group since Run-1 SM and MSSM H->tau tau analyses. We have added a corresponding reference to the AN.
L 1102-1104: some numbers are missing here
Done.
Are V+jet backgrounds negligible?
The Z+jets and W+jets backgrounds contributes to any one of our seven analysis channels in case at least one reconstructed lepton or tau is a fake (all channels) or the charge of at least one lepton is mismeasured (Z+jets in 2lss channel). Therefore, all background contributions from Z+jets and W+jets production are covered by the data-driven background estimation procedures described in Section 7.1 and 7.2 of the AN. There is a small additional contribution from Z+jets in the 3l channel and from W+jets in the 2lss channel where an FSR photon is reconstructed as electron. This contribution is small (cf. the event yield tables on page 25 and 27 of AN-2020/032 v3) and taken from the MC simulation.
Tab. 15: it would be convenient to see an ordered table of the impacts these uncertainties have on the sensitivity (or the limits)
Added reference to the impact plots in the table. We will add numbers for the uncertainty groups when we have the new version of our datacards (O(3 days)).
L 1169ff: can you clarify, do you train one BDT per category, and for the resonant analysis one set of BDTs per mass point? Or if not, how do you avoid it... parametric BDT? It is not really obvious e.g. from L 1180 or L 1199.
We train three BDTs per category:
1. BDT for Resonant Spin-0 (X->HH) Signal hypothesis: Parametrized by "Spin-0 particle X mass" (ranging from [250, 260, 270, 280, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 1000] in GeV) leading to a total of 18 BDT Output nodes. The BDT is trained (for any given X mass) on the signal decaying into all 3 decay modes (4W, 2W2Tau, 4Tau) along with the most dominant backgrounds for that category.
2. BDT for Resonant Spin-2 (X->HH) Signal hypothesis: Parametrized by "Spin-2 particle X mass" (ranging from [250, 260, 270, 280, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 1000] in GeV) leading to a total of 18 BDT Output nodes. The BDT is trained (for any given X mass) on the signal decaying into all 3 decay modes (4W, 2W2Tau, 4Tau) along with the most dominant backgrounds for that category.
3. BDT for Non-Resonant HH Signal hypothesis: Parametrized by the 12 "EFT Benchmark scenarios" + "Standard Model scenario" (ranging from ["BM1", "BM2", "BM3", "BM4", "BM5", "BM6", "BM7", "BM8", "BM9", "BM10", "BM11", "BM12", "SM"]) leading to a total of 13 BDT Output nodes. The BDT is trained (for any given scenario) on the signal decaying into all 3 decay modes (4W, 2W2Tau, 4Tau) along with the most dominant backgrounds for that category.
The crucial difference between the parameter "Spin-0/2 particle X mass" (used in resonant BDTs) and "EFT Benchmark scenario" (used in non-resonant BDT) is that while the former is implemented in the BDT as an integer valued variable (taking only one value out of list of input signal X masses: [250, 260, 270, 280, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 1000] for a fixed signal hypothesis), the latter is implemented in the BDT as a "one-hot-encoded" variable (i.e. as a 13 component vector wherein for EFT Benchmark "X" or "BMX", only the Xth component is set to 1 and all other components are set to 0).
When you say "13-component vector", is this just equivalent to 13 bools, one for each benchmark? Also, I thought the EFT benchmarks were implemented as weights. So does a single MC signal event enter the BDT training multiple times, each time with a different weight and a different EFT benchmark bool? If so, this should be clarified both in this reply and in the AN. (AWB)
Yes the "13-component vector" is essentially equivalent to 13 bools such that for a given benchmark X, only the Xth bool is set to true (and all the rest are False).
Yes, all signal MC events are reweighted to a given EFT benchmark when the BDT trains for that particular Benchmark scenario => a signal event enters into the parametrized BDT multiple times (each time with a different weight and a different EFT benchmark bool). *This will be made clear in the AN as well.*
L 1169ff: do you introduce rate parameters for each background source which you fit together with the signal strength?
We have tested this for WZ and ZZ (those are the main BG estimated from MC). All other backgrounds are either way smaller or covered by our data driven fake background estimation. A presentation with the preliminary results is attached. The rates are compatible with 1 and the limits only change marginally. See also here. As per default we will include the CR in the fit but we will use the theory XS and uncertainties. (With the option to switch to rate parameters if needed.)
L 1208: don't you lose shape information by this rebinning which could help separating S and B and also confirm the validity of the BG model?
Some loss in shape information is indeed inevitable in channels with low background statistics. This is the price to pay in order to ensure that each bin is well populated with background (quantified by the binError/binContent < 30% condition that we describe in L1211 of the AN). The issue is that the autostats feature of the combine tool is biased in case bins are completely devoid of a relevant background due to a downward statistical fluctuation (in this case, the statistical uncertainty on the background in this bin is zero, even if the autostats feature is used).
We have checked the effect of the loss in shape information on the sensitivity of the analysis (by computing expected limits as function of the number of bins) and find that the loss in sensitivity is acceptable. Regarding the background validity, there are always a few BG dominated bins to validate the overall BG behavior. In addition to that we have control distributions for all input variables as well as the BDToutputs in our control regions. In addition to that we will provide signal region plots of our BDT input variables The preFit/postFit shapes only show what enters the fit, additionaly we can add unrebinned BDT distributions to check the data/MC agreementl if needed.
L 1211: do you take the MC statistics of the backgrounds rigorously into account, e.g. with the autostats feature of combine?
Suggestion: Yes. We added this to the AN referencing the combine feature.
Fig. 19: what data are fitted here... is it all the CR and the non-blinded part of the BDT distribution?
The goodness-of-fit (GoF) plots show the quality of the bit of all bins in the datacards for the signal region of the 7 channels (7 plots) and for the combination of all bins of all channels (8th plot).
We will add 2 additional GoF plots for the WZ and ZZ control regions to the next version of the AN.
The HH signal is left freely floating when making the GoF plots. We expect that the GoF value does not change, regardless of whether HH signal is present in the data or not.
Fig. 19: can you do a CR-only fit (without unblinding the SR)? What is the result of the rate parameters for the backgrounds?
We have fitted the rate of WZ (ZZ) background in the WZ (ZZ) control region. The value of the WZ (ZZ) cross section obtained from the CR fit is 1.026 +/- 0.0993 (1.06 +/- 0.105) times the SM expectation, i.e. the WZ and ZZ production rates obtained from the CR fit are compatible with the SM expectation within uncertainties. The GoF plots for the WZ and ZZ control regions will be included in the next version of the AN.
WZ/ZZ Control Region plots are available here:
CRUpdate.pdf
Fig. 20-21: how do you correct for the BR's in all your different categories?
Since we always look at the combination of different HH signals, we only look at the inclusive HH cross section, considering all relevant signals in each channel. The y-axis is always the inclusive H(125)H(125) cross section, assuming SM Higgs branching fractions.
Fig. 20-21:do you understand why 3l_0tau gives mostly the best sensitivity?
3l_0tau is next to 2lss the channel with the highest signal acceptance, since our analysis is mainly limited by low signal counts than high backgrounds, its no surprise, that the channels containing the most signal (and background) perform comparatively good. Judging from the signal to background ratio we expect the best limits for 1l3tau (low signal but extremely low background), 2lss and 3l_0tau (higher background but also much higher signal). Another question that arises from this, would be why 2lss is not performing as well as 3l_0tau. This can be traced back to differences in BDT performance but also to the dominant backgrounds in those channels. 3l_0tau has WZ as its most important background while 2lss is fake dominated. The fake background has an overall larger uncertainty.
Fig. 22 right: why is there no 3l_0tau curve here?
It has the 3l_0tau curve, only that it has been labelled "3l" in the legend, its mostly hidden behind 1l3tau except for the interference region. We will check the AN for consistent naming in the plots and the text.
Fig. 22 right: it would be good to use a consistent color scheme for the categories across all plots
Good point, the color scheme used here originates in the inference framework used for the combination, but we will try to to change that for the next iteration of plots.
Fig. 22 right: do you understand qualitatively why the shapes are so different across the categories?
As can be seen in figures 20-21 the sensitivity of each channel varies greatly with the mHH spectrum. This is also reflected in the non-resonant case. Extremely low/high kl scenarios correspond to regions where the triangle diagram is dominant i.e. events in the low mHH region contribute here the most and thus channels like 3l1tau and 4l_0tau perform better than in the interference region where the box diagram and thus a broad mHH contribution extending to high mHH is contributing more and thus channels like 3l_0tau give the best sensitivity. Scenarios with negative kappa_lamba have a positive interference between the box and triangle diagram and thus dependent on the sensitivity in the box/triangle region of the phase space a different performance than those with positive kappa_lambda. How extreme this difference is depends on the relative performance in the box/triangle phase space region (high/low mHH). E.g. if the performance change in mHH is flat with a comparatively good performance for low mHH (box like) like for 3l_1tau the difference between high and low kappa_lambda i.e. box + triangle and box-triangle should be smaller and that is what we see.
Fig. 25: you might consider adding a "Box" bin (with only the top quark box diagram)
We saw no need for doing this since box (i.e. kappa_lamda==0) is also included in the NLO kappa_lambda scan, thus there is no physics need for this. Including the SM however makes sense to cross check NLO vs LO performance. Since we trained on LO these two should except for XS differences mostly close. Implementing box would be some work since we are using the parametrized BDT here with evaluating always the corresponding node. It would be technically simple but take some time with little gain. A faster alternative would be to evaluate box with the BDT output corresponding to the kinematically closest BM scenario to it (BM4) or we could add in the text that the result for BM4 is closest to the box case (but it is not the same, see: https://arxiv.org/pdf/1507.02245.pdf
).
Fig. 26-28: one cannot really appreciate the data/MC agreement in absence of the systematic uncertainty bands, can you add them? At face value, there seem to be quite some mismodelings, e.g. in Fig. 26 right and Fig. 27 middle row left
Systematic band has been added to the figures. All the plots in Fig. 27 and 28 from AN version-3 are now split by years of data taking, so in the latest AN version they are named as Fig. 27 to 29 and Fig. 30 to 31.
Should re-check figure numbers with next AN version pushed to CADI. (AWB)
No cut on jet multiplicity is used in the event selection for the WZ control region. Plots in Fig. 27 to 31 can de divided into the following three categories depending upon the number of jets in events. <br/>
A) Plots of kinematic variables not related to jets (for e.g. m(3l)) are filled for all selected events. <br/>
B) Plots of kinematic variables related to one or more jets (for e.g. deltaR(lepton, jets)) are filled for events with >=1 jets. <br/>
C) m(3l+2j) plot is filled for events with >=2 jets. For events with >2 jets, a jet pair with mass closest to W-boson mass is chosen for m(3l+2j) plot.
Plots for 2016 data shows satisfactory Data/MC agreement. However, plots for 2017 and 2018 data show some Data/MC disagreement. We will discuss it in detail in the following for 2017 dataset and the same arguments, with slightly lesser severity, follow for 2018 dataset.
In 2017 dataset, the number of events having "0 jet", "1 jet", "2 jets" and "3 jets" are 2200, 1650, 750 and 300 (the top row middle plot in Fig. 27 in the latest AN version), whereas the Data/MC deficit for "<2 jets", "2 jets" and "3 jets" events are <10%, 15% and 40% respectively. Although plots from aforementioned categories 'B' and 'C' show tension in Data/MC agreements, but ">1 jets" events contribute less than 25% in the WZ control region as it is dominated by "<2 jets" events. So the overall Data/MC disagreement in WZ control region is <15%.
The mismodelling in ">=2 jets" events are yet to investigate.
Fig. 34: quite some trends in muon eta
%Green% The plots shown in the AN are a bit misleading since they are in log scale, summed over all three years and summed for all muons in the events. We have investigated this here in more detail:
MuonUpdate.pdf. We do not see big problems with the modeling of muon eta, as the only difference we see is in the b-inverted CR where the BG distribution actually is symmetric while the observed data is not. This would suggest problems with the muon reconstruction which we also should see in other plots and CR. As an additional comment, the muon eta is not entering our analysis directly, e.g. as a BDT input variable and thus is not entering the analysis result itself. The opening angles dR_ll and dR_ltau who enter the BDT seem to be modeled well enough, so a possible problem should have an neglectible influence on the analysis result. We added plots for the leading/subleding electron/muon as well as the sub-sub leading lepton to the AN replacing the current plots summed over all muons/electrons.
Fig. 94, 300 GeV: why is the signal peak so wide?
Signal is more difficult to classify since in lower mass region it is more similar to the background. In higher mass regions it is easier to classify because it is boosted.
L 1660: what is the purpose of this "oversampling"... is it supposed to act like a weight?
(We believe you refer to L1560, not L1660) Oversampling is done so that the background statistics seen by the BDTs for a fixed signal hypothesis (i.e. fixed "signal HH mass" for the resonant BDT and "EFT BenchMark Scenario" for the non-resonant BDT) remains constant. This protects against statistical fluctuations in the decision boundaries (and hence raw BDT outputs) due to random migration of background events which would otherwise have been caused if we assigned "signal HH mass" ("EFT BenchMark Scenario") to each background event randomly for the resonant (non-resonant) BDT training.
It seems he's not clear on the definition of "oversampling", so we should clarify. Also, we absolutely have to explain how we guard against over-training, when we're sending every background MC event into the BDT training 18 times (in the resonant case), as if they were distinct events! (AWB)
Fig. 114: do you understand why parameters 2 and 4 are constrained? There are also quite a few one-sided impacts
We also noticed this, it seems like we are overestimating our fake BG uncertainties. We have an overall normalization uncertainty (CMS_multilepton_fakes) that we now agreed to set to 30%, for the version of the plot this was still 50% (the more conservative estimate used in parts of ttH-multilepton, 30% is more realistic as also used in ttH-multilepton). Also an additional uncertainty for 3l1tau and 1l3tau is not yet implemented. Since all these fake BG uncertainties will be influencing each other, we will have to have another look which the next iteration of limits.
Fig. 131ff: these prefit plots are hard to read because of the hatching style
Noted down for the next iteration of these plots.
---++ Higgs PAG Conveners (Maria and Nick) comments on v3 of AN-2020/032 (12 January 2021)
<b><span>==================</span></b>
Comments to AN-2020/032v3
<b><span>==================</span></b>
- To focus the review: can you clarify for us how the techniques and methods (excluding the signal extraction) overlap with HIG-19-008 ? We understand the framework is the same: can you highlight the changes, if any, in object / background modelling etc ?
The method for estimating the fakes and the flips (in the 2lss channel) background are identical to the methods used in the ttH multilepton+tau analysis (HIG-19-008).
The main difference is that cuts on the ttH lepton MVA have been relaxed in this analysis, in order to increase the lepton efficiency.
The relaxed ttH lepton MVA cut then required us to remeasure the lepton efficiencies, the lepton fake rates, and the electron charge misidentification rate.
Besides, the jet->tau fake rates used in this analysis have been measured in Z->mumu+jets events, whereas the jet->tau fake rates used in the ttH multilepton+tau analysis were measured in tt->emu+jets events. The motivation for measuring the jet->tau fake rates in Z->mumu+jets events is that most of the jet->tau fake background in this analysis is due to light quark and gluon jets, while in the ttH multilepton+tau analysis there were also jet->tau fakes from bottom jets.
-In particular, since this is yet another analysis that uses the fake rate method to model the taus, and you only consider variations in eta and pt, please discuss why the more complicated modellings used in Htautau are not necessary. Note that so far virtually all tau analysis unblinded with full Run2 data have had to do changes to the FR approach after unblinding. Let's avoid that :). If the method is the one used and proven to work for the tau channels in HIG-19-008, just point us to the appropriate comparisons/checks done to validate for this phase space.
The fake rate method developed for the SM H->tautau analysis (HIG-15-007) needs to use a more complicated procedure for the jet->tau fake background estimation because the event statistics is much higher, which makes small non-closure effects more relevant. The event statistics in the tau channels in the ttH multilepton+tau analysis (HIG-19-008) is lower by 2-3 orders of magnitude and for the lower event statistics it is sufficient to use the simpler procedures for estimating the jet->tau fake background that were used during Run 1. The event statistics of the jet->tau fake background in the HH->multilepton analysis is smaller than in the ttH multilepton+tau analysis.
What we have done to validate the simpler procedure for estimating the jet->tau fake background that is used in the ttH multilepton+tau analysis is: <br/>
1) run MC with tight tau selection applied to get an estimate of the jet->tau fake background in the signal region <br/>
2) run MC with loose ("fakeable") tau selection applied and weight these events by the fake rates <br/>
The difference between 1 and 2 is taken as "MC closure"-related systematic uncertainty and is described in L1231 in AN-2019/111 v14. Please let us know if this answers your question.
- The analysis uses a particularly sophisticated MVA strategy (making use of multiple output nodes) and we think there is too little information about the data:MC agreement for inputs to the MVA and output of the MVA. The signal extraction section in particular is lacking in that it doesn’t show clearly what is included in the likelihood - eg you mention that event yields in dedicated control regions are included but those yields are not shown in this section (we assume its the yields from the distributions shown in the control region data:MC figures), and its not clear which control regions are directly used to constrain which backgrounds and how its put into the likelihood. We think therefore it would be useful to show the following:
1. Data:MC distributions for the inputs to the MVAs. You have the distributions in signal and background but there’s not something (for the signal selections at least, however you do show some for control regions ) showing the data (partially blinded if there are single variables which are very sensitive, such as the M_HH, though it will be important to show that in the end since part of the selling point of this analysis is the sensitivity to particular EFT scenarios from acceptance at low M_HH).
Torben, Tobias, Ram will make plots
2. For appendix K - you show the prefit plots for a couple of mass points in the resonant case (for spin-0 and spin-2) and SM signal, but can you also show them for a few benchmark scenarios in the EFT and also a few for different points in the klambda, kt, Cv … parameter space? For the kt,Cv… the background won’t change (right?) but the signal distribution will.
Torben will add plots for EFT benchmarks once new datacards are ready
3. Some examples of shape uncertainties on the backgrounds (also for the specific benchmarks being shown) for the dominant sources so that it's easier to see how the likelihood depends on the systematics.
Laurits will make plots for the 5 most shape systematics in the 2lss and 3l channels
4. Some list of the parameters allowed to float in the analysis aside from the nuisances shown in the impacts and their pre(post) fit results excluding the signal from the fit to an Asimov dataset (to see the constraints). It could be those are included in the impacts and we missed them.
- Regarding the MVA, aside from showing the distributions (1D) in data and MC as requested above, how can we be really sure of the modelling of the background in the tails of the distributions? In the tails, the control regions start to run out of statistics. Did you check things like correlation coefficients between 2 variables in data and MC backgrounds or anything else to convince yourself that everything the BDT will use is sufficiently well modeled?
The goodness-of-fit (GoF) plots (Fig. 19 in AN-2020/032 v3) are meant to quantify the level of data/MC agreement. We interpret the fact that the GoF value for the data is within the bulk of the toys distribution as evidence that there is no issue in the modelling of the BDT output distribution. We did not make plots of the 2d correlation for data and MC. Do the GoF plots answer your question or do you still want to see the 2d correlation plots ? In case you prefer to see the 2d correlation plots, we propose to make them for the 2lss and 3l channel, as these are the channels with the highest event statistics (providing about 3000 events per channel in the full Run 2 data). There will be about 50 2d correlation plots per channel.
- Minor plotting note: please modify the macros so that when there is no data you do not get a “-1” in the ratio, it is very misleading to look at the ratios as they are now. See for instance figure 63 middle, and make sure error bars are appropriate for 0 entries (not using sqrt{n} ).
We will fix the ratio plots when we include the plots into the PAS/paper. We prefer to keep the current style of these plots for the AN, since the AN is only circulated within CMS.
- On readability & presentation of the signal extraction method: there are many channels, so it is difficult to find a synthetic figure to document the signal extraction. However it is very odd to not have at least some of the figures you fit in the main body of the AN (one needs to go to the very end of the backup to see the distributions, the background proportion, etc). Furthermore, we assume you will eventually want to approve some of the plots. The main point to be fixed is what is discussed above regarding showing the performance and robustness of the method, but after that is established, a few cleaned up figures should be included in the main body and discussed there. Style wise, they also need to be reworked to be made more readable, right now they are really very difficult to interpret.
Ram will move plots from appendix to the main part of the AN and write the corresponding text. Once the new datacards are ready, Torben will make plots for non-resonant HH with SM kinematics for the full Run 2 dataset (by hadd'ing the 2016+2017+2018 datacards).
-Finally, below are some specific (although minor) comments on the AN text:
Ram will polish the text in the AN and add responses below
1. In the abstract it looks like there is an error and it should read “HH->WWWW, WWtt, tttt”
The latest AN draft has this error removed.
2. L590 - From the table its clear given the names of the variables that the invariant mass is from a pair of loose leptons, but could you explicitly add it to the sentence here for clarity?
Added it to the latest AN draft.
3. L681 - Sorry if we missed this but what is defined as a Z-boson candidate here? Is it also any 2 loose leptons with opposite sign, same flavour etc? Would be good to be specific as it is in L590
Tobias will add this in the AN
4. L887-889 - Its not clear what is being discussed here since if the electron or muon is mis-identified as a tau-h, wouldn’t it then end up in a different signal region which could then change the relative acceptance between the different categories (if this is different in MC and data, it could make the signal model incorrect)? Presumably this is in any case very very unlikely - do you have an estimate for how often this happens?
The purpose of L887-889 to make two points: </br>
A) The e->tau and mu->tau fakes are considered as irreducible backgrounds, i.e. the e->tau and mu->tau fakes are modeled using the MC simulation and are not included in the "fakes" background that is obtained from data <br/>
B) The e->tau and mu->tau fakes background is small </br>
We have rephrased the text in Section 7 of the AN to make these two points more clear. The text will be included in the next version of the AN. Please let us know in case the rephrased text is still not clear.
We find it a bit difficult to quantify what "small" means, because the contribution of e->tau and mu->tau fakes depends on the channel and also on the type of signal or background. To give you an idea of this contribution, we find that in 2-3% of HH->WWtautau events reconstructed in the 1l+3tau channel, one of the 3 reconstructed taus is due to an e->tau or to a mu->tau fake. We also find that the level of e->tau and mu->tau fakes is about one order of magnitude smaller than the level of jet->tau fakes in HH->WWtautau events reconstructed in the 1l+3tau channel.
5. L1063 - These uncertainties seem to be just propagation of JEC (ok and JES too), so isn;t this double counting the JEC uncertainties that are described in L1013? I assume these are not split into the same sources as you mention the total uncertainty. Why don’t you just propagate the same 11 sources to the MET and then use the same set of JEC nuisance parameters?
Need an explanation.
6. L1133-1136 - are these fits shown somewhere? We possibly missed them.
Siddhesh will provide these
7. L1175 - this just means that the BDT response depends on the values of the resonant mass / EFT scenario right?
Yes
8. L1196 - You aren’t extrapolating the limits right? But really you are modifying the contributions of the different signal processes (box, interference…) and then re-deriving the limits at the new point. Can
you clarify this?
Yes. We are modifying the contribution of the three (in case of vbf six) selected signal samples. The original physics model was described here: pres
the newest version of the model can be found in the inference FW here gitlab