Full status report comments


1 - Check leading and subleading AK8 eta, phi distributions for pre- and post-HEM eras of 2018 (ratio would be the best)

We find no noticeable change in the above-mentioned distributions before and after applying the HEM veto to 2018 MC events in CR. We have also checked them for SR and no large deviations have been observed.

2 - Check effect of prefiring weights on jet eta distributions

No noticeable trend is found in the above distributions. We have also checked them for SR and no large deviations have been observed.

3 - Check how many hadronic W's are left in ttbar after deltaR cut.


Total tt-bar events

W matched to sub-lead ak8

no dR cut



dR>0.8 cut applied to sub-lead ak8 jets [rel. eff.]

457.1 [62.1%]

136.8 [57.7%]

Event yields listed in the table are obtained by properly weighting and adding different eras of MC sample. We find the efficiency of dR cut to be ~42% in rejecting boosted W-jets that are coming from sub-lead ak8 jets.

4 - Why is the deltaR cut so inefficient for the signal? Does your signal contain all Z decays? Assuming you are applying the cut only with respect to the subleading jet, it seems the probability should be just the BF to b's + some factor for mistagged c's for one Z boson, which is still quite short of 50% (?)

T5 1700 sample

Total signal yield

Signal yield where the Z is matched to lead ak8 jet

Signal yield where the Z is matched to sub-lead ak8 jet

No dR cut




dR>0.8 cut on sub-lead ak8 jet




Relative efficiency




Signal yield obtained with dR cut, presented at the full status report, was misleading; we have updated it in the above table. The relative efficiency of the cut for signal is ~72%. For the second part of the question, we have verified that almost all signal events contain Z (~97% and 89% matching for lead and sub-lead ak8 jet, respectively). The Z->bb branching fraction is ~20%, another 10% comes from the contribution when Z decays to other quarks.

5 - Taking the sqrt(N_sbd) gives only a very rough estimate of the uncertainty, why not simply propagate the uncertainty on the line parameters from the fit? E.g. can be done by sampling the space of allowed lines, plotting the distribution of resulting yields, and taking 1 sigma.

Data MC

Data MC

For data Nbkg = 321.1 +/- 17.9

For MC Nbkg = 372.7 +/- 19.7

The uncertainty in the background yield in SR estimated from the toy exercise is consistent with (to be precise, a bit higher than) our previously calculated uncertainty using √NSB = 13.0 (MC) and 12.3 (data). Nevertheless, we are going to take the uncertainty from the toy exercise.

6 - Redo MC validation region plots with full background composition for the respective VR, not just the single process of interest

Z(vv)+jets topology 1-lepton topology

These MC plots are updated after adding earlier missed bkgs for 2016, 2017 and 2018.

7 - Try loosening the selection in the validation regions to get better statistics at high MET, e.g. can remove b-veto, dPhi cuts and track veto in photon and dilepton samples; also, could consider removing dPhi and track veto in 1L control sample?

Currently looking at the nTuple level in order to see how far we can loosen these cuts.

8 - Write a list of signal systematics that will be applied for final interpretations

The following list is adopted from the boosted Higgs paper (work-in progress):

MC statistics: The signal MC sample statistical uncertainty is 2-4%.

Luminosity: The recommendation for the dataset is uncertainty is 2.5% (2016), 2.3% (2017) and 2.5% (2018).

*Isolated track veto: A flat uncertainty of 2% is assigned to the signal samples to account for possible data-MC differences. Need to check.

*Trigger efficiency: The effect of uncertainty on the signal yield is about 2%. Need to check.

*Pileup reweighting: The sensitivity to the pileup (PU) distribution was studied for various benchmark signal models by comparing events with NPV < 20 (low PU) vs. NPV > 20 (high PU). Need to check.

*ISR: An ISR correction is derived from tt-bar events, with a selection requiring two leptons (electrons or muons) and two b-tagged jets, implying that any other jets in the event arise from ISR. The correction factors are 1.000, 0.920, 0.821, 0.715, 0.662, 0.561, 0.511 for NISR = 0, 1, 2, 3, 4, 5, 6+. The corrections are applied to the simulated signal jet samples with an additional normalization factor, typically 1.15 (depending on the signal model), to ensure the overall cross section of the sample remains constant. The systematic uncertainty in these corrections is chosen to be half of the deviation from unity for each correction factor. The effect on the yield ranges from 0.01%, with the largest effect at high Emiss. Need to check.

*Scales: The uncertainty is calculated using an envelope of weights from varying the renormalization and factorization scales, μR and μF, by a factor of 2.0 and 0.5. The effect on the yield of is less than 0.1%. Need to check.

Jet Energy Corrections: The jet energy corrections (JECs) are varied using the pT- and η-dependent jet energy scale uncertainties from the official database. These variations are propagated into the various jet-dependent variables, including HT, Emiss, ΔΦ(Emiss, ji). The overall effect is less than 1%. ( SUSRun2Legacy)

Jet Energy Resolution: The jet momenta in MC samples are smeared to match the jet energy resolution in data. The smearing factors are varied according to the uncertainties on the jet energy resolution measurements. These variations are propagated into the various jet-dependent variables, including HT, Emiss, ΔΦ(Emiss, ji). The overall effect ranges from 0.04% to 0.2% between 2016-2018 dataset. ( JER)

*PDFs: The LHC4PDF prescription for the uncertainty on the total cross section is included as ±1σ bands in the results plots. No additional uncertainty is considered for the uncertainty in the acceptance due to PDFs, as per SUSY group recommendation. Need to check.

9 - Have you performed a signal injection test to check for bias?

The full sig+bkg fitting is used for the previous choice of selection criteria. Currently, we are updating this part with new choice of jet mass SR and SB. Previous study can be found here:


10 - Measure ttbar in single lepton control region after removing deltaR cut

Data MC

We observe the real W peak in lead jet AK8softdrop mass distributions for ttbar samples in both data and MC, which can be modeled with the signal PDF of a Gaussian plus background PDF of first-order Chebyshev polynomial.

NSig(MC) = 502.6 +/- 10.3 (fit)

NSig(Data) = 336.7 +/- 8.4 (fit)

AN comments


***General comments:

- check that you are consistent in the text about your cuts, sometimes it says HT>500 (e.g. trigger section), then it says 400 GeV in the Hadronic baseline, similarly for other cuts; e.g. cutflow says AK8 cut is 300 GeV. Mass window is listed up to 200 in one place, but 140 later.

- in plot axis labels and captions, when plotting jets always specify if what exactly is being plotted, e.g. is it one entry per event for leading pT jet in all plots?

- lots of plots are out of order with respect to the text, which makes it hard to read.

- you have two sidebands, and call them both "sideband" most of the time, making it very difficult to follow the text when one is first getting familiar with the analysis. Please have distinct labels and define them explicitly early on.

- you focus a lot on Z, but your analysis is likely also sensitive to W bosons. It would be useful to the reader to discuss this in the text even if you only show interpretations with Z-bosons.

***Line-by-line comments

section 5.1:

- BadChargedHadronFilter should not be used esp. for events with high pT jets, see MET twiki for more info:


section 5:

table 10: add rare backgrounds

line 145: you say you are using CHS for your AK8 jets. The recommendation for pruned mass is to use PUPPI jets. Is it different for softdrop mass?

line 295:

- you should explain why the veto is w.r.t. subleading jet. Is it because the leading is usually ISR in the background?

- also, what signal topologies are you referring to here? what two jets are widely separated? Please expand on this, plots based on MC would also be helpful.

line 300: Probably very few of your events contain two vector bosons. What do you mean with this sentence: "...the selection of at least two AK8 jets 301 ensures that most of the final state events contain two Z-bosons."

figure 3: labels on the top row should be improved. The caption is also confusing. Do the left and right mean different things for the top and bottom rows? Just based on the plots in the top row, it looks like the top right is the subleading jet and the top left is the leading jet. If the caption is indeed correct why does vetoing b-jets make the pt spectrum softer? Also this doesn't seem to be baseline without pt requirements; in both plots there are no jets below 200 GeV. It might be instructive for people that aren't familiar with these tools to plot the jet mass down to 10 GeV or something.

line 312: "In particular, since most SM processes do not produce hadronically decaying bosons, the jet masses will typically be below our baseline selection of 50 GeV, even if they have large pT ." This statement is not true. W+jets and Z+jets have reasonably large cross sections and they produce a lot of hadronic decays. However, in a high-MET phase space, it is rare for MET and a hadronic Z to be present, you either have to fake MET or fake a Z-tag.

cutflow table (table 10): can you show this for each year separately?

figure 5: isn't this identical to figure 2?

section 5.3: A discussion about why you are not applying any substructure cut would be useful. People in the know will find this strange. It's definitely worth a few paragraphs and a plot to help the review go smoother.

Figure 6: first plot (without fits) could appear earlier in figure 3.

line 338: Figure 6 doesn’t show anything about ETmiss dependence.

line 342: "The figure [6] also shows that the precision of the mass fit at high EmissT is lower in the simulation because of limited statistics." -- it does? this plot just shows two different mass ranges... Do you mean figure 7?

line 349: In the 1L regions, is the lepton cleaned from the AK8 jet collection? It is not immediately obvious which choice is correct, possibly comparing the jet mass distributions in tt MC between 0L and 1L with either option could tell you which one is more similar to the SR.

About the validation regions, you need some supplemental information to justify that these validation regions are representative of the signal regions. Also, some basic data/MC comparisons for events in the sideband in the signal region and in the validation region are necessary to make sure that there aren't any egregious data quality issues.

Figure 7 is not discussed anywhere.

Section 6: figure 8 is missing.

figure 9: can you show the individual fits for different gluino masses? You say in the the text that the mean doesn't vary, but can you show the plot? What do you do with the BW parameters, are they fixed to PDG values? Why do we care about this?

line 394-6: the bias shown is compatible with 0, not that there is a 2% bias. This applies to both MC and data.

figure 10: the uncertainties here don't seem to correspond to the uncertainties in the black points. Maybe this fit didn't converge?

figure 11: did you fit these over the full range or just to the sideband? Did you draw toys from a fitted distribution or from sampling the MC directly? Is the bias on the total background predicted?

Section 6.2: Show some basic distributions to prove that these samples really are representative of your signal region events. For example, you have a statement like this: "The validation regions are enriched in the main background processes", but no plot to back this up.

How do you handle contamination from ttbar in di-lepton samples and non-prompt photons in the single photon sample. These are events that have no corresponding constribution to the signal region. What are the levels of contamination and do they depend on MET?

Figure 13: Is this just for GJets and DYJets? Do you include backgrounds (like ttbar and QCD) Can you plot the top panel on log scale? I would check that you really understand the high MET tails in the validation regions. i.e. I would compare data to MC and look for anomalous features.

Figure 14: the legend in the bottom panel says "sys on pred" for the points. Is that correct?

Figure 15: There is no fit here? Can you show the curve and then show the residuals in the bottom panel instead of data/MC? The legend in the bottom panel says "Data stat error" for the points - is that correct?

Table 12: what is going on with the last two columns?

Figure 16 is empty.

Figure 20 is empty

-Add a description of the likelihood model to the results section

-- UttiyaSarkar - 2019-10-16

Edit | Attach | Watch | Print version | History: r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r1 - 2019-10-16 - UttiyaSarkar
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback