Review of B2G -AN17-247
Documentation
Color code for answers to reviewer questions:
- Green -- we agree, changes to analysis/documentations implemented.
- Lime -- we agree, but the item hasn't been done yet. (Open item.)
- Red -- we disagree, changes to analysis/documentation is not implemented.
- Teal -- we agree, but we don't think any change to analysis/documentation is needed.
- Blue-- authors/ARC/conveners need to discuss. (Open item.)
AN-2017-247 (Di-leptonic)
Documentation
Explicit green-lights from experts
Category |
Name |
Status |
Conveners |
|
Not done. |
PPD |
|
Not done. |
GEN |
|
Not done. |
TRIG |
|
Not done. |
EGM |
|
Not done. |
MUO |
|
Not done. |
TAU |
|
Not done. |
JME |
|
Not done. |
BTV |
|
Not done. |
TRK |
|
Not done. |
STAT |
|
Not done. |
Group Review
Comments from Devdatta (12/18/17) on AN-2017-247 V4
Is the ANv4 updated with the responses? I do not see Table 8 in the new AN.
Yes, the analysis note is updated with the responses. The Table 7 shows the same
information as Table 8 (below in the response), except the test results correspond to statistical
error only, and therefore present most pessimistic outcome. This is now explicitly stated in
Table 7 caption. We can, of course, include Table 8 in the next version of AN if desired.
On using ST vs mass, it should be driven by your sensitivity, evaluated using the full stat+syst errors. Did you check that?
The sensitivity study, the results of which are shown in Figure 17, already includes full stat+syst errors.
I did not understand he response
"Our approach is to include background region together with the signal region in the limit derivation procedure. This way, background control region helps to constraint both, background normalization and shape."
--> Do you mean to say that both regions are signal regions, or are you saying that you will use the DR > 2 region to constrain your data/MC in the signal region?
The expected limits are obtained using six signal and three background categories.
The six signal categories are [Boosted (DR<1), Non-boosted (1<DR<2)] x [mm, ee, em]. The three
background categories are [DR>2] x [mm,ee,em]. All these 9 categories are used in the fit, results of
which are shown in Section 10. Including background categories help to constrain data/MC in the signal region.
Comments from Jim (12/03/17) on AN-2017-247 V3
The object section (Section 4) generally requires more detailed information in order to be reviewed quickly by the object experts. Some examples:
Section 4.1: Reference is made to the general muon ID twiki but no reference is made to the SF twiki or to the specific SF file version numbers
https://twiki.cern.ch/twiki/bin/view/CMS/MuonReferenceEffsRun2
https://twiki.cern.ch/twiki/bin/view/CMS/MuonWorkInProgressAndPagResults
https://gaperrin.web.cern.ch/gaperrin/tnp/TnP2016/2016Data_Moriond2017_6_12_16/JSON/RunBCDEF/EfficienciesAndSF_BCDEF.root
https://gaperrin.web.cern.ch/gaperrin/tnp/TnP2016/2016Data_Moriond2017_6_12_16/JSON/RunGH/EfficienciesAndSF_GH.root
Section 4.2:
L176 The electron SF are updated on occasion. The specific SF used should be identified or defined.
Section 4.3:
L185 Jet ID definitions also change so perhaps reference the specific twiki revision used (https://twiki.cern.ch/twiki/bin/view/CMS/JetID13TeVRun2016?rev=9)
L198 I would also give the specific global tag or db file names for the JEC
Section 4.4
Please give the b-tag SF file name (CSVv2_Moriond17_B_H.csv)
L217. Which b-tag SF method listed here was used https://twiki.cern.ch/twiki/bin/viewauth/CMS/BTagSFMethods? Please provide some details.
Done. Detailed information has been added for all object ID and SFs to the AN v4.
Comments from Devdatta (12/04/17) on AN-2017-247 V3
- On the background strategy, like we discussed in the meeting, I am still slightly worried about the closure. For, e.g. 18 (upper left and right) shows some discrepancy, which could be because of the DY component, or single top component? Same can be seen in your numbers in Table 8. Also, can you confirm that the ttjets/ DY ratio is the same for DR < 2 and DR > 2 regions?
Tests results shown in Table 8 are performed using histograms with statistical errors only.
If we include both statistical and systematic errors, the numbers improve considerably, and indicate that
the deviations are covered by systematic uncertainties:
Here is TTjets/DY/Stop composition (in %) of the background in the signal (DR<2) and background
(DR>2) region.
|
mumu |
ee |
emu |
DR < 2 |
90.4 / 6.8 / 2.7 |
91.4 / 5.2 / 1.4 |
95.8 / 0.2 / 3.7 |
DR > 2 |
80.6 / 12.6 / 6.1 |
79.7 / 13.3 / 6.3 |
92.0 / 0.5 / 6.9 |
- Fig 8 (upper left) is lepton and not jet?
Fixed.
- Line 322: "We use ST variable to extract limits on heavy tt resonance production cross section. "
If the mass variable performs as well, why choose ST? The ST is has more dependence on the top pt reweighting than the mass.
Data-MC agreement improves with top pt reweighting for the mass variable too.
However, it's true the reweighting benefits ST more than it benefits the mass variable.
We are open to using the latter in the fits.
- Fig 17, 21: Why does the Z' of 10% width loose sensitivity faster than the other models at high masses?
The observed faster loss in sensitivity for Z' of 10% can be explained by the lineshape of the
resonance mass shown below. Z' of 1% retains resonance structure for all mass values up to 5 TeV. For
Z' of 30% width, loss of resonant structure already occurs at e.g. 4 TeV, lineshape of which is similar to the
5 TeV one. Whereas for Z' of 10% width, there is quantitatively large change from 4 TeV to 5 TeV.
- On the statistical treatment, we discussed in the meeting the possibility to constrain the background normalization using the DR > 2 region. Did you try this out? This may improve your background prediction and make you less reliant on the Monte Carlo.
Our approach is to include background region together with the signal region in the limit derivation procedure.
This way, background control region helps to constraint both, background normalization and shape.
Comments at B2G RES meeting 11/24/17 (talk
)
How are three ttbar MC samples combined? Is care taken to correctly normalize each mass region?
We use three ttbar POWHEG samples listed on p5 of the presentation: 1) Inclusive TT, 2) TT_Mtt_700to1000, 3) TT_Mtt_1000toInf.
To cover Mttbar<700 GeV region, we use events from sample 1) applying Mttbar<700 GeV cut at the parton level. The cross-section used to normalize
the obtained sample is 831.8-80.5-21.3=730 pb. To cover 700 GeV < Mttbar < 1000 GeV region, we use combination of sample 2) and events from sample
1) passing 700 GeV < Mttbar < 1000 GeV cut at the parton level. The cross-section used to normalize
this mass region is 80.5 pb. To cover Mttbar > 1000 GeV region, we use combination of sample 3) and events from 1) passing
Mttbar>1000 GeV cut at the parton level. The cross-section used to normalize
this mass region is 21.3 pb.
Show distributions of pt^rel to illustrate choice of pt^rel>15 GeV cuts on leptons
The plots below show pt^rel distributions for the leading and subleading leptons in three channels. With pt^rel>10 GeV
cut, we see slight excess of data over bkgd at low pt^rel end, indicative of presence of small QCD contamination. Therefore pt^rel>15 GeV
seems safer choice against QCD background.
pt^rel > 10 GeV
|
mumu |
ee |
emu |
Leading lepton |
|
|
|
Subleading lepton |
|
|
|
pt^rel > 15 GeV
|
mumu |
ee |
emu |
Leading lepton |
|
|
|
Subleading lepton |
|
|
|
Including original Control Regions in the note.
Done. AN17-247 v3 has now original background Control Regions, CR1 and CR2 described in Appendices.
Is top pt reweighting necessary?
Our observation is that top pt reweighting does have sizable impact on data/MC agreement. This is discussed in dedicated Section 5 of the AN.
We have also performed statistical tests for the distributions before and after reweighting. Results of the statistical tests, which support
these observations, are shown below.
Perform tests with additional relative normalization factor between background and signal region in the statistical interpretation.
Done. The results of the test are shown below. Expected limits are stable and Nuisance behavior with Asimov data does not reveal any special features. We have to, of course, see how data behaves after unblinding.
No additional relative normalization factor between signal and background regions.
With additional relative normalization factor (revnorm) between signal and background regions.
Comments from Annapaola (On AN-17-047_v0)
l 61-69: The recipe from TOP PAG is valid only for top pt < . Do you apply the reweighting beyond this point? If so, how do you do it? From the text I assume that the reweighting is applied to nominal distributions and used for this search. However I could not find plots comparing the behaviour of MC vs data with and without this correction. Please, can you add these information to the text and explained why you do want to apply it for your search, instead of considering an uncertainty on it?
Yes, the recipe is valid for top pt <800 GeV. We apply reweighting
above this value too. However, less than 0.5% of ttbar events have top pt>800 GeV.
The top pt reweighting is our default, and now it is described in dedicated Section 5,
which shows distributions with and without top pt reweighting applied. Observation is
that the reweighting improves data/MC agreement.
l 81-82: in which steps of mass signal samples are produced?
For Z' signal with 1% width and 10% width, as well as for KK gluon signal, samples are produced in mass range of 500 GeV to 5 TeV with 500 GeV steps.
There are also additional samples at 750 GeV and 1250 GeV mass points. For 30% width Z', samples are produced at mass points of 1 TeV, 2 TeV, 4 TeV and 5 TeV.
Signal MC samples are listed in Table 3 of the AN.
l 102: “at least one reconstructed good primary vertex”
Done.
Figure 2: Please, can you show the distribution before reweighting is applied for comparison?
The distributions below show nPV in the preselection sample for emu channel before and after PU reweighting. The ee and emu channels
show the similar trend: reweighting improves data/MC agreement.
No PU reweighing |
With PU reweighting, using sigma(MB)=69.2 mb |
|
|
l155-156: How was optimised the 2D cut?
The 2D cut is chosen such that the sample is QCD background free. Going
below pt^rel=15 GeV lets QCD events in our preselection sample.
l159: In which way this uncertainty is assigned? Does it come from POG recommendations?
Muon trigger and ID and uncertainties of 0.5% and 1% are taken from MUON POG twiki
MUON POG twiki
We don't apply "standard" isolation requirement, rather use 2D requirement: deltaR(l,jet)>0.4 or
ptrel>15 GeV. To this 2D requirement, we associate 1%/lepton uncertainty which is less than
typical 0.5% recommended for "standard" isolation requirement.
l 195: please, can you explicitly add the list of cuts employed? This will help out with the review
Done. Section 4 lists preselection cuts in more concise manner.
l 253: It is not clear from the text, what is the reason to apply such a criteria to assign the event to one category or the other. Please, can you elaborate more on this?
Events with three leptons of mixed flavor mme (mee) can pass selection of mm (ee) and em channels.
In order not to double count such events, we have to make a decision and assign these of only one channel.
We choose to make assignment to em channel, since this channel has larger branching fraction
compared to mm or em channels separately.
l 270- 276: Please , can you expand a bit more this section? For instance, how is it calculated exactly the Mass variable?
Done. Mass variable is more explicitly described on Section 5.
Section 4.1: Can you, please, explain better how the agreement between data and MC in these CRs can assure us that we have good modelling of backgrounds? The phase space is actually quite different wrt signal region.
We have redefined our background CR. Now we start from the same preselection for signal and background
but employ sumDeltaR variable to define signal-enriched and background-control regions. This is described in
Sections 6 and 7 of the updated AN.
l 303: Where do these numbers come from? Please, can you add a reference or an explanation?
16% and 15% uncertainties on cross-section for the single top and diboson productions
are based on CMS measurements of these processes. Corresponding references have been
added to the description in Secction 8.
Are you using muon and electron trigger, ID and isolation uncertainties provided by POGs?
Muon trigger and ID and uncertainties of 0.5% and 1% are taken from
MUON POG twiki.
Electron trigger HLT_DoubleEle33_CaloIdL efficiency SF is derived in this analysis (Section of the AN) and associated systematic
uncertainty of 2% (1% per electron leg) is extracted based on the observed statistical errors and on SF variation vs. deltaR(l, jet).
Electron ID uncertainty of 1% is larger than the statistical errors on SFs provided by EGamma POG
l251: what do you mean by “taking RMS deviations in the acceptance and physics observables?”
PDF uncertainty procedure uses the +/- RMS (68%CL envelope) of the 100 MC replica weights
as up and down shift per recommendation in PPD talk by J. Bendavid
l 376: As above, I do not understand if the reweighting is applied as default. Also, SFs are defined by the POG only in a certain pt range, what do you do beyond?
Now text explicitly states (in Sections 5 and 8 of AN) that the reweighting is the default. As explained above, less that 0.5% events
fall in the region above pT=800 GeV. We apply reweighting for these 0.5% events too.
l 407: In the event selection section it was not clear that you were defining Delta R as the sum of two delta R, please, can you elaborate on this in the text? Also, I think there is a typo here, because the two addends are identical.
This is fixed now. sumDeltaR is defined as the sum of two deltaRs: minimum deltaR between the leading lepton and it's closest jet + minimum deltaR between the subleading lepton and it's closest jet. The variable is described in Section 6.
Now that you have decided which is the best variable for the final statistical analysis, how do you plan to exploit the other variables for the selection?
In updated version of AN, we are apply cut on sumDeltaR variable and study sensitivity of ST and mass variables.
A general comment: as discussed during the presentations of the analysis at the RES meetings, it would be good to think about a data-drive estimation of backgrounds, you can think about using your CRs and additional regions to perform a simultaneous fit for instance.
We now have redefined background control region. We also use background CR in simultaneous fit together with the signal region.
This is described in Section 9 of the updated AN.
-- IaIashvili - 2017-11-21