- Review comments and responses for SMP-16-019
- Helen Heath 7 June 2018
- Isabel Josa 9 May 2017 on v15 of PAS and additional material
- Public additional material
- Kenneth Long, May 5 on v13 of PAS
- Isabel Josa 4 May 2017 on v13 of PAS
- Approval talk requests, April 27
- Gabriella Pasztor, April 11 on v10 of PAS
- Aram Apyan 10 Apr 2016
- Isabel Josa 10 Apr 2016
- Darien Wood 4 Apr 2017 on v9 of PAS
- Gabriella Pasztor 4 Apr 2017 on v9 of PAS
- Isabel Josa 3 Apr 2017 on v9 of PAS
- Gabriella Pasztor 3 Apr 2017
- Pietro Vischia for Stat. Com. 14 March 2017
- Gabriella Pasztor 6 March 2017
- Isabel Josa Mutuberria 5 March 2017 part II, AN-17-002
- Aram Apyan 3 March 2017 PAS v7
- Isabel Josa Mutuberria 2 March 2017 part II, AN-17-002
- Isabel Josa Mutuberria 2 March 2017, AN-17-002
- Isabel Josa Mutuberria 28 Feb 2017, PAS v.7
- Darien Wood, 27 Feb 2017, PAS v.7
- Pre-approval remarks, 8 Feb 2017
- Senka Djuric and Chia Ming Kuo, 24 Jan 2017 on PAS v2, AN2017_002 v4
- Senka Djuric, 22 Jan 2017 on PAS v2, AN2017_002 v4
- Chia Ming Kuo, 20 Jan 2017, on AN-2016/331 v4
- Chia Ming Kuo, 20 Jan 2017, on AN-2017/002 v3
- Senka Djuric, 17 Jan 2017, on PAS V1, AN-2016/331 v4, AN-2017/002 v3
Review comments and responses for SMP-16-019
Response deemed complete and sufficient
Work in progress
Follow-up comments
ToDo
Helen Heath 7 June 2018
Title I'd suggest "of Z boson" -> "for Z boson" (as in first line of abstract)
Changed
L9 "associate" -> "associated"
Changed
L22 "to the ZZ" -> "to ZZ"
Changed
L26 "in a single" -> "at a single"
Done
L27 "instrumental for"-> "instrumental in"
Done
L30 "13 TeV data, which are"-> "13 TeV data set, which is"
"which are being collected in Run II of LHC" we are not supposed to refer to RunII in a paper without defining it. Could you just omit this?
otherwise replace Run II with a description of what it means and insert "the" before LHC
removed "which is being collected in Run II of LHC."
Remove commented part
L115 I would assume a "pair" is two "matching leptons" so using "pair" is confusing when you go on to say they can be different flavors. Why not just say
"require the presence of two loosely isolated leptons."
or does "pair" imply opposite charge in which case say that.
Changed
L118 "request" -> "requirement"
Changed
move comma from before "and" to before "for the 13 TeV"
Changed
line after L146 "for the inclusion" -> "for inclusion"
Changed
"we require each lepton track to have the ratio between the impact parameter computed in three dimensions, with respect to the primary vertex, and its uncertainty to be less than 4"
I 'm having difficulty understanding this sentence. Does it mean
"we require that the ratio of impact parameter for the track to the uncertainty on the impact parameter is less than 4."
You refer to longitudinal and transverse impact parameters elsewhere without definition so it's not clear to me that you need to say "computed in three dimensions" here and it does make the sentence had to read. If 3D is necessary perhaps split into two sentences
"in order to suppress electrons from .. hadrons, we place a requirement on the impact parameter computed in three dimensions. We require that the ratio..."
Changed. Not so beautiful.
L204 remove "down"
Changed
L215 "the simulation" You describe quite a lot of simulations. I think this might need to be a bit more specific.
We described the MCs used in the 3. We think it's fine like this.
L221 "systematic uncertainty source "-> "source of systematic uncertainty"
Changed
L224 "The details are illustrated in the following". I don't think you need this but if you feel it's necessary change "illustrated in the following" to something like "described in the rest of this section"
Removed.
L226 "its effect" -> "it"
Changed
L227 "on the final" -> "in the final"
Changed
L230 "uncertainties on" -> "uncertainties in"
Changed
L233 "PU" isn't defined here suggest define it on L111 when it's first used and then use afterwards - particularly as you use it in equation 1. Alternatively replace PU here with "pileup"
replaced
L235 "uncertainty contribution" -> "contribution to the uncertainty"
Changed
L260 "uncertainties on" -> "uncertainties in"
Changed
L261 "uncertainty on" -> "uncertainty in"
Changed
L264 insert space before "channels"
Changed
L269 -> "and both the transverse momentum and pseudorapidity of the pt-leading jet" and a similar change on L270
It would "inrefere" with the pt definition. We prefer to keep it as it is.
L302 What does "that" refer to? Suggest "The theoretical uncertainties also include the uncertainties in the PDF and alpha_S" (note "uncertainty in" not "uncertainty on" as in pub com guidelines)
Changed
L336 -> " for a center-of-mass energy of 8 (13) TeV"
Changed
L339 "pseudorapidity separation" -> "separation in pseudorapidity"
Changed
L348 no hyphen in "key distributions"
Changed
Isabel Josa 9 May 2017 on v15 of PAS and additional material
Do not use any Latex defined script in the abstract, change \Lumi for
plain LateX commands.
Fixed.
L 232 The systematic uncertainty for the trigger efficiency are
evaluated --> is evaluated ??
Fixed.
Figure 5 (caption) (subleading jet distributions).
Fixed.
It is written ... for Njets >= 1. Shouldn´t it be for Njets >=2 ??
Fixed.
Last sentence. it is written ... the pT-leading jet transverse momentum
and pseudorapidity respectively.
Shouldn´t it be ... the pT-subleading jet ...
Fixed.
I think PAS documents do not include an Acknowledgment section.
- Captions for plots showing pseudorapidity distributions. In fact they
are showing the absolute value of the pseudorapidity of the jets and the
absolute value of the difference in pseudorapidity.
Can you update the caption accordingly (reflecting they are absolute
values).
Fixed.
Table with the 9 events with BDT > 0.92: Do we need two decimal figures in the table, 365.83, 91.36, 101.11... ?
I think it would be ok with just one 365.8, 91.4, 101.1... except for
the BDT score (0.97 ...) obviously.
Changed.
pTrel., jets and pTrel., had plots, can you move the axis title a bit
lower, now it is just touching the axis values label at 0.9 ? Just to
make it perfect, if not, it is ok.
Fixed
Do we need the comma in the axis title ? p_T^rel., jets vs p_T^rel. jets
? idem for had.
We changed notation to the one used in the PAS: its now
and
.
Simulation plot effS vs effB. Do we need a Preliminary label in there ?
I don´t know about simulation plots.
It is suggested in the PubComm guidelines for figures. It has been added.
Public additional material
Distribution of reconstructed multiplicity of jets with
. Points represent data, shaded histograms represent Monte Carlo predictions and background estimate while the hatched band on them represent systematic uncertainty on the prediction. Reducible background is obtained with data driven method.
Distribution of reconstructed
-leading jet transverse momentum. Points represent data, shaded histograms represent Monte Carlo predictions and background estimate while the hatched band on them represent systematic uncertainty on the prediction. Reducible background is obtained with data driven method.
Distribution of reconstructed
-leading jet pseudorapidity. Points represent data, shaded histograms represent Monte Carlo predictions and background estimate while the hatched band on them represent systematic uncertainty on the prediction. Reducible background is obtained with data driven method.
Distribution of reconstructed
-subleading jet transverse momentum. Points represent data, shaded histograms represent Monte Carlo predictions and background estimate while the hatched band on them represent systematic uncertainty on the prediction. Reducible background is obtained with data driven method.
Distribution of reconstructed
-subleading jet pseudorapidity. Points represent data, shaded histograms represent Monte Carlo predictions and background estimate while the hatched band on them represent systematic uncertainty on the prediction. Reducible background is obtained with data driven method.
Distribution of reconstructed invariant mass of the two
-leading jets. Points represent data, shaded histograms represent Monte Carlo predictions and background estimate while the hatched band on them represent systematic uncertainty on the prediction. Reducible background is obtained with data driven method.
Distribution of reconstructed pseudorapidity separation between the two
-leading jets. Points represent data, shaded histograms represent Monte Carlo predictions and background estimate while the hatched band on them represent systematic uncertainty on the prediction. Reducible background is obtained with data driven method.
The following figure displays a real proton-proton collision event at 13 TeV in the CMS detector in which two high-energy electrons (light blue lines), two high-energy muons (red lines), and two high-energy hadronic jets (dark green cones) are observed. The presence of two opposite-sign same-flavour lepton pairs with mass close to the Z mass, of two hadronic jets in opposite hemispheres of the detector with a large pseudorapidity separation, as well as the absence of hadronic activity in the central region of the detector, are indicative of the electroweak production of two Z bosons and two jets.
Selected kinematic properties of signal-like events with BDT score > 0.9 observed in the data.
Distribution of the Zeppenfeld variable of the leading Z boson,
, for events passing the ZZjj selection, which requires
> 100 GeV. Points represent the data, filled histograms the expected signal and background contributions.
Distribution of the Zeppenfeld variable of the subleading Z boson,
, for events passing the ZZjj selection, which requires
> 100 GeV. Points represent the data, filled histograms the expected signal and background contributions.
Distribution of the event balance observable for events passing the ZZjj selection, which requires
> 100 GeV. Points represent the data, filled histograms the expected signal and background contributions.
Distribution of the ratio between the
of the dijet system and the scalar sum of the tagging jets’
for events passing the ZZjj selection, which requires
> 100 GeV. Points represent the data, filled histograms the expected signal and background contributions.
Signal versus background efficiency curves of the boosted decision tree (BDT) and matrix element likelihood (MELA) classifiers for separating the electroweak from the QCD-induced production of the
final state. The efficiency of a cut-based selection on the dijet mass and dijet pseudorapidity separation is also shown.
Kenneth Long, May 5 on v13 of PAS
I checked and indeed all the samples use Pythia v8.2
Fixed.
I don't see any reference for POWHEG. It should be:
ZZ, WZ and W+W- production, including Gamma/Z interference, singly
resonant contributions and interference for identical leptons, T. Melia, P.
Nason, R. Rontsch, G. Zanderighi, JHEP 1111 (2011) 078, arXiv:1107.5051
P. Nason, JHEP 0411 (2004) 040, hep-ph/0409146 [paper]
S. Frixione, P. Nason and C. Oleari, JHEP 0711 (2007) 070, arXiv:0709.2092
[paper]
S. Alioli, P. Nason, C. Oleari and E. Re, JHEP 1006 (2010) 043,
arXiv:1002.2581 [paper]
Fixed.
Line 23 and 24: pair —> pairs seems more correct to me.
Fixed.
Line 25: in associations —> in association
Fixed.
Line 101: I don't see any comment about how you verified that MG and
Phantom are in good agreement and I think it would be useful.
Line 173: An algorithm “based on…”
Line 194: I think a minute rewording here would make it more clear.
Maybe just include a statement like “If a pairing that gives Z
candidates with m_{ll} in [60, 120] GeV, the event is accepted” before
saying how the Z1 and Z2 are chosen.
Line 208: I don’t understand why “mostly” is used here. I thought the
definition of irreducible backgrounds was having 4 prompt leptons.
228-229: “are functions” or “is a function”
Fixed.
234-235: “on the cross section” and “affecting the differential cross
section” seems redundant to me
***242-243: I don’t think “following the PDF4LHC prescription” is an
accurate statement. Don’t they come from NNPDF3.0 only?
Fixed.
264: It seems more natural to me to use the “+” sign than to right the
word “plus”
Changed.
282: Maybe mention that you use final state leptons and prompt
photons. Also could be worth explicitly using the word “dressed”
Isabel Josa 4 May 2017 on v13 of PAS
- New Pythia8 label in plots misspelled (Pyhtia instead of Pythia)
Fixed.
- Table 1 (syst. uncert.) Do we need the % symbol in the two items that
do not apply in the normalized distributions (trigger and lumi) ?
- Luminosity uncertainty is not mentioned in the text about syst.
uncert. for ZZ+jets (maybe I overlooked it). It is in the Table and it
is mentioned in the syst. uncert. that apply to the VBS search. I think
it should be also in the text.
L 273 ... estimated backgrounds and the syetamatic uncertanites on the
the predicition....
Fixed.
Figure 1, caption.- Gray is US English, is it intended ?
Yes we have been using American English.
predictiomn --> prediction
Fixed.
Thinking on the publications. PubGuidelines recommend stat and syst
without periods. Could you please, check?.
Fixed.
References:
CMS Collaboration Collaboration in [38] and [45]
Fixed
[45] appears as a Technical Report whereas other PASes appear as Physics
Analysis Summary.
Fixed
Remove CERN, Geneva and month for PASes (I know you get it like this
from CDS).
Fixed.
Give only first page in the references (PubGuidelines says so). Please,
check.
Done
Remove the no. in [15], [16], [37],
Done
Extra space in [55] CMS-NOTE-2011-005 ;
Fixed
Include the the journal series letter in the journal name, not in the
volume (PubGuidelines says so), for instance, in [1]
Nucl. Phys. B164 (1980) 445-483 (with bold B164) --> Nucl. Phys. B 164
(1980) 445 (only 164 in bold)-
Fixed
Please, remove the following commented lines in the Abstract (found with
the "magnifying glass" in CADI, superuseful !!!): % The total cross section ...
Fixed
Approval talk requests, April 27
Check that MC aMC@NLO Madgraph references are correct
The reference is correct.
Figure 8: please check that stat uncertainty on the data points is Poisson (root code from stat com: https://twiki.cern.ch/twiki/bin/view/CMS/PoissonErrorBars)
The error bars are Poisson, like recommended by the Stat Com. This is more evident on the linear-scale plot:
L102: please explain how triboson enter here. This is basically ZZV→4l+2j process.
Yes, triboson processes refers to ZZV, i.e. the production of two Z bosons and a third weak boson that decays hadronically. We added a phrase to clarify.
lumi uncertainty update: 2.5%
Corrected.
- slide 36: plot on the right. What is the message of this plot? Not sure if we want to include it. Authors should explain the message that they want to convey by this plot as it has several readings and can trigger questions on the choice of strategy.
The message of the plot then is that the BDT adopted for the analysis has trained optimally, i.e. no information is lost by the choice of BDT observables or a poor training. The comparison to a simple cut-and-count approach (what was used in previous searches for VBS) highlights the potential of deploying an MVA, in this channel in particular. Moreover the MELA results is the work of a PhD student and it would be nice to public her results at least as additional material.
L132: made a statement that analysis selection is consistent with ZZ inclusive and not identical. To avoid confusion from outside that CMS has two analysis with identical selection and data but observe different number of events in data.
We follow the suggestion received during the meeting and rephrased to "identical to" -> "similar to".
L376: aQGC limits are not derived using CLs. Please correct, you can use the description from SMP-14-014 for example.
The PAS text has been corrected.
Please add VBFNLO reference for unitarity bound results
Name of the tool and citation added.
- differential distributions: remove Phantom from label and add Pythia8.
L59 : of of → of
Corrected.
- s17: detector level plots on jet multiplicity will be nice to be added to the PAS, they give the feeling of the background level and composition as function of nJets. (suggestion not
made during the meeting due to lack of time)
Added
- s25: why not putting the |η|<2.4 jet multiplicity plot only in the public twiki ? what do we learn more by showing both |η|<4.7 and the |η|<2.4 ? Discussing only the |η|<4.7 case will simplify PAS, e.g., also x-axis label in Fig 5 PAS-v11 (suggestion not made during the meeting due to lack of time)
As one of the principal measurement of this work we prefer to keep it in. We think it’s important to show the results in the phase space where the full PF algorithm is fully accessible for the jets.
Gabriella Pasztor, April 11 on v10 of PAS
abstract line 13: wouldn't it be more useful to give the signal strength here? That seems to show the agreement with SM in one number and also that is the actual parameter we measure.
You definitely have a point, and we could move to quote the signal strength. The current focus/narrative is that we see a first hint of this process (the significance) and that it is truly a sub-femtobarn process which is probed here (the cross section) for the first time.
line 38: add which generators
We'd prefer to keep this part of the introduction short, without too much detail. The main reason is that we'd have to mention not just the ME generators (POWHEG and MadGraph _aMC@NLO), but also the parton shower. For the sake of completeness, one should then also mention the ggZZ prediction from MCFM and the electroweak prediction.
line 66: add pT range so that it does not clash with the next sentence
Added.
* Systematics due to neglecting the interference term (~1% total rate, ~10% EW rate).
Do I understand correctly that you assume negligible systematics from neglecting the interference contribution as the relevant jet distributions are more similar to QCD than to EW production thus after the BDT fit, they will not contribute to the signal in any significant way?
(I am asking this because the interference is one of the main systematics for example in Wjj and Zjj analysis while completely missing here. Admittedly those are more precise analyses. )
We do not assign a dedicated systematic on the interference, because it is concentrated in the background-like region and there it contributes <2% (FIG 3. c). Compared to the uncertainty of the QCD normalisation in this region, e.g. the 10% scale uncertainty, this is negligible. Regarding the role of the interferference in the Zjj analysis, we are not sure that it is a main systematic there either. In FIG 42 it is reported that the correlation between the signal strength and the interference systematic, i.e. the impact, is -0.18. This is much smaller than the background normalisation or the other theory uncertainties. It can also be seen that the fit itself is not really sensitive to this nuissace parameter, that is it has small pull and it does not constrain the (large) 50% uncertainty. So it seems that the situation in Zjj is similar to this analysis.
line 125: along the beam axis. [This is a dz cut!]
Corrected
line 315: refers to "the ZZ selection described in Section 4". In Section 4, line 195 defines "ZZ selection" before the removal of multiple candidates. This is confusing, especially as you wrote in your reply that ZZjj uses the ambiguity removal (as it should). I suspect the definition was for the benefit of line 211. Could you clean this up? Easiest would be to remove the sentence from line 195.
We have made the definition of the ZZ selection to the second-to-last sentence in this section.
line 219: thanks for adding the background estimate. It would however be nice to have this together with number of total selected events which I can not find in the note. Sorry if I overlooked it.
*** line 234: How large is the unfolding uncertainty? Not given in text, neither in table 1.
The uncertainty on the unfolding is estimated by changing the MC used to build the responce matrix as written in line 234. This uncertainty correspond to MC choice in systematic table.
I had a few questions on unfolding that were never answered (e.g. stability of results wrt chose of unfolding method, unfolding parameter. )
Changing the unfolding method does not change the results significantly, max about 1%. This uncertainty is well covered by MC choice uncertainty. The number of iteration for the D'Agostini method is chosen to be 4 and the reasons are written in the AN line 336. Basically it has been first checked the convergence of the Chi2 between the unfolded distributions obtained with increasing number of iterations and then it's chosen a number of iteration that gives a chi2 less than 1/sqrt(2) and in the same tome greater than 4 to avoid biasing the unfolded results towards the simulation used to construct the responce matrix.
What defines the order of systematic in table 1? Not the size, not grouping similar sources together. it seems somewhat random. Please use a logical ordering. Preferably same in text and table.
We now ordered the systematic as a function of type. Physic objects, event, Unfolding and theory.
line 256 states "variations corresponding to each source are given below" however the following paragraph does not list all sources only some selected (large) ones. Leptons, pileup, reducible background. are not given. I understand why you do not want to use the same table as for the diff xsection but then maybe you can add a separate one. In any case the text is misleading as it is.
We have added the missing systematics to the text.
For my education: In your reply you said that the Phantom sample is the nominal one in ZZ+jets while MadGraph is the nominal for the VBS analysis. Is there a physics reason for that or simply a historical thing?
The reason is somewhat historical. The PHANTOM sample covers the entire GEN phase-space and does not allow to train the BDT. The MG sample was made specifically for this analysis and has much higher statistics.
I am still curious to know the contribution of MCFM and PHANTOM to the theory total predictions for the section analysis.
Table 5 and 6 of the AN report the yield for each MC sample together with the observed, inclusively and per jet multiplicity respectively.
Table 3: is the lumi uncertainty correct for >=3 jets? Seems larger than 2.6% Maybe just rounding?
Yes it's the rounding.
Can we add the theory predictions to the table?
It would still be useful to give the measured and predicted (njet inclusive) fiducial cross-sections somewhere as well (referring to the ZZ section note) as most cross-sections are normalised to 1. Is there a reason not to quote this number?
We added the predicted fiducial cross section per jet multiplicity in table 3.
Fig 7 caption, line 2: isn't it 100 < mjj< 400 ?
No, the plot text is correct. The nVBS selection is a strict subset of the ZZjj selection, which requires mjj>100 GeV.
Typos, etc:
line 23: Z boson pairs OR a Z boson pair
Fixed
line 48: are silicon pixel and strip tracking detectors
Fixed
line 122-3: missing spaces before "(" , 3 places
Corrected
line 332: os -> of
Fixed
line 386: The more recent Monte Carlo and parton-shower predictions -> The more recent Monte Carlo ME calculations and parton shower models adopted in this analysis show
Fixed
Ref [43] remove one of the "collaboration"s
Fixed
Aram Apyan 10 Apr 2016
Figure 7 and Table 11 in the PAS show the pre-fit plots and yields, respectively. Looks like the QCD ZZjj background normalization is pulled down by 1 sigma in the fit. Could you please provide the following:
a) The pulls of the nuisance parameters
The pull distributions for the pre- and post-fit are shown in FIG. 50 of AN-17-002.
b) The corresponding post-fit plots for the figure 7.
The post-fit for the full ZZjj selection:
Showing the pre-fit plots/yields can be bit confusing for the reader as the expected S+B events is 117 and the observed data events is 99 while we obtain a mu-value of 1.39. Instead, showing the post-fit versions in the PAS would be preferable.
Thank you for this suggestion, we think showing a post-fit plot could indeed be an comprehension aid for the reader. However, it is not obvious how this can be accommodated in the current logic/line of reasoning in the PAS:
FIG. 7 (a) shows the nVBS control region and together with the full ZZjj selection in (b) it allows the reader to:
- Convince herself that the BDT selects the VBS-like signal region
- Show that the QCD background shape is in good agreement between data and MC
Now, the fact that we have mu>1 is of course due to the upward fluctuation in the most signal-like bin of the BDT, which is of course evident from FIG. 7 (b). We could eventually add a sentence on this after L351.
Isabel Josa 10 Apr 2016
Isn´t there any reduction in the systematic uncertainty from JES in the
normalized distributions (right column of Table1) ?
The JES uncertainty can change only the value of the pT of the jets and so in the jet multiplicity distribution the pT variation can only move jets from one bin to the other, leaving the total normalization identical. For the same reason the JER uncertainty doesn't change on the normalizated distribution.
Comment: AN (L 70-73) state that MadGraph5 aMCatNLO is chosen as the
reference MC (vs POWHEG) because the latter does not contain events with
2 jets at matrix element, the MadGraph5 aMCatNLO sample is expected to
describe better the variables related to jets.
However, looking at the results there is no significant differences
between the description of the two MCs. Maybe it is worth to comment it
explicitely in the discussion.
Working on it
Question: PAS does not include results in the "wide" region, but just
for me to understand it. Why the cross sections in the tight region (the
ones quoted in the PAS) are larger than those in the wide region ? Can
you remind me the definition of the wide region ? Thanks.
The "wide" fiducial cross section only requires the two Z to be on the mass window 60-120 GeV. The differential cross sections for the wide fiducial region are shown in pb while for the tight fiducial region are in fb.
Typo: Please, check L 92. There are some words left over from the
previous version.
To be corrected
Darien Wood 4 Apr 2017 on v9 of PAS
Type B comments/questions:
1. For the differential measurements, something I miss is a report of
the number of events used in the sample. Since normalized distributions
are presented, this information is not apparent to the reader. The
unfolded distributions are indeed the final result, but it is
interesting to know, for example, how many ZZ+>=3 jet events are
selected. If the plots are included with the reco-level comparison, that
would address this. Otherwise, I think this information should be given
somewhere else.
We suggest to have this information in the form of a RECO level plot as a supplementary material. Including it in the PAS would require some editorial effort, and we will not be able to include them in the combined 8+13 TeV paper because these plots have not been approved for 8 TeV.
2. I am confused by the statement about the gg->ZZ calculation on line
92: "The gg->ZZ process is calculated to O(a_s^2), where a_s is the
strong coupling constant, while the other contributing processes are
calculated to O(a_s^4); this higher-order correction is included because
the effect is known to be large." This makes it sound like the gg->ZZ is
only calculated to O(a_s^2), but that is only leading order for this
process, and the text (and reference) claim an NLO calculation.
The loop-induced ggZZ contribution is scaled to the NLO prediction. This sentence has been removed because it is redundant with the previous sentence.
3. Lin 110: "performed at leading order using MADGRAPH5_AMC@NLO ". The
name of the generator ("@NLO") and the order of the calculation (LO) are
different, but maybe I am taking the name too literally.
The sentence is correct. Since the merger of the MadGraph and AMC@NLO codes, this is the official name of the generator, as recommended by the PubComm guidelines. It is indeed confusing.
4. Line 225: "The uncertainty due to the jet energy resolution (JER) is
5.5%". The table says 1.2-5.5%, and it make sense that this varies with
jet multiplicity.
Fixed
5. Line 324: Can you define Rp_T^{hard}? It seems that all of the other
variable are defined in the text.
We added the explicit definition.
Type A comments:
Line 4: suggest "the mechanism of electroweak symmetry breaking (EWSB)."
Fixed
Line 15: suggest "that both radiate vector bosons, which then interact."
Fixed
Line 16: add commas or parentheses around p_T
Fixed
Line 34: "The results on" seems superfluous. Suggest, "The dependence of
the cross section...is measured and compared to the predictions from
recent Monte Carlo event generators."
Fixed
Line 35: "two p_T-leading jets' properties" -> "properties of the two p_T-leading jets"
Fixed
Line 300: incorrect formatting of units in "8 TeV ". Use macro.
Fixed
Line 307: "p value" -> "p-value"
Fixed
Gabriella Pasztor 4 Apr 2017 on v9 of PAS
I looked again at your trigger description in the PAS draft (starting at line 124) and it seems really out of date for the full 2016 data set.
For example HLT_Ele17_Ele12_CaloIdL_TrackIdL_IsoVL_DZ_v and HLT_Mu8_TrkIsoVVL_Ele17_CaloIdL_TrackIdL_IsoVL_v, that you describe as main dielectron and muon+electron triggers were prescaled in the 2nd half of the year.
We use the same list of triggers as ZZ inclusive and the HZZ4l analysis. Both include the triggers you point out. In fact the trigger description in this PAS is identical to that of SMP-16-017 (ZZ inclusive). We do not remove trigger paths for the later data. In the end the trigger efficiency was evaluated in a tag-and-probe study carried out by the Higgs analysis and documented in HIG-16-041.
I am not sure what you are using as it is not clear in the AN either, so it would be great if you could just point us to the trigger list in your code.
The trigger list in the VBS AN (Tab. 2) is up to date and in sync with what is used in HZZ4l.
Isabel Josa 3 Apr 2017 on v9 of PAS
L 280-281 in the PAS (v9) explain how the normalized distributions are
obtained:
"All the distributions of the corrected number of events are then
divided by the bin width and normalized to one."
This means that although the Y axis is always labelled as 1/sigma_fid x
dsigma/d(relevant variable) the sigma_fid used in each case is
different: sigma_fid for current Fig.1 is the sum of the values in Table
3, sigma_fid for Fig. 3 is 8.0+3.0+1.3 fb and sigma_fid for Figs. 2 and
4 is 3.0+1.3 fb, right ?
Please, explain clearly in each of the figure captions what is the
sigma_fid used in that particular distribution.
We always normalize the distribution to unity and since we can have overflow entries this is not always true. The sigma_fid correspond always to the integral of the distribution under study. We added a clarification in the caption explaining what sigma_fid is in every plots.
Shouldn ´t it be interesting to include the ZZ+jets production cross
sections as a function of the inclusive jet multiplicity, at least
starting in 1 jet, i.e. sigma(ZZ+>=1jet), sigma(ZZ+>=2jets) ? you are
already giving +>=3jets. Either as a table or as a plot. This is usual
information included in W+jets and Z+jets analysis (you may want to
check SMP-16-005 or SMP-16-015 analysis).
The measurement of the inclusive
cross-section in term of jet multiplicity is a different measurement
than the exclusive measurement we present in SMP-16-019. For sure it is
interesting, but it means that we need to do a completely new
measurement as it is based on different unfolding matrices. Since there
is nothing wrong in presenting the ZZ+jets differential cross section in
term of exclusive bins, I really prefer we present the results in term
of exclusive bins. Also, looking in prospective of the paper, for 8 TeV
data it is almost impossible to add such measurements.
L 96-98 Are background samples generated at LO ? NLO ?
They are NLO samples. We added this detail in the text.
L 220 ``Depending on the jet multiplicityâ?Tâ?T can probably dropped.
We’d prefer to leave it, as it is the first time we mention that the uncertainty ranges refer to the variations in the Njets bins.
L 241 Reference for the PDF4LHC prescription?
Reference added
L 271-272 MADGRAPH (qqbar -> ZZ), MCFM (gg -> ZZ) and PHANTOM (qqbar->
ZZ + 2 jets) for 8 TeV dataset and --> remove it, it refers to 8 TeV
analysis.
Fixed
L 273-274 for 13 TeV dataset. No longer needed.
Fixed
L 286-288 The systematic uncertainties in each bin are assessed from the
variations of the nominal cross section by repeating the full analysis
with each source of uncertainty varied.
Maybe you can explain what systematic uncert. are reduced in the
normalized distribution wrt absolute cross sections (those in Table 3)
because of cancellation in the ratio. I guess syst. values of uncert. in
Table 1 refer to diff. cross sections in Table 3.
We included in the systematic table both the values for absolute cross section and for normalized one. We also explained that only shape variations are included in case of normalized distribution.
L 291 ``for 13 TeV data sets of samplesâ?Tâ?T. No longer needed.
Fixed
L 296-298 ``Distributions taking into account the variations of MC
predictions are scaled to the corresponding default distribution and are
not normalized to the unity.â?Tâ?T
Removed.
Does it mean that you are only considering shape variations in the MC
predictions ? What do you mean by ``are not normalized to unityâ?Tâ?T ?
The text was poorly worded. We consider both the shape and yield variations for each uncertainty in case of differential distribution and only shape in case of normalized one.
How do you handle overflow bins ? I mean, in the case of the pT(jet)
distributions there may be jets with pT > 500, 300 GeV for the leading &
sub-leading jets and in the case of jet rapidity distributions there are
jets with abs(eta)>4.5 (you cut at 4.7). Do you still normalize the
distribution to 1 or divide it by the sum of cross section values in
Table 3 (40.4 fb) ?
Are always normalized to unity. We modified the caption in order to clarify it.
About Figure 1 (differential Njets distributions). Rechecking again the
preapproval comments, I can read that there was an explicit request to
present them in terms of absolute cross sections, not normalized ones,
to compare them with the predictions. One can retrieve the experimental
absolute distribution multiplying by the sum of the values given in
Table 3, but the expectation from the MC is not given. Can you please,
include the absolute cross sections back in the PAS ? In fact, both,
absolute and normalized differential cross section can be presented. You
can present again your arguments at the approval session, but the
information should be ready and presented.
Now in the pas there are both absolute and normalized plots.
L 299-307 Discussion of the results.
Some part of the discussion that was in the previous version (L 335-347
in PAS v8) has been dropped and the current paragraph addresses mostly
differences wrt the 8 TeV analysis. Some of the comments in the removed
paragraph are still valid in the present analysis and should be brought
back. I would suggest that you discuss first what you observe in the
current analysis and then mention differences wrt SMP-15-012, if
relevant (and if both sets of results are indeed comparable).
we kept back part of the 8 TeV description and rearranged the paragraph.
Figure caption for Figure 2, second line. ``for 13 TeV â?Tâ?~ no longer needed.
Removed
Figure captions for Figures 1 to 4 have a smaller font size than for the
rest of the Figures. Unify them (personal preference, to the font size
used in Figs. 5 to 7).
Corrected
L 347 Do you have the fiducial cross section value ready?
The numbers have been added to the PAS.
L 348 Do you have the SM cross section ready?
The numbers have been added to the PAS.
Derivation of limits for aQGC. Information is very reduced in this part.
I would suggest to include few sentences concerning:fitting procedure,
is it the same maximum-likelihood fit you used for the EW signal
extraction ?, something about the interpolation of the aQGC (L 874-877
in the AN). Are the limits dominated by the statistical or systematic
uncertainties ? mention it in the text. How do you derive the unitarity
limit (it is explained in the AN)?
We added more details on the statistical modelling and the aQGC interpolation to the PAS. The statistical method is (test statistic, Maximum likelihood fit) identical to the EW signal strength extraction. The limits are entirely dominated by the statistical uncertainty. The unitarity limit is given by the energy which violates unitarity, setting the couplings strengths equal to the respective observed limits, as described in L891-3 of the AN.
Figure 7 does not look exactly the same as the other VBS plots. Check
legend, line thickness etc. and redraw the axis.
The styles of the plots are now in much better agreement.
Typos:
L 120 pielup --> pileup
Fixed
L 199 high-pT isolated ... T should be in roman.
Fixed here and throughout the PAS.
L 218 it has found --> it has been found ?
Fixed
L 220 compositions --> composition
Fixed
L 224 ... the the ...
Fixed
L 222, L 224, L 234 ...uncertainty on ... --> uncertainty in ...
Fixed
L 262 Z bosons are than --> ... are then ... ?
Fixed
L 359 Figure 7 (left) shows ... (left) not needed.
Fixed
L 361 coupling --> couplings?
Fixed
References.-
[17] CMS Collaboration Collaboration, --> CMS Collaboration,
Fixed
[21] Collaboration Collaboration, --> CMS Collaboration,
Fixed
[36] To be completed.
Fixed
[42] To be completed.
Fixed
Gabriella Pasztor 3 Apr 2017
Abstract:
line 6: cross section as a function of the .
corrected
line 8: remove "the distributions of" - those or also diff sections
corrected
line 13: wouldn't it be more useful to give the signal strength here?
Main text:
line 23: Z boson pairs OR a Z boson pair
Fixed
line 35: corrections
Fixed
line 34-38: "The results on . are measured and compared." seems a strange construction
Anyaway, sentence is vey long and would probably be better to cut in half.
Fixed
line 38: add which generators
line 40: two jets with large mjj and dyjj
VBS use basically the same cuts used in the ZZ+jets analysis since is a fit on the shape and not a cut and count measurement.
line 48: are silicon pixel and strip tracking detectors
Fixed
line 66: 1.3$-$2.0
Fixed
line 66: add pT range as done in the next sentence
To be fixed.
line 70: from close to the nominal interaction point? Do we need "close to"?
Done.
CMS detector description:
Maybe this is the standard text but I find it curious that some numbers are only given for barrel.
Eg. lines 55-57 have a very restricted phasespace where resolution is given.
Similarly lines 69-71, give info only on the barrel though the full barrel+endcap range is used in the paper.
It is not clear why the HCAL eta-phi granularity is more important than the ECALs.
line 86: "and for comparison . processes." This half sentence is hard to read. missing a verb?
Fixed
It would be useful to add the most important EW and QCD diagrams that lead to ZZjj.
Adding only the VBS diagrams would create an imbalance between the two parts of the analysis (diff. cross sections, VBS), so one would have to also add some ZZ diagrams. For a VBS-only paper we'd definitely want to add these diagrams.
line 106: 1% to the EW yield?
Its 1% of the total ZZjj yield, as stated in the text. Detailed numbers and distributions of the interference are shown in Fig. 2 of the VBS AN.
line 115: introduce PDF abbreviation
Added
line 128-130: not clear which triggers are referred here, especially for the single leptons.
We updated the pT thresholds, the description.
line 132: within the ZZ search region??? -> selected by the four-lepton analysis criteria?
Fixed.
Event selection: mention lepton momentum and efficiency corrections
We added the data-driven momentum, resolution and efficiency corrections.
line 184: e-e separation requirement is only ~one calorimeter crystal
Is the efficiency well understood for such close objects?
This technical cut is part of the HZZ and ZZ inclusive event selection, since run I. It is mostly intended to remove spurious duplicate objects (“ghost cleaning”). From a physics POV, it is entirely irrelevant for this analysis, as the bulk of Z bosons has pT<200 GeV. See also responses to Isabel’s question on the high-pT muon ID from March 2nd.
line 187: mass closest to the nominal Z boson mass of 91.2 GeV is denoted Z1
Fixed
line 190: ZZ selection defined here. It appears for the EW ZZjj analysis later. Do I understand that for the ZZjj analysis, multiple ZZ combinations can enter the plots? Why was this choice made? It would be useful to explicitly state that these ZZ candidates will be used for the EW analysis without resolving the ambiguity.
line 193: Add rate of ambiguity as it is the relevant number for ZZjj
There seems to be a misunderstanding. The events used in the VBS analysis are exactly the same as in ZZ inclusive and the diff. cross sections. This means we apply the same ZZ arbitration based on lepton pT. There is no reason why the rate of ambiguous events in ZZjj should be different from the ZZ inclusive one (which we quote as 0.3% in the text).
Background estimation:
I would add the size of the estimated backgrounds
Added
line 217: just to be sure. the ~98% trigger efficiency is taken properly into account in the cross-section and this 2% uncertainty is on top of that.
The trigger efficiency is automatically taken into account in the unfolding procedure together with all the other efficiency sources. The 2% uncertainty is taken into account with all the other uncertainties.
line 220: remove "Depending on the jet multiplicity"
Removed
line 223: what is included in the 0.1-1.2%?
The statistical background of the MC samples
line 224: Add JES uncertainty size
Added
line 226: is lepton ID uncertainty rely parametrised as a function of jet multiplicity?
They are not parametrized as a function of the jet multiplicity but, they do depend on the event kinematics, which vary with the emission of extra radiationn.
line 238: For my education, why mZ and not 2mZ is the default scale? Is it MCFM or CMS choice?
It was MCFM choice. Anyway, this part as been removed since was something measured for 8 TeV and the same part is now taken into account by the theoretical uncertainty on the MC. For what concern MCFM as generator of ggZZ loop induced production the central scale in MCFM is actually dynamic and equal to m4l/2. The scale was optimized by the HIG PAG.
Table 1 caption: so all uncertainties except trigger and lumi depend on the jet multiplicity?
Yes since for both lumi and trigger we have only global uncertainties. The other have always at least a small dependency on the jet multiplicity.(change in isolation, kimatic, etc..)
line 249: Why are the theory uncertainties so larger for the EW ZZjj analysis?
As the values for the few sources mentioned here are so different from the ones in table 1, I suggest to have an extra column and list all uncertainties also for the ZZjj analysis in table 1.
The numbers quoted for the VBS search are based on the maximum deviation on the BDT spectrum. This is naturally more sensitive to the uncertainties than the per-jet-bin inclusive numbers reported for ZZ+jets. This is also one of the reasons why we would like to keep the VBS figures separate from the numbers on the fiducial cross sections. The other reason is that Tab. 1 will contain also the numbers for the 8 TeV results for the paper, and then we’d be mixing 8/13 TeV in addition to ZZ+jets and VBS numbers.
line 257: for consistency: delta_eta_jj
Fixed.
line 261: lepton momenta
Fixed
line 261: so here the ambiguity is resolved for the 4l pairing as well?
Yes, the same selection used on data/RECO is used at the GEN level.
line 268: and the reconstruction-level
Fixed
line 272: remove 8 TeV stuff
Fixed
line 273: this is MadGraph5 _aMC@NLO as described in section 3
I suggest to search for mad graph in the pdf as several different names are used at present (see e.g. line 230)
It would be good to use the same typeset everywhere and be consistent with the text and the figure labels
Fixed throughout the PAS.
In line 100 phantom appears here as the alternative for EW Zjj, however here phantom is the nominal MC for all ZZjj. This seems contradictory.
The Phantom sample is the nominal one in ZZ+jets. MadGraph is the nominal for the VBS analysis.
I would be curious to know the contribution of MCFM and PHANTOM to the theory total predictions
Table 3: is the lumi uncertainty correct for >=3 jets? Seems larger than 2.6% Maybe just rounding?
Can we add the theory predictions to the table?
line 280: divided by the bin width. for the jet multiplicity this does not make sense, as the bin width is 1 for the first 3 bins and then undefined for the last.
Maybe this sentence should go after the jet multiplicity discussion?
the bin width part has been moved to the captions of the plots where is actually applied.
line 281: It would be useful to give the measured and predicted fiducial cross-sections somewhere as everything is normalised to 1 so the absolute (dis)agreement can not be deduced from the plots.
For now we added the distribution not normalized to 1.
line 284: Figures 3 and 4
Fixed.
line 290: ones for both the . POWHEG predictions are also reported.
Fixed.
line 292: uncertainties on the MC
Fixed.
line 300: 8 TeV typeset
comma after [17]
the 13 TeV predictions, using newer version of the Monte Carlo ME calculation and parton shower, show
(or similar but need rephrasing along these lines)
Fixed and rephrased.
line 304: not sure the disagreement here is significant. Hard to read the plot but even the first bin seems to agree within 1 sigma. Less 1 jet events is the only significant here. P-value 15% is not really significant.
line 307: also the eta distributions show a small slope (deficit at large eta) which might explain the Njets disagreement being larger for the |eta|<4.7 region.
Fig 1: for the ratios, please zoom in as much as possible. The labels take up too much space as they are now and the values are not easy to read. Why are the powheg uncertainties so small (0?) wrt Madgraph?
The uncertainties shown are from the matrix-element calculation, as stated in the caption.
In general the labels on the plots have too small fonts and difficult to read.
Will fix for approval plots.
Captions figs 3-4, line2: Missing space after full stop
Fixed
line 321: to exploit
Fixed.
line 324: define event balance
Added definition.
Figure 6: do I understand correctly that the left plot shows events that are a subsample of the events on the right?
Fig 2 caption: isn't it 100 < mjj< 400 ?
Added the cut on mjj>100GeV
line 346: what about the ambiguity resolution for 4l pairing? See earlier comment.
See response to first questions.
lines 347-8: missing values
Fixed.
line 348: missing "fb"
Fixed.
line 345: MVA -> BDT
Fixed.
line 359: mZZ vs plot axis title m4l: use the same
Fixed.
Fig 7: would the fT0-2 couplings result in a similar shape that is shown here for fT8-9? I'd be interested to see their prediction as it is not shown in the AN either.
The “shape” of the different aQGC operators is almost identical. In the end the limit is entirely driven by the last/overflow bin. We do not show the predictions of the T0-2, but we do show the yield paramerization in the last/overflow bin for these operators in FIG. 52 of the AN. The yield predictions corresponding to the limits we set for each operator are identical, i.e. all limits correspond to the same yield increase.
Figs 5-7: add tics to top and right borders
Done.
Can we add a figure that shows these new and also previous bounds on these anomalous couplings? Such plots are always great for presentations.
We agree that these plots are great, but we believe these type of summary plots are done by the SMP PAG. We could add it, but we don’t recall seeing such a plot in a PAS.
line 372: The newer versions of the Monte Carlo ME calculation and the parton shower . (or similar)
Fixed
Ref [6] ATLAS and CMS Collaborations
Fixed
Ref [12] update
Fixed
Ref [36], [42] incomplete
Fixed
Ref [41] too many collaboration
Fixed
Ref [46] add Journal reference of this arxiv: Proceedings of the PHYSTAT 2011 Workshop, CERN, Geneva, Switzerland, January 2011, CERN-2011-006, pp 313-318"
Added.
Ref [53] seems to have some issues with the note number, should be CMS-NOTE-2011-005 ; ATL-PHYS-PUB-2011-11
Fixed.
Refs: please, use same format for all PAS, e.g.. compare style of [38] and [44]
Formatting is homogenised
Pietro Vischia for Stat. Com. 14 March 2017
- Unfolding procedure: could you please add a statement in the PAS specifying the number of iterations you chose for the unfolding procedure, and how did you choose it? (also, minor: in the systematics section you quote the unfolding procedure before you actually introduce it)
- Agreement between stuff: in figures (e.g. fig 1) and text (e.g. the paragraphs of L336, L351), it would be good if you quoted p-values from a goodness-of-fit test. This in particular since you venture into ranking the agreement of different generators. I suggest that for each generator you add an inset text in the ratio plot, with a quoted p-value (for example from a chi2 or KS test).
Added the P Value obtained with Chi2 on the ratio.
- Table8 and relative text: are the CIs two-sided? From the text and table it is not clear to me whether you set separately upper and lower limits, or if you are computing a two-sided interval. Perhaps you could make it more explicit in the text and table.
We made it clearer by always referring to lower and upper confidence levels.
- Typo in the AN: L667 of AN-2017-002: "RCO" ---> "ROC"
Fixed.
- The hyperparameters optimization study that you performed looks very nice: I think it is a pity that it is not apparent in the PAS: perhaps you could add an additional sentence?
We have added a sentence in this sense.
- The study you do at L689 of the aforementioned AN: you take out one variable, you retrain, you take out another one (leaving out also the first one), etc, (i.e. you train with N variables, with N-1, with N-2...), is that correct? This is a good approximation, but actually the full procedure should be to try out all the various possibilities (N variables, N times N-1 variables, etc). I don't think you need to actually implement it, just to be clear. It is just a suggestion for next time
The procedure you describe is exactly what we did, i.e. we retrained the BDT with N variables N times, each time dropping a different of the N variables. The AN text was indeed misleading and has been modified to make the procedure clearer.
- Overtraining check: in the aforementioned AN, in L698, I find an overtraining check, but as in the PAS the comparison between distributions is done by eye. It would actually be very advisable, particularly for the overtraining check, to estimate the amount of overtraining by quoting a p-value from a GoF test. Traditionally TMVA uses Kolmogorov-Smirnov, but since you seem to have high statistics you could even use a chi2 test.
Thank you for these suggestions.
Gabriella Pasztor 6 March 2017
Table 2:
You use a large number of triggers, some of whom were prescaled or disabled in the 2nd half of 2016 data taking.
The information in Table 2 is not (yet) correct for these prescale values.
Also the L1 seeds are not properly listed. For example the dielectron triggers are seeded by an or of multiple single and diEM L1 seeds.
Are the changing prescale values taken into account when calculating the data luminosity and the trigger efficiency?
Table 3:
Zg sample missing
Thanks to point it out. Now fixed.
The madgraph sample listed as qq->ZZ->4l according to line 69 also contains "gg->ZZqq" "with 0 or 1 jet". What do you refer to here? Something like gg -> qq qq -> q ZZ q which has 2 jets?
Yes exactly.
Sec 3.
line 91: please give the criteria or point to a suitable reference:
https://twiki.cern.ch/twiki/bin/view/CMS/JetID13TeVRun2016
if these fractions are also coming from loose jet id recommendations
Added
line 108: Ref 16: needs a twiki address
Also Ref 14 is not complete.
Tables 5,6: contributions from WZZ, ZZZ?
We removed these samples since they have an overlap with other signal samples.
Table 6: Presumably the last 3 columns belong to >2 jets (not including = 2 jets).
Missing numbers for total irreducible.
Wrong sums in several places in same line, e.g. 0.32+0.39 = 0.49 (1jet, 4mu ch)
Error in copy the values. Now is corrected.
line 120: Hm. jet pT distributions (fig 5) show a different slope.
With the full statistic there is not a slope any more. There is a good agreement between Data and MC.
Fig 6,7: contribution from ttWW mentioned in line 126 but not shown here
Removed since the samples is not present for Moriond. It's contribution is negligible. Mention on the AN has been removed.
line 142: How is the trigger efficiency measured in data?
Is evaluated with a tag-and-probe technique. Is written in SMP-16-17 but we could add this information also here.
line 144: is this the latest lumi error? Should it be 2.5%?
The version of the AN presents results with the half of the statistic and the uncertainty was 6.8%. Now is updated to 2.6 %.
line 158: why the lepton efficiency is used to estimate the QCD scale uncertainty on the ZZ cross-section? Which lepton efficiency is varied here?
Sorry for the mistake. Is actually the cross-section of each leptons and not the efficiency per lepton. Now corrected.
line 171: Which systematics are not propagated to the unfolding?
All the systematics that have a global values and not depend on the variables we are studying. For example the Luminosity.
line 177: What do you mean by this sentence? Only uncertainty on the shape is considered but none on the normalisation? This sounds strange but I probably misunderstood something...
It's also taken the difference in normalization. Sorry for the mistake. It has been corrected.
sec 6: No mention how correlations are treated.
To be added.
line 213: Sources are then added in quadrature?
Sources are added in quadrature.
Figs 9 - 16: I would make a better use of the y scale (i.e. decrease the ymax) wherever possible to have the distribution zoomed in as much as possible. Extreem example is Fig 15 top left plot where the distribution uses less than a 3rd of the y axis range.
I am not particular fan of differential distributions with only two bins. Could we rebin with the full dataset?
All the plots with 2 bin have now a more fine binning thanks to the higher statistic.
Why do we show only shapes? Having the non-normalised results have more information in my opinion?
We think that the information on the the normalization can be taken from the measure of the inclusive cross-section. It's also what is done in previous/other analysis such ZZ inclusive (SMP-17-017).
Fig 15, top left figure, bin 4: is the error bar correct? seems to be missing actually...
Fig 16, bottom left, bin 3: same question.
We have been investigating
Fig 17, right" highest data point missing
To be corrected
Fig 21: Interesting that the data - MC relation changes with unfolding. Was it checked why?
The main features of unfolding are correct for efficiency and resolution. For this reason it's expected to see some differences in data MC relation after the unfolding.
Fig 39: white means no entry? Treated as 0?
What negative values mean? Low stat weighted MC, I guess? Treated as 0? Binning and MC stat does not seem to be well matched here.
Yes, white means no entry and are treated as 0. The negative bins are indeed due to MC weight and are treated as 0 as well. The leakage of low statistic far from the diagonal is due to the high resolution of the invariant mass of the 4 leptons. Because of that is very unlikely to unlikely to fill some bins even with the high statistic of our samples. However, since the values inside those bins are almost 0 they basically don't count in the unfolding procedure. Finally the binning is chosen in order to fit of the statistic of the data.
Fig 77: I do not see why you claim SVD (k=2) is biased but Bayesian not. There is no qualitative difference between Fig 77 and the corresponding plots with Bayesian unfolding. Am I missing the point?
No your comment is right. After some changes and an update of all the plots apparently those variables for SVD (k=2) are not explicitly biased anymore. More evident biased variables plots have been shown now. Now is evident that the unfolded shapes are biased towards the "true value".
To see whether there is a bias or not, should we check whether the result changes if we assume different slopes especially for distributions where data and MC does not seem to match so well, e.g. jet pT.
Could be done but it would be very tricky and difficult to understand if the distribution is biased or not in this way. On the other hand, the extreme case where a flat reco distribution is unfolded by a response matrix build with a non-flat variable could prove that the result does not get biassed towards the distribution used to build the unfolding matrix.
Sec B3, could you show the difference in the prediction of the true and the reconstructed variables for the two different signal MC samples (Madgraph+MCFM+Phantom and Powheg+MCFM+Phantom) by simply overlaying the distributions? It would be useful to know how different they are to understand how meaningful this test is and whether it covers effects that might hide in the data.
To be done
Fig 78 and on, the lower panel should have a y scale better adapted to the observed values, so that we se if the observed differences are compatible with the stat error. Do the ratios have errors? They are surely not visible but it may just be that they are really small wrt the scale.
Fig 124: highest data points missing