Review comments and responses for SMP-16-019

Response deemed complete and sufficient

Work in progress

Follow-up comments

ToDo

Helen Heath 7 June 2018

Title I'd suggest "of Z boson" -> "for Z boson" (as in first line of abstract)

Changed


L9 "associate" -> "associated"

Changed


L22 "to the ZZ" -> "to ZZ"

Changed


L26 "in a single" -> "at a single"

Done


L27 "instrumental for"-> "instrumental in"

Done


L30 "13 TeV data, which are"-> "13 TeV data set, which is"


"which are being collected in Run II of LHC" we are not supposed to refer to RunII in a paper without defining it. Could you just omit this?
otherwise replace Run II with a description of what it means and insert "the" before LHC

removed "which is being collected in Run II of LHC."

Remove commented part


L115 I would assume a "pair" is two "matching leptons" so using "pair" is confusing when you go on to say they can be different flavors. Why not just say
"require the presence of two loosely isolated leptons."
or does "pair" imply opposite charge in which case say that.

Changed


L118 "request" -> "requirement"

Changed


move comma from before "and" to before "for the 13 TeV"

Changed


line after L146 "for the inclusion" -> "for inclusion"

Changed


"we require each lepton track to have the ratio between the impact parameter computed in three dimensions, with respect to the primary vertex, and its uncertainty to be less than 4"
I 'm having difficulty understanding this sentence. Does it mean
"we require that the ratio of impact parameter for the track to the uncertainty on the impact parameter is less than 4."
You refer to longitudinal and transverse impact parameters elsewhere without definition so it's not clear to me that you need to say "computed in three dimensions" here and it does make the sentence had to read. If 3D is necessary perhaps split into two sentences
"in order to suppress electrons from .. hadrons, we place a requirement on the impact parameter computed in three dimensions. We require that the ratio..."

Changed. Not so beautiful.


L204 remove "down"

Changed


L215 "the simulation" You describe quite a lot of simulations. I think this might need to be a bit more specific.

We described the MCs used in the 3. We think it's fine like this.


L221 "systematic uncertainty source "-> "source of systematic uncertainty"

Changed


L224 "The details are illustrated in the following". I don't think you need this but if you feel it's necessary change "illustrated in the following" to something like "described in the rest of this section"

Removed.


L226 "its effect" -> "it"

Changed


L227 "on the final" -> "in the final"

Changed


L230 "uncertainties on" -> "uncertainties in"

Changed


L233 "PU" isn't defined here suggest define it on L111 when it's first used and then use afterwards - particularly as you use it in equation 1. Alternatively replace PU here with "pileup"

replaced


L235 "uncertainty contribution" -> "contribution to the uncertainty"

Changed


L260 "uncertainties on" -> "uncertainties in"

Changed


L261 "uncertainty on" -> "uncertainty in"

Changed


L264 insert space before "channels"

Changed


L269 -> "and both the transverse momentum and pseudorapidity of the pt-leading jet" and a similar change on L270

It would "inrefere" with the pt definition. We prefer to keep it as it is.


L302 What does "that" refer to? Suggest "The theoretical uncertainties also include the uncertainties in the PDF and alpha_S" (note "uncertainty in" not "uncertainty on" as in pub com guidelines)

Changed


L336 -> " for a center-of-mass energy of 8 (13) TeV"

Changed


L339 "pseudorapidity separation" -> "separation in pseudorapidity"

Changed


L348 no hyphen in "key distributions"

Changed

Isabel Josa 9 May 2017 on v15 of PAS and additional material

Do not use any Latex defined script in the abstract, change \Lumi for plain LateX commands.

Fixed.

L 232 The systematic uncertainty for the trigger efficiency are evaluated --> is evaluated ??

Fixed.

Figure 5 (caption) (subleading jet distributions).

Fixed.

It is written ... for Njets >= 1. Shouldn´t it be for Njets >=2 ??

Fixed.

Last sentence. it is written ... the pT-leading jet transverse momentum and pseudorapidity respectively.

Shouldn´t it be ... the pT-subleading jet ...

Fixed.

I think PAS documents do not include an Acknowledgment section.

- Captions for plots showing pseudorapidity distributions. In fact they are showing the absolute value of the pseudorapidity of the jets and the absolute value of the difference in pseudorapidity.

Can you update the caption accordingly (reflecting they are absolute values).

Fixed.

Table with the 9 events with BDT > 0.92: Do we need two decimal figures in the table, 365.83, 91.36, 101.11... ? I think it would be ok with just one 365.8, 91.4, 101.1... except for the BDT score (0.97 ...) obviously.

Changed.

pTrel., jets and pTrel., had plots, can you move the axis title a bit lower, now it is just touching the axis values label at 0.9 ? Just to make it perfect, if not, it is ok.

Fixed

Do we need the comma in the axis title ? p_T^rel., jets vs p_T^rel. jets ? idem for had.

We changed notation to the one used in the PAS: its now $R(p_{T}^{hard})$ and $R(p_{T}^{jets})$.

Simulation plot effS vs effB. Do we need a Preliminary label in there ? I don´t know about simulation plots.

It is suggested in the PubComm guidelines for figures. It has been added.

Public additional material

Distribution of reconstructed multiplicity of jets with $|\eta^{\mathrm{jet}}|<4.7$. Points represent data, shaded histograms represent Monte Carlo predictions and background estimate while the hatched band on them represent systematic uncertainty on the prediction. Reducible background is obtained with data driven method.

nJets_Central_All_mad_SR.png

Distribution of reconstructed $p_{T}$-leading jet transverse momentum. Points represent data, shaded histograms represent Monte Carlo predictions and background estimate while the hatched band on them represent systematic uncertainty on the prediction. Reducible background is obtained with data driven method.

PtJet1_All_mad_SR.png

Distribution of reconstructed $p_{T}$-leading jet pseudorapidity. Points represent data, shaded histograms represent Monte Carlo predictions and background estimate while the hatched band on them represent systematic uncertainty on the prediction. Reducible background is obtained with data driven method.

EtaJet1_All_mad_SR.png

Distribution of reconstructed $p_{T}$-subleading jet transverse momentum. Points represent data, shaded histograms represent Monte Carlo predictions and background estimate while the hatched band on them represent systematic uncertainty on the prediction. Reducible background is obtained with data driven method.

PtJet2_All_mad_SR.png

Distribution of reconstructed $p_{T}$-subleading jet pseudorapidity. Points represent data, shaded histograms represent Monte Carlo predictions and background estimate while the hatched band on them represent systematic uncertainty on the prediction. Reducible background is obtained with data driven method.

EtaJet2_All_mad_SR.png

Distribution of reconstructed invariant mass of the two $p_{T}$-leading jets. Points represent data, shaded histograms represent Monte Carlo predictions and background estimate while the hatched band on them represent systematic uncertainty on the prediction. Reducible background is obtained with data driven method.

Mjj_All_mad_SR.png

Distribution of reconstructed pseudorapidity separation between the two $p_{T}$-leading jets. Points represent data, shaded histograms represent Monte Carlo predictions and background estimate while the hatched band on them represent systematic uncertainty on the prediction. Reducible background is obtained with data driven method.

Deta_All_mad_SR.png

The following figure displays a real proton-proton collision event at 13 TeV in the CMS detector in which two high-energy electrons (light blue lines), two high-energy muons (red lines), and two high-energy hadronic jets (dark green cones) are observed. The presence of two opposite-sign same-flavour lepton pairs with mass close to the Z mass, of two hadronic jets in opposite hemispheres of the detector with a large pseudorapidity separation, as well as the absence of hadronic activity in the central region of the detector, are indicative of the electroweak production of two Z bosons and two jets.

VBS_event_display.png

Selected kinematic properties of signal-like events with BDT score > 0.9 observed in the data.

signal_enriched_kinematics.pdf

Distribution of the Zeppenfeld variable of the leading Z boson, $\eta^*_{Z_{1}}=\eta_{Z_{1}} - (\eta_{jet 1} + \eta_{jet 2})/2$, for events passing the ZZjj selection, which requires $m_{jj}$ > 100 GeV. Points represent the data, filled histograms the expected signal and background contributions.

data_mc_BLS_Z1_zepp_all.pdf

Distribution of the Zeppenfeld variable of the subleading Z boson, $\eta^*_{Z_{2}}=\eta_{Z_{2}} - (\eta_{jet 1} + \eta_{jet 2})/2$, for events passing the ZZjj selection, which requires $m_{jj}$ > 100 GeV. Points represent the data, filled histograms the expected signal and background contributions.

data_mc_BLS_Z2_zepp_all.pdf

Distribution of the event balance observable for events passing the ZZjj selection, which requires $m_{jj}$ > 100 GeV. Points represent the data, filled histograms the expected signal and background contributions.

data_mc_BLS_delta_rel_all.pdf

Distribution of the ratio between the $p_{T}$ of the dijet system and the scalar sum of the tagging jets’ $p_{T}$ for events passing the ZZjj selection, which requires $m_{jj}$ > 100 GeV. Points represent the data, filled histograms the expected signal and background contributions.

data_mc_BLS_rel_pt_hard_all.pdf

Signal versus background efficiency curves of the boosted decision tree (BDT) and matrix element likelihood (MELA) classifiers for separating the electroweak from the QCD-induced production of the $\ell\ell\ell'\ell'jj$ final state. The efficiency of a cut-based selection on the dijet mass and dijet pseudorapidity separation is also shown.

ROCcurve_MELA.pdf

Kenneth Long, May 5 on v13 of PAS

I checked and indeed all the samples use Pythia v8.2

Fixed.

I don't see any reference for POWHEG. It should be:

ZZ, WZ and W+W- production, including Gamma/Z interference, singly resonant contributions and interference for identical leptons, T. Melia, P. Nason, R. Rontsch, G. Zanderighi, JHEP 1111 (2011) 078, arXiv:1107.5051 P. Nason, JHEP 0411 (2004) 040, hep-ph/0409146 [paper] S. Frixione, P. Nason and C. Oleari, JHEP 0711 (2007) 070, arXiv:0709.2092 [paper] S. Alioli, P. Nason, C. Oleari and E. Re, JHEP 1006 (2010) 043, arXiv:1002.2581 [paper]

Fixed.

Line 23 and 24: pair —> pairs seems more correct to me.

Fixed.

Line 25: in associations —> in association

Fixed.

Line 101: I don't see any comment about how you verified that MG and Phantom are in good agreement and I think it would be useful.

Line 173: An algorithm “based on…”

Line 194: I think a minute rewording here would make it more clear. Maybe just include a statement like “If a pairing that gives Z candidates with m_{ll} in [60, 120] GeV, the event is accepted” before saying how the Z1 and Z2 are chosen.

Line 208: I don’t understand why “mostly” is used here. I thought the definition of irreducible backgrounds was having 4 prompt leptons.

228-229: “are functions” or “is a function”

Fixed.

234-235: “on the cross section” and “affecting the differential cross section” seems redundant to me

***242-243: I don’t think “following the PDF4LHC prescription” is an accurate statement. Don’t they come from NNPDF3.0 only?

Fixed.

264: It seems more natural to me to use the “+” sign than to right the word “plus”

Changed.

282: Maybe mention that you use final state leptons and prompt photons. Also could be worth explicitly using the word “dressed”

Isabel Josa 4 May 2017 on v13 of PAS

- New Pythia8 label in plots misspelled (Pyhtia instead of Pythia)

Fixed.

- Table 1 (syst. uncert.) Do we need the % symbol in the two items that do not apply in the normalized distributions (trigger and lumi) ?

- Luminosity uncertainty is not mentioned in the text about syst. uncert. for ZZ+jets (maybe I overlooked it). It is in the Table and it is mentioned in the syst. uncert. that apply to the VBS search. I think it should be also in the text.

L 273 ... estimated backgrounds and the syetamatic uncertanites on the the predicition....

Fixed.

Figure 1, caption.- Gray is US English, is it intended ?

Yes we have been using American English.

predictiomn --> prediction

Fixed.

Thinking on the publications. PubGuidelines recommend stat and syst without periods. Could you please, check?.

Fixed.

References:

CMS Collaboration Collaboration in [38] and [45]

Fixed

[45] appears as a Technical Report whereas other PASes appear as Physics Analysis Summary.

Fixed

Remove CERN, Geneva and month for PASes (I know you get it like this from CDS).

Fixed.

Give only first page in the references (PubGuidelines says so). Please, check.

Done

Remove the no. in [15], [16], [37],

Done

Extra space in [55] CMS-NOTE-2011-005 ;

Fixed

Include the the journal series letter in the journal name, not in the volume (PubGuidelines says so), for instance, in [1]

Nucl. Phys. B164 (1980) 445-483 (with bold B164) --> Nucl. Phys. B 164 (1980) 445 (only 164 in bold)-

Fixed

Please, remove the following commented lines in the Abstract (found with the "magnifying glass" in CADI, superuseful !!!): % The total cross section ...

Fixed

Approval talk requests, April 27

Check that MC aMC@NLO Madgraph references are correct

The reference is correct.

Figure 8: please check that stat uncertainty on the data points is Poisson (root code from stat com: https://twiki.cern.ch/twiki/bin/view/CMS/PoissonErrorBars)

The error bars are Poisson, like recommended by the Stat Com. This is more evident on the linear-scale plot:

L102: please explain how triboson enter here. This is basically ZZV→4l+2j process.

Yes, triboson processes refers to ZZV, i.e. the production of two Z bosons and a third weak boson that decays hadronically. We added a phrase to clarify.

lumi uncertainty update: 2.5%

Corrected.

- slide 36: plot on the right. What is the message of this plot? Not sure if we want to include it. Authors should explain the message that they want to convey by this plot as it has several readings and can trigger questions on the choice of strategy.

The message of the plot then is that the BDT adopted for the analysis has trained optimally, i.e. no information is lost by the choice of BDT observables or a poor training. The comparison to a simple cut-and-count approach (what was used in previous searches for VBS) highlights the potential of deploying an MVA, in this channel in particular. Moreover the MELA results is the work of a PhD student and it would be nice to public her results at least as additional material.

L132: made a statement that analysis selection is consistent with ZZ inclusive and not identical. To avoid confusion from outside that CMS has two analysis with identical selection and data but observe different number of events in data.

We follow the suggestion received during the meeting and rephrased to "identical to" -> "similar to".

L376: aQGC limits are not derived using CLs. Please correct, you can use the description from SMP-14-014 for example.

The PAS text has been corrected.

Please add VBFNLO reference for unitarity bound results

Name of the tool and citation added.

- differential distributions: remove Phantom from label and add Pythia8.

L59 : of of → of

Corrected.

- s17: detector level plots on jet multiplicity will be nice to be added to the PAS, they give the feeling of the background level and composition as function of nJets. (suggestion not made during the meeting due to lack of time)

Added

- s25: why not putting the |η|<2.4 jet multiplicity plot only in the public twiki ? what do we learn more by showing both |η|<4.7 and the |η|<2.4 ? Discussing only the |η|<4.7 case will simplify PAS, e.g., also x-axis label in Fig 5 PAS-v11 (suggestion not made during the meeting due to lack of time)

As one of the principal measurement of this work we prefer to keep it in. We think it’s important to show the results in the phase space where the full PF algorithm is fully accessible for the jets.

Gabriella Pasztor, April 11 on v10 of PAS

abstract line 13: wouldn't it be more useful to give the signal strength here? That seems to show the agreement with SM in one number and also that is the actual parameter we measure.

You definitely have a point, and we could move to quote the signal strength. The current focus/narrative is that we see a first hint of this process (the significance) and that it is truly a sub-femtobarn process which is probed here (the cross section) for the first time.

line 38: add which generators

We'd prefer to keep this part of the introduction short, without too much detail. The main reason is that we'd have to mention not just the ME generators (POWHEG and MadGraph _aMC@NLO), but also the parton shower. For the sake of completeness, one should then also mention the ggZZ prediction from MCFM and the electroweak prediction.

line 66: add pT range so that it does not clash with the next sentence

Added.

* Systematics due to neglecting the interference term (~1% total rate, ~10% EW rate).
Do I understand correctly that you assume negligible systematics from neglecting the interference contribution as the relevant jet distributions are more similar to QCD than to EW production thus after the BDT fit, they will not contribute to the signal in any significant way?
(I am asking this because the interference is one of the main systematics for example in Wjj and Zjj analysis while completely missing here. Admittedly those are more precise analyses. )

We do not assign a dedicated systematic on the interference, because it is concentrated in the background-like region and there it contributes <2% (FIG 3. c). Compared to the uncertainty of the QCD normalisation in this region, e.g. the 10% scale uncertainty, this is negligible. Regarding the role of the interferference in the Zjj analysis, we are not sure that it is a main systematic there either. In FIG 42 it is reported that the correlation between the signal strength and the interference systematic, i.e. the impact, is -0.18. This is much smaller than the background normalisation or the other theory uncertainties. It can also be seen that the fit itself is not really sensitive to this nuissace parameter, that is it has small pull and it does not constrain the (large) 50% uncertainty. So it seems that the situation in Zjj is similar to this analysis.

line 125: along the beam axis. [This is a dz cut!]

Corrected

line 315: refers to "the ZZ selection described in Section 4". In Section 4, line 195 defines "ZZ selection" before the removal of multiple candidates. This is confusing, especially as you wrote in your reply that ZZjj uses the ambiguity removal (as it should). I suspect the definition was for the benefit of line 211. Could you clean this up? Easiest would be to remove the sentence from line 195.

We have made the definition of the ZZ selection to the second-to-last sentence in this section.

line 219: thanks for adding the background estimate. It would however be nice to have this together with number of total selected events which I can not find in the note. Sorry if I overlooked it.

*** line 234: How large is the unfolding uncertainty? Not given in text, neither in table 1.

The uncertainty on the unfolding is estimated by changing the MC used to build the responce matrix as written in line 234. This uncertainty correspond to MC choice in systematic table.

I had a few questions on unfolding that were never answered (e.g. stability of results wrt chose of unfolding method, unfolding parameter. )

Changing the unfolding method does not change the results significantly, max about 1%. This uncertainty is well covered by MC choice uncertainty. The number of iteration for the D'Agostini method is chosen to be 4 and the reasons are written in the AN line 336. Basically it has been first checked the convergence of the Chi2 between the unfolded distributions obtained with increasing number of iterations and then it's chosen a number of iteration that gives a chi2 less than 1/sqrt(2) and in the same tome greater than 4 to avoid biasing the unfolded results towards the simulation used to construct the responce matrix.



What defines the order of systematic in table 1? Not the size, not grouping similar sources together. it seems somewhat random. Please use a logical ordering. Preferably same in text and table.

We now ordered the systematic as a function of type. Physic objects, event, Unfolding and theory.

line 256 states "variations corresponding to each source are given below" however the following paragraph does not list all sources only some selected (large) ones. Leptons, pileup, reducible background. are not given. I understand why you do not want to use the same table as for the diff xsection but then maybe you can add a separate one. In any case the text is misleading as it is.

We have added the missing systematics to the text.

For my education: In your reply you said that the Phantom sample is the nominal one in ZZ+jets while MadGraph is the nominal for the VBS analysis. Is there a physics reason for that or simply a historical thing?

The reason is somewhat historical. The PHANTOM sample covers the entire GEN phase-space and does not allow to train the BDT. The MG sample was made specifically for this analysis and has much higher statistics.

I am still curious to know the contribution of MCFM and PHANTOM to the theory total predictions for the section analysis.

Table 5 and 6 of the AN report the yield for each MC sample together with the observed, inclusively and per jet multiplicity respectively.



Table 3: is the lumi uncertainty correct for >=3 jets? Seems larger than 2.6% Maybe just rounding?

Yes it's the rounding.


Can we add the theory predictions to the table?

It would still be useful to give the measured and predicted (njet inclusive) fiducial cross-sections somewhere as well (referring to the ZZ section note) as most cross-sections are normalised to 1. Is there a reason not to quote this number?

We added the predicted fiducial cross section per jet multiplicity in table 3.

Fig 7 caption, line 2: isn't it 100 < mjj< 400 ?

No, the plot text is correct. The nVBS selection is a strict subset of the ZZjj selection, which requires mjj>100 GeV.

Typos, etc:

line 23: Z boson pairs OR a Z boson pair

Fixed

line 48: are silicon pixel and strip tracking detectors

Fixed

line 122-3: missing spaces before "(" , 3 places

Corrected

line 332: os -> of

Fixed

line 386: The more recent Monte Carlo and parton-shower predictions -> The more recent Monte Carlo ME calculations and parton shower models adopted in this analysis show

Fixed

Ref [43] remove one of the "collaboration"s

Fixed

Aram Apyan 10 Apr 2016

Figure 7 and Table 11 in the PAS show the pre-fit plots and yields, respectively. Looks like the QCD ZZjj background normalization is pulled down by 1 sigma in the fit. Could you please provide the following:

a) The pulls of the nuisance parameters

The pull distributions for the pre- and post-fit are shown in FIG. 50 of AN-17-002.

b) The corresponding post-fit plots for the figure 7.

The post-fit for the full ZZjj selection:

Showing the pre-fit plots/yields can be bit confusing for the reader as the expected S+B events is 117 and the observed data events is 99 while we obtain a mu-value of 1.39. Instead, showing the post-fit versions in the PAS would be preferable.

Thank you for this suggestion, we think showing a post-fit plot could indeed be an comprehension aid for the reader. However, it is not obvious how this can be accommodated in the current logic/line of reasoning in the PAS:

FIG. 7 (a) shows the nVBS control region and together with the full ZZjj selection in (b) it allows the reader to:

  • Convince herself that the BDT selects the VBS-like signal region
  • Show that the QCD background shape is in good agreement between data and MC

Now, the fact that we have mu>1 is of course due to the upward fluctuation in the most signal-like bin of the BDT, which is of course evident from FIG. 7 (b). We could eventually add a sentence on this after L351.

Isabel Josa 10 Apr 2016

Isn´t there any reduction in the systematic uncertainty from JES in the
normalized distributions (right column of Table1) ?

The JES uncertainty can change only the value of the pT of the jets and so in the jet multiplicity distribution the pT variation can only move jets from one bin to the other, leaving the total normalization identical. For the same reason the JER uncertainty doesn't change on the normalizated distribution.

Comment: AN (L 70-73) state that MadGraph5 aMCatNLO is chosen as the
reference MC (vs POWHEG) because the latter does not contain events with
2 jets at matrix element, the MadGraph5 aMCatNLO sample is expected to
describe better the variables related to jets.

However, looking at the results there is no significant differences
between the description of the two MCs. Maybe it is worth to comment it
explicitely in the discussion.

Working on it


Question: PAS does not include results in the "wide" region, but just
for me to understand it. Why the cross sections in the tight region (the
ones quoted in the PAS) are larger than those in the wide region ? Can
you remind me the definition of the wide region ? Thanks.

The "wide" fiducial cross section only requires the two Z to be on the mass window 60-120 GeV. The differential cross sections for the wide fiducial region are shown in pb while for the tight fiducial region are in fb.


Typo: Please, check L 92. There are some words left over from the
previous version.

To be corrected

Darien Wood 4 Apr 2017 on v9 of PAS

Type B comments/questions:

1. For the differential measurements, something I miss is a report of the number of events used in the sample. Since normalized distributions are presented, this information is not apparent to the reader. The unfolded distributions are indeed the final result, but it is interesting to know, for example, how many ZZ+>=3 jet events are selected. If the plots are included with the reco-level comparison, that would address this. Otherwise, I think this information should be given somewhere else.

We suggest to have this information in the form of a RECO level plot as a supplementary material. Including it in the PAS would require some editorial effort, and we will not be able to include them in the combined 8+13 TeV paper because these plots have not been approved for 8 TeV.

2. I am confused by the statement about the gg->ZZ calculation on line 92: "The gg->ZZ process is calculated to O(a_s^2), where a_s is the strong coupling constant, while the other contributing processes are calculated to O(a_s^4); this higher-order correction is included because the effect is known to be large." This makes it sound like the gg->ZZ is only calculated to O(a_s^2), but that is only leading order for this process, and the text (and reference) claim an NLO calculation.

The loop-induced ggZZ contribution is scaled to the NLO prediction. This sentence has been removed because it is redundant with the previous sentence.

3. Lin 110: "performed at leading order using MADGRAPH5_AMC@NLO ". The name of the generator ("@NLO") and the order of the calculation (LO) are different, but maybe I am taking the name too literally.

The sentence is correct. Since the merger of the MadGraph and AMC@NLO codes, this is the official name of the generator, as recommended by the PubComm guidelines. It is indeed confusing.

4. Line 225: "The uncertainty due to the jet energy resolution (JER) is 5.5%". The table says 1.2-5.5%, and it make sense that this varies with jet multiplicity.

Fixed

5. Line 324: Can you define Rp_T^{hard}? It seems that all of the other variable are defined in the text.

We added the explicit definition.

Type A comments:

Line 4: suggest "the mechanism of electroweak symmetry breaking (EWSB)."

Fixed

Line 15: suggest "that both radiate vector bosons, which then interact."

Fixed

Line 16: add commas or parentheses around p_T

Fixed

Line 34: "The results on" seems superfluous. Suggest, "The dependence of the cross section...is measured and compared to the predictions from recent Monte Carlo event generators."

Fixed

Line 35: "two p_T-leading jets' properties" -> "properties of the two p_T-leading jets"

Fixed

Line 300: incorrect formatting of units in "8 TeV ". Use macro.

Fixed

Line 307: "p value" -> "p-value"

Fixed

Gabriella Pasztor 4 Apr 2017 on v9 of PAS

I looked again at your trigger description in the PAS draft (starting at line 124) and it seems really out of date for the full 2016 data set.

For example HLT_Ele17_Ele12_CaloIdL_TrackIdL_IsoVL_DZ_v and HLT_Mu8_TrkIsoVVL_Ele17_CaloIdL_TrackIdL_IsoVL_v, that you describe as main dielectron and muon+electron triggers were prescaled in the 2nd half of the year.

We use the same list of triggers as ZZ inclusive and the HZZ4l analysis. Both include the triggers you point out. In fact the trigger description in this PAS is identical to that of SMP-16-017 (ZZ inclusive). We do not remove trigger paths for the later data. In the end the trigger efficiency was evaluated in a tag-and-probe study carried out by the Higgs analysis and documented in HIG-16-041.

I am not sure what you are using as it is not clear in the AN either, so it would be great if you could just point us to the trigger list in your code. The trigger list in the VBS AN (Tab. 2) is up to date and in sync with what is used in HZZ4l.

Isabel Josa 3 Apr 2017 on v9 of PAS

L 280-281 in the PAS (v9) explain how the normalized distributions are obtained:

"All the distributions of the corrected number of events are then divided by the bin width and normalized to one."

This means that although the Y axis is always labelled as 1/sigma_fid x dsigma/d(relevant variable) the sigma_fid used in each case is different: sigma_fid for current Fig.1 is the sum of the values in Table 3, sigma_fid for Fig. 3 is 8.0+3.0+1.3 fb and sigma_fid for Figs. 2 and 4 is 3.0+1.3 fb, right ?

Please, explain clearly in each of the figure captions what is the sigma_fid used in that particular distribution.

We always normalize the distribution to unity and since we can have overflow entries this is not always true. The sigma_fid correspond always to the integral of the distribution under study. We added a clarification in the caption explaining what sigma_fid is in every plots.

Shouldn ´t it be interesting to include the ZZ+jets production cross sections as a function of the inclusive jet multiplicity, at least starting in 1 jet, i.e. sigma(ZZ+>=1jet), sigma(ZZ+>=2jets) ? you are already giving +>=3jets. Either as a table or as a plot. This is usual information included in W+jets and Z+jets analysis (you may want to check SMP-16-005 or SMP-16-015 analysis).

The measurement of the inclusive cross-section in term of jet multiplicity is a different measurement than the exclusive measurement we present in SMP-16-019. For sure it is interesting, but it means that we need to do a completely new measurement as it is based on different unfolding matrices. Since there is nothing wrong in presenting the ZZ+jets differential cross section in term of exclusive bins, I really prefer we present the results in term of exclusive bins. Also, looking in prospective of the paper, for 8 TeV data it is almost impossible to add such measurements.

L 96-98 Are background samples generated at LO ? NLO ?

They are NLO samples. We added this detail in the text.

L 220 ``Depending on the jet multiplicityâ?Tâ?T can probably dropped.

We’d prefer to leave it, as it is the first time we mention that the uncertainty ranges refer to the variations in the Njets bins.

L 241 Reference for the PDF4LHC prescription?

Reference added

L 271-272 MADGRAPH (qqbar -> ZZ), MCFM (gg -> ZZ) and PHANTOM (qqbar-> ZZ + 2 jets) for 8 TeV dataset and --> remove it, it refers to 8 TeV analysis.

Fixed

L 273-274 for 13 TeV dataset. No longer needed.

Fixed

L 286-288 The systematic uncertainties in each bin are assessed from the variations of the nominal cross section by repeating the full analysis with each source of uncertainty varied.

Maybe you can explain what systematic uncert. are reduced in the normalized distribution wrt absolute cross sections (those in Table 3) because of cancellation in the ratio. I guess syst. values of uncert. in Table 1 refer to diff. cross sections in Table 3.

We included in the systematic table both the values for absolute cross section and for normalized one. We also explained that only shape variations are included in case of normalized distribution.

L 291 ``for 13 TeV data sets of samplesâ?Tâ?T. No longer needed.

Fixed

L 296-298 ``Distributions taking into account the variations of MC predictions are scaled to the corresponding default distribution and are not normalized to the unity.â?Tâ?T

Removed.

Does it mean that you are only considering shape variations in the MC predictions ? What do you mean by ``are not normalized to unityâ?Tâ?T ?

The text was poorly worded. We consider both the shape and yield variations for each uncertainty in case of differential distribution and only shape in case of normalized one.

How do you handle overflow bins ? I mean, in the case of the pT(jet) distributions there may be jets with pT > 500, 300 GeV for the leading & sub-leading jets and in the case of jet rapidity distributions there are jets with abs(eta)>4.5 (you cut at 4.7). Do you still normalize the distribution to 1 or divide it by the sum of cross section values in Table 3 (40.4 fb) ?

Are always normalized to unity. We modified the caption in order to clarify it.

About Figure 1 (differential Njets distributions). Rechecking again the preapproval comments, I can read that there was an explicit request to present them in terms of absolute cross sections, not normalized ones, to compare them with the predictions. One can retrieve the experimental absolute distribution multiplying by the sum of the values given in Table 3, but the expectation from the MC is not given. Can you please, include the absolute cross sections back in the PAS ? In fact, both, absolute and normalized differential cross section can be presented. You can present again your arguments at the approval session, but the information should be ready and presented.

Now in the pas there are both absolute and normalized plots.

L 299-307 Discussion of the results.

Some part of the discussion that was in the previous version (L 335-347 in PAS v8) has been dropped and the current paragraph addresses mostly differences wrt the 8 TeV analysis. Some of the comments in the removed paragraph are still valid in the present analysis and should be brought back. I would suggest that you discuss first what you observe in the current analysis and then mention differences wrt SMP-15-012, if relevant (and if both sets of results are indeed comparable).

we kept back part of the 8 TeV description and rearranged the paragraph.

Figure caption for Figure 2, second line. ``for 13 TeV â?Tâ?~ no longer needed.

Removed

Figure captions for Figures 1 to 4 have a smaller font size than for the rest of the Figures. Unify them (personal preference, to the font size used in Figs. 5 to 7).

Corrected

L 347 Do you have the fiducial cross section value ready?

The numbers have been added to the PAS.

L 348 Do you have the SM cross section ready?

The numbers have been added to the PAS.

Derivation of limits for aQGC. Information is very reduced in this part. I would suggest to include few sentences concerning:fitting procedure, is it the same maximum-likelihood fit you used for the EW signal extraction ?, something about the interpolation of the aQGC (L 874-877 in the AN). Are the limits dominated by the statistical or systematic uncertainties ? mention it in the text. How do you derive the unitarity limit (it is explained in the AN)?

We added more details on the statistical modelling and the aQGC interpolation to the PAS. The statistical method is (test statistic, Maximum likelihood fit) identical to the EW signal strength extraction. The limits are entirely dominated by the statistical uncertainty. The unitarity limit is given by the energy which violates unitarity, setting the couplings strengths equal to the respective observed limits, as described in L891-3 of the AN.

Figure 7 does not look exactly the same as the other VBS plots. Check legend, line thickness etc. and redraw the axis.

The styles of the plots are now in much better agreement.

Typos:

L 120 pielup --> pileup

Fixed

L 199 high-pT isolated ... T should be in roman.

Fixed here and throughout the PAS.

L 218 it has found --> it has been found ?

Fixed

L 220 compositions --> composition

Fixed

L 224 ... the the ...

Fixed

L 222, L 224, L 234 ...uncertainty on ... --> uncertainty in ...

Fixed

L 262 Z bosons are than --> ... are then ... ?

Fixed

L 359 Figure 7 (left) shows ... (left) not needed.

Fixed

L 361 coupling --> couplings?

Fixed

References.-

[17] CMS Collaboration Collaboration, --> CMS Collaboration,

Fixed

[21] Collaboration Collaboration, --> CMS Collaboration,

Fixed

[36] To be completed.

Fixed

[42] To be completed.

Fixed

Gabriella Pasztor 3 Apr 2017

Abstract:
line 6: cross section as a function of the .

corrected


line 8: remove "the distributions of" - those or also diff sections

corrected


line 13: wouldn't it be more useful to give the signal strength here?

Main text:
line 23: Z boson pairs OR a Z boson pair

Fixed


line 35: corrections

Fixed


line 34-38: "The results on . are measured and compared." seems a strange construction
Anyaway, sentence is vey long and would probably be better to cut in half.

Fixed


line 38: add which generators


line 40: two jets with large mjj and dyjj

VBS use basically the same cuts used in the ZZ+jets analysis since is a fit on the shape and not a cut and count measurement.

line 48: are silicon pixel and strip tracking detectors

Fixed


line 66: 1.3$-$2.0

Fixed


line 66: add pT range as done in the next sentence

To be fixed.


line 70: from close to the nominal interaction point? Do we need "close to"?

Done.



CMS detector description:
Maybe this is the standard text but I find it curious that some numbers are only given for barrel.
Eg. lines 55-57 have a very restricted phasespace where resolution is given.
Similarly lines 69-71, give info only on the barrel though the full barrel+endcap range is used in the paper.
It is not clear why the HCAL eta-phi granularity is more important than the ECALs.

line 86: "and for comparison . processes." This half sentence is hard to read. missing a verb?

Fixed



It would be useful to add the most important EW and QCD diagrams that lead to ZZjj.

Adding only the VBS diagrams would create an imbalance between the two parts of the analysis (diff. cross sections, VBS), so one would have to also add some ZZ diagrams. For a VBS-only paper we'd definitely want to add these diagrams.



line 106: 1% to the EW yield?

Its 1% of the total ZZjj yield, as stated in the text. Detailed numbers and distributions of the interference are shown in Fig. 2 of the VBS AN.


line 115: introduce PDF abbreviation

Added


line 128-130: not clear which triggers are referred here, especially for the single leptons.

We updated the pT thresholds, the description.


line 132: within the ZZ search region??? -> selected by the four-lepton analysis criteria?

Fixed.

Event selection: mention lepton momentum and efficiency corrections

We added the data-driven momentum, resolution and efficiency corrections.



line 184: e-e separation requirement is only ~one calorimeter crystal
Is the efficiency well understood for such close objects?

This technical cut is part of the HZZ and ZZ inclusive event selection, since run I. It is mostly intended to remove spurious duplicate objects (“ghost cleaning”). From a physics POV, it is entirely irrelevant for this analysis, as the bulk of Z bosons has pT<200 GeV. See also responses to Isabel’s question on the high-pT muon ID from March 2nd.


line 187: mass closest to the nominal Z boson mass of 91.2 GeV is denoted Z1

Fixed


line 190: ZZ selection defined here. It appears for the EW ZZjj analysis later. Do I understand that for the ZZjj analysis, multiple ZZ combinations can enter the plots? Why was this choice made? It would be useful to explicitly state that these ZZ candidates will be used for the EW analysis without resolving the ambiguity.
line 193: Add rate of ambiguity as it is the relevant number for ZZjj

There seems to be a misunderstanding. The events used in the VBS analysis are exactly the same as in ZZ inclusive and the diff. cross sections. This means we apply the same ZZ arbitration based on lepton pT. There is no reason why the rate of ambiguous events in ZZjj should be different from the ZZ inclusive one (which we quote as 0.3% in the text).


Background estimation:
I would add the size of the estimated backgrounds

Added



line 217: just to be sure. the ~98% trigger efficiency is taken properly into account in the cross-section and this 2% uncertainty is on top of that.

The trigger efficiency is automatically taken into account in the unfolding procedure together with all the other efficiency sources. The 2% uncertainty is taken into account with all the other uncertainties.

line 220: remove "Depending on the jet multiplicity"

Removed


line 223: what is included in the 0.1-1.2%?

The statistical background of the MC samples


line 224: Add JES uncertainty size

Added


line 226: is lepton ID uncertainty rely parametrised as a function of jet multiplicity?


They are not parametrized as a function of the jet multiplicity but, they do depend on the event kinematics, which vary with the emission of extra radiationn.

line 238: For my education, why mZ and not 2mZ is the default scale? Is it MCFM or CMS choice?

It was MCFM choice. Anyway, this part as been removed since was something measured for 8 TeV and the same part is now taken into account by the theoretical uncertainty on the MC. For what concern MCFM as generator of ggZZ loop induced production the central scale in MCFM is actually dynamic and equal to m4l/2. The scale was optimized by the HIG PAG.


Table 1 caption: so all uncertainties except trigger and lumi depend on the jet multiplicity?

Yes since for both lumi and trigger we have only global uncertainties. The other have always at least a small dependency on the jet multiplicity.(change in isolation, kimatic, etc..)



line 249: Why are the theory uncertainties so larger for the EW ZZjj analysis?
As the values for the few sources mentioned here are so different from the ones in table 1, I suggest to have an extra column and list all uncertainties also for the ZZjj analysis in table 1.

The numbers quoted for the VBS search are based on the maximum deviation on the BDT spectrum. This is naturally more sensitive to the uncertainties than the per-jet-bin inclusive numbers reported for ZZ+jets. This is also one of the reasons why we would like to keep the VBS figures separate from the numbers on the fiducial cross sections. The other reason is that Tab. 1 will contain also the numbers for the 8 TeV results for the paper, and then we’d be mixing 8/13 TeV in addition to ZZ+jets and VBS numbers.

line 257: for consistency: delta_eta_jj

Fixed.

line 261: lepton momenta

Fixed

line 261: so here the ambiguity is resolved for the 4l pairing as well?

Yes, the same selection used on data/RECO is used at the GEN level.

line 268: and the reconstruction-level

Fixed

line 272: remove 8 TeV stuff

Fixed

line 273: this is MadGraph5 _aMC@NLO as described in section 3
I suggest to search for mad graph in the pdf as several different names are used at present (see e.g. line 230)
It would be good to use the same typeset everywhere and be consistent with the text and the figure labels

Fixed throughout the PAS.

In line 100 phantom appears here as the alternative for EW Zjj, however here phantom is the nominal MC for all ZZjj. This seems contradictory.

The Phantom sample is the nominal one in ZZ+jets. MadGraph is the nominal for the VBS analysis.

I would be curious to know the contribution of MCFM and PHANTOM to the theory total predictions

Table 3: is the lumi uncertainty correct for >=3 jets? Seems larger than 2.6% Maybe just rounding?
Can we add the theory predictions to the table?

line 280: divided by the bin width. for the jet multiplicity this does not make sense, as the bin width is 1 for the first 3 bins and then undefined for the last.
Maybe this sentence should go after the jet multiplicity discussion?

the bin width part has been moved to the captions of the plots where is actually applied.

line 281: It would be useful to give the measured and predicted fiducial cross-sections somewhere as everything is normalised to 1 so the absolute (dis)agreement can not be deduced from the plots.

For now we added the distribution not normalized to 1.

line 284: Figures 3 and 4

Fixed.

line 290: ones for both the . POWHEG predictions are also reported.

Fixed.

line 292: uncertainties on the MC

Fixed.

line 300: 8 TeV typeset
comma after [17]
the 13 TeV predictions, using newer version of the Monte Carlo ME calculation and parton shower, show
(or similar but need rephrasing along these lines)

Fixed and rephrased.

line 304: not sure the disagreement here is significant. Hard to read the plot but even the first bin seems to agree within 1 sigma. Less 1 jet events is the only significant here. P-value 15% is not really significant.

line 307: also the eta distributions show a small slope (deficit at large eta) which might explain the Njets disagreement being larger for the |eta|<4.7 region.

Fig 1: for the ratios, please zoom in as much as possible. The labels take up too much space as they are now and the values are not easy to read. Why are the powheg uncertainties so small (0?) wrt Madgraph?

The uncertainties shown are from the matrix-element calculation, as stated in the caption.

In general the labels on the plots have too small fonts and difficult to read.

Will fix for approval plots.

Captions figs 3-4, line2: Missing space after full stop

Fixed

line 321: to exploit

Fixed.

line 324: define event balance

Added definition.

Figure 6: do I understand correctly that the left plot shows events that are a subsample of the events on the right?

Fig 2 caption: isn't it 100 < mjj< 400 ?

Added the cut on mjj>100GeV

line 346: what about the ambiguity resolution for 4l pairing? See earlier comment.

See response to first questions.

lines 347-8: missing values

Fixed.

line 348: missing "fb"

Fixed.

line 345: MVA -> BDT

Fixed.

line 359: mZZ vs plot axis title m4l: use the same

Fixed.

Fig 7: would the fT0-2 couplings result in a similar shape that is shown here for fT8-9? I'd be interested to see their prediction as it is not shown in the AN either.

The “shape” of the different aQGC operators is almost identical. In the end the limit is entirely driven by the last/overflow bin. We do not show the predictions of the T0-2, but we do show the yield paramerization in the last/overflow bin for these operators in FIG. 52 of the AN. The yield predictions corresponding to the limits we set for each operator are identical, i.e. all limits correspond to the same yield increase.

Figs 5-7: add tics to top and right borders

Done.

Can we add a figure that shows these new and also previous bounds on these anomalous couplings? Such plots are always great for presentations.

We agree that these plots are great, but we believe these type of summary plots are done by the SMP PAG. We could add it, but we don’t recall seeing such a plot in a PAS.

line 372: The newer versions of the Monte Carlo ME calculation and the parton shower . (or similar)

Fixed

Ref [6] ATLAS and CMS Collaborations

Fixed

Ref [12] update

Fixed

Ref [36], [42] incomplete

Fixed

Ref [41] too many collaboration

Fixed

Ref [46] add Journal reference of this arxiv: Proceedings of the PHYSTAT 2011 Workshop, CERN, Geneva, Switzerland, January 2011, CERN-2011-006, pp 313-318"

Added.

Ref [53] seems to have some issues with the note number, should be CMS-NOTE-2011-005 ; ATL-PHYS-PUB-2011-11

Fixed.

Refs: please, use same format for all PAS, e.g.. compare style of [38] and [44]

Formatting is homogenised

Pietro Vischia for Stat. Com. 14 March 2017

- Unfolding procedure: could you please add a statement in the PAS specifying the number of iterations you chose for the unfolding procedure, and how did you choose it? (also, minor: in the systematics section you quote the unfolding procedure before you actually introduce it)

- Agreement between stuff: in figures (e.g. fig 1) and text (e.g. the paragraphs of L336, L351), it would be good if you quoted p-values from a goodness-of-fit test. This in particular since you venture into ranking the agreement of different generators. I suggest that for each generator you add an inset text in the ratio plot, with a quoted p-value (for example from a chi2 or KS test).

Added the P Value obtained with Chi2 on the ratio.

- Table8 and relative text: are the CIs two-sided? From the text and table it is not clear to me whether you set separately upper and lower limits, or if you are computing a two-sided interval. Perhaps you could make it more explicit in the text and table.

We made it clearer by always referring to lower and upper confidence levels.

- Typo in the AN: L667 of AN-2017-002: "RCO" ---> "ROC"

Fixed.

- The hyperparameters optimization study that you performed looks very nice: I think it is a pity that it is not apparent in the PAS: perhaps you could add an additional sentence?

We have added a sentence in this sense.

- The study you do at L689 of the aforementioned AN: you take out one variable, you retrain, you take out another one (leaving out also the first one), etc, (i.e. you train with N variables, with N-1, with N-2...), is that correct? This is a good approximation, but actually the full procedure should be to try out all the various possibilities (N variables, N times N-1 variables, etc). I don't think you need to actually implement it, just to be clear. It is just a suggestion for next time smile

The procedure you describe is exactly what we did, i.e. we retrained the BDT with N variables N times, each time dropping a different of the N variables. The AN text was indeed misleading and has been modified to make the procedure clearer.

- Overtraining check: in the aforementioned AN, in L698, I find an overtraining check, but as in the PAS the comparison between distributions is done by eye. It would actually be very advisable, particularly for the overtraining check, to estimate the amount of overtraining by quoting a p-value from a GoF test. Traditionally TMVA uses Kolmogorov-Smirnov, but since you seem to have high statistics you could even use a chi2 test.

Thank you for these suggestions.

Gabriella Pasztor 6 March 2017

Table 2:
You use a large number of triggers, some of whom were prescaled or disabled in the 2nd half of 2016 data taking.
The information in Table 2 is not (yet) correct for these prescale values.
Also the L1 seeds are not properly listed. For example the dielectron triggers are seeded by an or of multiple single and diEM L1 seeds.
Are the changing prescale values taken into account when calculating the data luminosity and the trigger efficiency?

Table 3:
Zg sample missing

Thanks to point it out. Now fixed.


The madgraph sample listed as qq->ZZ->4l according to line 69 also contains "gg->ZZqq" "with 0 or 1 jet". What do you refer to here? Something like gg -> qq qq -> q ZZ q which has 2 jets?

Yes exactly.


Sec 3.
line 91: please give the criteria or point to a suitable reference:
https://twiki.cern.ch/twiki/bin/view/CMS/JetID13TeVRun2016
if these fractions are also coming from loose jet id recommendations

Added

line 108: Ref 16: needs a twiki address
Also Ref 14 is not complete.

Tables 5,6: contributions from WZZ, ZZZ?

We removed these samples since they have an overlap with other signal samples.

Table 6: Presumably the last 3 columns belong to >2 jets (not including = 2 jets).
Missing numbers for total irreducible.
Wrong sums in several places in same line, e.g. 0.32+0.39 = 0.49 (1jet, 4mu ch)

Error in copy the values. Now is corrected.


line 120: Hm. jet pT distributions (fig 5) show a different slope.

With the full statistic there is not a slope any more. There is a good agreement between Data and MC.


Fig 6,7: contribution from ttWW mentioned in line 126 but not shown here

Removed since the samples is not present for Moriond. It's contribution is negligible. Mention on the AN has been removed.


line 142: How is the trigger efficiency measured in data?

Is evaluated with a tag-and-probe technique. Is written in SMP-16-17 but we could add this information also here.


line 144: is this the latest lumi error? Should it be 2.5%?

The version of the AN presents results with the half of the statistic and the uncertainty was 6.8%. Now is updated to 2.6 %.


line 158: why the lepton efficiency is used to estimate the QCD scale uncertainty on the ZZ cross-section? Which lepton efficiency is varied here?

Sorry for the mistake. Is actually the cross-section of each leptons and not the efficiency per lepton. Now corrected.


line 171: Which systematics are not propagated to the unfolding?

All the systematics that have a global values and not depend on the variables we are studying. For example the Luminosity.

line 177: What do you mean by this sentence? Only uncertainty on the shape is considered but none on the normalisation? This sounds strange but I probably misunderstood something...

It's also taken the difference in normalization. Sorry for the mistake. It has been corrected.

sec 6: No mention how correlations are treated.

To be added.


line 213: Sources are then added in quadrature?

Sources are added in quadrature.


Figs 9 - 16: I would make a better use of the y scale (i.e. decrease the ymax) wherever possible to have the distribution zoomed in as much as possible. Extreem example is Fig 15 top left plot where the distribution uses less than a 3rd of the y axis range.

I am not particular fan of differential distributions with only two bins. Could we rebin with the full dataset?

All the plots with 2 bin have now a more fine binning thanks to the higher statistic.


Why do we show only shapes? Having the non-normalised results have more information in my opinion?

We think that the information on the the normalization can be taken from the measure of the inclusive cross-section. It's also what is done in previous/other analysis such ZZ inclusive (SMP-17-017).

Fig 15, top left figure, bin 4: is the error bar correct? seems to be missing actually...
Fig 16, bottom left, bin 3: same question.

We have been investigating

Fig 17, right" highest data point missing

To be corrected

Fig 21: Interesting that the data - MC relation changes with unfolding. Was it checked why?

The main features of unfolding are correct for efficiency and resolution. For this reason it's expected to see some differences in data MC relation after the unfolding.

Fig 39: white means no entry? Treated as 0?
What negative values mean? Low stat weighted MC, I guess? Treated as 0? Binning and MC stat does not seem to be well matched here.

Yes, white means no entry and are treated as 0. The negative bins are indeed due to MC weight and are treated as 0 as well. The leakage of low statistic far from the diagonal is due to the high resolution of the invariant mass of the 4 leptons. Because of that is very unlikely to unlikely to fill some bins even with the high statistic of our samples. However, since the values inside those bins are almost 0 they basically don't count in the unfolding procedure. Finally the binning is chosen in order to fit of the statistic of the data.


Fig 77: I do not see why you claim SVD (k=2) is biased but Bayesian not. There is no qualitative difference between Fig 77 and the corresponding plots with Bayesian unfolding. Am I missing the point?

No your comment is right. After some changes and an update of all the plots apparently those variables for SVD (k=2) are not explicitly biased anymore. More evident biased variables plots have been shown now. Now is evident that the unfolded shapes are biased towards the "true value".


To see whether there is a bias or not, should we check whether the result changes if we assume different slopes especially for distributions where data and MC does not seem to match so well, e.g. jet pT.

Could be done but it would be very tricky and difficult to understand if the distribution is biased or not in this way. On the other hand, the extreme case where a flat reco distribution is unfolded by a response matrix build with a non-flat variable could prove that the result does not get biassed towards the distribution used to build the unfolding matrix.


Sec B3, could you show the difference in the prediction of the true and the reconstructed variables for the two different signal MC samples (Madgraph+MCFM+Phantom and Powheg+MCFM+Phantom) by simply overlaying the distributions? It would be useful to know how different they are to understand how meaningful this test is and whether it covers effects that might hide in the data.

To be done


Fig 78 and on, the lower panel should have a y scale better adapted to the observed values, so that we se if the observed differences are compatible with the stat error. Do the ratios have errors? They are surely not visible but it may just be that they are really small wrt the scale.

Fig 124: highest data points missing

Now corrected


Figs 134, 135 + similar: Why the very asymmetric contribution from JER and JES?

Now corrected. The uncertainty are not more asymmetric.

Isabel Josa Mutuberria 5 March 2017 part II, AN-17-002

Did you perform any closure tests of the performance of the BDT to separate QCD and EWK ZZ+2jets components ? I mean:

- Generate QCD+EWK pseudo-experiments according to MC distributions in Fig. 43 and extract the QCD and EWK fractions to test there is no bias.

- Moreover, fit the QCD only pseudo-experiments with the full BDT and check that no fake EWK contribution is extracted. Experimental distribution in the nVBS control region (Fig. 43 top) has also some events in the high BDT region.

The suggested studies have been done and are documented in the AN. The statistical model is unbiased under both the S+B and B-only hypotheses.

Aram Apyan 3 March 2017 PAS v7

Line 88-89, you say that the powheg sample is not expected to describe the event since it doesn’t contain events with 2 jets from the hard hard process. At the same time on line102 says that the nlo sample with 0 and 1 jet is used. Are you going to switch to the 0,1, and 2 jet amc@nlo sample?

Sentence in lines 88-89 refers to POWHEG sample used as a reference for 8 TeV while in lines 102 there is the description amc@nlo sample used for 13 TeV. So I'm not sure about the question. If the question is if we want to use amc@nlo also for 8 TeV, we can't since that sample exist only for 13 TeV.

Line 94. I wouldn't mention GG2ZZ.

Since we are reporting an update of some 8 results already published by CMS we are reported the differences with respect to the previous results at 8 TeV. GG2ZZ was used in the old 8 TeV published results. However we decided to remove it since it's misleading and it would need more explanations and the description the all the other differences with respect to the previous result.

Line 122-123. How was the interference calculated? In what fiducial volume is it 1%? This should be mentioned.

Line 188-189. Do you apply any pileup jet identification?

The results in the pas v7 have the pileup jet id with the medium working point. however we decided to drop the PU jet ID from the analysis for several reasons. First of all, it is a highly inefficienct ID for forward jets: for jets between 3 < η < 5 it is expected to reduce signal and background equally by 10%. This translates to a loss of 20% for the ZZjj selection. Also, it is poorly studied in data, i.e. no efficiency measurements have been performed on 2016 data. The last such measurement is from run I, and showed data/MC scale factors of around 5%. Also, the ID is known to be less performant in run II from the MC studies. Furthermore we expect that the data/MC agreemment will be worse compared to the run I situation. After all, this ID is based on a BDT that relies heavily on tracking information (number of tracks in jets etc), which is known to be poorly modelled due to the HIP effect in data.

Line 253-254. Why is the unfolding uncertainty so large for the 13 TeV (6%)? As you describe the unfolding is done for the migration and efficiency. However you measure the efficiencies in data and differences in efficiencies between the two MC samples (for example isolation) shouldn’t be propagated as a systematics uncertainty in the measurement.

The 6% uncertainty in line 253-254 was the uncertainty on luminosity. The systematic on the efficiency is propagated only for MadGraphamc @nlo changing the efficiency scale factors per their uncertainty. Not completely sure about the question.

Line 255-263 and table 1. Unless I am missing something we only measure the fiducial cross sections. Why does the theoretical uncertainty in the acceptance enter the measurement then?

It refers to the efficiency. Now fixed.

Table 1. What is exactly included in the “MC choice” uncertainty? Is this the unfolding uncertainty?

Yes It's the systematic on the unfolding. It'is computed comparing unfolded data distributions obtained applying the Madgraph + MCFM + Phantom and the Powheg + MCFM + Phantom response matrix and taking the difference between the two results.

Line 325. Do you vary the renormalization and factorization scales independently. This should be made clear in the pas.

Line 289-290. The theoretical prediction for the 13 TeV should be given

We have been waiting to the NNLO value from Grazzini. In the meantime we added the prediction from MC samples as did in SMP-16-017.

Table 3-4. The luminosity uncertainty is constrained to 6.3% Which background prediction constrains this uncertainty from the 6.8%.

Line 308-309. As I understand you are using your measurements when normalizing and these will be identical to SMP-16-017 after the full synchronization.

Couple of style comment:

Line 7-9, The scattering in nature doesn't violate the universality. You should say that the amplitudes for these processes will violate universality.

Thanks for point it out. Now corrected

Line 9-12. The sentence is bit awkward as it reads that the discovery suggested that the unitarity will be restored.

It has been changed.

Line 403. Perhaps one or two additional sentences are needed to introduce the anomalous coupling parameters.

Isabel Josa Mutuberria 2 March 2017 part II, AN-17-002

At some point (L 664-665) it is written ... hadronic observables related to the third jet veto or the production angles ... Do you veto the presence of a third jet ?

We do not include any jet vetoes or observables related to hadronic activity. The AN reads: "Given the overall small improvement and considering the appreciable modelling uncertainties introduced by the hadronic observables related to the third jet veto or the production angles, these observables are not considered in the final BDT."

Why don´t you use the Madgraph@NLO sample also for the MVA training/testing ? It should provide a better description of the input variables to the BDT.

We totally agree and are currently exploring this. These samples only became available quite late - after all they were first produced in the Moriond campaign - and we just processed them on RECO

Did you compare the performance of the two Madgraph simulations (LO and NLO) to reproduce basic ZZjj kinematic distributions (before VBS selection, i.e. no mjj cut)?.

This comparison is shown in FIG 8 and 9.

Uncertainties.-

784-785 Uncertainties arising from the trigger as well as lepton reconstruction and selection efficiencies range between 5% and 12%, depending on the final state.

Do these 5%-12% include the 2% of trigger efficiencies mentioned in L 781 ?

These are the preliminary ICHEP numbers and include the trigger uncertainties. We have updated them and will list them separately from the lepton efficiency uncertainties and per final state.

Is it 5% for dimuons, 12% for dielectrons ? the other way around ?

It was 5% for Muons, but see above comment.

Figs. 47 and 48. Variations in the QCD background and in the EWK signal look quite different. Do you understand why there is a kind of loss of events at low BDT score for the EWK signal ??

We are unsure if we understand the question. The low BDT score region for the signal suffers from poor statistics, also visible in the large error bars of ~5%. We want to note that these plots will be replaced by plots that show both the distributions and the ratio plots in the next version of the AN.

Fiducial cross section measurement.-

Can you include in the AN the part from AN-2016-331 relevant for this section ?

We can certainly recall the fiducial phase space definition here.

Don´t you make use of the just-fitted signal strength to calculate the fiducial cross section ?

One could certainly convert the underlying signal strength of the significance determination into a cross section measurement. The reason for defining the cut-based selection is to provide a straightforward definition of the phase space were we extract the measurement from, such that say theoreticians can use the resulting numbers without needing to know the details of an MVA. Also, we currently do not cut on the BDT distribution but we fit the entire spectrum of all events in the ZZjj selection to extract the signal significance.

How does this fiducial cross section translate into a signal strength for that region ?

Please see above argument. In the end signal significance and fiducial cross section are meant to respond to two different questions.

Isabel Josa Mutuberria 2 March 2017, AN-17-002

I understand you have still to move to a new MADGRAPH_aMC@NLO (FxFx) for the QCD ZZjj process, right ?

Yes, we will update to the new MC with the next version of the AN and the full data statistics.

It is written (L 229) We compare the kinematics for the LO Madraph, the NLO 0,1 jet, and, the newly requested NLO 0,1,2 jet sample. What is the difference between the existing NLO and the new NLO samples ? why the first one is only 0, 1 jet ?

Historically, the 0,1 jet sample was the first sample pp→4l MC done with MadGraph _amcAtNLO. Because it merges the 0 and 1 jet multiplicities at NLO, it has up to 2 jets from the Matrix Element. It includes not just the process of the on-shell Z bosons, but also the off-shell and low-mass photon processes with no cuts on lepton pT at generator level. It is used in the inclusive cross-section measurement. Because it is fully inclusive on the leptons, it has a low acceptance of ~20%. For the VBS analysis, it is practically useless, as only a few thousand events pass the ZZjj selection. This is certainly not sufficiecnt to train an MVA or create a reliable MC template. The new 0,1,2 jet sample is tailored to the VBS analysis in several aspects. Most importantly, it has much higher MC statistics in the ZZjj phase space (~200k events versus a few thousand in the 0,1 jet sample). It also is NLO up to 2 jets, i.e. the third jet emission is from the Matrix Element. While we do not use observables relating to any third jet in the analsyis (e.g. no central third jet veto), it is a welcome improvement in the modelling accuracy.

*Differences between the red (existing sample) and the black (new sample) in the plots in fig. 8 and 9, are they only statistical ? the first bins in the m4l distribution looks different. The mjj as well, and most important, the BDT score looks different.*

Fig. 8 and 9 are done with a small private production of the 0,1,2 jet sample, which suffers from low statistics (also visible in the small bin-by-bin jumps e.g. in the lower part of the dEtajj distribution). Also, the 0,1 jet sample has low statistics (few thousand events, see previous question). Because the 0,1,2 jet sample is NLO up to 2 jets, we expect it to provide a better prediction and this is why we use it as the nominal sample. Finaly, the official 0,1,2 jet sample will of course have much higher MC statistics.

*And the sample for the ggZZ loop-induced background process will be also updated, right ?*

Yes, the extisting ggZZ sample was found to have a faulty parton shower configuation. A new and fixed sample has been centrally produced and will be used for the updated PAS/AN.

*Choice of parton shower and underlying event tune (L 221). You compare Madgraph+Pythia vs Madgraph(same events)+Herwig. From the comparison the implementation in the reference MC is validated. Do you assign any systematic uncertainty for the modelling of the UE event and the parton shower at these low rapidities ? Maybe one could extract an estimation using the two BDTs (Madgraph+Pythia vs Madgraph+Herwig) and quoting the diff. in the cross section as a syst. uncert. ? It would also be useful to use several UE modellings ? or at least to change the UE parameters within their uncertainties.*

We do not assign an additional uncertainty based on the very small differences between the Herwig- and Pythia-showered samples. Also, our default MC is the Pythia one, which predicts a slightly lower yield in the signal region than the Herwig sample. And is thus conservative. One has to keep in mind that the low-BDT score part of the EW signal template does not actually matter in the signal extraction, because it is much much smaller than the QCD contribution. Also, one has to keep in mind that the Herwig UE model is not tuned in any way, while a lot of effort went into tuning the official Pythia model (for all of CMS). Finally, we want to point out that there is no recommendation on how to disentangle the UE/parton-shower effects and that this is the first time that such an effect is studied for a VBS analysis, i.e. no such study was performed for ssWW. In the end we conclude that the choice of UE/PS in this analysis (which exploits a fully-reconstructed leptonic final state and does not rely on observables related to the overall hadronic activity in the event) is negligible compared to other theory uncertainties like scale uncertainties.

*Does the GBDT multivariate classifier (electron identification) correspond exactly to the MVA identification described in SMP-16-017 ?* The lepton selection is exactly the same as that of SMP-16-017. In fact it is the same as in the HZZ4l analysis.

*Curiosity: Electron efficiencies seem to decrease with pT, mainly in the endcap. Do you know why ??*

This is indeed the case and is traced to the SIP requirement. The effect is modelled in MC and confirmed by a dedicated T&P study of the SIP efficiency in data. Because it is an overall small effect for electrons outside the typical pT-range of this analysis and because we want to stay synchronized with the HZZ4l analysis, we did not revisit this selection for Moriond.

*Is your muon identification tight working point the same as for the ZZ+jets diff. analysis (and SMP-16-017)?. Efficiencies showed in Fig. 20 do not necessarily apply here as here the two Z bosons are not coming from a heavy resonance. Do you have a similar plot but for muons from a ZZ+2jets sample ??*

The muon selection is identical to SMP-16-017 and the high-mass ZZ resonance search. Fig. 20 is indeed a poor illustration here. Because we use the exact same objects as SMP-16-017, which in turn uses the same objects as the high-mass ZZ→4l resonance search, we kept the figure even though it is irrelevant for the muons used in this VBS analysis. To illustrate this point, please find below a plot showing the leading Z boson pT form the EW signal (at generator level). It is clear that the bulk of our Z bosons will have a pT below 200 GeV, well outside the range relevant for the high-pT muon ID to take effect.

*I understand that no Jet Pileup id is applied,right ? How do you control PU jets ?*

We decided to drop the PU jet ID form the analysis for several reasons. First of all, it is a highly inefficienct ID for forward jets: for jets between 3 < η < 5 it is expected to reduce signal and background equally by 10%. This translates to a loss of 20% for the ZZjj selection. Also, it is poorly studied in data, i.e. no efficiency measurements have been performed on 2016 data. The last such measurement is from run I, and showed data/MC scale factors of around 5%. Also, the ID is known to be less performant in run II from the MC studies. Furthermore we expect that the data/MC agreemment will be worse compared to the run I situation. After all, this ID is based on a BDT that relies heavily on tracking information (number of tracks in jets etc), which is known to be poorly modelled due to the HIP effect in data. We also checked the impact of the ID on the analysis sensitivity, which is reduced by about 0.2 standard deviations when using the ID. Finally, we do not observe any issues of not using the ID in our nVBS controll region.

*JEC, I guess you are aware that there are some issues with the JEC in the forward region. To be followed.*

We are aware of the issue. It is worth pointing out that the data/MC agree within the admittedly quite large JEC uncertainties. We of course propagate this uncertainty to the BDT and consider it in the statistical analysis. Should the JETMET group release a refined version of the JEC, e.g. after Moriond, we will certainly use it for the paper. Our strategy is thus the same as the one of the HIG PAG.

*ZZ event arbitration. How large is the difference in the event yields with you procedure to select the Z candidates wrt the ZZ+jets differential analysis ?.*

We are not sure we understand the question: The event selection is identical to SMP-16-017 and the ZZ+jets differntial cross section measurement part of this paper. In fact the object, Z, and ZZ selections are all identical between the two papers/three analysis foci.

*Fake rate determination.- Not sure I understand the method ... trying hard …* Please let us know if we can provide any further explanations, e.g. via a chat. Ultimately, the methodology is unchanged and identical to the HZZ4l background estimation since run I.

Fig. 1 top-left, is it the right one ? there are very few entries.

We are unsure which figuer you refer to - surely not Fig. 21, which has ample statistics? Can you please clarify?

*How do you finally compute the fake rate from histograms in Fig. 21 and 22 ?*

The fake ratio is loosely speaking obtained by dividing the top-row histograms by the bottom-row of Fig 21. The only refinements are that we only use bins within +- 7GeV of the nominal Z mass (to supress FSR/bremstrahlung), that we correct for the small WZ contamination, and that we bin the fake ratio in η/pT. Details are described in the text. Fig 22 is only meant to illustrate the different composition of the FSR/bremstrahlung component in the right-hand-side tail (which we cut away anyway). The overall procedure is idential to what is done in SMP-16-017 and the HZZ4l analysis. In fact the same framework is also used in the to derive the Z+X estiamte in the HZZ4l analysis.

*Agreement data-MC is not great in any of the plots in Fig. 21 and 22 (maybe except Fig. 21 bottom right), not even in the peak.*

This is well-known from run I and the early run II multi-lepton analyses.We stress that these plots are for illustration only and that the method is entirely data-driven.

*Are conditions for Fig. 26 and 27 exactly the same ? I mean, is it the same quantity but plotted as a function of a different variable ??*

Yes, the same selection is plotted. The interest of Fig. 26 is of course to show that the fake ratios are stable w.r.t. pile up.

*Fake rate application.- Can you please, include here also Section 4.3 from [13]?*

We will add Sec. 4.3 to the AN.

*Can you please provide some control plots just before the VBS selection (i.e. ZZ+jets) where we can see the level of data-MC agreement?. m4l, pt, eta of the jets, mjj.*

Are you requesting the analogous data/MC plots of Fig 28 and 29, but for the ZZjj or VBS selection ? If yes, we can attempt to make such plots, but we fear that they will be of limited use, because the dominant source of backgorund is DY+jets, which suffers from low MC statistics already in the ZZ selection, with large MC fluctuations well visible. Requiring the presence of two jets, will leave us very very little, if any, MC events - after all we are looking at a Z+4jet topology (2 jets needed to fake two leptons that make an on-shell Z candidate and two jets of mjj>100 GeV to make it into the ZZjj baseline selection).

Isabel Josa Mutuberria 28 Feb 2017, PAS v.7

L 69 ... gg -> ZZqq (tree-level only) processes are produced at next-to-leading-order (NLO) for 0 and 1 jets with MadGraph5 aMCatNLO ...

This means NLO for up to three jets, right ?

This means that we can have maximum 2 jets from matrix element. So is NLO for 0 jet, NLO for 1 jet and LO for two jets. So a third would come from parton shower.

BTW, the PAS says (L86) ... and gg --> ZZqq (tree-level only) processes are produced at leading-order (LO) for 0, 1 and 2 jets with MADGRAPH 5.1 [22],

0, and 1 or 0, 1 and 2 ??

For 8 TeV dataset madgraph is LO for 0 jet, LO for 1 jet and LO for two jets. So is 0,1 and 2.

Numbers in Table 6 do not sum up numbers in Table 5. Probably not very relevant now as they will be updated with the full lumi, but please, check.

It will be double checked both this table and one with the full statistic.

Number of entries in Fig. 4 (delta phi_ll) should be the same the number of events in one of the two top plots, right ? doesn´t seem to be the

case, please, check. The bins are also divided by their width and that's why looks like they don’t give the same integral.

What is the selection efficiency ?? Can you include a table, with the eff. for the several jet multiplicities ?

To be added.

Table 8 seems to be missing. Table 7 is for 13 TeV, right ?

We are currently blind

Even if for the PAS it is ok to present the syst. uncert. as a range (covering the several jet multiplicities), can you include in the AN a

detailed table with syst. uncert. for the different jet multiplicities ?

To be added.

How do you calculate the trigger efficiency (data and MC). Maybe I should reread SMP-16-017 for it ?

The number it has been extracted with the tag and probe method. Yes it's written in SMP-16-017.

L 144 and ff. Does the 1% uncert. (from PDF) in the acceptance apply both to the wide and tight regions defined in L 185 ? I understand only

cross sections in the Tight region are given in the PAS, right ?

Yes only the tight region is given in the pas. In the wide region the uncertainty it should be different and will be estimated with the new MC that have all the weight to proper estimate it. But again only for the AN.

L 155 The irreducible background uncertainty varies between 15% and 19% ... Where do these number come from ? (SMP-16-017?)

The number it has been measured in ZZ+Jets framework. Anyway we will be fully synchronized with SMP-16-017 and also these number will be identical in the two analysis.

Can you expand a bit more why you estimate the uncert. in the QCD scale in the MC simulations by varying the lepton efficiency (L 158-159) ?

Yes it will be expanded.

Unfolding systematic associated to the generator. An alternative signal model is used for that purpose, but only in what the qqbar-->ZZ process

is concerned ( MasGraph5_aMCatNLO vs POWHEG). Aren´t there any alternative generators for the other subprocesses (gg-->ZZ,

qqbar-->ZZ+2jets). Are the cross sections so small that changes in the modelling can be safely neglected ?

In the past there was gg2ZZ but is not used anymore. Is the only sample we have now for that process.

Systematic uncertainties discussed in Appendix B of the note. Systematic uncert. quoted in this section are much larger that values quoted in Table 7. Do they refer to the same effects ?

Yes but in the appendix all the systematics are divided for channel while in table 7 there are the final estimations for 4l and they are obtained as the combination of the 3 channels and in the end the final values are smaller.

Results.-

Let me jump to the PAS.

Why do we report (Table 3 and 4) inclusive ZZ--> ll lâ?Tlâ?T cross sections in this paper ? They should be in SMP-16-017 (at least for 13 TeV).

Because those numbers are used to normalize the distributions. We put also the single channels results for 13 TeV in order to be symmetric with respect to 8 TeV.

About the jet multiplicity distributions.

The AN reads:

L 216 The jet multiplicity distribution is also normalized to the inclusive cross section measured in the analysis [22].

[22] is the ZZ inclusive analysis at 13 TeV with 2015 data. Why to normalize to that cross section and not to the one you are measuring ?

They should be fully consistent (are they?) and 2015 result has a higher stat. uncert.

But in the PAS I can read:

L 292 After the unfolding, the exclusive cross sections for ... are extracted and the measured values â?¦ With no normalization to previous

measurements. What are you finally doing ??

We used the 2015 value at the beginning since was the only value already published.

We than moved to what is written in the pas. The AN must be corrected. Sorry for the inconvenience.

8 TeV results include comparison with Madgraph (for qqbar --> ZZ and gg --> ZZqq). Don´t you intend to include similar comparison for the 13 TeV

data ?

For 13 TeV the available sample are only Powheg and MadGraphamc @nlo

Fig. 1 in the PAS, does the data/theory comparison draw the same conclusions at 8 and 13 TeV ? Data/MC is clearly above 1 for njets >=3

at 13 TeV, but not so at 8 TeV.

We have been waiting the results with the full statistic to draw the conclusions for 13 TeV. Especially for the last bin in the jet multiplicity we want to check if it is just a statistical fluctuation.

Fig. 4 in the PAS (pT distributions). Do I want to see a slope effect in those plots (not so clear in the second leading jet pT, Fig. 5)?

I don't fully understand the question

Darien Wood, 27 Feb 2017, PAS v.7

General comments:

In in the "Signal and Background simulation" section, there is no mention of how you generate samples for the anomalous couplings.

Thank you for pointing this out. It will be fixed.

There are large uncertainties described for the renormalization and factorization scales, but these to on appear in Table 1 as far as I can tell.

As written in line 165 this uncertainty is less than 0.1% for the differential cross-section measurements and thats why is not included in table 1. Of course it will be added if requested.

A predicted fiducial xsec is given for 8 TeV, but not for 13 TeV.

We have been waiting the NNLO order value from grazzini at 13 TeV. In the meanwhile we can add the value extracted from MC samples. Namely Powheg and mcfm as did in SMP-16-017.


Am I correct to assume that total fiducial cross section in Table 4 will be identical to that in SMP-16-017 once the sample are consistent?

Results from the two framework will be identical.

I am not commenting now on any discussion of the 13 TeV results, because the features may change when the data set is completed.

The figure captions for Flg, 4 and Fig. 5 say "the invariant mass of the p_T-[sub]leading jet transverse momentum". I think this is supposed to be just the "transverse momentum of the ..."

Done.

The section of aQCC's is very brief. Could something more be added to explain what aQGC's are being tested? I don't think it would be too much to list the EFT terms in the Lagrangian corresponding to T0, T1, T2, T8, T9. Or at least to say which four particles are participating in the coupling.

We will expand the aQGC section in the next version

Specific comments:

line 5: "allows to measure" -> "allows the measurement of”

Done

line 6: "made up through" -> “via"

Done

line 12: "in details" -> "in detail”

Done

line 44: "T8 and T9" : should also mention T0, T1, T2.

line 94: "MCFM 6.7 instead of GG2ZZ " - why mention GG2ZZ if it is not being used?

Since we are reporting an update of some 8 results already published by CMS we are reported the differences with respect to the previous results at 8 TeV. GG2ZZ was used in the old 8 TeV published results. However we decided to remove it since it's misleading and it would need more explanations and the description the all the other differences with respect to the previous result.

line 147 "physic objects [sic]" is jargon. Why not just list the objects (electrons, muon, …)?

Done

line 147-148: "...those of Ref. [43], with updated and minor changes to accommodate the on-shell Z boson requirement." Wouldn't it be simpler to say just "those of Ref. [11]"?

Yes it would be simpler. It will be changed

line 243: "increase" -> "increases"

Corrected

line 287: "can are" -> "are"

Corrected

line 290: can you also give the prediction for 13 TeV?

Look answer above.

line 316: "for any source" -> "with each source of uncertainty varied"?

Changed

line 329-335: It is not clear if this refers to the 8 TeV or 13 TeV result, and in any case most of this is repeated in the following paragraph. I assume all of this will be cleared up when the full 13 TeV results are available.

It refers to 8 TeV plots. Yes it will changed once the the results with the full statistic will be available.

Fig. 7: The binning looks too fine. It would be easier for the reader to compare data and prediction with coarser binning.

We have chosen this binning as it is the binning used in the statistical analysis, i.e. binned likelihood fit. This could be discussed.

line 383: I think "reducible" and "irreducible" should be swapped in this sentence.

Thank you for pointing this out. It will be fixed.

Table 7: in caption, add "(events)". Do you really intend to quote uncertainties on the data counts? If these are direct counts, they should be quoted without any uncertainty.

We will remove the data uncertainties.

Fig 8: The inset is hard to see - maybe two full-size panels instead? Also, the aQGC predictions should be shown on the linear plot as well as the zoomed log plot.

We are working on new plot proposals, also taking into account previous comments received from conveners.

Ref 12: Is this supposed to be the result from SMP-16-017? If so, there reference should probably be to the (forthcoming) PAS for now.

Yes is the result of SMP-16-017. We will put the the reference to the pas.

Ref 44 and 49: Incomplete references. What is supposed to go here? Upcoming PAS's?

The first one is a published pas and the second an article. The bibtex code has been taken from inspire and CDS but looks like there is some issue. It will be checked and corrected.

Pre-approval remarks, 8 Feb 2017

please demonstrate the synchronization with ZZ inclusive cross section measurements. Please demonstrate that with your analysis setup you derive the same fiducial cross section (per leptonic channel and sum).

The framework for ZZ and ZZ+jets have a two differences that are well understood and will not be present for the rereco. The first difference is the different PU weight and second a difference version of the lepton calibrations. We have been produced the number of expected events for the Powheg sample with the same PU weight and this is the result:

The difference is about ~0.6% and considering the difference in the calibration and the stochastic smearing the two framework are in good agreement. For what concern the data we're 100% in sync.

The following values are the fiducial cross-sections per final state still made with different PU weight.

ZZ+jets:

4mu 9.66 - 0.87 + 0.93 (stat.) -0.29 +0.30 (syst.) +- 0.62 (lumi)

4e 10.24 - 1.18 + 1.27 (stat.) -0.90 +1.08 (syst.) +- 0.62 (lumi.)

2e2mu 20.22 - 1.46 + 1.53 (stat.) -1.13 +1.27 (syst.) +- 1.25 (lumi.)

4l 39.63 - 2.05 + 2.12 (stat.) -1.84 +1.94 (syst.) +- 2.49 (lumi.)

ZZ:

4e 10.3 -1.2 +1.3 (stat) -0.9 +1.2 (syst ) +- 0.6 (lumi)
2e2mu 20.5 -1.5 +1.5 (stat) -1.1 +1.3 (syst) +- 1.3 (lumi)
4mu 9.9 -0.9 +0.9 (stat) -0.3 +0.4 (syst) +- 0.6 (lumi)
4l 40.4 -2.1 +2.1 (stat) -1.8 +2.0 (syst) +- 2.5 (lumi)

please update the AN and PAS

Done

table 4 in PAS: please check uncertainties in fiducial cross sections per lepton channel and the uncertainty sum

The table shown at pre-approval was wrong and the result of a copy-paste error. The actual table has been updated in the PAS.

please add theory uncertainty in the differential cross section plots

To be done

“MC choice” uncertainty source in table on slide 13. Please check where (which bins and which distributions) this uncertainty is large.

Leading jet pT 4th bin: 7.8 % Sub Leading Jet pT 3th bin: 8.2 % The uncertainty in those bin is high because the MG5 scale depends on jet pT: POWHEG uses μR=μF= m4l, independent on the jets. MG5 uses μR= μF= 0.5 * scalar sum of lepton & jet pT.

A study about the difference between MG5 and powheg here: https://indico.cern.ch/event/467424/contributions/1985652/attachments/1202446/1750621/slides_20151208.pdf

UE/PS uncertainty: to be discussed further among the conveners

We document in AN-17-002 Fig. 7 a result for the VBS signal obtained passing LHE events from Madgraph through Herwig. The MVA signal template is essentialy unchanged.

Fig 1 and 2: cross section should be absolutely normalized for the jet multiplicity distributions. Top panels should be in log scale to emphasize the tail of the distribution since linear scale is present in the ratio panels. Adjust y-axis of Data/MC as appropiate. The 13 TeV uses -0.2 to 4.8 which seems gigantic. Use for both sqrt{s} the same range.

Done for 13 TeV. 8 Tev plots to be updated with same range and in log scale.

Fig 3 left: m_{jj} and DEta_{jj} for 8 TeV have very large uncertainties and are normalized shape comparisons with just 2 bins — essentially 1 degree of freedom. While there is nothing wrong with the plots, they don’t bring any information and we can’t conclude anything. Please suppress the 8 TeV plots. For 13 TeV, in case the added statistics allow finer binning and small uncertainties we can decide upon them during the PHYS-APP. Same comment for pt(j2) and eta(j2) of Fig 5.

Done

Post Pre-approval questions:

PU uncertainty (inelastic cross section, i.e. the amount of pileup) does not seem to be included. Can you please include this uncertainty source?

To be done

Fig 4 top right: please check if the Data/MC of the last bin that is very large is coming from MC bin with very low entries. How many MC events you have for pt(j1) >= 300 GeV?

In MC we have:

2e2m 2.42985

4e 1.29037

4m 1.28843

In data:

2e2m 8.96615

4e 10.0589

4m 1.82118

Editorial comments:

abstract: as is it implies that both 8 and 13 TeV data are used for all the mentioned studies, but should be clarify better what has been done with 8 and what with 13 TeV. For example, the EWK VBS is not part of the 8 TeV analysis still this implied from the current abstract.

We beleive the current abstract is totally clear on what measurement is performed at which energy. Quotting the first sentence: "This paper reports measurements of differential cross sections for the production of two Z bosons in association with jets in pp collisions at sqrt(s) = 8 TeV and sqrt(s) = 13 TeV, and a search for the electroweak production of two Z bosons in association with two jets at 13 TeV."

L8: do you mean partonic center of mass energy ? clarify

We mean the center of mass energy of the scattering process. The text has been modified to make this clear.

Fig 4 : please use log scale for the top panel of the pt(j1) plots, linear scale if available in the ratio

To be done

Fig 6 right panel: can you try to use log scale for y-axis ?

Changing to log scale does indeed provide a better picture of the signal in the tail of the distribution. However, the data-driven Z+X background has little statistics in the tail, resulting in an unpleasant bump. For now we keep the version with the linear scale.

Fig 6: can you add the cuts in the plots ? these plots differ from what has been presented before by a cut on the mjj>100 GeV, and is easy to add this on the plot. It should be also described in the figure’s caption. Same comment for Fig 7

We added the cuts in the plots and in the captions. We note that the plots and the selections did not change.

Fig 7: the plot as is right now brings the focus at 200-300 GeV, while the interest is on what is going one at high m(4l). Please try either to use log scale or to adapt variable bin width in which the bin-contents are divided by the bin-width (density i.e., Events/GeV) such as to preserve the shape of the distribution. The m(4l) has detector resolution ~1 GeV and the plot uses a bin width that is 100 times coarser.

We will propose a different way to present the plot. The challenge is that we want this plot to show both the good data-MC agreement in the 2mZ region (which is relevant for the EW signal strength measurement) and the overflow bin for the aQGC limit. Regarding the binning, we will experiment with other/variable sizes as you suggest, keeping in mind that the expected statistics are small, prohibiting a binning of the order of the m4l resolution of ~1%.

Add a new table for the measured and predicted cross section as a function of jet multiplicity for 8 and 13 TeV to replace the numbers listed in the end of page 8 and the beginning of page 9.

Done

Please add few more distributions with VBS cuts in the AN, right now there is just one, BDT score.

We will add data v MC plots for the relevant kinematics. We'd like to point out that appendix A of the AN features MC signal v qqZZ (dominant background) comparisons for six common VBS observables.

Senka Djuric and Chia Ming Kuo, 24 Jan 2017 on PAS v2, AN2017_002 v4

add VBS to summary unc table (table 1 in PAS)

We will add the impact of the systematic uncertainties on the VBS fiducial cross section measurement to Tab. 1 of the PAS. We think the uncertainties relevant for the VBS signal strength should be kept separate from those of the fid. cross section measurement for clarity.

show table with signal, bkg and data yields for each jet bin in ZZjj

Done

table 4 in PAS: please list fiducial cross sections per lepton channel

Done

report the differences wrt ZZ inclusive analysis

Now ZZ and ZZ+jets are in synch. The only difference we found is that ZZ+jets uses

  • GluGluToHiggs0MContinToZZTo2e2mu
  • GluGluToHiggs0MContinToZZTo4mu
  • GluGluToHiggs0MContinToZZTo4e
instead of
  • GluGluToContinToZZTo2e2mu.
  • GluGluToContinToZZTo4mu.
  • GluGluToContinToZZTo4e.
The first set includes the Higgs boson and the interference and gives 30% more events with respect to the latter. The ZZ+jet yields, using latter sample, are in agreement with ZZ inclusive.

We found out the samples with the Higgs and the interference have problems. The interference should be destructive while is the opposite. One the hypothesis is that samples use a not SM Higgs boson. Until we don't have the correct sample we will use just the sample with continuum.

add theory uncertainty in differential cross section plots

To be added

*there is a large discrepancy between data and MC for >=3 jets in 13 TeV. We were wandering if this is just stat fluctuation or something is wrong in cross section calculation. Can you please double check?*

After some correction this discrepancy is reduced. At reco level and after background subtraction in that bin there are 22.3 events for the data and 13.9381 for the Monte Carlo. The ratio is 1.59761 and the distance between data and MC is 1.63 ~sigma. The Unfolding enhances this difference a little. Indeed after the unfoding there are 41.96 event in the data and 21.91 in the MC at gen level. The ratio is ~1.91 and the distance is 1.85 sigma.

*syst uncertainty in exclusive cross section for 13 TeV for >=3 jet bin should be ~30%. Can you please double check?

At 8 TeV, where the systematic is ~30%, the statistic for jet >=3 is very low and small changes due to systematic brought big differences in the yield. E.g. 2e2m channel has no events at reco level and it's systematic uncertainty is very high. At 13 TeV we have much more statistic and actually more than expected by MC.

Moreover at 13 TeV the jec uncertainty is lower with respect to 8 TeV. The systematic uncertainty now it's at level of 10%.

please add aQGC expected limits and distribution showing aQGC signal

Done.

QCD ZZjj normalization for VBS signal significance. Please provide numbers and corresponding scale factor with all syst and stat unc.

There is no explicit QCD normalisation factor in the signal strength analysis, only for the fiducial cross section measurement. For the signal strength we fit the BDT distribution for all events, including the QCD-enriched region, which will constrain the QCD yield.

please add JER uncertainty on VBS signal shape

The JER uncertainty has been added to the signal and all background processes. The AN has been updated accordingly.

We checked one more analysis, double differential inclusive jet cross-section measurement. They did include the systematic uncertainty due to UE/PS. We will double check with SMP convenor how other SMP analyses take this into account. In your answer to our question on UE/PS, can you please elaborate a bit on why you think it’s negligible ? In particular, please answer its impact on the VBS analysis as well.

For the VBS analysis: All of the MC used in the analysis includes at least two outgoing partons at the Matrix Element level. We furthermore do not rely on any variables explicitly sensitive to the emission of a third jet in the MVA, i.e. there is no jet veto. Regarding the QCD background, we use will use an FxFx -merged NLO sample with up to outgoing partons. The FxFx scheme is currently only implemented in Pythia, and is not available in Herwig, making a comparison unfeasible. The VBS signal is simulated at LO in MadGraph and the impact of the parton showering and underlying event description has been evaluated by showering the MadGraph LHE files with Herwig. A comparison of this sample and the default Pythia-showered sample has been added to the AN. The difference between the two showers in the signal region is a few percent, much smaller than the scale variations already used in the analysis

Editorial comments to PAS

please change the process names in the legend of Fig. 6, 7 and 8.

Fixed.

the physics motivation in the introduction section seems to be too slim for a paper. in addition, the mention of previous VBS results is missing in the second paragraph of the introduction. We suggest to work on this part a bit more for the paper.

We have improved the introduction which is part of the new v4 of the paper.

for paper, we cannot cite the CMS PAS so please remove reference 4

We will put a placeholder "paper in preparation" to indicate that we intend to reference the inclusive ZZ paper, which will be published before this analysis.

(optional) can you please add the fraction of VBS contribution plots as the bottom planes in Fig. 6 ? This may bring in additional information to let the readers know where the VBS component starts becoming significant.

Senka Djuric, 22 Jan 2017 on PAS v2, AN2017_002 v4

table 4 caption: eta^{e} -> eta^{l} Comment PP: Formatting and content of 13 TeV lepton cell in tab. 2 needs fixing as well.

Fixed.

*L278: to make sure there is not confusion between MG and and MG5_aMC@NLO: "In the second one the Powheg sample (qq → ZZ) is used instead of the MadGraph one. The MadGraph set is the reference set, while the Powheg one is used for comparison purposes and to extract the systematic uncertainty due to the Monte Carlo generator choice. " --> "In the second one the Powheg sample (qq → ZZ) is used instead of the MadGraph (MADGRAPH5_AMC@NLO) one for 8(13)TeV. The MadGraph (MADGRAPH5_AMC@NLO) set is the reference set for 8(13)TeV, while the Powheg one is used for comparison purposes and to extract the systematic uncertainty due to the Monte Carlo generator choice."*

Fixed.

"MADGRAPH5_AMC@NLO" is not consistently used, somewhere it is called "Madgraph_MC@NLO".

Fixed

*typos:L336: of two two Z bosons -> of two Z bosons *

Fixed.

table 5: QCD ZZjj is MC based, and uncertainty is dominated by JES. Correct?

The uncertainty is in fact dominated by the theory scale dependence, which is of order 20% (15% for NLO sample), see also Fig. 44 of the AN. The JES uncertainty is about 5-10%, see Fig. 46.

comments on AN2017_002 v4: 6.2 Experiemental Uncertainties. I assume that stat uncertainty on the data-drived ireducible bkg (QCD ZZjj) yield from control region is included here as exp uncertainty?

Just to be sure: The normalisation of the irreducible QCD background is only done in the measurement of the EW fiducial cross section. No such normalisation is done in the EW signal significance nor for the aQGC limits. In the EW fiducial cross section measurement we include the statistics as a systematic uncertainty.

Chia Ming Kuo, 20 Jan 2017, on AN-2016/331 v4

My main concerns to ZZjj analysis were sent by Senka already.

  • the synchronization with inclusive ZZ->4l analysis
  • not so great data/MC agreement in Fig.1, Fig. 2, and Fig. 4 in AN.
  • the systematic uncertainty due to UE/PS

Here I just add some minor comments to AN-16-311.

Section 2.2 Can you please add the Feynman diagrams of the signal samples used in this analysis ?

Done.

Section 3 I cannot find a table to summarize the expected yields for signal and background samples. Can you please add it to AN ?

Fone.

L149-150 : the PDFs (CT10 and MSTW08) are not recommended by PDF4LHC anymore. This is not a shower stopper for pre-approval, but you need to update to the newest PDFs recommended by PDF4LHC.

Yes, our number are currently the same as SMP-16-017, so we updated accordingly the documentation.

Chia Ming Kuo, 20 Jan 2017, on AN-2017/002 v3

Please find my comments on top of what Senka already sent to AN-17-002 v3.

First of all, as Senka already pointed out, we need the AN/PAS to be in a good shape so that when ARC and we review the documents, we have good ideas on what you do in this analysis. In particular, we need to ensure a good quality of document before we can hand the review of this analysis to ARC.

We modified the AN and PAS in several aspects to improve the clarity of the writing. To summarise the major changes:

  • The usage of the MVA and method of signal extraction is described with more detail in AN and PAS
  • A new chapter has been created in the AN to describe the measurements and statistical procedures
  • Each of the three measurements is explained in more detail in the AN * The order of presentation of the different measurements, in particular the fiducial cross section in the VBS selection, has been modified for clarity.
  • Addition of data/MC plots for the MVA input distributions in the ZZjj selection and MC plots for the VBS observables in the VBS cut-based selection
  • An introduction and summary are added to the AN

1. You already topped up the full 2016 dataset. However, I do not see the trigger paths changed for the later part of data. For example, the lowest unprescaled IsoMu trigger was IsoMu24 for the later part of data.

Thank you for pointing this out. We will check if this change is considered by the wildcards in the trigger names and if not we will add these new paths.

2. Can you please add a few Feynman diagrams of your signal samples to AN ?

We have added Feynman diagrams for the signal and background in the introduction.

3. Section 3.1.4 Do you try to derive the electron energy scale correction by yourself ? Is there any problem with the corrections provided by EGM ?

We follow the official EGM recommendations. The plot was more relevant when the calibrations for the re-eco weren't released yet and the energy scale exhibited large variations. It was check performed to evaluate the quality of the re-reco data. We will remove the plot.

4. Section 3.2 Recently there were reports on bad muons in re-reco. Are you affected by it ? We do not expect these muons to pass the selection due to the ghost removal. We will follow the official recommendation once it is released.

5. L527-528 : What is the best value ? Do you mean the results with MVA-based signal extraction ?

The 0.2 differential refers to the best cut-based selection using the m_jj and dEtajj. One could cut a bit tighter, but reducing the statistics. Because the cut-based selection is only used to measure the fiducial cross section, we keep the cut a bit looser to not loose too much statistics. We will change the formulation and add supplementary material on this point. With regard to the comparison cut-based versus MVA, the numbers to use are ~1.6 sigma for the optimal cut-based selection versus 2.2 sigma from the MVA (these numbers differ from those quoted in the AN as they only consider the dominant qqZZ QCD background and no systematics).

6. L650-651 the PDFs (CT10 and MSTW08) are not recommended by PDF4LHC anymore. This is not a shower stopper for pre-approval, but you need to update to the newest PDFs recommended by PDF4LHC.

Thanks for spotting this. We are actually using the CMS default PDF sets NNPDF30_lo_as_0130 and NNPDF30_nlo_nf_5_pdfas (as stated in the MC samples section) for all MC and the evaluation of the uncertainties also done on these sets. Reference to the other PDFs have been removed.

Senka Djuric, 17 Jan 2017, on PAS V1, AN-2016/331 v4, AN-2017/002 v3

Link to paper : http://cms.cern.ch/iCMS/analysisadmin/get?analysis=SMP-16-019-pas-v1.pdf

Link to ZZ+Jets AN: http://cms.cern.ch/iCMS/jsp/openfile.jsp?tp=draft&files=AN2016_331_v4.pdf

Link to VBS AN: http://cms.cern.ch/iCMS/jsp/openfile.jsp?tp=draft&files=AN2017_002_v3.pdf

I have reviewed PAS v1 and AN. Please find my comments/question below.

comments on PAS v1:

MC samples description: it i not made clear that aMC@NLO_MG5 is used to model qq->ZZ for ZZjj differential distributions for 13TeV. It is only mentioned in part of the text where you describe samples for VBS.

The aMC@NLO_MG5 sample was missing in the paper indeed. We’ll add it in the next version.

L120: missing average NPU value for 8 and 13tev data

It will be added in the next version of the draft paper.

L112-115 refers to 13tev analysis but this is not made clear in the text


Yes is not straightforward. It will be fixed.

L215 says 0.1-8.8% and in the table it is 1-8%.


Thank you to have spotted this. The numbers are now consistent.

L241: pt and eta refers to leptons ("l") and not electrons ("e")

Fixed in the new version.

L231-242: I would suggest to also put the fiducial PS cuts in the table

It will be added in the next version of the draft paper.

fiducial PS definition:

it would be good to explicitly add mll>4GeV cut in the text. This cut is used in selection and also in MC signal gen level cuts. This information is important for our theory colleagues.

Thank you to have spotted this, it is going to be fixed.

I understand that the difference in lepton eta and pt cuts in fiducial PS definition between 8 and 13TeV is motivated with synchronization with inclusive ZZ analysis. Since in this paper both 8 and 13 TeV are going to be presented side by side some reasoning for different fiducial PS definition should be given in text.


We agree and will propose a formulation in the next draft

table 3: can you please add comparison with cross sections derived per channel (as in table 2)


We can add them.

L247-248: missing comparison with expected value for 13tev

That number is still missing for 13 TeV at NNLO. We are asking if and when that number will be available. In the meanwhile we can put the MCFM value.

some information should be added about the lepton definition used for unfolding: dressed, born,...

We don't unfold variables that depend on lepton 4-momenta. On the other hand, perhaps some more information on the gen jet side could be added.

Yes you are right. No lepton definition needed only jet. Also please argument in the text why cone of different size is used 8 vs 13 TeV. Global CMS decision..

table 1: it would be nice to see uncertainties for VBS part listed in this table as well


We will add these uncertainties to the table.

I suspect that there is a typo in syst uncertainties listed in exclusive cross section results for 13TeV. For >=2 jets syst is ~0.3 and the cross section value is 2.75. So the uncertainty is ~11%. In comparison uncertainty for >=2 jets at 8TeV is ~33%. Table 1 is saying that basically all uncertainties are larger or similar at 13TeV. Including JES which is dominant source. Can you please check the uncertainties listed for 13TeV?

Yes, we found a couple of bugs in the printout of the analysis output, the new version of the paper will have the correct numbers. Moreover, the uncertaninty on the reducible background is going to become lower, because we are now following a more consistent approach with ZZ inclusive analysis in estimating it.

sync with inclusive ZZ 13 TeV data: I know from Nate that the synchronization is in progress. My understanding is that you are in sync within few % if you use the same MC. Correct? So the difference is mainly coming from difference in predicted yield between Powheg and aMC@NLO_MG5fxfx?

The two analyses are meant to be consistent in signal definition, selection ecc. In the version of documentation that we circulated, there could not be perfect agreement in MC yields due to minor differences in the cuts, but the main difference should come from MC choice.

Ok, can you please provide information that can back up the fact the MC choice is the main source of differences? Perhaps listing yields from Powheg and aMC@NLO_MG5 in comparison with data, and inclusive distribution using Powheg (perhaps fig 1 in AN and Njets distribution)?

Figure 1 and figure 2: I am looking at 0jet bin and comparing total and stat uncertainty between 8 and 13 TeV and comparing with exclusive cross section uncertainties. Differential cross sections are normalized to the same area, so some uncertainties (like lumi and part of systematic uncertainty) cancel out wrt exclusive cross section measurements. Stat uncertainty of exclusive cross section for 8 and 13 tev are about the same, 7% and 7.4%. Systematic uncertainty is larger for 13 TeV then 8 TeV, 5.2% vs 3.6%. But I don't see how this difference can explain the large difference between stat and total uncertainty in the 0jet bin for differential distribution for 13 TeV. Can you please explain? Are you sure that you are not including lumi uncertainty in 13TeV differential distributions?

Good catch! The lumi uncertainty in 13TeV code is buggly there.

All differential figues:

For 8 TeV there are th uncertainties (PDF+scale) plotted in the data/MC ratio separately from experimental uncertainties. Is the same planned for 13TeV results but they are not shown yet on the plots? *It would be nice to have the same style for plotting aMC@NLO_MG5 on 8 and 13TeV distributions. *


All 13 TeV plots will be consistent with the 8 TeV ones.

Figure 4 pt_jet1: for 13TeV Powheg+MCFM+Phanton and aMC@NLO_MG5+MCFM+Phanton give very similar shapes. For 8TeV the two are very different. Do you understand why?

At 8 TeV Powheg+MCFM+Phanton use pythia 6 while aMC@NLO_MG5+MCFM+Phanton use pythia 8 and high jet mutliplictiy is described differently.

Ok, please include this in PAS text.

figure 8: as written in label aQGC signal is missing from the plot

See the answer on similar comment on figure 8 in VBS part

L255-260: this is description of response matrix for 8TeV I assume? For 13 TeV you use aMC@NLO_MG5 for nominal response matrix I think?

yes, the aMC@NLO_MG5 is used for 13 TeV and the information will be added in the next version.

uncertainties : both Ming and I agree that you should check the effect on exclusive cross sections, differential distributions and EWK results, from uncertainty on hadronization and UE modeling. We have seen sizable effect in the case of WW analysis and VBF higgs analyses. This analysis is different of course, so the effect will not be the same.


For the ZZ+jets inclusive analysis, there are two different points that need to be addressed separately, the effect on unfolding (that translates on the exclusive x-section measurements) and the MC+PS comparison with unfolded data. For the first one, the effect of the PS is negligible, if any, because the unfolding should make everything transparent to it (and thus are the exclusive x-sections). We illustrate this in Appendix B (e.g., fig 66), where we show that even in the extreme case where a flat reco distribution is unfolded by a response matrix build with a non-flat variable the result does not get biassed towards the distribution used to build the unfolding matrix. Anyhow, assessing the impact of using another PS, e.g. Herwig, rather than Pythia is impractical, because ZZ+jets made with MadGraph (or with MadGraph _MC@NLO) cannot be interfaced with Herwig (nobody did the matching). It could be done for Powheg, but it is not what we use for the differential distributions and would imply a massive central production of that sample (perhaps for nothing worthing it). About the unfolded data - MC comparison, for that, we could present a comparison made with Powheg + Herwig. However, we would like that this does not become "a must" to get the analysis pre-approved (or even approved), because it is just a comparison and it is not critical for the good quality of the results.

EWK search related questions

in general the documentation on EWK part has to be improved. It is true that many parts of the analysis were presented in the SMP-VV meeting but the AN and PAS have to be a complete documentation of the analysis.

it is not clear from the AN and PAS what is the procedure of deriving EWK signal significance and EWK fiducial cross section. For EWK signal significance I understand from your SMP-VV slides that you derive the BDT template shape for ZZ QCD and ZZ EWK from MC. And then you fit the templates on BDT data distribution. Correct? Can you please add the expected and post fit yields in the documentation?


Yes, the signal extraction is done via a template fit. The fit is performed on all ZZjj events (no further cut-based selection) and the distribution that is fitted is the MVA spectrum. The expected shapes of the signal and reducible backgrounds are taken from MC. The motivation of fitting the whole spectrum and considering all events is that we can constrain the yield of the QCD background and we do not loose any information by cutting away potentially interesting events - the MVA provides the stratification in signal purity. The pre-fit yields are those reported in the yield table (“ZZjj baseline”) and we will add a post-fit table to the AN, which will of course stay empty/blind for now. The fiducial cross section measurement is carried out in the cut-based selection "VBS" - this and the control region plot of the MVA is indeed the only place where the cut-based selection is used at all.

Yes, I understand and agree with the argumentation of using the full ZZjj selection and no cut on MVA value for signal significance measurement. When you do this MVA fit with the templates you take into account the shape and normalization uncertainties listed in 6.1 and 6.2 of AN2017-002. I understand that once the new aMC@NLO_MG5 fxfx QCD samples are done this part will be redone.. So the end idea is that nominal QCD template is aMC@NLO_MG5 fxfx shape normalized to expected NLO cross section from aMC@NLO_MG5 fxfx MC. Signal template is LO MG/phantom normalized to cross section from LO MG/Phanton. Each template has it's shape and normalization uncertainty and allowed to float within uncertainties. And this is how the fit on data is (going to be) done. And then you use the postfit to derive the signal significance. Correct? Is this done with combine? If yes please provide the usual "impact" plot after unblinding.

For EWK fiducial cross section you say in PAS that you do: "QCD normalization is determined by simultaneously fitting the VBS region and the background-enriched complement of the VBS region". What is the definition of "background-enriched complement of the VBS region" and how is the fit done? Is it some shape fit or cut-and-count (just total yield, single bin) fit?


The complement of the VBS region is meant to express the following: of the events that are in the ZZjj baseline selection, we only consider those that fail the VBS cuts. The goal of the fit in the not-VBS region is to fix the normalization of the dominant QCD background to the data (for the not-VBS region). Based on this normalization we extrapolate the QCD contribution to the VBS signal region and subtract it in order to get the EW component. The fit is a single-bin fit of all processes that contribute, i.e. including the other backgrounds.

Ok, so there is one bin for events passing ZZjj but failing VBS cuts (" complement VBS region"). And then you normalize the QCD ZZ to yield(data-other backgrounds). Then you use this normalization factor to scale QCD ZZ contribution is VBS region. I assume the normalization factor includes stat uncertainty of this " complement VBS region". Can you please provide the yields and normalization factor value including uncertainties? This is the control region so you are free to unblind there.

I guess you could also measure EWK fiducial cross section with performing a fit on BDT, the same procedure as for EWK signal significance determination. Is there some obvious reason why one would not want to do that?


One could certainly convert the underlying signal strength of the significance determination into a cross section measurement. The reason for defining the cut-based selection is to provide a straightforward definition of the phase space were we extract the measurement from, such that say theoreticians can use the resulting numbers without needing to know the details of an MVA. Also, we currently do not cut on the BDT distribution but we fit the entire spectrum of all events in the ZZjj selection to extract the signal significance.

PAS is missing the information about interference between QCD and EWK and how this is treated.


This is indeed missing and we will add a sentence along the lines of "The interference between the electroweak and QCD diagrams is positive and contributes less than 1% to the total yield and is neglected." in L108 of the PAS. The details of the interference study are in the AN L82-93, in particular, Fig. 1.

Yes I have seen the info in AN it is just missing in the PAS.

MVA control region. PAS is missing the definition of "control region". In the AN it is defined as "events that satisfy mjj > 100GeV and (mjj < 400 GeV or |∆ηjj| < 2.5)". So events that have mjj >400 GeV and |∆ηjj| > 2.5 are not a part of the "control region". From your last SMP-VV [1] report you show plots for "control region" defined as "only events that do not satisfy |Δηjj| > 2.4 and mjj > 400 GeV ". So the definition between PAS and the SMP-VV talk is different, but the plots look the same to me. Can you please check and clarify? Perhaps I am just misunderstanding. I think we also briefly discussed this CR definition in last VV meeting. [1] https://indico.cern.ch/event/595254/contributions/2405700/attachments/1390258/2117505/20161213_SMP-v6.pdf


The 2.5 cut value on dEtajj mentioned in the PAS/AN are wrong. All plots and numbers are and have always been done with a cut value of 2.4. We have fixed this in the paper and AN. Previous (false) response: The definition of the control region is as is written in the AN and PAS, i.e. mjj > 100GeV and (mjj < 400 GeV or |∆ηjj| < 2.5 ). The text on the slides you refer to is indeed wrong, it should say “|∆ηjj| < 2.5” and not “|∆ηjj| < 2.4”, and the plots are indeed identical. As it says on S13 of [1], the control region is defined as those events that do not pass the cut-based selection.

uncertainties for BDT: do you also include the effect from JER? I do not see it mentioned.


We will add the JER uncertainty. The production of the new trees that include them is running.

aQGC part is missing. Please report expected limits for now since the analysis is blinded.


The MC issues have been solved and we are working on getting the templates in the format needed for the limit tool.

figures 6,7,8:

can you please check the labels? fig 6 a) and fig 7 a), and fig 6 b) and fig 7 b) should be done with same cuts according to the figure labels. But just by eye they do not seam to have the same yields for the same processes.


Fig 6: the caption is indeed wrong and should read : « Distribution of the dijet invariant mass (left) and pseudorapidity separation (right) for events passing the ZZjj selection.« . The event selection in the two plots are identical, the yields in the legend are without overflow, giving rise to the impression. We will remove these numbers. Fig 7: The caption is correct. The left plot shows events in the inverted VBS selection while the right shows all events in the ZZjj selection. In that sense the yields of Fig. 7b can only be compared to the yields in Fig 6.

figure 8 looks like ZZ inclusive selection not ZZjj selection. I would expect that the best limits on aQGC can be derived from VBS selection.


Figure 8 is indeed misleading, it was meant as an illustration only. We intend to show the ZZjj selection (for the reason you mention). We will have to decide on an appropriate binning for this m4l plot. We will also evaluate if performing the limit extraction in the tighter VBS selection yields better results.

typos: L8: the on differential -> the differential


Fixed.

L22: is instrumental for the for the search -> is instrumental for the search


Fixed.

Sascha S. (analyzer meeting) January 17

Comments are paraphrased.

The tagging jet selection is not clear in the AN/PAS. Do we consider any jet combination in the ZZjj and VBS selections? The VBS analysis is only performed on events with at least two selected jets. The selected jets are ordered in pT and the leading/subleading jets are then taken as the tagging jets. The dijets requirements of the ZZjj baseline selection and the VBS cut-based selection are then tested only on the tagging jet pair, ignoring any other jets in the event. We will seek to make this clearer in the next version of the AN/PAS.

The Zeppenfeld variable is not defined in the PAS. There is no link/unclear link to table 11 of the AN. The Zeppenfeld variable of boson i is the eta^*_{Zi}, i.e. the variable shown in Fig. 28 middle-left. We will add the definition of the Zeppenfeld variable and a reference to the Zeppenfeld paper to the PAS. In the AN I will add the name "Zeppenfeld" to table 11 and 13.

How do the VBS observables distribution change when going from the ZZjj to the cut-based VBS selection? We will add these distributions to the appendix. The information is already partly included in Fig. 32.

Can a tighter cut-based selection achieve the same/similar significance as the MVA shape analysis? Let me start by saying that the cut-based selection and significance numbers are only a cross-check to the MVA result. They also illustrate the advantage of using the MVA spectrum for the signal extraction. Also, I want to clarify that we do not cut on the MVA and we fit the whole distribution of all events in the ZZjj baseline selection. In that sense there is no MVA selection, only the distribution that we fit. The cut-based VBS selection is completely independent of the MVA analysis.

That said, we evaluated the best cut-based signal significance using the signal and background yields.

These plots show the number of events passing a selection on m_jj>X and dEta_jj >Y indicated by the axis labels X,Y. Using these distributions and the asymptotic significance Eq. 1 of the AN, we obtain the plot below:

It is clear that there is a fairly large plateau of the highest significance around m_jj>500GeV and |dEta_jj| > 5 and that the best significance is around 1.55. Our cut-based VBS selection is only mildly suboptimal by 0.2 sigma. Of course these significances are without systematics and ignoring the other backgrounds, notably ggZZ. The corresponding significance extracted from the MVA shape is 2.3, i.e. an appreciable improvement. As stated previously, the interest of the cut-based selection is mostly to illustrate the data/MC agreement in the yield table and to serve as a control region for the MVA (by inverting the selection.) This brings me to the last point raised yesterday: the expected yields in a tighter VBS selection would be really low like 3-4 events (you one see from the above plot).

I will modify the AN to be more explicit about the difference between the cut-based and MVA-based signal extraction and highlight the difference between the two. For the PAS I will add details to the signal extraction via the MVA. I believe all of this (the signal extraction) is basically in the AN/PAS already, but it needs to be rephrased/made clearer.

-- PhilippPigard - 2017-01-18

Topic attachments
I Attachment History Action Size Date Who Comment
PNGpng Deta.png r1 manage 21.2 K 2017-05-08 - 14:30 GianLucaPinnaAngioni delta eta
PNGpng Deta_4l.png r1 manage 20.8 K 2017-05-08 - 09:13 GianLucaPinnaAngioni delta eta
PNGpng Deta_All_mad_SR.png r1 manage 21.6 K 2017-05-08 - 17:59 GianLucaPinnaAngioni  
PDFpdf Deta_All_mad_SR_4l.pdf r1 manage 16.0 K 2017-05-07 - 12:15 GianLucaPinnaAngioni detajj
PNGpng Deta_All_mad_SR_4l.png r2 r1 manage 20.8 K 2017-05-08 - 08:48 GianLucaPinnaAngioni  
PNGpng EtaJet1.png r1 manage 20.3 K 2017-05-08 - 14:29 GianLucaPinnaAngioni eta jet 1
PNGpng EtaJet1_4l.png r1 manage 19.9 K 2017-05-08 - 09:08 GianLucaPinnaAngioni Eta jet 1
PNGpng EtaJet1_All_mad_SR.png r1 manage 20.5 K 2017-05-08 - 17:59 GianLucaPinnaAngioni  
PDFpdf EtaJet1_All_mad_SR_4l.pdf r1 manage 15.7 K 2017-05-07 - 12:14 GianLucaPinnaAngioni eta jet 1
PNGpng EtaJet1_All_mad_SR_4l.png r2 r1 manage 19.9 K 2017-05-08 - 09:03 GianLucaPinnaAngioni Eta jet 1
PNGpng EtaJet2.png r1 manage 20.4 K 2017-05-08 - 14:29 GianLucaPinnaAngioni eta jet 2
PNGpng EtaJet2_4l.png r1 manage 19.9 K 2017-05-08 - 09:12 GianLucaPinnaAngioni Eta jet 2
PNGpng EtaJet2_All_mad_SR.png r1 manage 20.6 K 2017-05-08 - 17:59 GianLucaPinnaAngioni  
PDFpdf EtaJet2_All_mad_SR_4l.pdf r1 manage 15.3 K 2017-05-07 - 12:13 GianLucaPinnaAngioni Eta jet 2
PNGpng EtaJet2_All_mad_SR_4l.png r1 manage 19.9 K 2017-05-07 - 15:10 GianLucaPinnaAngioni Eta jet 2
PNGpng Mjj.png r1 manage 21.7 K 2017-05-08 - 14:29 GianLucaPinnaAngioni mjj
PNGpng Mjj_4l.png r1 manage 21.5 K 2017-05-08 - 09:13 GianLucaPinnaAngioni mjj
PNGpng Mjj_All_mad_SR.png r1 manage 21.7 K 2017-05-08 - 17:59 GianLucaPinnaAngioni  
PDFpdf Mjj_All_mad_SR_4l.pdf r1 manage 15.7 K 2017-05-07 - 12:14 GianLucaPinnaAngioni mjj
PNGpng Mjj_All_mad_SR_4l.png r1 manage 20.9 K 2017-05-07 - 12:07 GianLucaPinnaAngioni Mjj
PNGpng PtJet1.png r1 manage 19.0 K 2017-05-08 - 14:28 GianLucaPinnaAngioni pt jet 1
PNGpng PtJet1_All_mad_SR.png r1 manage 19.1 K 2017-05-08 - 18:00 GianLucaPinnaAngioni  
PDFpdf PtJet1_new_All_mad_SR_4l.pdf r1 manage 15.7 K 2017-05-07 - 12:14 GianLucaPinnaAngioni pt jet 1
PNGpng PtJet1_new_All_mad_SR_4l.png r1 manage 18.9 K 2017-05-07 - 12:09 GianLucaPinnaAngioni PtJet1
PNGpng PtJet2.png r1 manage 18.3 K 2017-05-08 - 14:29 GianLucaPinnaAngioni pt jet 2
PNGpng PtJet2_4l.png r1 manage 18.3 K 2017-05-08 - 09:10 GianLucaPinnaAngioni pt jet 2
PNGpng PtJet2_All_mad_SR.png r1 manage 18.4 K 2017-05-08 - 18:00 GianLucaPinnaAngioni  
PDFpdf PtJet2_All_mad_SR_4l.pdf r1 manage 15.1 K 2017-05-07 - 12:14 GianLucaPinnaAngioni pt jet 2
PNGpng PtJet2_All_mad_SR_4l.png r1 manage 18.3 K 2017-05-07 - 15:09 GianLucaPinnaAngioni pt jet 2
PDFpdf ROCcurve_MELA.pdf r1 manage 14.0 K 2017-05-07 - 13:17 PhilippPigard  
PNGpng ROCcurve_MELA.png r1 manage 125.5 K 2017-05-07 - 23:33 PhilippPigard  
PNGpng Screen_Shot_2017-02-15_at_22.01.21.png r1 manage 68.3 K 2017-02-15 - 22:02 GianLucaPinnaAngioni  
PNGpng VBS_event_display.png r1 manage 1596.0 K 2017-05-07 - 13:17 PhilippPigard  
PDFpdf ZZJetsCross.pdf r1 manage 66.4 K 2017-02-14 - 20:10 GianLucaPinnaAngioni Fiducial cross sction per channel for ZZ+jest analysis
PNGpng ZpT_from_signal.png r1 manage 77.3 K 2017-03-02 - 16:06 PhilippPigard  
PNGpng aqgc_linear.png r1 manage 22.3 K 2017-04-28 - 11:11 PhilippPigard  
PNGpng cut_based_2D_background_yield.png r1 manage 14.2 K 2017-01-18 - 10:45 PhilippPigard Cut-based optimisation plots
PNGpng cut_based_2D_signal_yield.png r1 manage 12.8 K 2017-01-18 - 10:45 PhilippPigard Cut-based optimisation plots
PNGpng cut_based_2D_significance.png r1 manage 14.0 K 2017-01-18 - 10:45 PhilippPigard Cut-based optimisation plots
PDFpdf data_mc_BLS_DiJetMass_4p7.pdf r1 manage 14.8 K 2017-02-14 - 18:28 PhilippPigard  
PNGpng data_mc_BLS_DiJetMass_4p7_log.png r1 manage 19.8 K 2017-02-14 - 18:28 PhilippPigard  
PDFpdf data_mc_BLS_Z1_zepp_all.pdf r1 manage 15.7 K 2017-05-07 - 23:15 PhilippPigard  
PNGpng data_mc_BLS_Z1_zepp_all.png r1 manage 96.6 K 2017-05-07 - 23:33 PhilippPigard  
PDFpdf data_mc_BLS_Z2_zepp_all.pdf r1 manage 15.8 K 2017-05-07 - 23:15 PhilippPigard  
PNGpng data_mc_BLS_Z2_zepp_all.png r1 manage 96.6 K 2017-05-07 - 23:33 PhilippPigard  
PDFpdf data_mc_BLS_delta_rel_all.pdf r1 manage 14.9 K 2017-05-07 - 23:15 PhilippPigard  
PNGpng data_mc_BLS_delta_rel_all.png r1 manage 96.7 K 2017-05-07 - 23:31 PhilippPigard  
PDFpdf data_mc_BLS_rel_pt_hard_all.pdf r1 manage 15.5 K 2017-05-07 - 23:15 PhilippPigard  
PNGpng data_mc_BLS_rel_pt_hard_all.png r1 manage 98.4 K 2017-05-07 - 23:31 PhilippPigard  
PNGpng nJets_Central.png r1 manage 19.3 K 2017-05-08 - 14:28 GianLucaPinnaAngioni central jet multiplicity
PNGpng nJets_Central_All_mad_SR.png r1 manage 19.4 K 2017-05-08 - 18:00 GianLucaPinnaAngioni  
PDFpdf nJets_Central_All_mad_SR_4l.pdf r1 manage 15.8 K 2017-05-07 - 12:13 GianLucaPinnaAngioni central jet multiplicity
PNGpng nJets_Central_All_mad_SR_4l.png r1 manage 18.9 K 2017-05-07 - 11:51 GianLucaPinnaAngioni central jet multiplicity
PDFpdf postfit_BLS_MVA_spectrum.pdf r1 manage 18.7 K 2017-04-11 - 14:18 PhilippPigard  
PDFpdf signal_enriched_kinematics.pdf r1 manage 151.3 K 2017-05-07 - 13:31 PhilippPigard  
PNGpng signal_enriched_kinematics.png r1 manage 191.3 K 2017-05-07 - 23:33 PhilippPigard  
PNGpng sync_cross.png r1 manage 310.4 K 2017-02-16 - 19:01 PhilippPigard  
PNGpng sync_yields.png r1 manage 90.0 K 2017-02-16 - 18:58 PhilippPigard  
Edit | Attach | Watch | Print version | History: r107 < r106 < r105 < r104 < r103 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r107 - 2018-06-07 - GianLucaPinnaAngioni
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Sandbox All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback