Q&A for VBS Semileptonic

Color code for answers
Green led - Comment is acknowledged and answered
Orange led - Authors are working on answering the comment
Red led - Comment requires further work to be addressed or need attention from the internal reviewer regarding a specific issue
Blue led - We do not agree with the comment and arguments are given

Paper draft

22/02/2021 Comments from ARC to Paper v2 link

General points:

  • The paper lacks any discussion of systematic uncertainties. Do you
plan to add these? If so, where?

  • I presume "Results" and "Summary" will be updated and expanded after
unblinding.

  • The "All MC" label that appears in Figures 3 and 4 needs to be
relabled or removed. I do not see this at all in Fig. 3 and in Fig. 4 this seems to refer to the uncertainty that appears in the ratio plot.

  • In the figures with stacked histograms (4,5, 6), it is very helpful
if the order of the legend matches that order of the stacking (e.g. on Fig 4, start with data, then top, W_jets, non-prompt, VBF_V+Vgamma, DY, VV+VVV, VBS)

Specific suggestions:

  • Title: It might be useful to specify that the production is of pair of
vector bosons, e.g. "Search for pairs of vector bosons produced by vector boson scattering in the semileptonic.."

  • Abstract: "The data sample corresponds to the full Run-II CMS dataset of
proton-proton collisions at 13 TeV corresponding to 137 fb-1". To avoid the repetition of "correspond", this could be reworded. "The search uses the full Run-II CMS dataset..."

  • Abstract: "Events are selected requiring one lepton (electron or muon),
two jets with large pseudorapidity separation and dijet mass, separated in two categories: either the hadronically decaying W/Z boson is reconstructed as one large-radius jet, or it is identified as a pair of jets with dijet mass close to W/Z mass". Shouldn't you also mention the requirement on missing transverse momentum? I suggested spitting this into two sentences: "Events are selected requiring one lepton (electron or muon), moderate missing transverse momentum, two jets with large pseudorapidity separation and dijet mass, and an additional jet system consistent with the hadronic decay of a W/Z boson. Events separated in two categories: either the hadronically decaying W/Z boson is reconstructed as one large-radius jet, or it is identified as a pair of jets with dijet mass close to W/Z mass".

  • Line 3: "Standard Model (SM) of Fundamental Interactions" -> "standard
model (SM) of fundamental interactions"

  • Line 7: "its unitarity is granted" -> "the violation of its unitarity is
avoided"

  • Line 9: "other" -> "other diagrams"

  • Line 17: "It is therefore of mandatory importance" sounds awkward. Maybe
"It is therefore compelling"?

  • Figure 1: Strictly speaking, the q and q' labels are not consistent. The
q and q' in the W+ decay do not need to be the same as the q and q' in the incoming and outgoing quark lines. You could use q'' and q''' in the decay, or, since it is an example, you could replace the in coming q's with us and the outgoing (q')'s with d's, and leave the q and q' in the W decay.

  • Line 31. "will give rise" -> "gives rise" Line 31-32: "it will
disintegrate into two jets" -> "it is resolved into two jets"

  • Line 35 "heavy quarks" -> "top quarks" (If I understand correctly, you
are only referring top quarks here, and not b's and c's.)

  • Line 36: "on both cases" -> "in both cases"

  • Line 41-43: I interpret this to say that Wgamma is generated with
MadGraph and Zgamma is generated with POWHEG. Is this correct? If so, why are different generator used for these similar processes?

  • Line 53: "This analysis target" -> "The target of this analysis"

  • Line 53: "signal observation significance" -> "signal significance"

  • Lin3 57: "models, that" -> "models that"

  • Lin3 59-60: "The signal significance extraction is performed in
sub-regions of the phase space, where the signal-over-background ratio is more favourable." Are you just talkiing about the signal regions here, or something else? Why are these "sub-regions" and not just "regions." I woudl not hesitate to use the terms "signal region" and "control region" since these appear in the figure and elsewhere in the text.

  • Lines 61-63: suggest moving "(usually called tag jets)" to directly
follow "two jets originating from incoming partons"

  • Line 64-65: "smoking gun" is jargon, and probably an overstatement in
this instance. Maybe "which are a strong indication of top quark decays". Also, his is part of a very long sentence that is difficult to parse. You could consider breaking into several sentences.

  • Line 67: "First of all," -> "First,"

  • Line 68: "them all" -> "them"

  • Line 69: "The leptons are reconstructed requiring to have" -> "The
reconstructed leptons are required to have"

  • Line 71: "jets are considered if p_T is greater than" => "The jets are
considered if they have a p_T greater than"

  • Line 75-76: "If the previous condition is not fulfilled while at least
four AK4 jets are found the event is classified..." -> "If the previous condition is not fulfilled and at least four AK4 jets are found, the event is"

  • Line 79: add parentheses around "the average between m_W and m_Z"

  • Line 80: "within" -> "between"

  • Line 79-83: I could not follow this long sentence all of the way
through. Somewhere around "namely" I got lost. Please consider restructuring this.

  • Line 83: "rquired" -> "required"

  • Line 85: "m_V far from the W/Z resonance range..." Do you really mean
"far from" or just "not within"? If I understand correctly, there is no gap in m_V between the signal region and the control region.

  • Line 89: I don't think "phase spaces" is the best term to use here,
because this usually refers to kinematic properties and not to lepton flavor. I think it would be more clear to say "Finally, all of the signal and control regions are split according to.."

  • Line 91: "sophisticated" -> "complex" (sophistication usually implies
some sort of willful complexity)

  • Line 97, add comma after "Fig. 3", "distribution" -> "distributions"

  • Figure 3: It is good to vary the line type as well as a color to
distinguish the different histogram. The label on the ordinae is missing. Also, why are they normalized to 0.08 instead of to unity?

  • Line 107-112: Can this sentence be simplified? e.g. "The QCD multijet
background, which may enter the signal region with non-prompt leptons, is estimated in a fully data-driven way by measuring in [XXX] the probability for a loosely defined reconstructed lepton ..." Note the placement of commas and the substitution of "which" for "that". The [XXX] says "in dedicated phase space" but I do not think it will be clear to the reader what this means.

  • Line 113 "top enriched phase space" -> "top enriched control region" ?
If it is the top CR that you are referring to, it is more precise to use this name that is already defined.

Preapproval talk comments slides

  • [Darien] s27 SF 6 W+jets categories, why using WpT to bin and scale ? Green led Was known not to agree with data. Independent variable still correlated with the W+jets kinematics of interest. Leptonic observable less sensitive to QCD effects, rather than jet-related variables. Corrects for normalisation.

  • [Guillelmo] s8 do you apply any special selection for electrons ? Green led The related issues were fixed with proper fake rate estimates, no special selections needed

  • [Guillelmo] s8 how do you exclude Ak4 jets pointing to ak8 ? Green led Ak4 collection cleaned with DR>0.8 requirement wrt ak8

  • [Guillelmo] s11 did you study the eff of correct pairing ? Green led shown in s57-58 : measured the matching eff with partons. Determined in a fiducial region where 4 partons are assigned to 4 jets.

  • [Paolo] what is the eff of fiducial well matched 4 jets Green led Checked offline: 42%

  • [Guillelmo] s15 Do you have the data/MC comparison for these distributions? (CRs and SRs?) Green led A yes many in backup

  • [Guillelmo] How is the number of jets data/MC ? Green led not in backup but not bad in CRs, all in the AN

  • [Guillelmo] s18 What do these plots mean? Green led Each event is given an S/B weight to indicate if they are more signal-like or background-like

  • [Guillelmo] It would be good to show the rest of distributions for high deta_jj values, they seem to be more background-like, which is counterintuitive Orange led We will check the value

  • [Guillelmo] s27 Better to show DNN distributions weighting by the bin widthA Green led We will do it.

  • [Guillelmo] s27 How did you chose the variable to correct the MC? Green led We use a variable correlated to the W pt, we prefer

  • [Guillelmo] s28 Uncorrelate muon and electron nonprompt rate uncertainty Green led OK

  • [Guillelmo] QCD scale only shape variations ? Green led Yes

  • [Guillelmo] s39 why are not either prefit or postfit =1 ? Green led on 38 prefit is the W+jets SFs (as from s23)

  • [Guillelmo] start with prefit SFs or without and see if you converge to the same values Green led OK

  • [Darien] s10 Did you consider the control region that has bjets and is off-shell? Could be useful for single top, for example Green led Plots in the top off-shell control region added in ANv9

  • [Philip] s18-19 did not really understand the plots how does high detajj get bkg like at high values ? Green led it is true and interesting

  • [Philip] is the y-axis ordered in some way ? Green led A ordered in decreasing importance

  • [Philip] QGL is important ? Green led A quite important: adding QGL improved exp significance from 4.2 to 4.6

  • [Darien] inv mass of diboson systems ? Green led next step/publication with BSM studies

  • [Paolo] s33 prefit vs postfit expected significance follows opposite trends for resolved and boosted categories, why ? Green led It depends on the different postfit background normalizations, not clear

  • [Paolo] s39 Why are SFs so different in 2016 w.r.t. 2017/8? Green led A data/MC distributions are significantly different, visible on s53

  • [Paolo] As your leading uncertainty is signal QCD, would it be possible to produce a NLO signal ? Green led A Difficult. Maybe could try to evaluate some NLO effect.

  • [Paolo] Can you show a table with the data, signal, and background yields? combining all channels? maybe most significant bins? Green led
Added in ANv9

  • [Paolo] What was the procedure to decide the DNN input variables? Green led A. Started from a larger number of them, checking the importance, and reducing the number of variables without any degradation.

  • [Yacine] How do you check the overtraining in the DNN? Green led A Checked a kolmogorov test between test and training samples.

  • [Guillelmo] Besides quoting the signal strength, you should quote a cross section. Either a looser or a tight selection, this is something you may Green led yes. We will quote the cross section with the generator level cuts.

  • [Paolo] s41 how will you treat interference contributions in the fit ? Orange led Since the sample is still in production, for the moment it has been neglected. Probably we will treat it as additional uncertainty to the signal, given its limited importance. We'll decide whether the uncertainty will be a normalisation one or will also have the shape, depending on the outcome of the event generation.

  • [Paolo] s5 with MJJ>100 in the inclusive signal definition you might have triboson processes contributing. Is there any relevant contribution from tribosons after the event selection ? Green led if on shell VVV is 0.1% events then VVV offshell is <<< 0.1%

Analysis Note

13/01/2021 Changelog for analysis AN v7 link

Collected answers to the questions before the freeze for preapproval in the ANv7. Synchronized with content shown in the preapproval talk.

  • Added W+jets categories closure-test factors comparison plot.
  • Added W+jets post-fit and pre-fit scale factors comparison.
  • Added preliminary EWK+QCD combined signal strength measurement.
  • Removed Pythia to Herwig factor from signal uncertainties. An alternative parton shower for signal will be evaluated once the sample will be ready.
  • Splitted QGL morphing uncertainties components.

Comments and discussion on ANv6 link

  • Have you considered how you can make a combined EW+QCD cross section measurement? This has always been well-received by theorists and shouldn't be a complicated addition (you can use a simplified version of the fit, with both the EW and QCD components treated as signal)

Green led The EWK+QCD measurement resulted in an expect total signal strength of $1^{+0.230}_{-0.199}$. The result and likelihood scan will be added to ANv7.

  • Also a related question, does the PythiaToHerwig line in Fig 74 and 75 have the flattening using the nominal DNN distribution from Pythia? I think it would be great if you could show at least one DNN distribution before the flattening procedure is applied

Green led Yes, the Fig 74 and 75 have the flattening included. The transformation can be seen just as a binning choice. We can show a DNN distribution with a regular binning.

  • Do you have an explanation on why the PythiaToHerwig uncertainty is one-sided?

Green led At the moment we don't have an Herwig signal sample ready to use, but it has been requested. In order to approximately model the difference between the two parton showers we have applied a reweighting based on the number of jets observable referring to a gen-level study performed for the same-sign fully leptonic channel by A. Ballestrero et al., ( “Precise predictions for same-sign w-boson scattering at the lhc” (http://dx.doi.org/10.1140/epjc/s10052-018-6136-y)). 1) it's a proxy, since it was only WW SS in the paper. 2) It is one sided since it is an alternative sample, as was formerly done in CMS with pythia vs herwig, and where the nuisance has been settled as onesided (discussions dates back run 1). 3) we have requested the HW official cmssw sample

  • Similar question, do you know why the QGL-morph uncertainty is over-constrained?
Green led Only 1 parameters per year has been used in ANv6 for the QGL morphing uncertainty. A better approach is to uncorrelate the different components used in the morphing: gluon and quark, low eta and high eta functions. Therefore the QGL uncertainty has been splitted in 4 components, correlated for 2017-2018 and uncorrelated for 2016 and the over-constraint is reduced. Only the uncertainty on the gluon loweta morphing is still constrained because the initial uncertainty inserted in the fit is quite conservative, and moreover the effect of this uncertainty is strongly correlated with other shape nuisances.

  • Follow-up to question "...Can you show in the MC that the shape of the W pt is expected to be about the same...". You're right that the assumption is only that the corrections are the same, not that the shape is the same necessarily. And thanks for pointing me to sec 7.3, I guess I didn't fully understand this before, this is definitely a useful check. Could you plot just the corrections vs. each other so the trends are easier to see?

Green led The prefit W+jets categories normalization SF for the far-from-signal and near-signal regions are plotted for resolved and boosted categories for all the years. The plots, including the total uncertainty band can be found here and they will be added to ANv7. .

  • I still think that if the shapes are about the same in the signal region, that gives some confidence that the corrections should be the same. Can you just plot the W pt distribution in the signal and in the background region together + their ratio? It would be interesting to see in any case.

Green led The plots are available here . The WlepPt distribution for W+jets background in the signal and W+jets control region is compared, and it is found to be compatible. Please remember that the data-driven technique does NOT rely on the two shapes to be compatible though, since the simulation is used to translate scale-factors determined in the control region to the signal region. The reliability of the test is confirmed by a closure test built splitting the contron into two parts and verifying that corrections calculated in one of the two work well in the other. The two regions are defined as far-from-signal, near-signal.

  • Follow-up to question: "There is always the possibility of kinks when you bin a continuous distribution like this...". The point is that you derive corrections to relatively large bins, e.g., the right plots in Fig. 56.. If you apply them to the left plot in Fig.56, do you see any visible kinks in the distributions? Could there be any issues if so?

Green led The effect of the rescaling of the W+jets categories on the full WlepPT distribution is shown in Fig. 64 pag. 86 for all years and categories. The first line shows the distribution before the correction, the next line with the rescaling. No large kinks are observed.

  • Follow-up to question "It's true that the statistical procedures rely on knowledge of the signal shape...". I mostly agree with you that LO is the best solution we have, and it's not valuable to do a big scan of predictions at LO unless we're sure that the generators are all configured properly. The one point I disagree on is the statement about how the signal uncertainties are accounted for in the fit. Signal uncertainties are special in that they play next to no role in the distribution of the test statistic used to measure the significance. You can confirm this by running your fit with/wiithout any theory uncertainties on the signal. To that extent, uncertainties in the signal modeling may be better treated as alternative models. I'd like to see a test of the robustness of the signal strength and significance using a different model of the signal as the expectation and the pseudo data. You could do this using your Herwig alternate sample, for example.

Orange led As soon as we have the alternative sample, we can do it.

03/01/2021 Changelog for analysis AN v6 link

  • Splitted parton shower uncertainty in FSR and ISR components
  • Normalized theory uncertainties on top and W+jets samples in order to disentangle the effect of the rateParameters. More details in Sec.8.5.50
  • Added W+jets categories closure-test also for 2016 and 2017. Fig 63
  • Added correlation matrices of DNN input variables. Fig.43
  • Added likelihood scan with breakdown of uncertainties. Fig.80
  • Added W+jets categories post-norm normalizations plot. Sec.9.5
  • Added comparison between QGL morphing and reweighting. Fig.103

Comments and discussion on ANv5 link

  • Can you contact the object contacts of electron, muon, and JetMET to review your usage of objects in the analysis? You can find the contact
info here (please put us in cc): https://twiki.cern.ch/twiki/bin/view/CMS/TWikiSMP

Green led Done

  • Can you also upload your combine cards to a public area and notify Pietro Vischia (with us in cc) to take a look?

Green led The repository for the datacards has been created: https://gitlab.cern.ch/cms-hcg/cadi/SMP-20-013. The latest version of them with the fit checks has been uploaded and P. Vischia has been notified.

  • What is the logic behind the binning of the DNN distribution? (paper Fig. 4, and AN Fig. 46 and 71) You talk about shaping it to a flat distribution in the AN line 266. Is this procedure applied in the boosted case? if so why is the signal distribution not flat? Did you do the rebinning after the flattening?

Green led The DNN output is transformed to make the signal distribution flat (separately for each year) in both the categories. In the boosted category, given the low number of events in the tail of the DNN distribution, we rebin after the flattening.

  • We think the switch to fitting the leptonic W pt is a good one, since it's a more physical correction than the jet binning before, and it's nice to use an independent variable for the correction. Still, some validation of this procedure is needed. Can you show in the MC that the shape of the W pt is expected to be about the same in your signal and control regions? Do you apply any additional uncertainty for this extrapolation?

Green led The simultaneous fitting of the signal region and of the Wjets bins makes it such that the Combine automatically calculates the appropriate corrections in the signal region, depending on the Wjets cross-section in the Wjets bins. This means that the W pT shape is not required to be the same in the signal region and in the control regions, because what is relevant is that the correction calculated by the fit is the same. This is verified by the inner/outer closure test (section 7.3 of the Analysis note), where a fraction of the control region is assumed to be a signal one for the sake of the test, while another subset is used as control region. We will add the trends of the correction factors in the two regions to the new version of the AN. Since the signal region is blind, clearly the correction factors there will be accessible only after unblinding.

  • How is the binning of the w-lepton-pt chosen?

Green led Since the trend in the data/MC discrepancy for WlepPt is quite smooth, a small number of regularly spaces bins has been chosen. The binning in the resolved and boosted category is different to account for the different distribution of the observable.

  • There is always the possibility of kinks when you bin a continuous distribution like this. Do you see them when binning more finely? Did you consider an analytic function for the correction, or forcing some smoothing across bins?

Green led Since the bins of the analysis as rather large, we do not expect to suffer from any kinks. As a matter of fact, the corrections are effective as they are (the data/MC ratio shows a good agreement) and smaller bins would suffer from low statistics in the high energy tails.

  • You had some really nice plots of the corrections across channels in AN-19-239_v4_appendix that are no more. Can you remake those for the new approach?

Green led Added in v6

  • It's really hard to follow the change to the nonprompt background estimate, because the previous version of the note had very little documentation, and you don't really refer to what changed here. Can you add a description of the changes (I think it was based on the trigger you use in the fake rate derivation region)? Also add plots from the regions where you derive the face rates and where you apply them.

Green led We have derived the fake rates with exactly the same trigger as in analysis with the method described in AN-2010/261. The trigger that was used before was not suitable for the lepton WP we are using. The fake rate has been applied in a closure region orthogonal to the analysis and plots are shown in Sec.5.3.1 of ANv5 fig40 and 41.

  • Can you add a few more plots of the scan of the likelihood with individual uncertainties frozen? It's kind of hard to tell what are the leading uncertainties with all the rate parameters in the impacts. Similarly, is here is a way to show the impacts without the rate parameters that still has physical meaning? It's hard to rectify the fact that they are always leading in the impacts with the likelihood scan that shows they are not so important.

Green led More likelihood scan with different groups of uncertainties frozen have been added in ANv6. The uncertainty has been breakdown in normalization, theory, experimental and stat. Moreover the final overall impact plot has been added in ANv6 splitting the rateParameters impacts from the others to improve the readibility.

  • Did you make any comparison of the central scale factors and your approach for the QGL distribution? Ideally this would also be signed off by JetMET

Green led We have compared the morphing with the official correction for the signal sample in 2016. Results added in the ANv6. The QGL morphing method has been presented to JetMet talk. Comments: 1. Jetmet is ok with our morphing procedure, and welcome the possibility of having it available for other analyses. 2. Clearly the morphing needs to be calculated for each parton shower program in a consistent way, which is what is done in the analysis. 3. Jetmet sees as a plus the fact that the morphing has been done for all the years, whereas the official corrections exist only for 2016.

  • Is the QGL included as a new variable in the DNN? Sorry, I don't find this clearly stated anywhere

Green led Yes they are included. The detailed list of the variables used as inputs for the DNNs can be found in Table.11 Pag 59 of ANv5.

  • Fig 85 and 86: How are you defining a quark and gluon jet? This is a really nasty definition itself, and surely there should be some uncertainty associated with the predicted fractions and distributions from Pythia and the specific tunes. Do you show the results for the two tunes separately at least?

Green led The parton flavour associated to the Jet is taken from the official recipe using the Jet_partonFlavour branch in the NanoAOD. Effects of the parton shower uncertainties are considered on all observables as well as QGL variations. More plots including 2016, for the different tunes, added in ANv6.

  • I guess you want to submit this to PRL? It would be good to give your justification to Boaz ASAP so we can confirm whether or not this will be supported by pubcomm. My preference would be to have the longer PLB format so we can describe the analysis more clearly.

Blue led The authors do not have a strong preference towards PRL or PLB. Given the promising expected result for the first observation in this channel, a short paper on PRL can be interesting, but also a longer description of the analysis in PLB format is perfectly fine. Given the longer format, some more time would be needed for the typesetting of the paper draft, but we don't think that this can be a delaying factor for the preapproval of the analysis.

  • It's true that the statistical procedures rely on knowledge of the signal shape. But, from a physics perspective, it's entirely reasonable to ask just how sensitive you are to the knowledge of the signal shape. This can be done by, for example, testing the signal strength and its significance with one model acting as the signal prediction and another acting as the data. For example, the ATLAS Z VBF paper does this with three models: https://arxiv.org/pdf/2006.15458.pdf. In CMS, we've done this with multiple generators in the past, or in the WZ+WW VBS analysis, comparing the impact of applying or not the EW corrections to the signal. I think such a study in this analysis would be very valuable, given that the shape of the signal prediction is quite leveraged by the DNN. One can very reasonably ask whether your observed significance of WV VBS production would be robust if the true WV VBS process is not fully described by the WV VBS MadGraph LO simulation. I suggest thinking of ways to test this. A few ideas: Calculate WZ or WW at NLO QCD with POWHEG, and apply these corrections to your signal Apply the WW or WZ EW corrections in mjj to your signal Try another parton shower and apply the (gen-level) corrections to your signal Note that we see such questions from ATLAS reviewers very often. Doing this study now will make your work more convincing

Green led Any effects that modify the signal shape and are clearly identified as sources of uncertainty in the data analysis are accounted for as a shape systematic uncertainty in the fit performed by combine, which we think gives solidity to the result of the analysis. The precision with which the modelling of the signal is known theoretically is of course a source of concern in general. In this particular case, the generation used is the most precise existing: NLO EW and QCD corrections have never been calculated for the semi-leptonic final state, and according to theoreticians (we have contacted our theoretician collegue M. Pellen after your request to ask for advice) their achievement would deserve a paper on its own. Therefore, any approximate exercises using corrections calculated with fully-leptonic final states would not add to the precision of the analysis, because there is no way to assess how realistic such an approximation would be. Choosing so use an alternative LO generator, like Sherpa, exposes us to two risks: the first one is that in CMS there is not enough experience in the use of such a generator, which may lead to an incorrect use of it; the second one, even more relevant, is that the ATLAS Collaboration clearly showed that, for the VBS case in particular, it is the generator which is further away from data. In more general terms, the comparison studies performed in final states like the EW production of Z + 2jets happen in a case where the generation is known much better and the statistical uncertainty of the analysis is far smaller (the 5 sigma is largely achieved with the data available in 2016), that are two conditions not met in this case.

  • For the fit with rate parameters, you should not have the same set of uncertainties as without, because any uncertainties that are purely normalization should cancel out. It's not really clear at the moment if this is the case.
Ans: Since the convolution of a flat prior with a gaussian prior is a flat prior, effects that change normalization on top of a flat prior (a.k.a. rateparam) has no effect. This effect, that is a pure didactical test, has been studied already in run I with what it was called lnU prior and at the beginning of run II in WW analysis. I think I follow that the normalization uncertainties on backgrounds will not add additional uncertainties wrt the rate parameters. But does this mean that I can't really look at the impact plots and tell how much of the uncertainties impact the result? It would be really helpful to provide an explanation that I can follow on a physics level here that helps me understand the results.

Green led The overall normalization effect of QCD scale and parton shower uncertainties has been removed from samples in which rateParameters are included in order to disentangle the normalization effect. The normalization has been implemented in order to have the same event yields for nominal and varied shapes summing all the channels in which the sample rateParameter is included.

08/12/2020 Changelog for analysis AN v5 link

  • New non-prompt estimation and fixed trigger efficiency : corrects for discrepancy in electron MC. Closure test for non-prompt estimation added in the AN (Sec.5.3).
  • Brand new binning for data-driven W+jets estimation using leptonic W pt. (Sec.7.2)
  • Closure test for the data-driven estimation (Sec.7.3) and added plots showing the effect of the W+jets subcategories rescaling on prefit distributions (Sec.7.4).
  • Updated fit results and impact plots (Sec.9)

General questions and discussion on V4 of the AN

  • Relevant to AN v4: Follow up to previous discussion: Have you considered any study of dividing the W+jets samples based on gen-level jet pt and eta rather than reco level?

Green led No significant differences have been observed when splitting the W+jets sample into RECO-based or GEN-based categories

More details: GenJet collection have been analyzed to extract the highest mjj GenJet pair. DeltaEta of the pair and Pt of the second selected GenJet have been used to split the W+jets sample using the same categorization of the data-driven method on reco objects. We have observed very little difference with the respect of categorization done on reco level jets, as show in the second plot where the bins corresponds to reco-level categories.

imga4f8f0c6fbe244d173d608c1e4a532a0.png
Deltaeta VBS jets in resolved ele W+jets CR

img501c2080cba91116a2d437fa6c39ffbd.png
Deltaeta VBS/ trailing VBS pt bins for data-drive estimation in resolved ele W+jets CR

Follow-up by Kenneth: Have you performed the full expected signal+CR fit with this setup? I'm not very surprised that the categorization doesn't change dramatically, but the behaviour of the fit should change. For one, In the RECO binning most uncertainties that affect the normalization cancel out. For the gen binning, you should still have RECO uncertainties and only the GEN uncertainties cancel out. I would also think that a more granular binning of the etajj/ptj distribution you fit makes sense in the GEN case, so the fit has a bit more tension and the uncertainties are more meaningful. If you then look at the post-fit results, you should be able to separately look at the pulls for the RECO uncertainties and the shift in the GEN distributions, which at least to some extent tell you the extent to which the discrepancies are from modeling or from detector/reco issues.

Green led The question is outdated by the new binning chosen for the analysis. Since the binning at gen level has no effect for our analysis, given the wjets being a background that has a data-driven component, we stay with the reco level binning. In the new implementation of the analysis, see AN v5, the binning is performed in bins of reconstructed W transverse momentum, binning motivated both by experimental evidence of not good description of this variable by the current accuracy of MC generators, and by theory calculations shown in gen meetings.

  • For the fit with rate parameters, you should not have the same set of uncertainties as without, because any uncertainties that are purely normalization should cancel out. It’s not really clear at the moment if this is the case.

Green led Since the convolution of a flat prior with a gaussian prior is a flat prior, effects that change normalization on top of a flat prior (a.k.a. rateparam) has no effect. This effect, that is a pure didactical test, has been studied already in run I with what it was called lnU prior and at the beginning of run II in WW analysis.

  • If you are including the normalization component of uncertainties on the data-corrected backgrounds, why?

Green led Until now the normalization component of the uncertainties on the data-corrected background has been included. Although it is more tidy to keep the shape and normalization separate, it has been shown (and it is mathematically equivalent) that keeping shape and normalization effects is the same. This topic was first discussed in Run 1, where we talked about lnU, since rateparam were not yet been implemented, i the context of the H>WW search and top/WW control regions.

  • Fig: 8. In general you have good consistency across bins and across channels, which is good to see. But there is a consistent offset in electron vs. muon scale factor, where the electron scale factor is always ~10-20% larger. This makes me think that you’re missing something in the electron efficiency estimation. Could it be the trigger efficiency, for example? If the corrections for the electron and muon channels can be unified, this would give more confidence that the data/MC corrections are related to the jet modeling and reconstruction only

Green led Data/MC disagreements in the electrons case have been understood and it has been fixed by means of a more proper calculation of that fake rate. In AN v5 the discrepancy is not there.

  • Appendix A.5: I don’t find this study exceptionally convincing. If you just move the mu for the expected signal, the fit can find perfect agreement by just pulling the mu, and I would never expect the background to get pulled. I think you need to make a study that introduces some shape variations, perhaps drawing toys per bin from background+signal+unc. Then compare the normalization per bin to the normalization from the toy and see if there is any bias that is evident over many toys.

Blue led We do not understand the rationale behind the request. As in any other CMS analysis, the likelihood ratio is built on the knowledge of the signal shape and the bias tests are meant to show that the analysis is able to distinguish the background from the signal. An artificial signal built as a mixture of signal and background will by construction show a bias in the result, since it is constructed to partially behave like the background (a simple RooFit exercise can show it). The background variations within the post-fit uncertainties, on the other hand, are accounted for by Combine.

03/06/2020 Comments from Conveners on v4 of the AN-19-239

*Link to AN appendix with details about fit tests and answers to questions*

  • Table 2: for single electron, in principle you could have used HLT_Ele32_WPTight_L1DoubleEG (with just a precaution concerning the L1 filter) and lower the offline pT threshold to 35 GeV (maybe it wasn't possible in the Latinos framework?) Do you know how much acceptance (and sensitivity) you are giving up with an offline cut of 38 GeV? Looking at final distributions (e.g. Fig. 46), the signal yield in the 2017 electron channel seems significantly lower than in 2016. By the way, also for single muon in 2017 you could use HLT_IsoMu24 OR HLT_IsoMu27 (HLT_IsoMu24 was only pre-scaled for a short time, ~3/fb). Clearly this is pointless if you cut at 30 GeV offline. But have you considered lowering the muon pT threshold to 27 GeV for all years? Would it help?

Green led The detailed answer can be found in the appendix B of the AN (see link above). A test has been performed looking at the 2017 MC without trigger efficiencies. Lowering the electron Pt threshold from 38 to 35 GeV would increase the signal yield by ~ 5 %, but also it would increase the W+jets background of 5%. Lowering the muon Pt of 3 GeV (the test has been done from 33 GeV to 30 GeV since the Pt > 30 GeV was preselection of ntuple skim) would increase the signal by 8 % but the W+jets would increase of 10%. In our opinion , given the fact that including the prescaled trigger for muon or the electron trigger with 2 L1 seeds would require additional technical details as well as lepton id/Iso scale factor re-derivation, the increase in signal efficiency with the respect of W+jets background does not justify the need of changing the thresholds.

  • Table 4: the leptonic decays include taus, right? (It seems so from the cross sections)
Green led Yes, tau decays are included.

  • Table 5: the 2016 DYToLL-M50 sample seems to be missing (I only see the DYToTauTau one). Is it just an oversight, or is there a reason for not including it?
Green led Ths samples list is now updated in AN v5.

  • L 242: do you know how much signal efficiency you lose by removing this 2.5-3.0 region? Could you recover it with a tighter PU jet ID (only in |eta|=2.5-3.0, also for pT>50 GeV)?
Green led L242 needs to be updated because already in the current plots (v4) tight PU jet id has been used to exclude events with tagged jets falling in that eta region instead of complete removal. Around 7% of the signal in the resolved category, and 5% in the boosted category is removed using the tight Jet PUID in the horns regions. This cut was been done a-posteriori but it now it is included as a preselection for the jet tagging in the v5 of the AN to gain back some efficiency.

  • Figs. 6-21: what uncertainties are included in the error band (~20% everywhere)?
Green led All the nuisances included in the analysis until now (listed in chapter 8) are included in these plots.

  • Figs. 6-21: is the nonprompt background from data in these plots? Is it clear why it is so much smaller in 2016 than in 2017-18 in the electron channel?
Green led The non-prompt background is estimated in a data-driven way using the Fakable object technique. The non-prompt estimate has been updated and recomputed correctly in the v5 of the AN

  • Sec. 7.3.1: do you use the non-closures in this test to evaluate some additional uncertainties on the W+jets background? The overall normalization in each subregion seems very good after the fit. But the jet pT distributions still show some larger discrepancies, which can have an impact on DNN. Can you add the DNN distributions in the W+jets CRs after the correction? (You have them in some presentations.)

Green led Updated postfit plots of the DNN in the control region for the AN v5 wil be available ASAP

  • Figs. 40-44: I guess these will be updated with the new binning of L 534-539. Also, please add plots of the DNN spectrum before and after the correction.

Green led OK - superseeded by V5

  • L 621: what is the initial value of these uncertainties, before the fit?
Green led The effect of the QCD scales on W+jets samples is ~ 20% in both signal and control regions.

  • Fig. 45: clarify in the caption that these are DNN distributions (x axes have no titles and the labels are very small to read)

Green led OK

  • L 687-688: fix the references

Green led OK

  • Table 17: do you know what drives the lower sensitivity in 2017? The main differences I see are (1) the higher electron pT thresholds, (2) the exclusion of jets with |eta|=2.5-3.0, and (3) the larger nonprompt background compared to 2016 (also true in 2018). Points (1) and (2) could be improved in principle (see comments above).

Green led The higher pt thresholds for lepton should account for 5% signal yield loss, the tighter JetPUID also for 7% signal loss. The non-prompt contribution should be correct (the 2016 needs to be updated). These factors should explain the different performance.

  • Fig. 49 (impact plot): should we ignore this, and just look at Fig. 50?

Green led OK

  • Fig. 50: all the parameters related to the W+jets normalization have similar impacts. Maybe it would be clearer to have a version of this impact plot where all these parameters are grouped into a single systematic category, to see the cumulative effect of the W+jets normalization procedure. This detailed version could be left here or moved to an appendix.
Green led Likelihood profile plots with and w/o these systematics have been produced. It will be added also for V5

  • Figs. 51-53: very nice! Just a couple of comments: - as I said before, it would be useful to add the post-fit DNN distributions in the W+jets CR, either here or (better) in Sec. 7.3.1, to check the W+jets agreement also in the high DNN region; - what uncertainties are included here? In L 717 you say they are from the fit. But why are they so much smaller than the ~20% you show in the pre-fit plots? Are they really constrained this much?

Green led All the pre-fit shapes also in control region have been added in AN v5. We will provide post-fit plots for dedicated distributions in the next presentation

28/04/2020, K. Long private email, on this version of the AN

Green led it will be done.

  • For example Higgs Boson → Higgs boson, Standard Model → standard model, Vector Boson Scatter → vector boson scattering, CMS collaboration → CMS Collaboration
Green led done.

  • There are also a handful of typos
Green led fixed

  • If you’re going to have the “Section X describes” text you should probably describe all the sections
Green led part removed

  • Ln 80: This isn’t a complete sentence and I’m not sure what it’s meant to say
Green led fixed

  • Why are you using NanoAOD v5 for 2016 and 2017 but v6 for 2018? I assume the intention is to move fully to v6 (and possibly v7 when it’s available)?
Green led Yes, now v7 in the latest AN

  • Ln 103: The table ref is broken
Green led fixed

  • Table 4: Does “in production” mean you are using private MC for the time being?
Green led For the time being we are using 2018 signal

  • Ln 110: I don’t know whether v51 is a typo or a reference to your internal post processed samples
Green led fixed

  • Ln 115: Add the pileup profile
Green led sentence removed

  • Ln 149: I don’t find the details in section 5
Green led It will be described briefly, but the correction is a standard one dating back 2015

  • Ln 190: Booted → Boosted
Green led fixed

  • Ln 202: So you would rather have a pair of jets with mjj = 85 GeV than with mjj = 90 GeV or 80 GeV? Why not just collect (mjj-mZ, mjj - mW) for each pair of jets in the event? I expect the ambiguity is very small but the latter definition is more logical
Green led The impact in the analysis is negligible. We studied in the past and we decided to go with the simplest solution algorithmically

  • Ln 199, and more: try to be consistent with notation. V should be in roman. GeV should always be in Roman (e.g., Ln 206, 207)
Green led Fixed. These questions will be consistently addressed when writing the paper draft for publication

  • Ln 201, 204: tag jets, b jets
Green led done

  • Ln 211: In principle the normalization of everything is derived by the fit. If you mean that the normalization is unconstrained in the fit, please say that
Green led done

  • Ln 216: betweeh → between
Green led done

  • Ln 230: I’m not really sure what you mean by “instabilities”
Green led change to "effects"

  • Section 4.2: Please provide a minimal definition of your lepton selections rather than pointing 100% to another AN. It seems the muon ID is basically just the POG tight CB ID, but the electron ID is quite a bit different. Can you also justify why you go with this ID vs. a POG ID?
Green led We can add few lines, but, as done in previous documentations, it is more useful for the reader just to refer to something already published/used, so that we all don't spend time in checking already checked things

  • Can you also make a table summarizing all the cuts in your signal regions?

Green led OK

  • Ln 276-277: I think mV (rather than M_W) would be more clear here, unless I’m missing something)
Green led agreed

  • Ln 290, 305: Again I’m confused. Do you really mean W? Can’t it be a W or Z, so mV?
Green led agreed

  • Ln 294: Reference the figures explicitly rather than “in the next pages”
Green led fixed

  • Ln 296-297: This is not really convincing. How can a 15% contribution from W+jets be the source of up to 40% discrepancies in lepton eta/pt and ptj, for example?
Green led We have improved (next iteration of the AN) the purity of the top control region. The discrepancies decrease, as expected.

  • In many cases, the data/MC agreement is really not good, and specific comments on these distributions should be made. I spent some time going through the plots in the VBF H mumu analysis (AN-19-205) and the Higgs invisible VBF analysis (AN-19-243) to try to understand if the disagreement you see is consistent with their results. The VBF Hmm analysis seems to generally look a bit better, they use the NLO DY sample for the Z+2j background. The VBF Hinv uses the same LO samples in their control regions, and sees similar for the mjj and etajj, but with more agressive MET cuts.
Green led the total number of jets entering the analysis in this case is larger with respect to those final stated. The comparison between analyses is not expected to show consistency. A data-driven technique is implemented to cure these discrepancies, be them in the LO or NLO simulation. The level of agreement in the control regions, on the other hand, is similar (beware of the range in the ratio plots and the number of bins shown).

  • More investigation is needed, also with the other analysis studying this process in the SMP-VV group. I understand you are still waiting on some samples to finish to use NLO W+jets, but you could at least make some test plots for some years using the jet-binned samples at NLO.
Green led The NLO will be used for the samples as soon as the processing is over.

  • Ln 324: Again, you can leave the details to the reference, but at least some words on the technique are needed. I would also suggest using “Nonprompt” in the plots rather than “Fake”
Green led Done in the text. The plots will be modified when updated

  • Ln 327: Comprehends → comprises
Green led done

  • Section 6: What framework do you use for the DNN? TMVA, Keras? It’s useful to say.
Green led Added

  • How have you decided on your network architecture? How long does training take and on what machine?
Green led We have observed that in general large, well regularized, models perform better than a shallow network with fewer parameters. Therefore we have started with a medium-sized model with around 4 levels and 50 nodes and tried to increase its dimension. In case of performance increase, we have proceeded in making the model larger, in case of overtraining the model dimension has been decreased. A procedure for automatic optimization of the network hyperparameters is under development.

  • Ln 389-392: What is the fractional split used to form the test/train datasets?
Green led It is 80% training and 20% validation.

  • Ln 410 and 411: < and > signs have some issues
Green led Fixed

  • Fig 20: The caption has some spurious text.
Green led Fixed

  • Fig 20: Can you describe more clearly in the caption why it is a clear overtrain?
Green led The loss in the validation is stable and almost increasing, while the loss in the training goes down. The network is learning too much about the training sample

  • Table 11: I don’t find the definition of quite a few of these variables
Green led done

  • Ln 505: Is the mW here computed with lepton+MET? It’s really mass and not transverse mass?
Green led Mass, from jet(s), "W_had"

  • Ln 509: Don’t you fit the rate parameters for the W simultaneously with the signal region?
Green led Not in this closure and validation test, otherwise we would be unblinded. This is a closure test of the method

  • Fig. 27 (Fig 29): Are the right (top right) plots the distribution you actually fit? Why not use more reco bins than GEN?
Green led Yes. They are reco bins. The more bins the more degrees of freedom you give to the system. The proposed number of bins makes sure to correct for the discrepancies observed in control regions (the data-driven method for Wjets, LO or NLO) without spoiling the analysis performance

  • Section 8: Please show the distributions you use in the fit for some illustrative processes (signal, major background) for the leading backgrounds, especially for the JES and theory uncertainties

Green led OK

  • Ln 611-616: If you don’t have final implementations, you should at least add a dummy uncertainty that is likely to capture the effect
Green led OK. The pu reweight is expected to be small, since the effects on leptons is negligible as already taken into account by the scale factors, and the effect on jets is included in the nuisances

  • Ln 636: “Initial fit on data” What does this mean? Isn’t there one simultaneous fit?
Green led There are two kind of asimov toys that can be performed: the "data asimov" (a.k.a. post-fit asimov) and the "MC asimov" (a.k.a. pre-fit asimov). The difference is that the nuisances could be prefitted and their best estimated value be used in the asimov toy, or we can just take the nominal nuisances values. The difference between the two approaches is small if the analysis is full of lnN priors and it has very few and with negligible effects rateparam (flat prior). The difference could be visible if the major nuisances are rateparam, that are supposed to be fitted in the final fit. Using the "data" asimov" is then more reasonable, being more close to the description of the background distributions we expect. In Run 1 we used to normalize by hand some of the backgrounds to take into account this effect. In run 2, having all these new tools available in combine, it is not needed and we can perform this scaling, in an even more correct way, on top of the final datacards.

  • Section 9.2: Where is the distribution of the DNN score over the full 0-1 range?
Green led It will be added in the next iteration of the AN.

  • Please show more information from the fit, especially impacts of the uncertainties and the postfit corrections (rate params for W normalization)
Green led All impact plots will be provided in the next iteration of the AN

-- Davide Valsecchi - 2020-07-20-- Davide Valsecchi - 2020-07-20

Edit | Attach | Watch | Print version | History: r41 < r40 < r39 < r38 < r37 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r41 - 2021-02-23 - DavideValsecchi
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback