PAS EXO 09-013 - Feedback to ARC comments

In this page you can find answers to the questions arose during the approval procedure, along with feedback to Analysis Review Committe comments

General comments

T.Dorigo

  • The current draft is too thick. This is not a comment about the number of pages (15) which is however too large, but rather on the prose, and on the demand it makes on the reader. I suggest to rewrite and shorten significantly sections 2, 3, and 5.
Done at our best. We might need some extra ARC suggestions for a further compactification

  • Some parts are obscure. It requires some back-and-forth to understand that you compute the MHT with Et>30 GeV jets, but then count only Et>50 GeV ones, and select up to two of them. Other similar examples exist. Some rationalization of the text is in order.
All selection procedure now summarized in L77-105

  • The ARCs believe we should rethink the list of figures that go in this document. We found that some of the variables used in the selection, which are not shown, should be made, before a proper choice is possible. You have gone through this process but we haven't, so maybe providing us with more information would help.
Many extra plots shown privately to the ARC. Few of them added to the Analysis Note

  • Throughout the text one sees predictions for 100/pb, 200/pb, limits for one or the other, significances as a function of luminosity. We should stick to a single number.
Done

  • In general, plots have too many bins, and too many different distributions are shown together. This causes confusion.
All distributions have been revised after the first PAS version

A.Meyer

  • Figures are not publication quality (compare old PAS for example), and some of them are very hard to read. You should make more use of histograms (instead of markers), avoid shaded boxes, add (a), (b) labels and so on.
Done

Physics Issues

T.Dorigo

  • Please explain in more detail what you do to get the PDF errors. Do you actually test the different eigenvalues ? We believe you do, and we would thus like to see the independent variations of the acceptance caused by each. This is a ARC request.
Clarified during private discussions. The PDF individual acceptance variations for the 22 eigenvectors have not be put in the Analysis Note, but are available in one of our logbook pages

  • The procedure by which the limits are calculated needs to be explained in more detail. Perhaps here if you provide us with a full list of numbers, for a specific case (say delta=4, 100/pb) as an example, and a flow chart of your calculations, you will make us more confident that what you do is correct and meaningful. At zeroth order we trust you, but our job is to mistrust you, so please convince us that what you do is correct.

The procedure has been explained to ARC and detailed in the Analysis Note

Discovery reach:

    • a Likelihood for the S+B hypothesis has been chosen, where probability for number of N_S and N_B is Poissonian and uncertainty on average N_B is Gaussian, with sigma_B from stat and syst
    • a significance is extracted out of this Likelihood using the Profile Likelihood method. The "correspondence" detailed in Cousins' paper cited and mentioned also in our previous AN is used to perform calculation (easy to extract significance with N_S, N_B and sigma_B)
    • that significance is reported as a function of MD
    • Signific vs MD is fitted with an expo

Exclusion limits:

    • an expo extrapolation for signal acceptance as a function of MD is performed
    • nuisance parameters due to syst on signal are introduced, by assuming a Gaussian prior and convoluting with the likelihood
    • with the Profile Likelihood approach, the NLL is scanned as a function of parameter of interest, MDin this case. This has been performed with RooStatCms
    • from the NLL as a function of N_S, the value for 95% C.L. is derived
    • this has been repeated for many (~100) lumi points. Syst errors on sign and background are scaled with lumi for syst errors and with sqrt(lumi) for stat errors

  • We believe your list of backgrounds should include diboson production. In a recent analysis by CDF, the signal of dibosons yielding missing energy and two jets was extracted, for the first time above 5-sigma with a boson going to jets. In the CDF analysis the top background, once missing Et and two jets are requested, is way smaller than the diboson signal. We request you to run some MC sample of WW, WZ, ZZ, to ascertain it is not a concern. Our estimate is of the order of the present ttbar contribution.
Done

Cross-sections are:

WW+jet: 44.8 pb ZZ+jet: 7.1 pb WZ+jet: 17 pb

Results of our analysis are the following:

all cuts WW ZZ WZ_incl
00_ALL, 0 8960.4 (100.0%) (151531 MC) 1420.0 (100.0%) (47825 MC) 3520.0 (100.0%) (197080 MC)
09_MHTs, 250 20.3 (0.2%) (344 MC) 5.4 (0.4%) (182 MC) 10.6 (0.3%) (596 MC)
11_JEMF1, 0.9 15.7 (77.0%) (265 MC) 4.5 (83.0%) (151 MC) 9.0 (84.7%) (505 MC)
12_JEMF2, 0.1 14.8 (94.3%) (250 MC) 4.4 (98.7%) (149 MC) 8.7 (96.4%) (487 MC)
12_TIVs, 0.1 3.1 (20.8%) (52 MC) 3.1 (70.5%) (105 MC) 4.7 (54.4%) (265 MC)
14_Jet1_pt, 200 2.4 (78.8%) (41 MC) 2.6 (83.8%) (88 MC) 3.9 (81.5%) (216 MC)
15_Jet1_eta, 1.7 2.2 (92.7%) (38 MC) 2.3 (89.8%) (79 MC) 3.3 (86.1%) (186 MC)
16_Jets_n, 3 1.8 (81.6%) (31 MC) 1.8 (78.5%) (62 MC) 2.8 (82.8%) (154 MC)
17_DPhi_Jet1_MHT, 2.8 1.8 (96.8%) (30 MC) 1.8 (96.8%) (60 MC) 2.6 (93.5%) (144 MC)
18_DPhi_Jet2_MHT, 0.5 1.7 (93.3%) (28 MC) 1.7 (96.7%) (58 MC) 2.4 (95.1%) (137 MC)

Therefore, they can contribute around 2% of the total background (comparable to top contribution as expected). No visible effects on the sensitivity reach should be expected, so at this stage we propose to neglect it.

  • We acknowledge that you do not wish to run cosmics data to nail the cosmic-ray backgrounds. We believe this would be a great improvement of the present analysis, but we understand it is time consuming. However, we believe you should at the very least provide some more quantitative estimate, if not of the backgrounds, at least of the signal loss by the cuts you apply to reduce cosmic-ray backgrounds and beam events.

The following actions have been undertaken, and detailed in the PAS

In order to suppress cosmic background, at least one vertex coming from interaction point and at least two pT>5 GeV tracks inside leading jet cone were requested. The effects of the machine-induced background and cosmic rays have not been included, but we are confident that the devised set of cut reduces this kind of effects to a negligible level; in particular, the beam halo is expected to be removed by the jet electromagnetic fraction lower cut. As an additional check, the number of tracks closest by DR<0.5 to the jet axis have been considered for a sub-sample of signal events. When the signal is also asked to have at least one primary vertex and 2 or more jet tracks with pT>5GeV, less than 1% of events are rejected. Such a cross-check demonstrates that the proposed procedure allows to select jets coming from genuine collision events, while objects that are displaced from the interaction point (as cosmic rays) do not contaminate the sample.

H.Flacher

  • Trigger: I do not share your confidence that the HLT_jet110 will remain unprescaled throughout the first physics run. Did you check the next higher threshold HLT_jet180 which is just below your offline requirement?
Done. Reduction on acceptance in the background (about 80%) is larger than in signal (about 40%), but most of this gain is compensated by a smaller efficiency on background of MHT cuts, in a way that final numbers and S/sqrt(B) are pretty the same. The HLT_Jet110+ path is thus considered an optimal choice. Detailed numbers available.

  • I understand that you select up to two jets, with jet1 > 200 GeV and jet2 > 50 GeV. Is there an upper cut on the pt of the 2nd jet?
Only a pT>50 GeV on all the jets

  • As an additional jet is assumed to come from gluon radiation from the first jet, does it have to be in the same hemisphere as jet1? I understand that the dphi(MHT,j1) and dphi(MHT,j2) cuts somehow address this but would be interested in seeing a plot of dphi(j1,j2) and possibly a correlation plot between pt_j1 and MHT or pt_j1 and pt_j2. I believe if the difference between pt_j1 and MHT is large enough one can have a second jet with pt>100 GeV pointing in the same hemisphere as MHT.
A pT spectrum of second jet after all selection was provided during the discussion. If compared with spectrum of leading jet, we can see that events with 2 jets after all selection are about 1/4 of total. From the transverse angle between MHT and jet 2 after the selection (provided during the discussion) we can see that in the signal only about 20% of events seem to have a secondary jet opposite to the MHT: therefore, a di-jet configuration could occur in about 5% of signal events.

  • for the MHT calculation you use calibrated jets above 30 GeV. This seem fairly low to me, especially in for the first physics run. Did you verify with the JetMET group that such low jet pt are meaningful for calibrated calo jets?
We used a 20 GeV threshold as defined in the proper POG, then increased to 30 GeV to avoid fake jets due to noise

  • lines 217 -219 if the two sets of cuts are uncorrelated as you write, shouldn't the total efficiency be equal to the product of the two individual efficiencies?
Yes, some typos there

A.Meyer

  • L64-65, statement that there is no striking dependence on delta or M_D, therefore one set of cuts: this does seem like a convincing argument at all, given that (a) supporting plots are missing (jet ET?), and (b) the signal efficiency varies by a factor of 2 (!) between different parameters (Tab. 4).
Correct, rephrased

  • L135: I would guess that leptons being outside the tracker acceptance is a much bigger effect than being "too soft". If so, should add that.
Correct, rephrased

  • L161-163: unclear - did you use error PDF's? Probably yes, as in the old analysis, but not clear from the text.
Rephrased

  • L232-237: I would strongly suggest to remove the paragraph. Firstly it appears to be "work in progress", secondly you argue in the text before that b-tagging is not suitable because of the "early data" environment, and thirdly you don't really need it.
Done

  • L297 and following: I find Ref. [23] insufficient to understand which statistical methods have been used. Did you use a self-written program to do the calculation? Have you verified the result against another method?
see above

Style Issues

T.Dorigo

  • Abstract: "for the LHC pp collider" -> "of the LHC pp collider".
Done

  • Table 1: Move it two paragraphs down.
Done

  • Line 1: why not starting with Section 1 ? It looks funny to have the introduction start at line 8.
Done

  • Line 2: "a single jet": this is the kind of specification which is bound to confuse. Why "a single jet" if you later choose to have up to two jets ?
On L26, explained that "plus possible less energetic jets due to initial or final state radiation"

  • * Line 27: "initial-final state radiation": unclear what you mean by joining the two with a hyphen. Suggest "initial and final state radiation".*
"initial or final state radiation"

  • Line 28: undetected needs no hyphen.
Done

  • Line 30: "The standard model process Z+jets" -> "process of Z+jets production".
Done

  • Line 33: "and subtracted": why ? It does not need to be subtracted, estimating it is enough.
Done

  • Line 34: "if the lepton e,mu,tau is not reconstructed" -> suggest just "if the charged lepton is not reconstructed".
Done

  • Line 34-36: this sentence needs a comma before the last "and", as proper English in a list. If you want to keep single top and top-pair together, then "produced), and top pair and single-top quark production". Otherwise "produced), top pair, and single-top quark production". Note the commas.
Done

  • Line 37-46: why making a big deal with Pt0 ? Just say what jets enter the calculations.
Not done, as could be misunderstood as MHT defined only for pT>30 GeV

  • Another error is to start a discussion of jet reconstruction (R, eta cuts, JES, etc) within a paragraph meant to discuss MHT. Please separate the two issues, by starting with a description of what you call "jets" and what are the energies you quote, before you define secondary variables.
Done, moved to L62

  • Line 47-51: "the distribution" -> of the MHT defined as you do. I understand and concur about the reason for the MHT in place of MEt, but this paragraph is frankly expendable, to reduce the burden on the reader. Just say what you use and define it.
Done

  • Line 71: "after slicing" -> "in adjoining slices of pthat".
"in slices of pthat lower cut", they are not consecutive

  • Line 80: again a switch to the past tense, after two sentences in the present tense. I will omit further pointing out of the inconsistent text, but will request that it is all fixed and tidy before I sign off a PAS which gets frozen.

Done

  • Line 88: "unprescaled" is jargon. We do not need to specify that we believe the trigger will stay what it is. Please remove the sentence; just mention the rate.
Done

  • Line 106: Why applying a cut on the third jet in an event if you later discard three-jet events ? Please change the sentence and make it relevant to the analysis. Please remove "around 60%" and substitute the actual number.
Rephrased

  • Figure 1 caption: you list the MHT cut as if it was a selection on jets. Please rewrite. Suggest "Number of jets .... after the MHT>250 GeV selection, for jets with ...".
Done

  • Line 122: and leading ... what ? "and leading jet". As for the second jet, this is simply not true: the distribution of DF(MEt,J2) is flat for the signal. Please remove the specification within brackets.
Done

  • Please rebin figure 2 by at least a factor two, error bars are large and hide the information rather than providing more of it.
Done, as the other plots

  • Line 129: you cannot find 0.2 events. You find N and after a normalization you estimate 0.2. Please rewrite the sentence.
Done

  • line 135: Actually, most of the not-found leptons are not soft but at high rapidity. Please rewrite the sentence.
Done

  • Line 141: you mean 200 /pb, not 100. Please be very careful, we do not want to find such typos in the final version of the PAS.
Done

  • Line 144-146: what is a "valid vertex" ? Also, "inside leading jet cone" -> "the leading".
Done

  • Table 3: numbers with FIVE or more digits require a comma every third digit. Table 4: same as above.
Done

  • Line 147: we would like to have numbers for the acceptance of the cosmic-removal cuts. Maybe not for the PAS, but please provide them.
Think that the "1%" is enough

  • Line 160: "with next orders computations" -> this is not too informative. Of course we expect it to improve. Please remove the specification.
Done

  • Line 177: can be assume -> assumed.
Done

  • Line 181: data sample -> samples. Since you say here that what follows has been already explained in the previous PAS, why not reducing the whole section ?
Not done. Need suggestions about what can actually been removed without lacking important informations

  • Line 202: the hypothesis ... has a high purity ? Change the verb or the noun.
Rephrased

  • Line 214: 0.5 muons found -> cannot find fractions of events. "Estimated" is more correct.
Done

  • Line 219: please explain these numbers, which do not make sense to us.
Rephrased

  • Line 222-231: it is very good to estimate the power of our early data to pinpoint the rate of ttbar and single top, however I wonder if this is too much detail for a PAS.
Not removed

  • Line 232-237: this needs to be removed. We do not need to quote "preliminary analyses" here, nor is there any need to quote their results.
Done (but specified in the Analysis Note)

* Line 248: why quoting a number belonging to the PDG here ? simply quoting the PDG will suffice, and the text becomes lighter. You mean sth like "With Br(t->nnµ) from Ref.[19], the method described produces"? For the moment we would prefer to spell out

  • "demonstrated to be constant" -> "has been demonstrated to be constant".
Rephrased

  • Line 259: "it has measured to be " -> "it was measured to be".
Rephrased

  • Line 291: this number cannot be reproduced using your numbers. Please check it.
Done, typo

  • Line 309: has exploited -> has been exploited.
Rephrased

  • Line 315: why reference to 100/pb ? Use 200 as elsewhere.
Done

  • Line 327: "has been tuned" -> "have been tuned".
Rephrased

  • Line 333: given the uncertainties, a decimal place on the number of inverse picobarns (10.5/pb) is excessive. Similarly I suggest 3.1 (2.3) TeV in the line above.
Done

  • Line 344: add the page number of this reference. Same, on line 346, 348.
Done

  • Line 359: Who is J.A. ? Put a correct name here.
Done

  • Appendix A: nice for a internal note, but not needed here. It looks apologetic. Please remove it and add it to the note.
Why apologetic? In the main text we mention the results of the estimation, in the appendix we put details about how the estimation has been performed

H.Flacher

  • Table 5: What has the lumi to do with these systematic studies? Drop it from caption
Done

  • *For the data driven BG estimations, explain what the error labelled (MC) means.
Done

  • line 21: author is the CMS collaboration, say e.g.: in a previous study \cite{} it was found that...
Done

  • Table 3: eff for QCD, second row is not correct, 10-6 instead of 10-4?
No, its a percentage, 288/143,007,728 x 100

  • Table 5 : drop lumi sentence
Done

  • wording of the appendix could be improved, also Table 8 has to many ")" in last column. Is it really 10-2 in 2nd row?
Yes, checked

A.Meyer

  • Table 1: Note should not start with a table, move e.g.to bottom of page.
Done

  • L6-7: remove "from mostly the same authors" (but keep "[1]") - does not make much sense in a public CMS document.
Done

  • L18 and Table 1: you use the D0 Run I result in the table (monojets), but the text only refers to Run II; should be made consistent.
Done

  • tagged -> identified
Done

  • L91-93, sentence about hard gluon radiation and using jets above 50 GeV: this reads totally out of context here; maybe move to later in the text, where the actual N_jet cut is discussed?
Rephrased

  • L93: Fig. 1(left) -> Fig. 1 (a) (many other places... won't mention them all)
Done

  • Caption Fig. 1, "A veto against three or more jet events reject the hadronic objects...": sentence makes semantically no sense, please reformulate (an event veto does not reject any "objects").
Done

  • L113: rejecting TIV < 0.1 -> rejecting events with tracks fulfilling TIV < 0.1
Done

  • A global terminology comment: you freely mix the terms "QCD" (which is a theory, not a background!), "multi-jet" and "multijet". Should use one term and consistently.
"QCD backgrounds" or "multi-jet events" used and specifed

  • L134: prompt -> directly produced
Done

  • Total transverse energy -> The missing H_T distribution
"MHT"

  • L145-146, additional cuts to reject cosmics: at this point in the paper, the entire selection was already presented and summarized. You cannot logically introduce additional cuts here, so this part has to be moved more to the front.
Done

  • Tables 3, 4, 6: Tables need substantial improvement, compare to previous PAS.
    • There is way too much detail.
    • The number of significant digits is often meaningless (143007728.0 QCD events and things like that).
    • The left column (Kinematic cuts, Jet multiplicity etc.) should explicitly list the cuts (as in the old PAS, after commenting on it...), because otherwise the reader has to look through pages of text to find the details.*
Done

  • Caption Tab. 5: "Relative effect" on what? Total number of signal events presumably, but this has to be spelled out.
Done

  • Caption Fig. 3: please avoid to call anything "data" here, because after all nothing is "data" at this point.
Done

  • L279, "some boosted top originating large missing energy tail from W": convoluted, please untangle.
Done

  • L314, "first days of physics runs": I had to smile, I thought we try to avoid such risky statements these days! Suggest to write "with the first physics run"instead.
Done

  • L325: was found to be -> is expected to be (because I could not find any proof that MHT is more robust in the end, it is "only" an expectation, even if well-founded)
Done

  • L327: has been tuned -> have been chosen (you state earlier that no sophisticated optimization took place)
Done

  • References have many mistakes, just a selection:
    • [8] misses the journal (if existing)
Not existing
    • [11] misses the journal (if existing)
Not existing
    • [12] is a (soon-to-be) CMS internal documentation, should be replaced or removed
Done
    • [13] spell out author's last name
Done
    • [16] misses the journal (if existing)
    • [17] is a (soon-to-be) CMS internal documentation, should be replaced or removed*
Done

  • intended as conservative -> an upper bound
Done

Some remarks after Approval meeting

General comments

A.Meyer

  • Figure 1: letter size on the vertical axis is way too small, on the horizontal axis barely acceptable. Histograms are still kind of hard to make out on bw printout. Figure 2: same comment on letter sizes as in Figure 1.
Label and title increased. Not much freedom in histo style, due to many different processes to be displayed

  • Also, it is a very long stretch to have this under the heading of "data-driven background estimation". Your assumption about how the efficiencies are related is of course purely MC-based, and it includes fairly tricky issues like the efficiency of the ILV to hadronic tau decays. I don't know about the opinion of the other ARC members, but in my opinion this has to rephrased carefully, clarifying (a) the assumption that enter and (b) making clear that at this point the W(tau nu) estimate is mostly MC based, with added consistency checks.
Rephrased as "The same region designed for invisible background estimation can be used to measure the W(\tau\nu)+jets contribution in the signal region. The Monte Carlo simulations demonstrates that the muon efficiency for the W(\tau\nu)+jets process is reproduced by the value determined above from the W(\mu\nu)+$ets sample, scaled by the efficiency of the TIV cut. Therefore, the simplest approach is to rescale the N(W(\tau\nu)+{\rm jets})^{Contr} by muon reconstruction and isolation efficiencies measured in the control region." To evaluate the Wtau contribution in the signal region, now the estimate from data-driven obtained in the previous section have been taken. Final discovery and sensitivity limits are unchanged

Physics Issues

H.Flacher

  • In particular I did not see (apologies if I missed it) how the systematic variations of Tab 5 would affect the backgrounds and how stable the data driven background estimation is when applying these systematic variations. I think that for signal alone these variations are not very instructive.
_Since the relevant background have been measured from data, effects resulting in a fluctuation of the absolute number of background events have not been considered. On the other hand, our method can be affected by: syst due to PDF in W/Z x-sec ratios (mentioned); assumption of constant muon reco efficiency and effects due to jet energy scale (the latter now included, about 2%); all the systematic on the We/Wtau and Wmu/Wtau ratio in the signal region, about 13% and 16% (now mentioned). We also verified that number of events in Wmu control region is scaling by systematic effects in the same way as number of Z invisible in signal region. _

  • Table 5: I'm not sure what one can learn from these numbers without seeing the corresponding numbers for the background samples.
See previous answer

  • Also, it would have been nice to see a cross-check of the QCD background with the MadGraph sample.
Done. We performed a cross-check for the MadGraph 100<HT<250, 250<HT<500, 500<HT<1000, HT>1000 GeV samples. In the first two bins no events survive after the MHT cut, in the last bin no events survive after the azimuthal angle cuts. Detailed numbers here.

A.Meyer

  • L218-219: you quote a 0.1% (!!!) systematic uncertainty on the determination of the muon reconstruction and isolation efficiency. I think this needs to be reworked or rephrased.
That is just the error of fitting our efficiency trend with a constant, in the signal region. The contribution of other systematic is now added and amounts to a 2%.

T.Dorigo

  • This 300% systs is in addition to the stat error ? 1 has an upper limit of about 3...
Yes, syst and stat limits were added separately

Style Issues

H.Flacher

  • 93: what is "cone lower cut"?
Replaced with "cone lower bound" (0.02)

A.Meyer

  • Table 1: In a Letter-style publication, this would be dropped. It is sufficient to state the best limit in the text. Table 2: Again, in a typical Letter this would be removed. Just state typical numbers in the text.
Done

  • Page 5: These tables are huge, and certainly would be substantially reduced in a Letter. E.g., one could merge the tables into one; show only one signal and give the efficiency range in the text; merge some of the backgrounds; don't always quote absolute numbers *and incremental efficiency. This is all fine for the AN, but too much detail for a Letter.*
Partial efficiencies removed everywhere, but in the signal final total efficiency. We believe is important to quote all those background sources and different ADD points, since several times mentioned in the text

  • Table 5 could also be removed, there is little information in these %-level fluctuations from point to point. Not to mention that the 3-digit precision on some of these uncertainty numbers is excessive...
Done (but precision in MC is of few % here so we believe 3-digits make sense). Rephrased as "Their relative shift from the value with no systematic effect depends from the ADD points, ranging from 10\% to 16\%. The value of instantaneous luminosity, that can be assumed to have a \pm 10% uncertainty, has been incorporated".

  • L227: for invisible background -> for Z to invisible background
"invisible background" defined in L182

  • References: It is common style to not give titles for publications in journals. If you choose to keep all the full titles, you will have to thoroughly clean them up: x-section -> cross section; use proper math for all the symbols like sqrt(s), ttbar and so on.
Partially Done. TDR automatic compiling machinery not so elastic, especially with BibTeX, so titles left there. Title spelling is that found on CADi or Arxiv

-- LeonardoBenucci - 29 Jun 2009

Edit | Attach | Watch | Print version | History: r5 < r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r5 - 2009-07-10 - LeonardoBenucci
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Sandbox All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback