# Search for new physics with disappearing tracks

## Presentations

Only notable roundtable presentations are listed here.

19 Oct 2018 LL EXO WG Initial lepton background estimates for 2017, optimizing search region for Phase I pixel upgrade Agenda slides
30 Nov 2018 roundtable Settled on signal region binning Agenda slides
14 Dec 2018 LL EXO WG Initial fake background estimates for 2017, initial limits Agenda slides
18 Jan 2019 LL EXO WG Background closure, updates Agenda slides
1 Mar 2019 LL EXO WG Fake background method update, signal corrections/systematics Agenda slides
22 Mar 2019 roundtable Fake background method update, clarification of technical needs Agenda slides
29 Mar 2019 LL EXO WG Full status update, extension of signal to 1000, 1100 GeV charginos Agenda slides
21 May 2019 EXO general Extrapolation from 2017 to high pileup (postponed) Agenda slides
24 May 2019 LL EXO WG First 2018 plots Agenda slides
31 May 2019 LL EXO WG Pre-approval Agenda slides
7 June 2019 LL EXO WG Followup to pre-approval Agenda slides
19 July 2019 LL EXO WG 2018 ABC background estimates Agenda slides
30 Aug 2019 LL EXO WG 2018 background estimates Agenda slides
13 Sep 2019 LL EXO WG HEM 15/16 mitigation with MET phi veto Agenda slides
3 Dec 2019 EXO general meeting Approval Agenda slides

NOTE: Questions are in Red (Unanswered), or Green (Answered), or Purple (In Progress) while answers are in Blue .

## Convener comments from approval HN December 4

(1) Estimation of lepton backgrounds

The method takes the count of the number of events (N) in dedicated single lepton control regions and multiplies N with certain probabilities for the lepton to fake a disappearing track. The factor "N" drives the overall normalization of the background; N is drawn from events that : pass single lepton triggers + contain a lepton that passes certain selections. These requirements select events with leptons of a certain purity. However, leptons that fail these criteria (and hence do not get counted in N) could still fake a disappearing track. For example, leptons from b decays.

Is the factor N underestimated? Do the systematics cover this potential bias? For instance,

- What is the percentage of probes used for the estimation of Pveto which pass the single lepton triggers used for the normalization?

We have looked at this for the muon Pveto where we find 94%of probes are matched to the IsoMu primitive. We have looked at this for the electron Pveto where we find 84% of probes are matched to the electron HLT primitive. This does imply a slight underestimation of the factor N (however within existing lepton background systematics which are 10-25%).

- For the events that do not pass the trigger could the Pveto be higher than the value computed for leptons with no trigger requirement. This would further accentuate a potential bias.

If it is, it is already taken into account with our method. We made no trigger requirement on the probes so Pveto was measured with a mixture of events that would and would not pass the trigger requirement thus accounting for any potential bias.

- Please explain/point to the closure tests that were performed to test this method

Closure tests were performed by running over all the MC samples listed in AN Tables 6,7 to obtain the normalization N referred to in your question, and the "observation" to test closure. Pveto is necessarily determined from DY MC. The selections applied on these are described in AN section 6.1.6. The results of the closure test is presented in AN Table 32. These are statistically dominated by the large statistics ttbar MC samples (225 fb-1), which naturally includes leptons from b decays, so the results in Table 32 include any potential effect from such leptons. We have checked the parentage of the leptons and no contribution from leptons from b's is observed.

(2) Estimation of fake track backgrounds

The method depends on a fake rate that is computed without taking into account any possible correlation with the isotrack trigger. The probability of a spurious track to fake a disappearing track in events passing the isotrack trigger may be different compared to this fake rate.

It turns out that it is not completely true that the fake rate is computed without taking into account any correlation with the isotrack trigger. The selection applied to the sample from which Pfake is computed does not explicitly include (or exclude) events that may have fired the IsoTrk50 trigger and we find that this does occur at some level. We find that 0.4% of the events in the data used to compute Pfake actually exclusively fire IsoTrk50.

What is the contribution of background events passing the isotrack trigger exclusively (i.e. events picked up only by the isotrack trigger) ?

We have answered this question for Pfake above (it is 0.4%). For the events in the data used to compute the normalization N, 0.3% exclusively fire IsoTrk50.

Show that :

(a) Either the fake rate for events passing the isotrack trigger is consistent with the fake rate for other MET triggers or

Given that pFake is actually measured with a mixture of events that have fired the IsoTrk50 trigger and those that do not we can compute separately pFake for each of these exclusive categories. For those that do not fire IsoTrk50, we have Pfake_noIsoTrk50 = 461/16107872 = 2.9E-4. For the events that do fire IsoTrk50 we find Pfake_isoTrk =19/64928 = 2.9E-4, which is coincidentally identical, but certainly consistent. (Note: these numbers are what is called Pfake_raw in the AN, they are produced without the dxy cut applied, which has an efficiency of ~0.1, to preserve statistics). If we do the same analysis in MC, we get similar agreement (1.1 E-5 for Pfake_noIsoTrk50 and 1.2 E-5 for P_IsoTrk50).

(b) The contribution of background events passing the isotrack trigger exclusively is small enough that differences in fake rate get covered by the systematic uncertainties.

There does not seem to be a difference in the fake rate, but even if it was substantially different than the 10E-4 number above (which is itself an overestimate) since it contributes at 10E-3 even an order of magnitude increase would not lead to an observable effect.

Some things that could be tried/checked :

- For isotrack and non-isotrack triggers, check the consistency of the distributions of the number of hits of tracks passing the selection and the lepton veto at some level of selection. The cuts onMET, jets, etc. could be relaxed in increases statistics if need be. If the relative number of fake tracks coming from isotrack trigger was larger, one would expect to see an enhancement at low number of hits.

See below for a plot comparing the nValidHits for tracks matched to an IsoTrk50 primitive, before and after the lepton veto is applied.

Abstract : Remove the AMSB abbreviation since it is not used elsewhere in the abstract

Removed.

Abstract : What does the "(2018 only)" in the fourth line from bottom refer to ? Aren't the higgsino limits 2017+2018 ?

This was true in a previous version, and mistakenly not removed when 2017 was added for higgsinos. This is now removed.

Lines 3-4 : Large Hadron Collider (LHC) experiments --> CERN LHC experiments

Changed.

Lines 47-49 : The track resolutions mentioned here are presumably for well reconstructed tracks (with a significant number of hits). Since the focus of this analysis is on tracks with few hits, it may be worthwhile clarifying that the quoted resolution values apply to well reconstructed tracks (unless you are certain they also hold for 4,5 hit tracks)

This section is taken from the PubComm recommendations, so probably it is best to keep this. The citation does discuss inefficiencies being mainly caused by hadrons undergoing nuclear interactions in the tracker material, so the reference is relevant to this point.

Lines 84-85 : drop "In the absence of a contrary observation"

Dropped.

Line 127 : Also add that there is an nhits requirement on the track at the trigger level.

Lines 149-150 : If a track passes through dead, inactive region of a tracker layer, then does it still get counted as a missing hit ?

In general, a dead/inactive tracker channel is not counted as a missing hit -- this was incorrectly stated in the approval. What was correctly stated is that in the pixels, the explicit requirement of >=4 pixel hits was meant to combat a larger-than-expected number of dead/inactive channels. The paper text is clarified to these points.

Line 282 : The phrasing here is confusing. The requirement of 0.1 < |d0| < 0.5 is imposed on tracks in the Z control region, and so there should be no overlap with the signal region. The text makes it sound as though you are trying to avoid the inclusion of the signal region with this requirement. Please change the text to clarify this point.

Rephrased.

Line 308 : Fix 1 -- 11% (two dashes)

Fixed.

Figures : The signal model (wino, higgsino) should be mentioned in the plots. The chargino-to-neutralino branching ratio is mentioned for the wino limits but nor the higgsino ones.

The branching ratios are now mentioned in the captions, and the production modes and neutralino mixing are in the plot legends in both cases.

Supplementary material / HepData:

Provide a list of all the plots, tables, etc. that you wish to provide as supplementary material. Here are some suggestions :

Provide information (signal yields, acceptance, etc.) separately for final states involving one and two charginos. This would be helpful for theorists looking to recast these results.

Provide a cuts flow table

Provide reconstruction, selection efficiencies for signal tracks with 4,5,>=6 hits

Event displays are always helpful when presenting searches with unconventional searches such as this one. Please consider making displays of interesting/relevant events public

These suggestions all good and we plan to include them in HepData, along with the previous paper's information. The HepData content itself is all currently in progress, and presumably this can continue in parallel with CWR. For an enumerated list:

• signal yields and acceptance*efficiency (TH2s), separately for NLayers 4/5/6+ and 2017/2018AB/2018CD
• cross sections, separately for wino/higgsino cases and for chargino-chargino and chargino-neutralino production
• 4 signal cut flows; these will include the suggested selection efficiencies for signal tracks
• M = 700 GeV, ctau = 100 cm separately for wino/higgsino
• M = 300 GeV, ctau = 10 cm separately for wino/higgsino
• upper limits (observed, expected)
• versus mass: table versions of Figure 1 in the paper (4 tables for wino)
• versus mass: table of the higgsino case version of Figure 1 (4 tables for 4 lifetimes)
• TH2s of Figures 2 and 3
• SLHA and geant particle files for signal generation, as well as Pythia8 generator block
• the contents of Table 24 in the analysis note, which is the raw event counts going into paper equation 1 resulting in paper table 1
• at least two event displays: one for a simple jet+MET+track event, and one that is more interesting e.g. containing a lepton.
• tables detailing each of the object definitions, both signal and control region selections
• tables detailing the breakdown of the background estimation, e.g. AN tables 29-31 and 35

## ARC review

### Followup comments from Kevin Stenson HN November 20

1. Regarding not needing to veto tracks in the HEM region, can you help me understand. I'm not sure what you mean by "The only real hadronic background is charged pions (from taus) to pi0's". Charged pions do not decay to pi0s. It is true that many tau decays include pi0s as well as charged pions. So for events in which the tau decay includes pi0s in addition to a charged pion, I can see why there would be a large ECAL contribution. I see the plot showing that WJetsToLNu events have most of their energy in the ECAL. Can you say exactly how that plot is made. Are you sure that the energy in Delta R < 0.5 is not excluding the energy associated with the charged particle? Looking in the PDG, I see that the tau branching fraction to a single charged hadron with no pi0 is 11.5%. This is a big fraction. Any idea why this wouldn't give a large contribution with small ECAL energy? Can you say approximately what luminosity the figure in the twiki corresponds to?

Charged pions can decay to pi zeros (rarely), but this is not what we meant. We were referring to quark exchange with matter (charged pion + nucleon -> pi0 + nucleon), which is the main way the disappearing track signature arises from a charged hadron. The decay of taus from W’s is the dominant source of isolated charged pions that pass our selection, which is why we made the plot with them.

2. You indicate that the AN makes it clear which items are combined in the 2018 and which are split into pre/post HEM problem. But I still have questions: - Tables 23 and 24 show Pveto information for combined 2018 but Tables 28-30 show Pveto split in 2018 AB and CD. - Figures 27-29 seem to show Poffline results integrated in 2018 but Tables 28-30 show Poffline values that differ between 2018 AB and CD. - Figures 30-32 show results for obtaining Ptrigger that are combined for 2018. Tables 28-30 show different Ptrigger results for 2018 AB and CD. Maybe the trigger efficiency curves are the same but the single lepton sample it is applied to is different. Or maybe the trigger efficiency curves are different and the single lepton sample it is applied to is the same. Or maybe the trigger efficiency curves and the single lepton samples are different. I don't think it is possible for me to figure it out from the AN. - For the calculation of Pfake in 2018, do you use the same transfer factor for AB and CD and simply use a different P_fake^raw?

Tables 23/24 were intended to be split between 2018 AB and CD to show the effect of the HEM veto but this was missed in the last edit, they are now split between the run periods in AN v10.

Figures 27-32 show results pertaining to conditional probabilities evaluated without the HEM veto (it is applied as the last conditional probability) and thus there is no reason to split these between 2018 AB and CD and so they are plotted integrated over 2018.

As to the question on Pfake, the answer is no. In 2018 Pfake is treated completely separately and independently between AB and CD, including recalculating the transfer factor separately between them.

3. Regarding what are now Tables 1 and 3, thank you for reducing the number of significant figures. I think that even more can be removed. The guidelines indeed state that at most 2 sig figs for uncertainties. This means that 1 sig fig is also acceptable. In Table 3, for example, we have a handful of events and providing an estimate of the background (and uncertainty) to a hundredth of an event is just silly. I would round every number in Table 3 to the nearest tenth of an event. In Table 1, I would also suggest using just one sig fig for the uncertainty.

Table 1 also benefits from regularizing the style (ie adding \times 10^{-2} in a few places), which we propose to use -- see paper v5. Table 3 is made to the nearest tenth, except the 6+ layer fake tracks which have values at the 100th places, so one sig. fig. is used. Also note "Fake Tracks" in this table is fixed to "Spurious Tracks". Further improvement can/will come through the CWR process

4. L48-49: As I mentioned, the z positions of the forward disks is wrong. Clearly the reference is incorrect. It seems the reference you have used is a conference proceeding. I also don't think this is appropriate. Did someone suggest using it? I don't think for simple factual information you need to have a reference (as long as the facts are correct). I have attached a figure from the Pixel Phase 1 TDR (CERN-LHCC-2012-016) showing the planned pixel layout to indicate the correct scale. I don't know what the correct numbers are. I just know that what you have written is terribly wrong.

We have reverted to the official tracker description recommended by pubcom, we can ask them for an official recommendation (and reference) to describe the Phase 1 tracker and implement that with other changes that can/will come through the CWR process

### Comments from Juan Alcaraz HN November 15

Abstract: 1) our luminosity precision does not go below 2%, so we should write 101 /fb, and not 101.2 /fb; 2) it will be a bit puzzling for the reader to see that the reach is poorer with full statistics than using 2017+2018 (876 GeV vs 880 GeV). Probably we should not quote the 20107+2018 number in the qabstract, but just the combination.

The lumi is changed throughout. It is decided that the Run 2 combination is the result, so we will not include 2017+2018 as you suggest.

L24: we are not going to privide limits for gluinos, so I am wondering whther we should quote the limits from ATLAS in this channel. The paper is going to fully focus on charginos.

Removed.

L70: ... 0.2-10000 cm "proper decay length" ?

L80: PDF4LHC reference 28 refers to the new generation of PDF sets, so I wondering whether it is the appropriate reference for our case, since we are still using old PDF sets

This citation is believed to be the recommended one, as well the cross sections are taken from the LHC SUSY cross section working group.

L82: we should talk about "limitations", rather than "mis-modeling"

This is now simply stated as a "correction".

L84-85: it is not the "chi1 production" which is similar to Z production, but the production of the "chi1 chi1" system (which comes dominantly from a virtual Z too).

Fixed.

L88-89: we can not know a priori that the guilty is the modeling of ISR in PYTHIA, because we are correcting to data; it is an assumption

This mention is removed.

L93-94: I do not think that we reweight to match the vertex multiplicity. We use to reweight according to the instantaneous luminosity spectrum and the estimated inelastic cross section for minimum bias events. The latter required some tuning/studies that also include comparisons with the number of vertices, right, but it is not the only factor in the game: one can not translate naively tracker inefficiencies into true physical particle multiplicities in the event.

Fixed.

L115: use the referece to the Particle Flow paper itself for the ptMiss definition ?

PubComm recommends here the MET performance paper; we have fixed the reference to that here.

L163-164: The definition of the primary vertex is more complicated: it also includes MET as addition pT in the sum. There is a standard sentence for papers, which you should probably use. IT is not just PF particles.

The language is updated to PubComm recommendations.

L168: there are no decays of "virtual photons" -> "from standard model processes involving the exchange of neutral electroweak bosons" ?

This would leave out the W; instead this is now "...W or Z bosons, or from virtual photons" for a more clear sentence.

L169: not sure that the lack of lepton identification is typically due to a "failure" of the PF algorithm -> identification inefficiency ?

Reworded for other comments, "failure" is removed.

L176: there a no gaps of coverage in the muon from a global point of view in CMS; we can miss 1-2 chambers and the identification algorithms can decide that the muon does not pass the (tight) cuts; here we shoud talk about regions with identification efficiency <100% ( actually the efficiency is still large in these regions).

"gap" --> "incomplete detector coverage"

L180: say "both" in the sentence (it is not the OR or eta and phi regions, but the AND).

Reworded for clarity.

L183: "remove" (drop the "s")

Fixed.

L247-248: one has to explain - at least minimally - how the single-lepton control region is defined. Otherwise the whole thing is unclear, because there is also no MC to judge what is going on...

A brief definition is added here.

L268-274 and Equation : after having read this logic several times, I am wondering whther an initial and simple explanation on why we do things this way would be useful. There are two types of efficiencies that we have to estimate on data: 1) those related with the identification of the disappearing track itself (Pveto); 2) those related with the event as a whole (Poffline*Ptrigger*PHEM). 1) can be determine using tag-and-probe methods; 2) required the definition of a looser event sample (single-lepton). An explanation along these lines at the beginning of the charged leptn subsection would give a nice impression to the reader, and will avoid any feeling of the type "this is too complicated". Also I think that the way we do things we could combine Poffline*Ptrigger*PHEM into just one factor, even if the lack of statistics for some lepton samples and nlayers would probably complicate the explanation in any case.

These probabilities need to be conditioned on the ones before them, otherwise the background is not estimated correctly. For example the offline and online/trigger MET requirements will be highly correlated, with P(trigger | all lepton events) being much lower than P(trigger | lepton events passing offline MET cuts). That is why this is done this way, and the current language tries to be specific on this. We do agree however that specifying many different factors is distracting/confusing, and do not provide P(offline), P(trigger), or P(HEM) in the paper.

L275-277: Should not this paragraph be moved before in the section? The clouser test is an important "a priori" test before proceeding with this logic...

This paragraph was removed as suggested by Kevin elsewhere, as no other information about it is given in the paper.

L279-280: Not sure this statement is true, at least based on first principles. The Kalman Filter procedure is sequential/directional in terms of layers, so if you have a wrong assignment in one layer the risk of losing the following hist in subsequent layers is larger. I.e. it is not expected to really be "random" pattern of lost hits... This kind of tendency of losses is the reason why we have several iterations in tracking, including outside-in to recover potential in-out losses.

It would be more fair to say "the combination of tracker layers with hits assigned to them are largely random", which is now the wording. It was not the intention to say that tracker hits overall are random assigned to a fake track, but instead that once you have a fake track then which layers had assigned hits is random. As you point out once you miss one hit this affects the next one, however in that example the occurrence of the first missing hit is random.

L294-297: the reader will be asking why the Gaussian is fit starting at 0.1cm but the sideband only starts at 0.05 cm. One would expect the same numbers. I know that there is some systematics taking into account possible variations in the sideband definition, so there is no practical problem with the choice, but I am afraid that the question will be probably raised and I do not see any credible justification for having two different choices.

Added to be "to avoid including as much of the search region as possible while still successfully performing the fit". In the systematics section, the test of the 'second assumption' shows that we accurately predict the signal region peak regardless of staying farther away from it.

L372-373: " This issue was not existing in the 2018 data-taking period" ?

"was resolved"

Caption of Table 3: drop the comment on "gluino masses in the case of strong production"

Dropped.

L397-398: Is a recasting of the 2015-2016 analyses for the Higgsino case really excluded as an option ?

The time and resources needed for 2015-6 sample generation aren't available. As for recasting, we have not considered this.

L399: Are all systematic uncertainties really uncorrelated between years? There should be something remnant which is correlated, for instance some theoretical uncertainties (ISR) ?

", with the exception of signal cross section uncertainties" added. The ISR corrections are calculated independently for each period so are treated as uncorrelated.

Figure 1, legend of plots: should not it be better to quote "c\tau_\chi = ... cm" instead of "\tau_\chi..." ? Prpoer decay lengths are more round numbers than the time in ns, and "c" looks better on the left of the equal sign.

Changed.

Figure 1, caption: we should at least write that the details of the AMSB model under consideration are given in the text.

Figure 3: to be done (obviously)

In progress.

### Comments from Kevin Stenson HN November 5

Physics questions:

1. Regarding the HEM issue, should you not reject disappearing tracks that point to the HEM region? That is, shouldn't the HEM region be part of the fiducial region that is excluded (for the appropriate run period of course)? I don't see this mentioned anyway. But it would seem that the Ecalo<10 GeV cut would not be very effective if the track is pointing to a dead calorimeter.

The only real hadronic background is charged pions (from taus) to pi0's, for which the ECAL contribution to ECalo is much more important than the HCAL. This is seen in WJets (HT > 100 GeV) MC for example below, where disappearing track candidates have >90% of their ECalo from the ECAL in all selected events. Since no track fiducial requirement is deemed necessary for this issue, it was not mentioned anywhere.

WJetsToLNu (HT > 100 GeV):

2. Regarding the lepton background estimates and HEM, I have some questions. In the paper, it seems that you calculate Pveto, Poffline, Ptrigger, and PHEM separately for the two 2018 data sets. Is that true? Is everything in done in parallel? For example, the trigger efficiency vs ptmiss is calculated separately for the two 2018 data sets? If so, then the AN needs to be updated as most of the lepton background seems to combine all of 2018 together. I just want to know what was done. I would also like to know the rationale behind it. I could easily imagine taking all of the 2018 data together to estimate the lepton background and simply applying a scale factor to account for the data lost by the HEM veto. Since the lepton background is from single-lepton control regions it should not be affected by the HEM issue. One way to figure this out would be to measure the lepton background using events from all of 2018 but with the ones affected by the HEM veto weighted by a factor of (2pi-1)/2pi = 0.84 to account for the loss of events from the HEM veto. That way you don't suffer as much from small statistics in your estimations.

The rationale behind splitting the 2018 dataset was that the selection treats the two periods differently, so we feel they should be reported separately. Only the background predictions are done separately between 2018 AB and CD, and the AN presents exactly what was done in all cases. In cases throughout the AN that show plots for 2018 inclusively, the HEM 15/16 failure will not be relevant and it was neither necessary nor feasible to re-produce everything.

3. Why will we not be using the old 2015-2016 data for the higgsino exclusions? Is it because of the lack of MC?

It is because of the lack of MC, due to the lack of time and disk space to produce them.

Paper comments (some general and main issues):

Several places: Data does not have units of fb^-1. So you can't say things like "XX fb^-1 of data". You need the more convoluted term "data corresponding to an integrated luminosity of XX fb^-1".

The wording has been updated in several places throughout the paper.

Several places: Please follow what the pub guidelines say about Run 1 and Run 2 (note they use Arabic, not Roman numerals): Do not use the term "Run 1" or "Run 2" in the title or abstract of a CMS physics paper because it is LHC jargon whose meaning will be lost to subsequent generations and which has no significance outside our field. It is OK to introduce the terms in the body of a paper, so long as it is clearly defined which data sets are implied.

The abstract now refers to the "full 13 TeV data set", and "Run 2" is expressly defined in the introduction. "II" is changed to "2".

Several places: Do not refer to an inner tracker. There is no outer tracker so there cannot be an inner tracker.

Removed.

Several places: Remove hyphens after "mis" and remove hyphens in "inner-most" and "outer-most".

Fixed.

Several places: I prefer "first, second, third" to "firstly, secondly, thirdly". I looked it up and both are correct so if you prefer the endings with "ly", I'm not going to object any more. Just letting you know my preference.

Noted, but the authors prefer the adverb.

Title: Flesh it out. At a minimum, include "proton-proton collisions" and "sqrt(s)=13 TeV". Also, people don't like the term "new physics". You can use "physics beyond the standard model" or simply say that you search for disappearing tracks.

Changed to "Search for disappearing tracks in proton-proton collisions at $\sqrt{s} = 13\TeV$". The previous paper's title inlcuded "as a signature of new long-lived particles", and this title should remain different.

L2-7: You haven't introduced the CMS detector yet (that is Section 2), so I'm not sure it is a good idea to be discussing the tracker, muon detectors, and calorimeters here. You could shorten the first paragraph to: "Many beyond...on the order of 1 m. If the decay products of this track are undetected, it can produce a "disappearing track" signature." Once that is done, you may want to combine the first and second paragraphs.

Done, however it's important for the introduction to introduce how the signature is defined. Added at the end also "This signature is identified as an isolated track leaving no hits in tracker detectors and having little associated energy in calorimeters after the point of disappearance."

Introduction and Section 4: In L156-166 you start to discuss spurious tracks as background. In L167-174 you start talking about leptons as background. However, the signal signature has not been well defined yet so this seems to come a bit early. For example, in L159-161 you say spurious tracks don't have outer hits or energy deposits so could mimic disappearing tracks. But this is not really defined until L193-197. Also, in L172 there is mention of small calorimeter energy. The only information we have before Section 4 is L4-7 and L13-15. I don't think this is enough. I would suggest that the introduction be more fleshed out to give an overview of the disappearing track analysis. Maybe after the L18-19 sentence you can provide this information. Namely: a disappearing track is defined as one that does not reach the end of the silicon tracker, does not deposit energy in the calorimeters, and is isolated from other tracks in the event. You could also talk here about the main backgrounds if you wanted. That is, since the isolation requirement removes events in jets, the remaining isolated tracks are generally from leptons or fakes, both of which require some breakdown in the reconstruction algorithm. Lastly, one might consider saying that we use production in association with an ISR jet.

See the previous comment which introduces these concepts in L6 now, and after that again is added "As the standard model does not produce this signature, its non-BSM backgrounds require some breakdown in particle or track reconstruction algorithms and are very rare."

Introduction L18-34: I think you need to define wino-like and higgsino-like. Actually, in L9-10, you may need to define chargino and neutralino. You could give more of a primer to SUSY around L9. Some information that is relevant is: New particles (sparticles) are partners to SM particles; additional SM and SUSY Higgs particles (charged and neutral) are present; sparticles with the same quantum numbers can mix so the W and charged Higgs partners form a generic chargino spectrum where wino-like indicates properties similar to a W and higgsino indicates properties similar to a charged Higgs; production can be direct (through electroweak process) or through gluino production (produced in the strong interaction). Could also motivate by mentioning the LSP could be the source of dark matter.

Being a signature-driven letter it's not so vital that the charginos/neutralinos are mass eigenstates of gauge boson superpartners. But to better introduce what wino/higgsino-like means, lines 24-27 have "purely wino-like" --> "purely wino-like mixing of the LSP with gauge boson superpartner eigenstates" and similar changes.

L65-69: Does any of this matter? We use our own chargino lifetime, we vary the mass of the chargino from 100 to 1100 in 100 GeV steps, the neutralino is <=0.2 GeV lighter, and the other masses are much heavier so they decouple. I think this is the only information that matters. I don't see what ISAJET does that is relevant. I also don't see why we need to say what tan(beta) and sign(mu) are.

This information is still important to any reinterpretations as without it, it's not clear what the upper limits represent. Most important from ISAJET would be the chargino-neutralino mass splitting for the "correct" AMSB lifetime, which true we do vary freely but is a certain question to recieve.

L81-83: First, the Pythia is not "mismodeling" the ISR, in the sense that it gets something wrong. It is simply incomplete being a LO generator. Second, since we are not spelling out the two-stage process we use to correct Pythia, I think the current text is a bit confusing. I would suggest something like "Scale factors are applied to the Pythia simulation as a function of the \pt of the electroweakino pair to match the data. In the absence..." You would also need to remove the text in L88-89 of "with most...in Pythia". Actually, looking again, I think that L81-89 could be written a little more clearly. You could remind the reader why ISR is important in the analysis. Then say that the ISR determines the pT spectrum of the recoiling system. Then say that we measure this in Z events. Then say that we assume the electroweakino pair production is the same so we reweight electroweakino pair pT to match the data Z pt. You don't need to mention scale factors. Scale factors sound strange to me as I tend to think of them as single (global) numbers. Finally, I think you need to rephrase what +80-100% means. Maybe just say that the reweighting value varies from 0.2--3.0 (or whatever it is).

At this point the trigger strategy hasn't been introduced (it's even defined here), so it's difficult to motivate why the ISR is important without larger reorganization. Instead this is now called a correction, and for the +80-100% it is called an increase in signal yields.

Section 4: The paragraph L96-107 describes how photons, electrons, muons, charged hadrons, and neutral hadrons are obtained. The paragraph L108-112 describes how jets are obtained and the next paragraph defines ptmiss. I think you need to have a sentence or two about tau_h reconstruction. Here is a sentence from TOP-18-005: "Hadronic \tau lepton decays are reconstructed with the hadron-plus-strips (HPS) algorithm [https://inspirehep.net/record/1693412], which starts from the reconstructed jets." Maybe you can add this at L112.

Single-lepton control regions are used in several places. But there is not a good definition of these. In L224-226 there is mention of some events collected with a single-electron or single-muon trigger and then the additional requirements on one lepton. In L232 it is written that the tau sample uses the same electron and muon selections but it is not clear that the sample is the same. Then, in L248 (and through L267), there is reference to "single-lepton control regions" but it is not clear what these are (or how they related to what was used in L224-226). I would suggest that you define the single lepton control regions explicitly somewhere. You could add a paragraph in Section 5 before Section 5.1 as they are used by both Section 5.1 and Section 5.2.

This is briefly defined inline at L248 now, where it is first really used instead of just the beginning of the tag-and-probe selection. At L232 this is the tag-and-probe selection so we don't want to give the impression that it's just a single-lepton selection, so we didn't define it there.

L245-261: You may want to consider defining the modified version of ptmissnomu. In fact, I think you just need to change the mu to \ell in the ptmissnomu symbol and define it as such. This would be used in L253, L259, and L261. On a related note, should L262-267 be using the modified version as well? It seems like it should and if so, this should be mentioned. Also L313.

The typesetting of that symbol was distracting so we avoided using it. In a comment below you've suggested removed "after modifying ptmissnomu in this way", but that phrase was used so that the symbol wasn't necessary. Also a "no \ell" superscript would suggest a version of ptmiss where leptons globally are excluded, which isn't true.

Tables 2 and 4: I would suggest using one less significant figure for the values and uncertainties.

PubComm guidelines suggest at most 2 with the decimal places matching for uncertainties, and the tables are made to match that.

L164-166 and L284-299: In lines 164-166 you define d0 and z0 as the transverse and longitudinal impact parameters. First, if z0 is not used elsewhere, it should not be defined. Second, it is not entirely common for an impact parameter to be signed and so having the absolute value is a little odd. It may be better to not define d0 at this point. Then, in L284-299 you can explicitly define d0 as the signed impact parameter and remind the reader that the signal requirement is |d0|<0.02 cm. I also think that L284-299 can be explained better. Here are some issues that could be improved. 1. In L289 you mention a "sideband" selection that is the inverse of the signal requirement. This would mean d0>0.02cm. But that isn't really what is done. I guess sideband is eventually defined in 296 but that is a bit late. I suggest not mentioning the sideband region until you are ready to define it. 2. You say that spurious tracks are Gaussian+constant for data and simulation. But you only know a track is spurious for simulation. So you would need to say that the simulation shows that spurious tracks follow a Gaussian+constant distribution and the data are consistent with that. 3. You could explain better why the d0<0.1 cm is not included in the fit. Presumably because of the potential for real tracks to be present. 4. Other than the d0 requirement, you don't say how the tracks are selected. Do they have all of the disappearing track requirements (including missing outer hits and Ecalo<10 GeV)?

0) Removed. 1) Defined more clearly. 2-3) This section has been rewritten due to multiple comments. 4) The sideband d0 requirement is the only change from the signal criteria; so missing outer hits >= 3 and Ecalo < 10 are required.

L354-358: I think you may need to expand a bit on how these uncertainties are calculated.

A brief description has been added.

Abstract: "produce the" to "produce a"

Okay.

L42-44: Note that the 1440 pixel modules is not correct for the Phase 1 detector. Suggest combining the first two sentences and removing any reference to the number of modules.

Fixed to the PubDetector twiki recommendations.

L45: The z positions must be wrong. Please check. It could be 32, 39, and 48 cm. I'm not sure.

The positions are as given in the reference cited.

L61: Suggest changing "during" to "from" (twice)

Changed.

L64: I don't recall seeing "NNPDF3.0LO" before. Is that standard notation? Or should it be "LO NNPDF3.0"?

Missing space between "NNPDF3.0 LO" added.

L65: Make sure you have defined the term "sparticle" if you continue to use it here.

Now "supersymmetric particle" in both instances.

L68: Change "positive, $\mu > 0$." to "positive $(\mu>0)$."

Changed.

L68: Change hyphen to minus sign (put everything inside )

The hypen was intentional as this is the "chargino-neutralino" mass difference.

L69-71: Suggest rewriting as "While the mass difference typically determines the chargino lifetime, we simply vary the chargino lifetime from 6.67 ps to 333 ns (0.2--10\,000 cm) in logarithmic steps to encompass a broad range of models."

"Freely" now "simply".

L72: Comma after "LSP case"

L72-76: Suggest swapping the order of the last two sentences in this paragraph (and rewriting once that is done).

Done.

L80-81: I'm not sure I understand this. For the wino-like case, with chi^0_1 chi^+_1 and chi^+_1 chi^-_1, I think the ratio should be 2:1 as you can have 0+ and +0 (total of two states) compared to +- (total of one state). But for the higgsino-like case, I would assume it would be 4:1 as you can have 1+ +1 2+ +2 (total of four states) comapred to +- (total of one state). Am I missing something? I think in order to make everything clear, you may need to provide the information separately for the higgsino and wino scenarios.

This was not updated for the higgsino case, and your assumption is correct. Separate mention is made for both scenarios now. For higgsinos it is rather 7:2.

L108: Suggest changing "these reconstructed particles using the infrared and collinear safe" to "the PF candidates using the". We don't generally make a big deal any more about being infrared and collinear safe, especially in your case where jets are not very important.

This language was taken from the PubComm detector twiki.

L115: Missing reference

Fixed.

L115: Add either "vector" or "value" or "result" after "The \ptmissvec"

L116 and elsewhere: I note that the vertical position of "miss" in \ptmiss is much lower than the position of "miss, no mu". Try to see if you can get these to match better. I like the look of \ptmiss better.

Changed the metNoMu command to the form of \ptmiss.

L120-122: Suggest rewriting sentence to make it more clear. Perhaps "The L1 triggers require \ptmiss above a threshold that varied during the data taking period and with the instantaneous luminosity. The HLT selects events based on both \ptmiss and \ptmissnomu."

Rewritten.

L125: Could remove "to be"

Removed.

L128: Suggest removing QCD as it has not been defined. Also, suggest some additional explanation for what multijet background is and how it can create ptmiss. Perhaps "A large background exists from multijet events in which a jet energy may be undermeasured, leading to an apparent \ptmiss for the event. To reject these events, the different in..."

Suggestion taken.

L129: Can't use "leading" without definition. Suggest changing to "highest-\pt"

Changed.

L131-138: Suggest simplifying a bit: "In 2018, a $40^\circ$ section of one end of the hadronic endcap calorimeter (HEM) was unpowered. The 2018 data are separated into two samples with 2018A (2018B) containing events without (with) this problem, corresponding to an integrated luminosity of 21.1 (38.6) \fbinv. Events from 2018B period are rejected if the \ptmissvec vector points to the dead region; that is, if ${-}1.6<\phi(\ptmissnomu)<{-}0.6$. This requirement, referred to as the "HEM veto", rejects 36\% of the 2018B events, and leads to a signal efficiency reduction of 16\% for this data period, as expected from geometry and verified in simulation." I think these numbers are more relevant (but should be checked by you). If you think your numbers are more appropriate, please explain. Also, since we are free to define our abbreviations, I think "HEM veto" is fine. There is no need for "HEM 15/16 veto". People will not know what to make of the "15/16".

The 2018 B rejection is 31%, but otherwise the suggestion is taken and applied. Ideally "HEM" isn't needed at all but P(HEM veto) is used, so "HEM veto" is now used.

L138-140: Suggest rewriting as "The selection requirements applied to this point define the "basic selection", with the resulting sample expected to be dominated by W->lnu events."

Rewritten.

L142: Change "They" to "The selected tracks"

Changed.

L144: Change "Selected tracks" to "The tracks"

Changed.

L145: I think you want Delta R (track,jet), not Delta R (track,jets) here

Yes; also added "separated from _all jets".

L147-156: Suggest rewriting so that you first go through the definition of the three variables and then state the cuts. It may also be useful to say that the track reconstruction allows for missing hits to improve the reconstruction efficiency and as long as the entire tracker is used, this produces a low fake rate. But since you will be requiring missing hits, this would generally allow for a much higher fake rate. So you need tighter cuts on missing inner and middle hits to compensate.

This section has been rewritten.

L163-164: I don't think this description of the primary vertex is entirely accurate. In addition to the PF particles, the ptmiss associated with the vertex is counted. You can see the description here: https://twiki.cern.ch/twiki/bin/viewauth/CMS/Internal/PubDetector

Rewritten to be more in line with recommendations.

L168-9: Not sure what you mean by this sentence. For a lepton to disappear, I think there has to be a failure of the track reconstruction (so that there are missing outer hits). Suggest changing to "failure in the track reconstruction to find all hits on the track." or something like that.

Changed.

L172-3: I think the source of small calorimeter deposits is different for electrons, muons, and hadrons. For electrons and hadrons, nonfunctional or noisy calorimeter channels is needed. For muons, since they usually don't shower, they will produce little energy. Need to be clear here.

Specified for electrons or taus.

L173: Can remove "source of"

Removed.

L179: Remove "In the data,"

Rewritten.

L179-182: I can't tell if the eta region is for 2017 or 2018 or both. Please rewrite.

Rewritten.

L183: Not clear if "These vetoes" are just the pixel vetoes or the lepton and fiducial vetoes as well. Perhaps "The lepton and fiducial vetoes remove approximately..."

Now "These last vetoes" since the 20% is only relevant for the unpowered layers vetoes.

L192: Suggest changing "approximately" to "an additional"

Changed.

L196-7: Suggest dropping this part of the sentence. For a pT>55 GeV track, the track parameters are not going to change much so there is nothing really gained by specifying this.

Dropped.

L202-209: I think this section can be rewritten to be much clearer. Also, it can be combined with L198-201 to give a better idea of the tracker geometry and what sorts of tracks are present. Also, note that your track lengths in L201 are for eta=0. So this should be specified. The way I would write 198-201 is the following: 1) Describe the tracker geometry. Could say that in the barrel region (|eta|<1), there are four pixel layers followed by 10 strip layers. 2) Give the three signal categories and use the tracker to define the tracks. That is, nlayers=4 corresponds to a track just passing through the pixel detector, which has a radius of approximately 20 cm. The other categories correspond to tracks that pass through just the first tracker layer at about 25 cm or at least the second tracker layer at about 30 cm. 3) Point out that because of overlaps and the fact that the inner two strip tracker layers each have two sensor planes glued together, there can be more than one hit per layer. 4) Mention that the previous version of the analysis required nhits>=7. I would not say that the new nlayers>=6 is very similar to the nhits>=7. I don't think it is necessary and I don't know what you mean by it. Do you mean the track length is similar? Do you mean the signal-to-background is similar? Finally, I don't know what the last phrase (the nlayers=4 category ensures that central tracks traverse the entire pixel detector) is for. This was a paragraph about a comparison to the previous analysis but this phrase doesn't seem to have anything to do with that. If you take my suggestions for how to rewrite these two paragraphs, I'm not sure it is needed. Or maybe I am missing something.

For now simply the last paragraph is mostly dropped, and the stipulation of eta=0 for the listed lengths is made. NHits is never used so it's not vital for the reader to understand these differences to NLayers. The point of this paragraph is condensed now in "The previous CMS search for disappearing tracks~\cite{Sirunyan:2018ldc} required at least seven hits associated to selected tracks, the sensitivity of which to new particle lifetimes being comparable to that of the $\nlayers \geq 6$ category in this search."

L215: Don't use the term series as the ordering that you do is not the natural ordering. Could say "Several conditions must be met in this scenario".

Changed.

L215-216: "track must still be left in the tracker" doesn't make sense. It sounds like a kid left behind in the shopping mall. How about "a track must be reconstructed". I think since the track is reconstructed it is more correct to say "with no candidate lepton identified with the track" instead of "but no candidate lepton is reconstructed". Finally, I think it is better to simply write "E_calo^DeltaR<0.5 of less than 10 GeV is found" Again, the verb left is weird. Also, it doesn't seem necessary to specify that this is in the calorimeters since the variable has "calo" in the name and has already been defined. If you didn't use the variable, then you could mention the calorimeter again (but you would have to write out the cone with Delta R < 0.5).

Fixed the wording.

L219-220: Suggest changing "must survive the veto on phi(\ptmissnomu)" to "must pass the HEM veto".

Done.

L220: Suggest changing "this scenario" to "charged leptons"

Okay.

L221: Suggest changing "each of these" to "each of these requirements in the order given"

Changed.

L230-232: This could be written better to make it more explicit what you are looking for. For example: "To study the tau_h, we use Z->\tau_h \tau events in which the \tau decays via \tau\to e\nu\nu or \tau\to \mu\nu\nu and the electron or muon serves as the tag lepton."

Rewritten.

L235: Suggest "In addition" instead of "Secondly" as there is no "First"

Okay.

Done.

L239: Comma before "respectively"

L240-241: Suggest removing "that may contaminate the measurement of P_veto with spurious tracks". I don't think it is just spurious tracks that are the problem (in the sense of spurious tracks that you use elsewhere in the paper). It can be real tracks that are not muons, electrons, or hadrons from a tau.

Removed.

L248: Need to have a definition of what these single-lepton control regions are.

L253: I think you can remove "after modifying ptmissnomu in this way"

It needs to be clear that the cuts are applied to the modified observables, and the "^\text{modified}" variables were too clunky to include here only once.

L255-6: I think you should remove "explicitly". I don't know what an implicit identification would even mean.

Removed.

Table 2: Need parentheses in 2018A, nlayers=5, electron and 2018B, nlayers=4, electron.

L275-277: Suggest removing this sentence. Given that this was only done for 2017, only includes ttbar, and has lower statistics than the data, I don't think it is particularly persuasive. Also, one would need to define much more clearly what a closure test is. Also, if you choose to keep it, you need to change "\sigma" to "standard deviations"

Removed.

L281: Remove "then"

Done.

L304: Change "This" to "This calculation"

Changed.

L305-307: Suggest removing this sentence for the same reason as L275-277.

Removed.

L319: Remove "As mentioned above"

Removed.

L319: You cannot use the term "statistics" in this way. You can write "the available data ... do not provide enough events to measure..." See https://twiki.cern.ch/twiki/bin/viewauth/CMS/Internal/PubGuidelines#Word_usage_and_jargon

Fixed.

L317: Need to change "electron estimate" and "tau_h estimate" to something more clear. Like "estimate of the background contribution from electrons and tau_h"

Fixed.

L322: Need to clarify this. What is meant by "binned estimates"

Re-written.

L322-3: you seem to have "estimate...are..."

Fixed.

L323: Again, "statistics" cannot be used like this.

Fixed.

L325: I think you can say the range is 1--10% or 1--11%

Done.

L332: I think you can change "205%" to "200%" No need for 3 sig figs on an uncertainty. Also, such a large discrepancy may lead people to believe there is a serious disagreement between the Zee and Zmumu samples. You may want to state the level of disagreement in terms of standard deviations. I still don't understand why you didn't combine the Zee and Zmumu to get the central value but I guess it is too late now.

Done.

L335: It may not be clear what "signal-like tracks" means. Perhaps you can add "(d_0<0.02cm)" after "signal-like". Actually, I think this paragraph can probably be written with half as many words.

Done.

L339-344: I think this paragraph can also be reduced by a factor of 2 and perhaps combined with the previous paragraph.

Done.

L345: Suggest being consistent between either "flat" or "constant". I think "constant" is better.

Suggestion taken, using constant.

L350: "depending on the chargino"

Fixed.

L357: You can write ${<}0.3\%$ to get the spacing correct.

Done.

L359: Suggest changing "to account for potential mis-modeling of the track" to "in the track".

Suggestion taken.

L361-363: Suggest rewriting as "The lepton veto efficiency for signal events depends on the simulation of detector noise, which may produce muon detector or calorimeter hits that result in a lepton candidate and thereby reject the track."

Suggestion taken.

L363: Change "difference" to "differences"

Done.

L366: Change "A data-to-simulation difference of 0.02--0.1\% is found" to "Data-to-simulation differences of up to 0.1\% are found" or "Data-to-simulation differences of less than 0.1\% are found"

Done.

L368-373: Should be shortened and jargon (like trigger primitives) removed. Also, the entry in Table 3 should be clearer. Does this effect both the signal trigger and the single-lepton triggers? If so, wouldn't this cancel out?

Jargon removed and table improved. The effect was not included in simulation so it is added as a correction, and the uncertainty in the correction is what's included here. This has been made more clear in the text.

L374-379: I think this needs better explaining. One certainly needs to mention that there are triggers that don't require any tracks.

Specified that it is the track leg of the trigger requirement in question.

Table 3 caption: There is mention of gluinos and strong production. Should that be removed?

Removed.

Table 3 caption: What does the 3rd sentence mean?

Removed.

Table 3 caption: Change "between" to "over"

Done.

Table 3 and 4: Capitalize the first letter of each entry, like in Table 1.

Done.

L384: Suggest "The expected number of background events and the observed number of events are shown in Table 4 for each event category and each data taking period.

Suggestion taken.

L385: Make part of the previous paragraph.

Done.

Table 4: Change "AB" to "A" and "CD" to "B"

Done.

L401: Change "several" to "four"

Done.

L406-407: Maybe compare with the previous CMS result and/or the ATLAS result. Maybe mention that this result supersedes the previous CMS result.

Figure 1: Why are you only showing the 2017+2018 results here? Shouldn't this be the combined result? Will there be a figure similar to Figure 1 for the higgsino?

### ARC action items from September 10 meeting

Use a sample of high-pt taus in Z->ee and Z->mumu to compare tracking in data and MC.

This has been done for Z->ee and Z->mumu. The selection is the same as the fake estimate selections, but with the tau veto removed (see Table 22!). A second channel is defined also removing the missing inner and middle hits cuts, analagous to the "muon control region" of Section 8.2.2 and 8.2.3. Using these for the hits systematics instead, using DY MC, binning in nLayers with dashes when one value is used for all 3:

 systematic with muon ctrl region with Z(ee, mumu)+tau ctrl region inner hits - 0.02% - 4.2% 1.6% 0.09% middle hits - 5.2% - 5.0% 1.5% 3.9% outer hits (avg over mass/ctau) 3e-4% 2e-4% 0.30% 0.02% 0.007% 5.1%

The 0% values are due to no events failing the missing inner hits cuts in either data or DY MC. We would propose using the larger of the two methods (muons/taus) in each category and systematic as a conservative measure.

Make clear in the documentation that our treatment of the ISR assumes DY-like production and the systematic really just relates to MC statistics.

This consideration is explicitly mentioned in the AN.

Remake Figure 46 separated into different trigger epochs and with appropriate lumi-weighting.

Done. See AN v8 Section 8.2.4 for plots and details.

Extend ISR weights to higher pT(mumu).

Completed for 2018, 2017 in progress.

### ARC action items from July 25 meeting

Combine the nine dxy sideband regions in the fake estimate into one larger sideband.

Done. The fake estimates change slightly but are all within 0.5sigma (statistical) of the previous estimates. The AN is updated for this.

Compare the pileup distributions in ZtoMuMu, ZtoEE, and BasicSelection events. If there is a big difference, try reweighting and see how much it changes the estimate.

 nPV ratios to BasicSelection

See the above plots. Using the ratios as weights to the fake selection (ZtoMuMu/EE + DisTrk (no d0 cut)), the overall weights applied to P_fake^raw would be:

 ZtoMuMu ZtoEE nLayers = 4 0.994 +- 0.064 1.01 +- 0.34 nLayers = 5 1.013 +- 0.088 1.0 +- 1.4 nLayers >= 6 1.0 +- 0.21 1.02 +- 0.83

Despite the plots above these average weights are very consistent with one, i.e. the estimate does not depend on this.

If possible, find the justification for using dxy with respect to the origin for the track isolation pileup subtraction.

After closer examination of the code, we confirm dxy is not calculated with respect to the origin for the track isolation, i.e. the text in the previous version of the AN was incorrect. The updated AN contains the correct description of the track isolation calculation on lines 383--390.

Suggestion: compare the nominal Gaussian fit to a flat line for NLayers5, as there's a concern the bias towards the PV changes as nLayers increases.

 ZtoMuMu NLayers5 (gaussian fit) ZtoMuMu NLayers5 (pol0 fit)

With a flat line, the 5-layer fake estimate is 0.81 events compared to the nominal 1.00.

### On-going 2018 estimate updates (last updated July 9)

( - todo - doing - done ) Reset

• Produce skimmed ntuples with CRAB
• MET
• EGamma
• SingleMuon
• Tau
• Create fiducial maps * Muon * Electron (D in progress)
• Run channels without fiducial track selections
• basicSelection
• ZtoEE (D in progress)
• ZtoMuMu
• Trigger efficiencies with muons
• Background estimates and systematics (ABC complete) * Electron * Muon * Tau * Fake * ZtoMuMu * ZtoEE * Fetch RAW and re-reco lepton P(veto) passing events
• Signal corrections * Pileup * ISR weights * missing middle/outer hits (requires fiducial maps) * Trigger scale factors
• Signal Systematics
• Expected upper limits
• Unblind observation

## Questions from ARC

### Email questions from Kevin Stenson August 27

AN Table 22 question. I do not understand the answer. It seems like you are agreeing with me. Maybe I don't understand Table 22. Let's just take the muon case for now. You want to measure the probability that a muon passes the lepton veto. You basically just measure the fraction of probe tracks that pass the lepton veto. According to Table 22, you measure the fraction of probe tracks that pass the criteria minDeltaR_track,muon > 0.15 and missing outer hits > 2. This is P_veto. However, the actual veto that you apply in the signal region is the 5 requirements in Tables 20-21. So, why don't you measure P_veto for muons by measuring the fraction of probe tracks that pass the criteria of all 5 requirements in Tables 20-21.

It seems as if you might be correct that you don't understand Table 22, which we agree is probably misleading. We apply all of the signal selection criteria (including those for the leptons you were asking about) except for those criteria listed in Table 22, which are inverted to allow the calculation of the conditional probability for the specific lepton who's veto probability is being calculated. The AN has been clarified to reflect this.

Figure 19 and Tables 29-31 questions. If Figure 19 and Tables 29-31 both utilize all probe tracks then I would expect them to give the same result. So I'm not sure how that affects anything. In principle, the same sign subtraction could have an effect. Hopefully once the tables I asked for are made I will be able to understand if that is the reason.

Section 6.1.5 question. I'm not sure this makes sense. You say that you only use ttbar because that contributes the most. But the analysis includes a tag-and-probe assuming Z production. So, while Z->ll may not contribute to the overall background, it will certainly contribute a great deal to the measurement of P_veto. I think you should use all of the MC samples you have if you are trying to imitate the data.

A MC closure tests is not intended to mimic the data, but to provide confidence in a data driven background estimate that the method is sound. The method involves the calculation of probability of isolated lepton identification and ttbar was selected as the largest available sample of isolated leptons to demonstrate this method closes. We are running over additional samples, but are confident the method will close in these also because after the signal selection the tracks in other samples are very similar to the ones selected in ttbar.

Section 8.2.2 question. You write that the chargino behaves like a muon and so muons should be used as proxy for measuring the efficiency. It is true that the chargino does not undergo hadronic interactions so it is more like a muon than a hadron. However, the chargino of interest for this analysis does NOT get reconstructed as a muon in the muon system. So it is unlike a muon in that sense. The track reconstruction explicitly uses information from the muon detectors to find all possible muon tracks and to associated as many hits as possible with the track. This is possible for muons but NOT for charginos that decay before the muon system. Therefore, using muons may overestimate the hit efficiency. If you found the same result for hadrons, then I would not be concerned. Or, if you just used muon tracks that were not found by the special muon-finding steps, then I would be happy. I know you have already used this but for reference, this shows the overall tracking efficiency for muons with and without the muon-specific steps: https://cds.cern.ch/record/2666648/files/DP2019_004.pdf As you can see there is a significant reduction in efficiency when the muon-specific steps are removed and a much larger disagreement between data and MC. I have no idea how this translates (or not) to the hit finding efficiency.

The issue in hit requirement efficiencies being different between data and simulation is in the simulation's handling of non-functional channels. This difference will be the same for both simulated muons and charginos. These differences can only occur when a track has been reconstructed in the first place, an issue covered by the systematic we take from DP2019_004. So while the issues you mention do affect tracking, they do not affect the discrepancy in non-functional channels handled by the systematics in question.

Section 8.2.5 question. This answer misses the point of my comment. Your estimate of the systematic uncertainty is simply a measure of MC statistics. It does not address the question of the systematic uncertainty associated with the method itself.

As discussed in first ARC meeting, this is a very common procedure used by many analyses. For example SUSY diphoton+MET (SUS-17-011) uses the sum-pt of photon/electron pairs as a handle on the hadronic activity. The MET performance itself is measured (e.g. JME-17-001) using Z events; from their PAS "A well-measured Z/γ boson provides a unique event axis and a precise momentum scale.".

Finally, I see in the answer for the ARC meeting that the calculation of dxy for the track isolation may not have been done in the intended fashion. Can you specify what information you have on what was done. Is there any way to see the dxy distribution as calculated? Is there any way at this point to change to use dxy relative to the primary vertex (like dxy<0.02cm as is done for the signal tracks)? I see that this cut has a pretty big effect for the short lifetime case as shown in Figure 16.

See the above answer from the ARC action items.

### Comments from Giacomo Sguazzoni HN August 13

Comments are for the paper (v0 2019/05/16) with references to the AN (v7) when applicable.

L 15-16: "Decaying to a weakly-interacting, stable neutralino and 16 unreconstructed pion, a chargino decay often leaves a disappearing track in the AMSB model." --> it seems the disappearing track is left in the model!

Reworded slightly.

l 25: exclude --> *excludes*

Fixed.

l 31-32: the interpretation ... are --> the interpretation ... *is*

Fixed.

The CMS detector: As already noted, a more detailed description of the tracker is needed. Here the geometry of the tracker is not described at all, while this is important to clarify the concept of 'measurement layer' and their numbers. I'm wondering if a picture of the tracker layout would be appropriate in this paper.

A sentence has been added to specify the position of each layer of the Phase 1 upgrade. L 186-190 should clarify "nLayers". We likely have room for another figure, but we can discuss what would be best to include.

l 75: blank missing between chi^0_1 and 'mass'

Fixed.

l 87: these correction factors are huge (350%); with these correction factor involved, how can you trust the simulation? Moreover the description in the paper is misleading, I think. You give the impression that the 350% factor derives from the ISR mismodelling in the Z->mumu events. But, reading the AN section 7.4, I understand that the big factor derives from AN Fig. 38, i.e. Madgraph vs. Pythia for Z->mumu, that you need since your signal is simulated with Pythia. I think this point has to better explained in the paper, cause the reader could be surprised to learn that we are 350% off in Z->mumu simulation. See below also the discussion on the systematic uncertainty associated to this correction.

A brief comment has been added, saying that most of this value is due to the ISR modeling in Pythia.

l 112: I think PF as an acronym of particle flow has never been defined

Particle flow is first mentioned on L 95 and the "(PF)" is now defined there.

l 133-155: concept is clear but the description is complicated and I think there is room for improvement; in case they are missing, 'hits' are counted as 'layers' (what about using a different nomenclature? e.g. 'layer measurement'); but what about not missing hits, the one on which you apply the cut? what about the 4 hits you require? This is relevant with respect to overlaps. Clarify (you may need to introduce a discussion on overlaps here and in the tracker description).

Until L 186-190, the actual quantity of nLayers isn't used so the distinction wasn't made clear. Before that point we make frequent reference to "layers", but as physical layers of the detector itself in which hits can exist.

l 184-185: track coordinates? Do you mean track parameters?

That is a better phrasing and we use that now.

l 253-255: The phrasing could be improved.

This section has been rewritten.

l 262-271 the d0 fit description could be improved (need to check the AN to better understand what's going on). In particular:

l 263: when you say 'we first fit the observed d0 of selected tracks to a Gaussian distribution in the range 0.1cm < |d0| < 1.0cm', you mean that you fit d0 excluding, from the fit, the range -0.1cm < d0 < 0.1cm. Is that correct? If this is the case, I think the you way to describe the fit is misleading.

This section requires some rewriting as in the review we've moved to a single sideband instead of the nine. This rewording makes the fit range and process more clear.

l 265: |d0 ==> |d0|

Fixed.

l 272: 'independent': is that true? the transfer factor derive from the same fit, i.e. the same gaussian. How could N_est^i,fake be independent?

It would be more appropriate to say that P_fake^raw,i is independent, correct. However with the use of one larger sideband this sentence is no longer relevant.

l 326: among the systematic uncertainties, the one associated to the ISR modelling is the largest; nevertheless is it sufficient? You just consider 1sigma of statistical fluctuation (according to the AN, Section 8.2.5), a quantity that, in principle, you can reduce by increasing the statistics. Is there no systematic associated to the reweighing method itself? I think it is needed given the large correction factors you end up with (350%). A possibility (indeed extreme but for sure conservative) would be to evaluate the efficiency change with and without correction factors. Which would be the systematic in this case?

Comparing with/without the correction would be extremely large (~70% for some example points) and not relevant since differences between Pythia and Madgraph are well known. Removing the Madgraph/Pythia correction, we do not have a comparison of data/Pythia to make the correction -- an entirely Pythia-based SM background campaign would be required. In the limit of infinite statistics in data/Madgraph/Pythia samples, we would essentially have a perfect tune for Z ISR and there would be no uncertainty. Another possibility we are persuing is to generate a sample of our signal in Madgraph, the expectation being an ISR distribution equal to the 10M DY+jets we generated to form the Madgraph/Pythia weights.

### Questions from Joe Pastika HN August 7

L254: "bad charged hadron filter" is listed as "not recommended" on the JetMET twiki. Is there a reason this is still included in your filter list?

The recommendation was changed between our analysis of 2017 and 2018 data, and it wasn't feasible to re-process 2017 for this; all of the MET filters listed remove only 2% of the /MET/ dataset, so it is a small issue. In the next AN version "(2017 only)" is added here.

L268: Could you use a difference symbol in the text/tables for when you use dxy/dz referenced from 0,0,0 (maybe dz_000 or something else reasonable) to differentiate it clearly from measurement w.r.t. the beamspot?

Now used is $d_{z}^{0}$ and $d_{xy}^{0}$ where relevant.

L311: What effect does the choice of "2 sigma" on the inefficiency of tracks have on the analysis? How is "sigma" defined here? Is it a uncertainty or is it related to the standard deviation of the distribution?

The sigma here is the standard deviation of the sample mean of the veto inefficiency for each flavor. This procedure is taken as a conservative measure so the value of 2 isn't rigorously optimized. For an example of the effect, in a sample of 900 GeV, ctau = 100 cm charginos in the NLayers6plus category, 745 tracks are selected with the <2sigma requirement and 730 tracks are selected with <1sigma required instead.

L322: What is the signal efficiency for your benchmark points for the feducial cuts?

The fiducial cuts all togther remove roughly 20% of signal tracks.

L334: Can you help me understand what the jet pT > 110 GeV cut is achieving? Do you have the ratio of each passing this cut? Its hard to tell how effective it is from the 2D plots in figure 11.

Figure 11 will be extended to jet pt > 30 GeV on the y-axis to better show the issue the text mentions. The pt > 110 GeV cut is used to be consistent with the online requirement of MET from an ISR jet. The efficiency of this cut is 84.6% in data and 87.2% for the signal sample shown.

L375: Does rho include neutral or charged + neutral average energy?

Rho is from the "fixedGridRhoFastjetCentralCalo" collection which uses all PF candidates within |eta| < 2.5

Figure 16: Is the difference from one in the first bin labeled "total" from acceptance?

This was normalized incorrectly and will be corrected to one in the next AN version.

L458: Can you add plots (at least a few examples) to the AN of the Z->ll mass distributions in the OS and SS categories used for the T&P method?

Plots similar to Figure 19 are added to the AN for electrons.

L490: I don't understand how this test shows that P_veto does not depend on track pT. How significant is the KS test for distributions with so few events?

This question was asked in the pre-approval (see below in the responses). Investigating we found that the estimate as presented is statistically consistent with several other hypotheses of P_veto's pt-dependence, for example a linear dependence. In short we do not have the statistics to determine a pt-dependence or for any potential dependence to affect the estimate.

L523: Can you say a few more words about how the trigger efficiency is calculated?

This is simply the effiency to pass the signal trigger requirement. As in Section 7.3 we require lepton pt > 55 GeV, so that the efficiency is measured on the IsoTrk50 track leg plateau. An additional sentence to this effect is added now to the AN here.

Table 29-31: Are the uncertainties statistical only?

Statistical only. A comment has been added to these captions.

L590: What level of signal track contamination in the ee/mumu CR would be required before it would affect the background estimate significantly?

To have signal contamination here, there would need to be sources of ee/mumu pairs (a Z or otherwise) in the signal which does not occur in any of our samples; we have 0% contamination now. Even if signal did contain Z->ee/mumu candidates, a track would need to have |dxy| >= 0.05 to be fake estimate contamination. The efficiency for the |dxy| < 0.02 cut on one sample (900 GeV, 100 cm, NLayers6plus) is 99.8% -- so the contamination would be even less than 0.2% times the transfer factor.

L646: Do I understand correctly that the cross check here is to simply take the ration of integrals of the sideband vs. signal region instead of using the fit to determine this ratio?

Not quite. The cross check takes the normal estimate from the sideband (count/integrate the events and scale by the fitted transfer factor), and compares its estimation of the events in the peak to the actual observation in the peak.

L742: Did you compare the trigger efficiency calculated with other reference triggers than SingleMuon?

From a pre-approval question we measured the trigger efficiency using the SingleElectron dataset as well. It was very similar, although electrons introduce hit efficiency effects like conversions so we do not use it in the analysis.

L837: Is there a good argument why the source of differences in the electron case really is applicable to the muon and tau cases?

The online/offline MET isn't expected to be strongly dependent on the nLayers of a mis-reconstructed lepton, since they've failed to be reconstructed, and so naively one should expect P_offline and P_trigger are all the same. But there is a small difference for electrons, and we assume muons/taus have a similar difference. The statistical uncertainties for these muon/tau estimates are already 100% or more so even a much larger systematic would not make a difference.

### Some followup questions from Kevin Stenson HN July 24

Table 19: More clarification on dxy and sigma of the track is needed. If dxy is truly with respect to the origin, that is a terrible idea. The beamspot is significantly displaced from the origin (by several mm). So dxy should be measured with respect to either the beamspot or the primary vertex. Regarding sigma, I guess you are saying that it only includes the calculated uncertainty on the track parameters. Can you provide a plot of dxy and sigma for the tracks. Preferably for all tracks. These are applied in the signal selection, so why not here?

See the above answer from the ARC action items.

Table 22: I don't understand your response. Regarding Table 22, my specific questions are: for electrons, why is there no veto on minDeltaR_track,muon>0.15 or min DeltaR_track,had_tau>0.15 or DeltaR_track,jet>0.5 for muons, why is there no veto on minDeltaR_track,electron>0.15 or E_calo<10 GeV or minDeltaR_track,had_tau>0.15 or DeltaR_track,jet>0.5 for taus, why is there no veto on minDeltaR_track,electron>0.15 or minDeltaR_track,muon>0.15

See the above answer from the ARC action items.

Figure 19 and Tables 29-31: I'm still not sure I understand. Is Figure 19 the plot for all probe tracks, regardless of whether there is a matching tag? If it is just all probe tracks that survive the tag-and-probe requirements, then the fact that they are all plotted and that you use all combinations should mean we get the same answer. On the other hand, it could be the same-sign subtraction is the cause of the difference. In addition to providing the four numbers that go into Equation 6, can you also provide the integrals of the blue and red points in the three plots of Figure 19? I'm hoping that N_T&P and N^veto_T&P will be similar to the integrals of the blue and red points, respectively.

See: above.

L514-518 and Tables 29-31: So my understanding is that when you write "tau background" you are really intending to identify the sum of taus and single hadronic track backgrounds. I think this approach is fine but there may still be some issues with the implementation. The measurement of P_veto is going to be dominated by taus as it is measured from Z events and has the same-sign background subtracted. If you are trying to apply P_veto to the sum of taus and single hadronic track background it seems like it is necessary to show that P_veto is the same for taus and single hadronic tracks. For P_offline, the single-tau control sample will clearly be a mix of taus and single hadronic tracks as the selection is relatively loose. This is probably good. However, it will also include fake tracks as there is no same-sign subtraction in this measurement. So I think there is still the possibility that you are including fake tracks here. I guess the fact that there are basically no 4 or 5 layer tracks suggests that the fake track contribution is negligible.

We agree. The tau control region is not sufficient to study the composition of real taus versus single hadronic tracks so there is no feasible study on fake contamination here. Even if for example the contamination was 50-100% the estimates would still be statistically consistent with what we have now.

Figure 25 and Tables 29 and 31: The Figure 25 caption seems to suggest (to me) that the plots show the projection of the events in the upper-right of the red lines. In actuality, the plots show the full distribution. I would suggest changing the plots to only include the events from the upper-right of the red lines in Figures 20-22.

Instead the caption has been changed to be more accurate.

Section 6.1.5: Your response keeps mentioning that P_veto is small and therefore other sample are not useful. It may be true that other samples will not contribute when you select the signal region and compare to the background estimate. However, I would think the non-ttbar background will contribute to the measurement of P_offline and P_trigger, since these simply come from single-lepton samples. So, again, if you want this to mimic the data, I think you need to include all of the background MC samples.

See: above.

Section 6.2: Thanks for the info. I think you are correct that fake tracks are the main contributors to the Gaussian and flat portion of the dxy plot. I don't think there is any bias in the final fit that is done. However, the pattern recognition does have a bias for finding tracks that originate from the beamline. So that could be the reason. I am concerned about one statement. You write that you label a track as fake if it is not "matched to any hard interaction truth particle". Can you clarify what you mean? I am worried that you only check the tracks coming from the "hard interaction" rather than all the truth tracks (including from those from pileup). I think it would be wrong to classify pileup tracks as fake tracks.

Technically this means there is no "packedGenParticles" (e.g. status=1) object with pt>10 GeV within deltaR < 0.1 of the selected track. This designation is not used for any measurement in the analysis, and whatever the source of fake tracks one would need to treat them in data as we've done; we just do not separate them by source.

Section 8.2.2: It is good to know that the hit efficiencies seem to be accurate. However, you also write that charginos behave like muons and so using muons is the correct way to evaluate the hit efficiency. You also write that "the reconstruction is done only with the tracker information". As I wrote earlier, for real muons, there are two additional tracking iterations that use muon detector information to improve the track reconstruction. This won't be the case for your charginos of interest because they decay before reaching the muon stations. So that is why I worry that muons are not a good substitute for your charginos.

See: above.

Section 8.2.5: This response does not really address the heart of the question. Suppose you did have infinite statistics in data and MC. Would we then be comfortable quoting no systematic uncertainty? Are we 100% sure that taking the Z pT and using that to reweight the pT spectrum of the electroweak-ino pair gives the correct distribution? Has anyone looked at applying this procedure to diboson production or ttbar production to confirm that it works?

See: above.

Section 8.2.11: Are you saying that for your 700 GeV chargino signal, the increase in trigger efficiency when going from HLT_MET120 to HLT_MET120 || HLT_MET105_IsoTrk50 is only 1%? I'm not sure this is relevant to my concern though. Suppose you have a signal that produces PFMET of 125 GeV. When you measure the efficiency for nlayers=4,5,6 tracks, you will get the same efficiency because the MC includes a HLT_PFMET120_PFMHT120_IDTight trigger in it. So, you say great, there is no systematic because the efficiencies are the same. However, in some fraction of the data this trigger path is disabled and so the efficiency would be quite different for nlayers=4 tracks (which won't get triggered) and nlayers=6+ tracks which will get triggered by HLT_MET105_IsoTrk50. So, my suggestion was to do the same study but only include triggers that are never disabled or prescaled. Basically, you are currently using HLT_PFMET120_PFMHT120_IDTight || HLT_PFMETNoMu120_PFMHTNoMu120_IDTight || HLT_MET105_IsoTrack50 to make the calculation. My suggestion is to use HLT_PFMET140_PFMHT140_IDTight || HLT_PFMETNoMu140_PFMHTNoMu140_IDTight || HLT_MET105_IsoTrack50. Alternatively, you could correctly weight the MC to account for the different luminosity periods when each trigger is active.

The increase in signal acceptance*efficency is only 1%, the increase in trigger efficiency is as shown in Figure 44.

What we've done with trigger efficiency scale factors is to average the data efficiency over the entire data set, which includes the histories of every path. As you suggest we could alternatively measure a different scale factor for every data period, and then take a lumi-weighted average of these different periods -- but these methods are equal. The loss of efficiency in 2017 B is included in the data efficiency and thus the scale factor on signal efficiency.

### First set of comments from Kevin Stenson HN July 14

Table 5: There seems to be a third column of percentages but there is no column heading or description of what this is in the caption or text. Please clarify.

The \pm was missing; fixed.

L208-209, Table 9: You mention the single tau trigger was prescaled. This appears to result in less than 6 fb^-1 of luminosity. Is there no other single tau trigger with less of a prescale that could be used? What is the situation for 2018 data?

Yes, Table 1 details it as 5.75/fb. The only other single tau triggers have much higher thresholds or additional requirements that are unsuitable. The prescale was higher in 2018.

L243-4: Are there eta restrictions on the jets you use? |eta|<5 or |eta|<3 or |eta|<2.4 or |eta|<2.1 or something else? In Table 17 there is a cut of |eta|<2.4 but I'm not clear if this applies to all jets used in the analysis.

Yes, |eta| < 4.5 overall; this is added to the AN. Different requirements are as listed e.g. Table 17. The 10 GeV has also been fixed to 30 GeV; MINIAOD contains only >10 GeV jets but the analysis considers only >30 GeV.

L260: I have a general idea of how electrons and muons are reconstructed but not so much with the taus. I seem to think there is some flexibility in tau_h reconstruction. Can you add some information about how the taus are reconstructed? I seem to recall that one can select (with some ambiguity) tau+ decays to pi+ or pi+ pi0 or pi+ 2pi0 or pi+ pi- pi+. Do you consider all tau_h decays or just one-prong decays? It wouldn't hurt to add a few sentences about muon and electron reconstruction as well (or at least some references).

As stated we use the POG recommended decay mode reconstruction with light flavor rejection which does target multiple tau_h decays. This selection is only used to normalize the tau_h background estimate and must be inclusive to all tau_h decays. A brief reference has been added for the PF lepton reconstruction description.

L260-4: Have you checked the efficiency for selecting the correct primary vertex in signal events? One could imagine selecting the vertex based on the origin of the isolated track and/or ISR jet.

We use the standard PV recommendation using the highest sum-pt^2 vertex (L 260-264); we have no need of a specialized vertexing requirement. Figure 26 for example demonstrates that signal tracks are well-associated with the PV already.

Table 17: I'm a little confused. I get that there must be at least one jet which simultaneously has pT>110 GeV and |eta|<2.4 and passing tight ID with lepton veto. But when you measure max |Delta phi_jet,jet|, do each of the jets in the comparison need to pass those cuts as well? If so, then this cut only applies in events where there are two or more jets with pT>110 GeV. That doesn't seem right.

See above; jets are considered if pt>30 and |eta|<4.5. So |Delta phi_jet,jet| would apply only if there are two or more jets, the minimal case being a ~110 GeV and a second ~30 GeV jet. A clarifying sentence has been added.

L332-334: It is claimed that a jet pT cut of >110 GeV removes a lot of background and not signal. The plots in Figure 11 don't seem to back this up. It seems like about the same percentage of signal and background events are removed by the cut. Can you quantify the effect of this cut (background rejection and signal efficiency)?

Figure 11 has been updated (now Figure 13 in AN v8) to show jet pt >30 GeV instead of >55 GeV, as the issue is seen at lower pt. The efficiency of the >110 GeV cut is 84.6% in data and 87.2% for the signal sample shown.

L338-341: Would be nice to show the signal and background distributions for these two variables so we can evaluate for ourselves the effectiveness of the cut. Also, it would be helpful to report the background rejection and signal efficiency for these cuts.

Added to the AN as Figure 14 beginning with version 10.

Table 18: Are there no standard Tracking POG quality requirements on the tracks? I think they still have "loose" and "highPurity" requirements based on an MVA. Do you require either of these?

These exist but we do not use them; the standard quality flags do not make requirements on the hit pattern for example.

Table 19: You need to define what sigma is in the last line. Is it the beamspot width, is it the primary vertex transverse position uncertainty, is it the uncertainty on the track position at the distance of closest approach to the beamline or primary vertex, or some combination of these?

Both dxy and sigma here refer to the dxy measurement with respect to the origin. This is made more clear in the AN.

L365-367: You write that "all" muons and electrons are used. I would like to have a more complete description of this. It may help if you add some text around L260 describing muon and electron reconstruction. For muons, Table 13 defines tight and loose ID. Is it simply the OR of these two that you use? Or do you include tracker muons? I think there is also a soft muon ID. Are these included? What about standalone muons? For electrons, Table 12 defines tight and loose ID. Is it simply the OR of these two categories that you use? Or do you loosen up the requirements further? If so, what are the requirements?

Text describing this around line 365 has been added to explain that "all" means all available in MINIAOD, which has a minimal set of slimming requirements which are now provided in the text.

L368-370 and Table 20: Would be nice to see plots of Delta R to see why 0.15 is chosen. I would have expected a smaller value for muons and a larger value for electrons.

We will produce N-1 plots to show this.

L363-370: Just to be clear, there are no requirements on the pT of the leptons? So, if there is a 4 GeV muon within Delta R of 0.15 of a 100 GeV track, then you reject the track?

As above these are all those available in MINIAOD, which has a very minimal set of slimming requirements. For example muons passing the PF ID have no pt requirement whatsoever. If such a muon as you write is near our track, yes we reject it.

L389-91 and Figure 14: It would be good to plot Figure 14 with finer bins (at least between 0 and 10 GeV) to back up this statement.

The statistics are very limited and the intent is to show the separation between 0-10 and above 10. The statement now states "remove almost all background" which Figure 14 supports.

Figure 16: Why is the first bin at ~30% rather than 100%?

Fixed.

AN Table 22: Why don't you apply the full set of vetos for all lepton flavors? Is it an attempt to increases statistics? Can you perform the test with all vetos applied to see if the results are consistent?

All three sets of requirements are applied in the signal region. In measuring each flavor's P(veto), it's necessary to retain the veto against other flavors to maintain purity of the flavor under study. For example when studying muons, one would still require ECalo < 10 GeV. So this is tighter than what you suggest, to achieve better purity of each flavor.

L456-460: Your assumption here is that the same-sign sample has the same production rate as the background. Have you verified this? You could verify it with a high statistics sample of dilepton events (or come up with a scaling factor if it is not exactly true). Also, in L457-458 you list three sources of background: DY, non-DY, fake tracks. I don't see how a same-sign sample can be used to estimate the DY background? I would suggest calling DY part of the signal. For di-electron and di-muon events, you also have the possibility of using the sidebands around the Z mass to estimate the background. You could check that this gives consistent results.

This method was suggested by you in the review of EXO-16-044. The language of the AN regarding continuum DY has been updated for clarity as you suggest. It is not relevant to the measurement of P(veto) to estimate the non-Z backgrounds in our tag-and-probe samples, so we do not -- the purpose of the same-sign subtraction is to increase the purity of the lepton flavor under study.

L468-9: It would be helpful to provide a table giving the values for the 4 numbers in Equation 6 for each of the 9 cases (3 leptons * 3 nlayer bins). I would like to get an idea of the signal-to-background ratio. I may also want to calculate the effect of subtracting the background versus ignoring it.

In making this table we found some bugs in the P(veto) script. Firstly the N_{SS T&P} was not being subtracted in the denominator of Equation 6; in nlayers>=6 this is an extremely trivial issue but is relevant in the newer, shorter categories. Secondly in cases where the numerator of Equation 6 is negative, the N_{SS T&P}^{veto} subtraction was ignored when it should be assumed to be 0 + 1.1 -0. This slightly changes some estimates.

L472-5: Are these recHits from the tracker or the calorimeter? I don't really understand what you are describing here. Are the electron seeds from the ECAL or pixel detector?

Throughout the documentation, recHits refers to calorimeter recHits. Electron seeds are ECAL superclusters matched to seed tracks in or close to the pixels -- so electron seeds are from both. We've reworded this section for increased clarity.

L482: It looks like the probe tracks already have a pT cut > 30 GeV. So going down to 20 GeV is just being extra safe. Is that right?

Correct.

L478-487: It is not clear to me. Is this re-reconstruction needed for the signal region or not? Was it done for the signal region?

It is not needed for the signal region and is not done, as track pt > 55 GeV is above the threshold of 50 GeV.

Figure 19 and Tables 29-31. If I try to integrate the plots in Figure 19, I would estimate that the integral of red/blue is roughtly 10^-6, 10^-7, and 10^-4, for electrons, muons, and taus, respectively. I would expect this to be approximately equal to P_veto. But in Tables 29-31, I find P_veto numbers of 10^-5, 10^-6, and 10^-3 for electrons, muons, and taus for nlayers>=6. So roughly a factor of 10 off. Can you explain this?

Recall in the review of EXO-16-044 you recommended we utilize all possible tag-and-probe pairs in every event, on top of performing the same-sign subtraction. Figure 19 shows all probe tracks in all events; often there are multiple probes that can be chosen as the tag-and-probe combination. As you say these figures are related to the value of P(veto), but are not precisely equal.

Figures 19, 21, 25, and 40 and page 72: I would suggest removing footnote #17 on page 72 and adding that information into the captions for figures 19, 21, 25, and 40. You should also add a similar explanation to the caption of Table 31 indicating that N_ctrl is scaled to the signal region luminosity.

We have added the mention to Table 31's caption, but the authors feel the luminosity label is sufficient in those figures.

Figure 22: Would be good to show the results for nlayers=4 and nlayers=5 (unless there are no entries in which case you should note that in the caption), similar to the way you show the results for tau for nlayers=5 even though it is not used.

There are 1 and 2 events respectively for nlayers=4 and =5 respectively, thus these plots were found unhelpful. A comment has been added to the caption of Figure 22 to mention this.

L514-518 and Tables 29-31: I note that P_offline for electrons and muons is very similar, around 80%, while for taus it is much lower, around 20%. Do you understand the difference and do you think it is OK for the method. I can imagine two effects that could cause this. First, it could be that since the pion from the tau decay does not carry all of the tau momentum, the tau candidates from W decays will have a lower pT than the muon or electron candidates from W decays. So when the tau pT gets added to pTmiss, it will get shifted less than when the electron pT gets added and so more will fail the Ecalo cut. Based on Figure 25, this seems to be true and is probably innocuous. But I think there is more to it. Comparing Figures 20 and 21, the modified pTmiss for the tau case has a large contribution at the bottom left corner that is not present in the electron (or muon) case. It seems that the electrons and muon are very consistent with the topology of a W recoiling from an ISR jet so delta phi is ~pi. However, for the tau case, there seems to be many events where the "tau" is part of the leading jet. I guess that since there is a Delta R cut of 0.5, the "tau" must differ from the jet a bit in eta. Given this evidence and the fact that we know that the tau purity is much worse than electron and muon purity, it seems likely that many of the events in Figure 21 do not contain taus. I would guess the events are multijet QCD events with either an isolated track by chance or a fake track. So, my hypothesis is that the single tau control region has a large contamination of non-tau events. You use the same sample for measuring P_offline and the multiplying by P_offline as part of estimating the tau background. So we could consider this estimate as being the tau+single hadronic track+fake track contribution. But there are two problems with that. First, P_veto is measured on a much purer sample of taus as it uses Z decays and subtracts the same-sign contribution. So P_veto is really measuring taus, not the sum of tau+single hadronic track+fake track. And P_veto may not be the same if the other contributions were included. Second, you have a separate measurement of the fake track contribution so you would be double counting. Please let me know what you think.

Our interepretation of the lower modified-MET distribution for taus is the same as yours here, and we too feel it is innoccuous.

We intentionally capture the "single hadronic track" component as part of the tau estimate. These do contribute as a background, and are included as part of our "tau background"; the analyzers feel that calling this the "tau and single hadronic track background" is a distraction for the reader, beyond the one mention in L454-455. We capture this contribution in two ways: firstly as other reviewers have noticed, our hadronic tau ID is fairly loose, and secondly we remove the requirement on deltaR(track, jet) > 0.5. Thus the tau P(veto) considers all of these contributions, and you see in Figure 21 that some of them have a lower probability to pass the deltaPhi(jet, MET) requirement which is then included in P(offline). For clarity we have added a reminder of the deltaR(track, jet) cut removal to L455 in the AN.

Lastly the "fake track" contribution will not survive the same-sign subtraction, whereas the "single hadronic track" contribution will. So we are not concerned about double-counting the fake tracks.

Section 6.1.3: I think you need to be a little more clear here. I want to confirm that I understand. First, the figures mention HLT efficiency but I think this is really the full trigger efficiency (L1+HLT). Do you agree? Second, the trigger efficiencies shown in Figure 23 and 24 are the actual results from the L1+HLT that was run and the x-axis refers to the actual pTmiss,nomu of the event. That is, the x-axis is not the modified pTmiss,nomu where the electron pT is added back in. Is that correct? Then, the x-axis of Figure 25 shows the modified pTmiss,nomu with the electron or tau pT added back in. Is that correct? Try to make the text and figures a bit clearer.

Figures 23 and 24 now are labeled as just "trigger efficiency", and the caption for Figure 25 has been made more clear.

Figure 25 and Tables 29 and 31: Figure 25 seems to show that the electron distribution is shifted higher than the tau distribution. Therefore, once you convolute this distribution with the trigger efficiency, I would expect the electron trigger efficiency to be higher than the tau trigger efficiency. However, the opposite is true. If I naively take the trigger efficiency as a step function which is 0% for pTmiss<200 GeV and 100% for pTmiss>200 GeV, I think I get about 30% for electrons and 13% for the tau, compared to 46% and 52%. Can you check the results and if correct, try to explain what I am missing?

Your numbers are correct if you integrate Figure 25 across the entire MET range. However P(trigger) is a conditional probability after P(offline) has already required metNoMu > 120 GeV, so that must be applied.

Table 31: How did you determine the uncertainties for nlayers=4? The upper uncertainty of 0 for N^l_ctrl and the upper uncertainty of 0.0058 for estimate seem too small. Actually nlayers = 4 and nlayers = 5 for both muons and taus (Tables 30 and 31) have yields that are too small to assume Gaussian uncertainties. You should use Poisson uncertainties. You can ask the statistics committee for better advice but I think more correct uncertainties would be: 0 +1.15 -0 1 +1.36 -0.62 2 +1.52 -0.86 This comes from the prescription on page 32 of http://pdg.lbl.gov/2019/reviews/rpp2018-rev-statistics.pdf but again, the statistics committee may have another prescription. Note that I think this results in an estimate for the tau background for nlayers=4 to be 0 +1.9 -0.0 rather than 0 +0.0058 -0.

This has been corrected to "0_{-0}^{+8.2}" using poisson errors; the +1.15 must be multiplied by the tau trigger prescale. This is also corrected in the total lepton and total background stat uncertainties. Only the table values needed correction -- the upper limits used the correct values.

Section 6.1.5: There needs to be more information. Do you calculate P_veto, P_offline, and P_trigger for each of the leptons using simulated samples following the same recipe as for data? If so, what simulated samples? Do you just use Z->ll for P_veto and W->lnu for the others? Does the single lepton control region come from just W->lnu events? Or do you include all the background samples in Tables 7-8? I think using all of the samples from Tables 7-8 for every calculation would make the most sense.

With the modifications given, we calculate the background estimates in precisely the same way as in data. The AN has been clarified on that. For the lepton closure we use only the ttbar samples because the other samples do not affect the statistics for this study due to the small P(veto).

Section 6.1.5: It seems like even with the relaxed selection criteria, you are still quite lacking in MC statistics for this test. You mention that you only include ttbar events. Is this just the ttbar semileptonic sample? Although this may be the largest single sample, it seems like you could also include other samples. Most importantly would be the W->lnu and Z-> invisible (including the HT-binned samples) as these seem to be the largest source of background in Figures 14 and 15. Is there some reason you didn't include these? If not, I suggest you go ahead and do this.

All three ttbar samples are used, but the di-leptonic ttbar sample contributes the most. Keep in mind that P(veto) is a tag-and-probe selection, and samples such as W->lnu and Z->invisible will not significantly contribute. The Z->ll sample should contribute, but the size of those samples is considerably smaller than the ttbar samples.

Figure 26: Are all three results properly normalized to 41.5 fb^-1? If so, it seems like we should be including 3 layer tracks in the signal region because I can exclude a ct=10cm with this plot alone (observe 350 with a prediction of 125), which you can't do with the whole analysis.

Yes, the results are properly normalized, but there is no observation in Fig. 26 -- all the entries are MC. If we were to include data, we would expect the fake contribution to the 3 layer tracks to be orders of magnitude larger making any exclusion difficult without a dedicated analysis.

Figure 27: Would be good to add the nlayers=4 result as is done in Figure 28. Should also say what simulated samples are included here.

All samples are used, the caption is updated. The cleaning cuts are only relevant for the nlayers=3 category which is only used in an appendix (after the edits suggested below). Showing this for nlayers=4 would provide the same information as Figure 31.

Section 6.2: This needs to be cleaned up and explained better. Here are some specific comments/suggestions - I think that L566-569, Figures 26-28, and L602-615 can all be removed. It seems like they have nothing to do with the analysis that is done. They just lead to confusion. If you want to move this material to Appendix C, that is fine. But don't clutter up this section.

These have been moved to Appendix C.

- I'm confused by the transfer factor. I assume that the fit in Figure 29 is actually a Gaussian + flat line. Is that correct? What is your hypothesis about what is contained in the Gaussian area and what is contained in the flat line area? I would have assumed that Gaussian contribution indicates real tracks (since they peak at d0=0) and the flat line contribution indicates fake tracks. But this doesn't seem to match your hypothesis. Can you say exactly what the fit in Eq. 14 is doing? In L629-630 you quote a single transfer factor for each Z mode. Shouldn't there be a different transfer factor for each of the 9 sideband regions?

a) Correct, the fit is a gaussian + constant. The AN has been clarified.

b) Our hypothesis is that this is a bias in the track-fitting algorithm, where short tracks with very few hits have the importance of the primary vertex inflated, drawing tracks closer to the PV. Figure 27 shows this also occurs in SM background MC, and in those MC samples none of the tracks are near to any hard interaction truth particle; this is precisely our consideration of "fake" in MC truth.

c) The purpose of the transfer factor is only to normalize the sideband rates to the signal region. We must describe this normalization in a way that does not depend on obvserving the signal region count, because in nlayers=5, >=6 the statistics do not allow for that. That is what the fit does in Eq. 14.

d) L629-630 quotes the transfer factor for the baseline sideband (0.05, 0.10) cm, so only one. The authors felt that Table 35 was large enough already, but we now provide an additional table listing the P^raw_fake and transfer factors.

- What best describes your assumption of the fake track rate as a function of d0. Is it uniform (flat), Gaussian, Gaussian+flat, or something else?

Guassian + flat.

- I don't see the advantage of having 9 different sideband regions. Simply take the sum of events from 0.05-0.5 and multiply by the overall transfer factor. This should minimize the statistical uncertainty. In fact, I would suggest combining the Z->mumu and Z->ee samples as well. Also, remember to use the correct Poisson uncertainties (as discussed for Table 31) when you only have a handful of events. If you somehow think it is a good idea to have 18 different measurements instead of 1 and you are using a transfer factor with an uncertainty, make sure to properly account for the fact that this uncertainty is correlated for different bins.

As discussed in the first ARC meeting, we've used a single larger sideband. The fit uncertainty is employed as a nuisance parameter 100% correlated between bins.

- L635-638: As mentioned above, I would suggest combining the Z->mumu and Z->ee results to get the final estimate, seeing as you are statistics limited. You can still use the difference between the two as a systematic uncertainty (but see below).

Doing as you suggest is acceptable, but will not change the estimate very much. This was discussed in the first ARC meeting and will be revisited.

- L640-645: It is obvious that Z->mumu and Z->ee events are quite similar. They have the same production mechanism, they are selected by single lepton triggers, etc. So, it is not much of a test to show that they give the same result. On the other hand, your signal region requires large missing ET, a high pT jet, and a high pT isolated track that is neither a muon or electron. One might worry that the fake track rate depends on the amount of hadronic activity in an event, which is likely higher in the signal region than in Z events. One might also worry that the fake track rate depends on pileup, and the signal trigger/selection may be more susceptible to pileup than the single lepton trigger/selection. Ideally, I would suggest that you perform the same measurement on a QCD dominated region (like requiring a dijet or quadjet trigger or just high HT). You can require pTmiss,no mu < 100 GeV to ensure no signal contamination. If this is not possible, then you could consider taking what you have and either reweighting the pileup and HT distribution to match the signal region or checking that the fake rate is independent of these quantities.

See the above response (from "ARC action items from July 25 meeting"). Small differences exist between pileup in these different samples, but reweighting for those differences doesn't change estimate.

- L649-653: I don't understand how these numbers are consistent with Figure 29. In Figure 29 (left) it seems there are about 9 events with |dxy|<0.02cm and about 15 with 0.05<|dxy|<0.10cm to be compared with 32 events and 68 events. There is a similar discrepancy for electrons. I guess the plots have been scaled for some reason as the entries are not integers. Please fix the plots and verify the results are consistent.

The scaling of the plots is now fixed, and agrees with the text.

- Figure 29: Why do you not fit the region |dxy|<0.1cm? If you fit out to |dxy|=1.0cm, please show the entire range in the plots. It would be nice to see the results for nlayers=5 and 6 as well so we can evaluate the extent to which a fit may or may not be possible and whether the shape is consistent with nlayers=4.

The AN has been updated to correctly reflect the fit extending to |dxy|<0.5cm, the range of the plots. Should the d0 peak actually contain real tracks, it would peak more narrowly than observed in the sidebands; so |dxy|<0.1cm is excluded from the fit, and the count of nlayers=4 tracks in the signal region is checked against the fit prediction, and agrees.

Shown below is the nlayers=5 d0 distributions, with the fit from nlayers=4 overlaid. The nlayers>=6 samples have one (three) events in ZtoMuMu (ZtoEE), so no fit is possible.

 ZtoMuMu NLayers5 ZtoEE NLayers5

Section 6.2.2: In Table 36, it would be enlightening to show the same results as in Table 35. That is, I am curious as to how P_fake compares between data and MC. Are these results normalized to 41 fb^-1? If so, then it seems like the MC predicts about 1/5 as many fake tracks as data. It is hard to be confident that the MC tells us anything if that is so.

We felt that Table 35 was distractingly large, and the low MC statistics makes things worse for Table 36. That we are moving to one single sideband, Table 35 is less relevant too. Table 36 is normalized to 41/fb. The value of Table 36 is the test of closure in the fake estimate method, rather than the absolute rate of fake tracks in simulation which has always been an issue, thus the data-driven estimate is a must.

Section 6.2.2: Your hypothesis is that the fake track rate is independent of selection so you can use the Z data to estimate the fake track rate in your signal region. I have suggested that you could also measure the fake track rate in QCD events to verify this. You can also check the effect in MC. I guess in Section 6.2.2 you apply the same criteria to MC as you do for data (selecting Z events). However, if your hypothesis is true, then you should also get the same fake rate if you use any MC sample. What happens if you use all the samples in Section 3.3 but remove the Table 33 and 34 requirements so you are using all events? If P_fake changes significantly, this is cause for concern. If not, then that is good. In either case, it still may not prove anything if the MC is really predicting 1/5 the amount of fake tracks.

As above the absolute rate of fake tracks in simulation is not well trusted, so the comparison of 1/5 to data does not concern us. Certainly one could use additional MC samples and change the selection, but this then deviates from the treatment in data and in principle is not the same closure test. If a third selection/sample were used then the closure test in MC would also need to be used.

Figure 31: Would be good to have a plot for nlayers=5 as well.

We will produce this plot.

Figure 35: Please include the ratio of the two since this provides the scale factors that are used. It may be better to simply include the region of 50-300 GeV on a linear scale.

We will produce this plot. The 2018 data processing was prioritized over this.

L752-754: While this signal yield reduction is interesting, just as interesting would be the change after all cuts are applied (with nlayers>=4). Can you provide this as well?

We will produce this. The 2018 data processing was prioritized over this.

L760-762: How sure are we that the Z can be used to measure ISR difference for the signal model? I generally agree with the statement that both recoil off ISR but it would be nice if this could be confirmed somehow. Does the pT distribution for a 100 GeV chargino look similar to a Z in Pythia8? Does the ISR reweighting work for ttbar events or diboson events?

See the above answer from the Email questions from Kevin Stenson August 27.

L772-773: Please expand on "is applied to the simulated signal samples". Do you reweight the events using the ISR jet in the event or the net momentum of the produced SUSY particles or something else.

The vector sum pt of the gen-level electroweak-ino pair is used to evaluate the weights. The AN is clarified.

Section 8.2.2: Please expand on this. I am very surprised that this is such a small effect. Given the problems encountered with the Phase 1 pixel detector (problems with timing levels 1 and 3, way more noise than expected in layer 1 causing high thresholds, DC-DC converter failures, etc.) I would have expected big differences between data and simulation on quantities requiring hits in the pixel detector. I know the tracking reconstruction was changed at HLT and offline to keep track reconstruction efficiency high but this doesn't remove the problem of missing pixel hits. So please expand on how you measure these uncertainties. Do you just use tracks with pT>55 GeV that are associated with a muon? One problem with using muons to evaluate tracking efficiency is that there are special track reconstruction techniques developed to recover muons missed by the standard tracking. These tend to use wider windows to discover silicon hits and so may not reflect the track reconstruction of "standard" charged particles. You could perhaps remove the electron and tau vetos to see what you get in those cases.

THE AN describes the process correctly. The global tag used for signal was formed well after data-taking was completed for 2017, and has updated hit efficiencies. Further this efficiency is very high which affects the scale of this value. For missing middle hits, the inefficiency has a 4.5% difference between data and MC, but it is the efficiency which is 0.02% different.

Before the chargino decays, signal tracks are muon-like and would be treated the same way. The muon control region is still a track selection (pt > 55, MET > 120, jet pt > 110), and the reconstruction is done only with the tracker information. The electron/tau vetoes need to remain so that this sample is dominated by muon tracks, and is comparable to the signal tracks before they decay.

Section 8.2.5: This seems like an underestimate of the systematic uncertainty. If you had infinite data and MC statistics, your systematic uncertainty would be 0. As mentioned above, this doesn't address whether measuring the ISR using Z->mumu decays translates exactly into the ISR for the signal process. The paper mentions this is up to a 350% correction, so it is a big effect. I am very worried that the systematic uncertainty does not cover all that we don't know. I note that Figure 37 shows results with pT and pTmiss. Why did you use pT? Perhaps pTmiss could also be used as a systematic check.

Most of this uncertainty comes from the data/MC correction at lower sum-pt's, where we do not have infinite data statistics. Moreover these statistical uncertainties are largest where our signal populates the least, which lowers this systematic uncertainty. The pTmiss is a useful cross-check which was requested by conveners, but the sum-pT of a diMuon system has a much better resolution than pTmiss; it is a very common tool in characterizing the hadronic recoil in many analyses.

Figure 43: I would suggest including the same comparison vs pT from the document you reference. This shows that for pT>55 GeV, the differences are similar as for lower pT.

Section 8.2.11: Please explain this better. My understanding is that various prescales were in place. In the original measurement of the trigger efficiency in Section 7.3, you rely on being above the track requirement plateau to measure the trigger efficiency versus pTmiss in data (which has all of the various prescales naturally included) and compare to MC (which just has an OR of all trigger paths, I think). This is the main reason why the data efficiency is lower than the MC efficiency. Is this correct? Now, in this section, you are measuring the trigger efficiency solely with MC, which is an OR of all trigger paths. So if the trigger path with the track requirement fails, the MC might still find that an MHT only trigger will fire, while in data the MHT-only trigger may be prescaled. Isn't this a problem? Can you perhaps repeat the exercise using only triggers in MC that were never prescaled (as the opposite extreme to assuming there was never any prescale)? Also, why would you average over the chargino lifetimes? Shouldn't this systematic uncertainty depend very strongly on chargino lifetime?

Several triggers were disabled in portions of 2017, so not precisely "prescaled" but we understand. The main difference between data and MC in Section 7.3 is that this history was not in the simulation; this operational history is averaged over in data and applied to the simulation with these weights.

For this section, consider a simple worst-case scenario: the HLT_MET105_IsoTrk50 path which was disabled for 2017B (10% of the lumi), and a 100% enabled path HLT_MET120. In 2017B conditions the efficiency is solely that of HLT_MET120, and you end up over-estimating the trigger efficiency by the difference in efficiency between HLT_MET120 and the OR of the two. Figure 44 and the systematic in Table 45 show this difference to be very small (~1%), and this would only apply to 10% of the data in 2017B, very contained by this systematic. Ignoring the IsoTrk50 path, the triggers dominating the MET turn-on from Table 9 are HLT_PFMET(noMu)120_PFMHT(noMu)120_IDTight which is very similar to this simple worst-case example.

Lastly if the charginos are reconstructed at all they would be reconstructed as muons and will not be included in metNoMu. The only way they can contribute to the metNoMu is by affecting the recoil of the ISR jet, which is why we average over chargino lifetime but measure this systematic separately for each chargino mass.

Section 8.2.11: Per my discussion of pixel issues in 2017. It is relatively easy to get 5 pixel hits with only 4 pixel layers as in order to make a hermetic cylindrical detector with flat sensors, you need to have overlaps. These overlaps are largest in the first layer, which is where there were significant issues with the Phase 1 pixel detector. So I am concerned that if the MC is optimistic about layer 1 hits, then relying on the MC may not be wise. Maybe you can check the following. Take good tracks (not muons but large number of hits with pT>50 GeV). Check the fraction of tracks that have two layer 1 pixel hits compared to one layer 1 pixel hits between MC and data. Or, more generally, the average number of pixel hits. If they differ, then you could see how many times you would go from 5 pixel hits to 4 pixel hits in data vs MC and use this difference as another estimate of the difference in trigger efficiency.

We do not have the entire hitPattern contents histogrammed so this is not quickly answerable as suggested. However in the electron control region (pt > 55 GeV) with nlayers=4, 11% of tracks have more than 4 pixel hits whereas signal ranges from 8-12%. This comparison however would only describe the offline association of hits to tracks, which is known to be better than the online association -- so one would still need to examine the trigger fires as Section 8.2.11 does to get a clear view of the difference in trigger efficiencies.

Here are also some brief comments on the paper:

Whenever you have a range, it should be written in regular (not math) mode with a double hyphen in LaTeX and no spaces. That is, "1--2". Done correctly in L44, . Incorrect in L186, L285, L321, L326, L327, L331

Fixed.

In Section 2, I think it would be good to give more information about the tracker, especially the Phase 1 pixel detector. It is pretty important to know that we expect particles to pass through 4 pixel layers.

Lines 28-31 should establish the extra categories that are possible thanks to the upgrade. A sentence listing the positions of the layers/disks has been added to Section 2.

Should mention the difference between number of hits and number of layers with a hit.

Lines 185-187 have been expanded to mention this.

L60-67: At the end you talk about physics quantities like tan beta, mu, and the chargino-neutralino mass difference. In principle, I believe the lifetime is set by the masses (mainly mass difference) of the chargino and neutralino. I think you need to be clear that the lifetimes are changed arbitrarily and also give the mass difference (could just say 0.2 GeV).

Reworded to mention that more clearly. Typically the mass values would be included in the HEPdata entry because they vary, and space in a letter is too limited to include a full table.

L113: pTmiss and pTmiss no mu should be vectors

Fixed.

L128: Should say why |eta|<2.1 is used.

L131: need to specify the eta and pT requirements on the jets, perhaps in L103-107.

The pT is specified now. The jet eta requirement is |eta|<4.5, which for tracks with |eta|<2.1 is all of them so it's left out.

L157: Should describe hadronic tau reconstruction. Could be at the end of L91-102 where electrons, muons, and charged hadron reconstruction is described.

The PubComm does not give a recommendation for this, and typically hadronically decaying taus are included in the mention of charged hadron reconstruction.

L168,L177: Given that your special procedure removes 4% of the signal tracks, it is natural to wonder what fraction of the signal tracks are removed by the requirements of L159-168.

L179: Commas after "Firstly" and "Secondly"

Fixed.

L190: Should make it clear that leptons here refers to electrons, muons, and taus.

Okay.

L194, 196, 222, 231: The ordering of P_offline and P_trigger in L194,196 is different than in L222,231. Better to be consistent.

Fixed in in L193-197.

L204: I think you mean "excepting" rather than "expecting"

Fixed.

L214: I don't think you need the subscript "invmass" given that you define it that way in L213.

Removed.

L222: Change "condition" to "conditional"

Fixed.

L227: p_T^l should be a vector

Vectorized.

L234-238: This will need to be expanded to make it clear

Reworded somewhat to improve clarity.

L247: I don't think it is useful to mention a closure test with 2% of the data. I mean a 2% test may reveal something that is horribly wrong but it is not going to convince anyone that you know what you are doing.

Removed.

L339 and Table 3 caption: Suggest changing "signal yields" to "signal efficiencies"

Changed..

Table 4: I guess to match the text it should be "spurious tracks" instead of "fake tracks"

Changed..

Lots of the references need minor fixing. The main problems are - volume letter needs to go with title and not volume number: refs 2, 8, 26, 27, 30, 39, 40, 41 - only the first page number should be given: refs 2, 19, 30 - no issue number should be given: refs 8, 13, 31 - PDG should use the Bibtex entry given here: https://twiki.cern.ch/twiki/bin/view/CMS/Internal/PubGuidelines - ref 40 needs help

### Questions from Juan Alcaraz (July 5)

Regarding the Z (or ewkino pair) recoil correction, are you really performing the following two steps for the signal: 1) reweight from Pythia8 to MG as a function of the recoil pt; 2) reweight again the resulting signal MC according to the data/MC observed recoil spectrum in Z->mumu events? Also, let me ask again (probably you answer that at the pre-approval meeting, but I forgot): the data/MC discrepancy at lot dimuon pt was just de to the lack of MC dimuon events at low invariant mass ? (this should be irrelevant given the ISR jet cut used in the analysis, but just to understand).

This is correct, we apply both weights. In the ARC review of EXO-16-044 (in which Kevin Stenson was chair and will recall), it was noted that only applying the data/(MG MC) correction would only correct the MG distribution to that seen in data. As our signal is generated in Pythia, we need to correct Pythia's distribution to that of MG's first, otherwise the first correction is not applicable.

Yes, the discrepancy at low dimuon pt is driven by the drell-yan samples; in 2017 the samples available were M > 5 GeV, and M > 10 GeV in 2018. Yes, this is irrelevant given the ISR cut.

If one of the trigger paths has a tighter cut (5 hits) than the offline cut, why did not you redefine the offline cuts and required >=5 hits when ONLY that trigger path is fired ? I do not see any right to assume that we can count on an extra efficiency that does not really exist, even if it is small. Am I missing anything?

There is a non-zero probability that a track has multiple hits associated to it in the same pixel (9% in one signal sample for example). This allows tracks with only 4 layers to have >=5 hits and fire the IsoTrk50 leg.

As you suspect, the addition of this trigger has a small effect on the efficiency for the nlayer = 4 bin as shown in the left plot below (of course for the nlayers > 6 bin the effect of is much more larger, as shown in the right plot).

On a tangent matter: when can we expect to have any kind of 2018 results ? Despite the suggestion from the EXO conveners I am a bit uncomfortable with considering this step as a trivial top-up operation in an analysis like this one. We know by experience that each new year can give rise to new features and then change significantly the rate of pathological background events that we have to consider...

The above section will for now provide immediate updates for 2018 results. See also this recent update with recent updates in 2018 ABC.

## Pre-approval

Some questions and comments were given during the Aug 30 andSep 13 EXO long-lived working group meetings.

Could you measure the trigger efficiency using electrons instead of muons for 2018, as was requested for 2017?

See plots below. The small differences have a negligible impact on the analysis.

 2017 2018

A general comment/question regarding the HEM 15/16 mitigation of just vetoing MET phi in the affected range.

We agree that vetoing MET phi is a bit aggressive, but we have not found a jet-based veto that adequately mitigates the issue. But more to the point, this range of phi is not completely instrumented for MET reconstruction, and we will never be able to completely trust these events from a trigger standpoint. We see no evidence of lower lepton reconstruction efficiencies or higher fake track rates in the HEM 15/16 area, so we are confident the only issue is in the normalization of the fake estimate. We propose to use the MET phi veto for our result, and provide the un-veto'd background estimates in an appendix in our note.

### Additional pre-approval followup Ivan Mikulec HN June 21

In your answer to (4) the plots show only the correction related to Fig. 37 in the AN. First, it is a bit surprising that there is a residual MC overprediction at high pT (right plot on the twiki) and the effect on recoil (left and middle figure) is marginal. Second, and more importantly, in the Pythia/MG part of the correction which is in Fig. 38 of the AN, it seems that Pythia does not generate enough high recoil events, so the resulting weight on high recoil signal (>~250 GeV) seems completely saturated. Do you have convincing arguments that this is not an issue?

The residual MC overprediction is an artifact of the fact that we reweight the background MC evaluating them as a function of GEN-level electroweak-ino pair pT in our AMSB signal, not as function of reconstructed di-muon pT that is plotted.

For Pythia/MG, you are correct that Pythia does not generate enough high recoil events. This is a well-known feature of Pythia and one of the main reasons why MadGraph was developed, and why such a correction is necessary to correctly describe the AMSB hypothesis. Yes, it does result in weights of 3--4 for events >~ 250 GeV, but this is a necessary correction, so we don't see it as an issue.

We find the first paragraph of the answer to (5) confusing. If most of the signal events are in the plateau, why not cutaway the turn on in the selection? Anyway, according to Fig. 36 in the AN, quite some part of signal is in the turn on. If this is the case, we find unbelievable that you can be confident about your efficiency in the middle of the steep turn on to relative uncertainty of the order of 0.5%. We still think that a check with different datasets might provide some handle on the related systematics (position and slope of the turn on). We hope that ARC can pay attention to this issue. We are fine with the second paragraph of the answer.

Since MET from ISR is only needed for the trigger strategy, as a search for the disappearing tracks signature we wish to keep as much acceptance as possible. The small uncertainties you mention are those from the statistical uncertainties in the data and SM background MC efficeincy measurements, and are small due to those samples being very large. We recently added Section 8.2.11 and Figure 44 to the AN in version 7 which introduces a signal systematic for the shorter (==4, ==5 layers) track categories due to the turn-on region for those. Only about 10% of the signal is on the turn-on so even a 10% uncertainty in the turn-on region only results in a 1% yield systematic -- this new AN section resulting in a 1.1% and 0.5% systematic for ==4 and ==5 layers respectively. In the next version of the AN we will combine all of the trigger signal systematics into one section to make it easier to read.

Also as requested we measured the trigger efficiency in data using electrons instead of muons; see the below plots. One can take the ratio of these efficiencies and apply them as a weight (as a function of MET) to derive another signal systematic on the signal yields. This would give a 2.7-3.2% downwards systematic across the NLayers categories, using 700GeV 100cm charginos as an example. The analyzers feel however that this is not appropriate to use, because the chargino signature is muon-like in the tracker and electrons introduce hit pattern effects due to conversions and bremsstrahlung which would not affect the signal.

### Questions from pre-approval EXO June 1

Thanks a lot for a comprehensive preapproval presentation. Overall the analysis is in good shape. Here is the list of comments/questions that came up during the preapproval

(1) The MET+IsoTrk trigger requires at least 5 hits on the isolated track whereas the analysis starts with short tracks with 4 hits. Please show the trigger turn-on curves for the signal for the different bins of number of tracker layers considered in the analysis, and compare with the turn-on you get with the single-muon events. It would also be good to see the turn-on curves separately for the MET+IsoTrk trigger and the other MET(NoMu) triggers.

See below for several plots, which will be added to the analysis note. Some of these are relevant for (5) below.

(2) The uncertainty on the P(veto) estimate is set to ~10-15%. However, we cannot verify this in the closure test due to lack of statistical power. Please demonstrate that the uncertainty on P(veto) is sufficient. Also, assess the impact of this uncertainty on the analysis. Do the results change significantly on inflating this uncertainty ?

We studied the ratio of the pt of tracks after to before applying the lepton veto, and found that the statistics were too poor to determine a dependence of P(veto) on track pt; all pt-binned values are consistent with the average over all pt. We attempted anyways to fit the pt-binned ratios to a linear function, and found in one case a linear dependence could increase one background by 17.4%, but in all other cases the result was actually a decrease in backgrounds or no change at all. The fit uncertainties were very large. Even in the worst case assumption of a ±17.4% uncertainty on all lepton background estimates, there was no discernible change in our upper limits.

As such we find that the pt-average value of P(veto) used for the estimates is consistent with any possible pt-dependence in the statistically limited data we have available.

(3) Slide 23 : The 1.9% uncertainty can be dropped. Please rebin the track d0 distribution when showing the Gaussian fit.

The 1.9% has been dropped and the plots rebinned in the AN.

(4) Slide 28 : Show the data/MC comparison of the recoil (METNoMu) distribution before and after the corrections.

We attach those plots here. Showing after the corrections is actually not completely trivial, because we apply the ISR weights as a function of the sum-PT of the electroweak-ino pair — of which there is no pair in SM background MC. We felt the most correct procedure to show these figures after a correction was to correct only drell-yan, which has a clear gen-level muon pair to evaluate the weights at. As such the plots after the correction will not agree by construction as one might expect.

(5) Please reassess the systematic uncertainty on the trigger efficiency - it seems to be too small. One possibility would be to compare the trigger turn-ons between single-muon and single-electron events. Also make sure that any potential systematic due to the fact that the number of hits requirement in the IsoTrk trigger is tighter w.r.t. offline selection is also taken into account.

Taking a closer look at the values involved, we believe they are correct despite being small. This is because the bulk of the selected signal events have larger MET well onto the efficiency plateau where the scale factors and SF uncertainties are smallest — see the below plot for one signal sample. In that plot for example the data efficiency just above our offline requirement at ~122 GeV, the data efficiency is 5.680 ± 0.026, a relative 0.46% error; however a very small fraction of the accepted events receive that scale of systematic, so the total systematic is very small.

For the tightness of the IsoTrk50 leg wrt the 4 and 5 layer categories, we do see a difference in the turn-on since the IsoTrk50 path has the lowest MET threshold and has reduced efficiency. Unfortunately however we cannot measure the trigger efficiency in muon data and SM background MC as 4- and 5-layer muons tend not to exist with reasonable statistics. It's not precisely the correct thing to do, but without a data measurement in 4- and 5-layers the best option we see is to take a very, very conservative systematic based on the difference between trigger efficiencies in signal for the 4/5 layer categories to the 6+ layer category -- e.g., that would cover any difference between data/simulation in those samples if the difference was as large as it was between nLayer categories, which is quite conservative. This would result in an average signal systematic of 1.1% (4 layers) and 0.5% (5 layers).

(6) Currently, a veto is applied on candidate tracks overlapping with any reconstructed leptons. Please quote the signal efficiency for this veto. It would be good to ensure that we do not spuriously lose events in data by matching tracks to some mismeasured objects that are not simulated well. Please check the impact on P(veto) of requiring at least some lose selection (e.g. requiring the muon to be a PF muon or a loose muon).

This suggests, quite correctly, that we use a lepton selection that will have some scale factor between simulation and data for its efficiency. We measure this scale factor relative to the scale factors published by the Muon and EGamma POGs for the loosest available approved selections: loose muons and “veto ID” electrons. As we measure a scale factor relative to the loose SFs, the product of these two are the relevant factor.

We propose using these measured scale factors as a systematic on the signal yields. In the 4- and 5-layer bins these would be at most -6% and -3% systematics respectively, and the rest being well below -1%. Compared to the ISR weight systematic of roughly 9%, these will be largely inconsequential.

(7) For the strong production switch to the NNLO cross sections.

We have switched to NNLO(approx)+NNLL listed here

Concerning the the next steps, as discussed at the preapproval, once you address the comments on the P(veto) uncertainty and the impact of some loose ID on leptons used for veto i.e. (2) and (6), we can proceed to the unblinding of 2017 data.

### Juliette Alimena (v0) May 22

Thank you for the paper draft v0. I find it to be fairly complete (besides the 2018 data, which of course we know about). I have just a few very minor comments (which should not be taken as requirements for preapproval, although having them implemented by the time of the ARC review would be great).

I understand that you want to target PLB, and so have restricted the number of figures, but I think it’s a little unfortunate that the only figures currently in the draft are limit plots. Maybe the easiest thing to do is to consider what figures you might also want to be made public in supplementary material. Think of what figures it would be nice to have when presenting this search at a conference. You might consider: Feynman diagrams, a sketch and/or event display of the signal in the detector (figure 3 or 4 in the AN?), 1 or 2D histograms of key variables (perhaps Figure 1 in the AN?), etc.

Figure 4 of the AN is a fairly classic one to present when discussing this, and as opposed to a Feynman diagram of AMSB is more signature-driven. But as you've pointed out we've possibly limited ourselves to much in our first draft, and will be considering what to add to it or what supplemental material is best.

The last 2 sentences of the paragraph starting at L60 and the one starting at L68 are nearly identical (see L64-67 and L72-75). Please consider writing the information once, but making clear for which signals it is applicable.

Some sentences in this section have been rewritten for increased clarity and without repetition.

You need to define “PF” as an abbreviation for “particle-flow” when it is first mentioned on L95.

The acronym is now included/defined on line 95.

Figure 1 is not mentioned in the text.

Although it could be nice to mention the new 2017+2018 results in the Summary section, I think this section should again mention the full Run 2 integrated luminosity and quote your results in that case.

A short paragraph has been added to the summary to re-mention the Run II combination and the mass exclusions from the combination. I've left the results of the 2017+2018 in however as it helps clarify what is new about this publication.

References 10, 21, 37: You have mistakenly written “Collaboration Collaboration”.

Fixed.

References 27, 39, 40: We list only the first page number, not the range (see the style guide).

Fixed.

Reference 34: “sqrt{s}” mistakenly used parentheses instead of curly braces

Fixed.

Reference 37 seems unfinished.

Fixed.

## Object reviews

( - tobedone - inprogress - OK ) Reset

• jets
• MET
• electrons (high ET)
• muons
• taus
• StatComm questionnaire
• combine datacards check

### Anshul Kapoor (Electron object review)

You use Ele35 trigger and your minimum pT cut for electrons is also 35 GeV. Isn't that cutting too close? Since you have a compound procedure for applying trigger scale factors, I am not sure if can judge whether this choice of 35 GeV is trigger-safe? Do let me know if I am not understanding how these electrons are used in this analysis.

We are a bit on the electron trigger turn-on, yes, but our signal selection is only based on MET+track paths. The scale factors you mention do not concern electrons, we instead use muons as proxies for the track with pt>55 to be on the track leg plateau. When we measure only the MET/track legs of our main HLT_MET105_IsoTrk50_v* path for TSG studies, we use an orthogonal method again with muons as proxies for the track.

In the background estimation where we do use electrons, we find no dependence on pt of any measured quantity, so any shaping of the low-pt distribution due this turn-on issue will not affect the estimate. As long as the electrons we select are of a high purity -- we use a tag-and-probe technique with opposite sign subtraction to ensure this -- then it does not matter to us if we've missed a small number of electrons close to 35 GeV.

Few unrelated typos I noticed while reading the AN,

Abstract: calorimter -> calorimeter

Line 1077: entireity -> entirety

Line 893 and 978 (two places): betweeen -> between

And thank you for the careful reading for typos, when the AN is un-freezed after pre-approval we will correct these.

### Benjamin Radburn-Smith (Muon object review)

I have gone through your AN2018_311_v6. I cannot see any issues from the muons side and therefore give a green light from us. However, for my interest, I cannot see why you choose muons with pT>96 GeV for your TnP muon reconstruction inefficiency study (Table 16). Could you please explain this pT choice?

"pt > 96 GeV" in Table 16 is an unfortunate typo, and was also noticed by another object review. The real value is "pt > 29 GeV", the same value used throughout for muon-related selections. Sorry for the confusion!

### Klaas Padeken (Tau object review)

Hi, this is a very nice analysis and a thorough handling of the tau veto. I would just point out one thing of which you are probably aware, that requiring the tight isolation with cutbased and MVA discriminators does not have the highest efficiency. But since you are vetoing on taus, this means a higher background contamination. I also see that this allows the background studies, which use the tau trigger to use the same selection, which is needed. So as I said I just wanted to point this out.

But I did find one information missing in the AN. When you use the leptons as a veto, which pt threshold is used. There is an implicit cut, if you assume that the missing signal track is part of the tau, but it would be great to document this explicitly.

One other not tau related question and coming from my pixel perspective, do you see the effects of the stuck pixel TBMs and the dead DCDCs?

In the lepton veto, we do not explicitly make any pt or ID requirements and reject tracks that are near to any PF lepton. However indeed there is an implicit cut for the object to exist in MINIAOD in the first place. For taus we use "slimmedTaus" which requires pt > 18 GeV and passes "decayModeFindingNewDMs". We typically think in terms of "any lepton" but we see that we can mention the PAT slimming cuts explicitly in the AN, and will add this.

On the efficiency of our hadronic tau ID, an imperfect efficiency rather works in our favor to include the contamination you mention. We apply the electron and muon vetoes when studying the tau background, so we are not concerned about cross-contamination from the backgrounds we do present. This non-tau contamination could still fake our signal in the same way as real hadronic taus; our P(veto) estimate would include this type of event, and our tau control region we use to normalize the overall rate will as well. We just do not make any distinction about the tau purity and just call it the tau background.

For the pixel perspective, we definitely lose candidate tracks due to stuck TBMs and dead DCDCs due to our very tight hit pattern requirements, but these holes should appear fairly randomly and in small amounts compared to the whole. We don't see any dependence of efficiency on pileup for example, as a stuck TBM early in a fill could bias us to before it became stuck. We haven't examined the offline track occupancy on a fill-by-fill basis for our selection to see stuck TBMs being cleared, as another example, since that would just just repeat the work DQM does. We also correct the missing hits distributions of our signal samples to the data, which would account for this effect in the simulation.

### Chad Freer (MET object review)

This analysis looks very nice. I have a question regarding Figure 25. Can you explain the discrepancy between the total yields from the SingleElectron and the SingleMuon. The SingleElectron is emulating the same selection by removing the electron from the MET calculation so it is good that the distributions look similar, but i don't understand the difference in yields considering the trigger efficiency is high in both datasets. Is this coming from the EcaloDR<0.5 selection?

The largest difference between single-electron and single-muon selections in the search are due to the triggers available in 2017; electrons we require pt>35 GeV and muons we require pt>29 GeV as per the leptonic sections Tables 15 and 16 (we see now a typo in Table 16, it's 29 GeV not 96 GeV!). There are other differences in efficiency from ID/isolation/more but the PT is the largest.

Also since you are using Type-1 corrected MET can you out the version of JECs that you use?

For the JECs, we get them from the event setup in data using global tag 94X_dataRun2_ReReco_EOY17_v6, and in simulation using global tag 94X_mc2017_realistic_v15, with the AK4PFchs payload. According to CondDB this should retrieve the tag "JetCorrectorParametersCollection_Fall17_17Nov2017BCDEF_V6_DATA_AK4PFchs" for data and "JetCorrectorParametersCollection_Fall17_17Nov2017_V8_MC_AK4PFchs" for simulation.

### Eirini Tziaferi (Jet object review)

As the EXO Jet Object contact I have read through your document AN2018_311_v6. Things looks ok as concern jets, however since I could not find this information in the AN, I would like to ask you what is the JEC and which are the resolution scale factors you used.

In 2017 data we use the global tag 94X_dataRun2_ReReco_EOY17_v6 which retrieves from the event setup: JetCorrectorParametersCollection_Fall17_17Nov2017BCDEF_V6_DATA_AK4PFchs We just use "slimmedJets" from MINIAOD without additional manipulation. In 94X simulation we use the global tag 94X_mc2017_realistic_v15 which gets: "AK4PFchs" -- JetCorrectorParametersCollection_Fall17_17Nov2017_V8_MC_AK4PFchs "AK4PFchs_pt" -- JR_Fall17_25nsV1_1_MC_PtResolution_AK4PFchs "AK4PFchs" -- JR_Fall17_25nsV1_1_MC_SF_AK4PFchs.

## Questions from subgroup conveners

### Juliette Alimena email comments on AN v2, 19 Feb 2019

L51: The symbol \ptmiss is not defined.

We've now defined this at its first mention.

Including the PF reference at L147, and the anti-kt and fastjet user manual references at L195, would make your paper preparation that much easier. Also “PF” has not been defined on L147.

These references have been added, and PF is defined at its first mention.

Table 9 caption: You write that these are the triggers used for the background estimation and systematic uncertainty evaluation, but aren’t at least the MET triggers also used to collect the data in your signal region?

You are right, and the caption has been made more clear in stating which data sets are used for which purposes.

L313, L328, L650: missing figure references

These lines have been modified to avoid the missing references.

Could you gain sensitivity if you re-optimized any of your selections? I imagine the last time they were tuned was for the previous version of the analysis. For example, if I look at Figure 14: could you bin more finely in E_calo and cut out more (W to lnu) background? Or if not on this variable, are there any other likely candidate cuts that can be retuned?

As our background is data-driven, Figure 14 is mostly informational rather than a method to optimize; the poor statistics makes this fairly difficult to use for that purpose. Optimizing Ecalo in data however, we see as not promising because it primarily removed electrons and is already very strict at 10 GeV, and changing by a few GeV would mostly just be cutting into pileup calorimeter deposits rather than real backgrounds. There's also the practical issue for us that changing the Ecalo cut would change our fiducial maps, which would have a large impact on our selection and the workflow of the analysis. To answer the more open-ended question of how we can overall increase our sensitivity, we look firstly at our large fake track estimate in the shorter nLayers bins -- especially considering ATLAS's 2015-6 results right now are still better at low lifetimes than even our 2015-6-7 combined result. We've found a series of well motivated cuts that increases the purity of fake tracks in our 3-layer transfer factor control regions, and leads to a much more reasonable estimation of this background. We plan to update you on the details of this on Friday.

Done.

You have some sort of typo in Section 7.2.1 (after L682) that makes most of the section unreadable.

There was an errant "\$" causing this, and that's now fixed.

Section 7.2.9: You say you don’t yet have the track reconstruction efficiency study from the tracking POG for 2017. Would the study presented in https://indico.cern.ch/event/768528/timetable/?view=standard#2-tracking-in-run2-performance meet your needs? Hopefully it will soon be approved by the Tracking POG.

This is precisely what we need. For now we have added the relevant plot, but since it's not yet in CDS we just have a footnote to this presetation. Eventually we will promote that to a CDS reference.

Figures 42-45 inclusive are not described in the text.

A brief description of these figures has been added to the text.

All the appendixes except B.1 are empty? Appendix D would be particularly interesting to see.

They were indeed empty, but have now been given text.

I’m trying to compile what you still have left to do on the analysis (before unblinding). Is my list below complete? finish some systematic uncertainties implement trigger scale factors (L615)

The trigger scale factors are actually implemented in the signal yield, it seems we saw "XX%" and overlooked them thinking they were systematic uncertainties -- these numbers on L615 are now updated. The remaining analysis tasks are to investigate possible optimizations as mentioned above, and to finish the signal systematics.

Please add what you told me in your answer about the Ecalo selection to the note, namely that it primarily removes electrons and optimizing this criterion to increase the search sensitivity will be ineffective because it will be primarily be cutting into pileup calo deposits rather than real backgrounds. Adding this to the note will help anyone else who reads it and could have the same question that I did.

We've added an additional comment towards the end of Section 4.3 to clarify this.

### Steven Lowette email comments on AN v2, 23 Feb 2019

On another topic:
I'd like to understand how much time you think it would take to add 2018 data, keeping in mind this can happen during the review that we have started now (in particular since you are so data-driven). I don't see an obvious showstopper from reading the analysis note, but I remember there were technicalities involved, though forgot the details. Did you already request 2018 signal MC? As you know, the MT2 analysis is moving forward rapidly with a paper that will have full 2018 data. It's a different interpretation, but it's fishing in the same pond, and referees (and maybe internal in CMS too) may ask to compare.
To have arguments for or against adding 2018, it would also be necessary to understand the comparison with the latest ATLAS result.

As Yuri mentioned on Friday, it would probably take in the ballpark of 6 months to complete the analysis of 2018 data. We have 2018 signal MC samples already requested, although they have not yet begun on even 2017 samples. The technical issues affecting our timeline are a few:
We use a custom data format which essentially saves the generalTracks, and computes the ECalo for our tracks from AOD. There is a first step to the analysis, to calculate the fiducial maps for our selection. This adds another round of jobs over the single lepton datasets. We clean our lepton background estimates by pulling the small number of selected events from the RAW data tier. For 2017 data this took longer because the RAW datasets were not hosted anywhere we could submit jobs.
As for the comparison to ATLAS' latest public result, our updated expected exclusions shown on Friday compare powerfully to ATLAS' 2015-6 results that they've published. With even just 2017 data, we very slightly beat them at ctau = 1cm, and are much, much more sensitive to higher lifetimes than that as we were before. The combination with 2015-6 results improves this slightly as well. So currently above 1cm we are better.

L67: nitpicking detail, but it reads as if just an extra layer was added, while the whole detector was replaced with all layers in different positions.

A good detail however; this is now noted in the text.

Fig 5, caption: "with the dependence ... as described in the text" -> where is this described?

The caption now correctly points to Section 6.3.

Fig 5: the turnon gets fully efficient in the plateau; so is there no inefficiency from the track leg vs some offline selection? Or is this factored out and described elsewhere?

The track leg's efficiency is not at 100%, which is a known issue for some time now. But Figure 5, and the corrections applied to signal described in Section 6.3, are calculated after requiring a pt > 55 GeV track and the OR of all the triggers. So any loss in efficiency due to the track leg has already been taken into account by the selection.

Figs 9, 10: what's the difference between the open and closed circles? I didn't see that described.

There are actually no closed circles, if you look closely the color of some bins are just close to the color of the circles. I've made this easier to see by making the circles green, which should stand out more.

L273: actually, this cannot be seen. I guess the figure has the majority of them already cut out.

At the time we only had this plot with the jet requirements already made, however this plot is now correct and the caption has been changed to reflect that. The edge you saw at jet pt 110 GeV from the requirement is no longer present.

Tab 17: what are the "lepton veto requirements"?

The JetMET-recommended jet ID is referred to as being tight with lepton vetoes, or "TightLepVeto". Over the tight ID it additionally cuts on the muon energy fraction and the charged EM energy fraction.

L298: is this primary vertex always matched to the main track of interest? It would be good to make this explicit.

It is. L298 is however describing a slightly different requirement on other tracks included in the track isolation sum, so we feel a more appropriate place to clarify this is in Table 18; "(w.r.t. primary vertex)" has been added to the two vertex requirements there.

%Green%L301: "We show several plots" -> where?

This paragraph referenced several figures we removed; the text is now cleaned up.

Tab 19: wouldn't it be useful to keep good tracks from b decays, where dxy>5sigma?

Since this selection applies to tracks included in the isolation energy sum, this would indeed ignore nearby displaced tracks from heavy flavor decays. However we do need some method of mitigating pileup, and this method actually includes slightly more than the standard PFIsolation methods since those match vertices by reference instead of dxy. What should remove tracks too close to heavy flavor decays then is another cut, the deltaR between the candidate track and the nearest jet.

L313: broken \ref

The reference has been removed.

L328: Figs 15 and ?? -> 14 and 15

Fixed.

L337-338: you speak of "very few layers" but actually the biggest effect is for the long lifetimes. How does that add up?

I'm not sure I understand your comment about the "biggest effect" -- the version of the AN you had for this question did not have a legend for Figure 16, perhaps. But Figure 16 does show that for 10cm samples there are drops in efficiency for the pt, numberOfValidPixelHits, and track isolation requirements, and these are indeed because not all of the shorter tracks are successfully reconstructed. We can remove the word "very" as this is a relative quality.

Fig 16: the 1jet pT>110 cut would be more logical much earlier in the chain, when you apply the other jet cuts.

This was an issue in our implementation of the MC-smeared pt cut, when replacing the regular non-smeared jet.pt() cut. This has been fixed now and the ">= jets with smearedPt > 110" cut is in the correct place.

L350, Sec 5: do you expect no background from charged hadrons with little calo deposit because of dead ECAL cells etc?

As we veto tracks near dead ECAL cells (see Figure 8) we do not expect this background.

Fig 19: why does the control sample in muons have much more events than electrons; because of MetNoMu?

The principle reason is the trigger thresholds available in 2017. Due to these we need to require pt>35 GeV for electrons but only pt>29 GeV for muons. The muon selection efficiency was slightly higher in 2015-6 but this additional difference increases the disparity in Figure 19.

Fig 19: why is the rejection so much stronger for the muons than for the electrons?

Recall the rejection is from vetoing any quality lepton, so it is an extremely loose definition of electrons/muons. This selection efficiency has been seen in 2015-6 as higher for muons than electrons, and we see the same in 2017 data.

Fig 22: I suggest you add the red boundaries also here.

Fig 25: can you add the muon and tau projections too (the right hand side plot I mean, the left one is not needed).

This figure now just has the three lepton flavors' projections, instead of the illustration concept.

L463: "N^l_ctrl": maybe this question should be obvious, but I tried to wrap my head around it and couldn't figure it out: it seems to me that you are missing in the estimate the leptons that did not pass the lepton iso/ID of the control sample selection, but that do lead to a track passing the selection. If it would be included but I missed it, then I could imagine the P_veto to be different for these leptons, indeed the veto to work worse and more background to pass?

From an operative perspective of just how to accomplish that, without the presence of a reconstructed (however poor quality) lepton, we can't infer the presence of a lepton in data to make any real distinction. The only place you can make a distinction is in SM background MC, which we do in the closure test.
From the perspective of overall strategy, the leptons you refer to should be present -- in a rate proportional to the total rate of single lepton events, and with probabilities to pass the search requirements equal to the probabilities listed on line 460. So the estimate in equation 11 (line 466) is designed to estimate these events which are really and truly the background to our search, where N^l_ctrl is not really a background since their tracks are from well-reconstructed leptons. Put one more way, N^l_ctrl is only necessary to normalize the background estimate to the expected number of single lepton events.

L511-513: even after reading, I still don't understand why this dxy sideband is needed. Maybe you can demonstrate how the non-displaced selection is not working?

We discussed this at length in our update on Friday. We were previously using the BasicSelection (the MET dataset) to derive a systematic on the fake estimate, so you needed to avoid using the well-vertexed tracks since that would be our signal region. As we now avoid using that due to potential signal contamination, we will need to avoid this peaking element in the 3-layer track sample at low fabs(dxy). The question was asked, why can we not just use the number of Z+track events while applying the fabs(dxy) cut, and our answer (which we've now confirmed as true) is that this would give precisely the same answer, but with very poor statistical uncertainty. So in short, using the 3-layer track samples greatly improves our statistical uncertainty while giving the same answer, and in order to use the 3-layer track sample we must use the dxy sideband to avoid this tracking issue only present in 3-layers.%ENDCOLO%

Tab 34: Is it a typo that the Z->ee P_fake for n=4 is a factor 10 off wrt the Z->mumu and the basic one? Isn't the point that these are expected to be the same?

This was indeed a typo. The Z->ee value for n=4 should have been 3.0 * 10^-5. This is now fixed, but also changed since our fake estimate method has changed.

Tab 34: what are the MC truth values for these P_fake values - or do you lack statistics to estimate them?

We had not done a closure test of the fake estimate in MC, but now that one signal region is for 4-layer tracks, the statistics is better and we now are able to do this. We have added a section to the AN detailing this, and provide the P_fake values in MC truth you suggest.

Tab 35: averaging the 3 estimates seems a bit random. The "basic" category is the one that really matters for the analysis, if I understood correctly, so why not take that one as your real estimate, and use the other two for validation and systematics? Or if you want to avoid the statistical uncertainty, use Z->ellell with the basic as validation.

As mentioned on Friday, the basic selection is now very dangerous to use since 3-layers and 4-layers are not so different, and this presents signal contamination. Also with our updated fake estimate method, the systematic is now much lower and we do not have to use the average value to obtain a reasonable systematic.

Tab 35: also here, for information: what are the truth MC values?

See the above answer; these values are now in the AN.

Fig 28: can you have same backgrounds have same color, and order them the same between the plots?

Fixed.

L579: "excellent agreement": rather "better" or "good agreement"

Changed to "good".

Tab 37: I'm surprised you don't also apply the pT>110GeV requirement, so that you are closer to your analysis phase space when you estimate the correction factors. The corrections are big, and I'm worried the inclusive phase space you consider may miss a dependence on the further analysis selections. Can you check whether adding the cut makes a difference?

If we require the jet requirements here, that would bias the MET distribution and we would not as successfully measure the trigger efficiencies. You are right that with the jet requirement, our samples populate a different region of Figure 30, but what we need in Table 37 is to measure the overall efficiency of the trigger requirement to derive scale factors well into the turn-on curve.

Fig 33: this correction is huge and it leaves me a little uneasy (but maybe I shouldn't be?). Was it not an option to have your signal generated with an extra jet in MadGraph?

Our choice of Pythia8 is mostly historical, but we also have not validated anything in MadGraph. As for the corrections, their size actually does not depend on the application (or not) of a gen-level ISR filter in our signal samples. The corrections are due to an apparently sizable difference in the hadronic recoil between Pythia8 and MadGraph within the same tune (CP5). So applying a gen-level ISR filter would only increase our signal efficiency/statistics in regards to what is generated in the samples, but those filtered events would still populate the same, higher recoil phase space where the disagreement between Py8/MG seems to be larger. In the end we feel the gen-level ISR filter is not necessary, as we are content with the statistics selected from these signal samples already.

L647: the 28% is smaller than the stat uncertainty. Does the stat uncertainty still go separate as well?

With our updated fake estimate method presented on Friday, these values all decrease. But for your question we consider the statistical and systematic uncertainties separately and completely uncorrelated when put into the limit datacards.

L650: ?? -> 26

Fixed.

L652: "100%": this demonstrates the randomness of this systematic. You can always change the dxy range until you run out of statistics, and take a big hit in the systematics. I think this 100% creates a fake "we're conservative" feeling, while I'd argue it's not a real systematic: all the values in Fig 26 for same color are compatible with one another, so why assign an uncertainty at all? I think what does require a thought-through systematic, on the other hand, is the use of the transfer factor measured in ==3 tracker layer tracks. There's an assumption that goes in here that the measured value is applicable in your phase space, but there is no validation or systematic to deal with that.

Since our fake estimate has changed and has different considerations to make, this 100% is reduced. As well rather than progressively narrow the sideband which as you point out decreases statistics, we now do the opposite and progressively widen it to test for this dependence. We have also decided to move away from using the ==3 layer tracks, as the presence of this central bias peak makes comparison with ==4 layers very difficult. Using ==4 layer tracks for the transfer factor for ==5 and >=6 numerically does not change the estimate in any case, and improves this particular plot.%ENDCOLOR

L671: my memory tells me we expect on average about 3GeV deposit for a muon traversing the calorimeter, so it may not always be completely negligible. But ok, smaller than the electron case, agreed.

In the future this is something to investigate, but for now we exclude it as it will be a very very small uncertainty.

p67, bottom: math mode mess

Fixed.

L718-719: how much? Please mention here.

These values hadn't yet been calculated. They are now and are quoted in the text. They are below 1.2% for the ≥6 layer bin and less than 0.003% in the =4 and =5 layer bin.

L723: I didn't really understand the procedure. What is this 1 sigma?

This is 1 sigma of their statistical uncertainties, so the error bars in Figure 30. The text has been changed to say this more clearly.

L731: that must be a huge factor, and I don't think it makes sense to take the change as a systematic. The pythia description of the ISR is just not good at such high pT, so why base a systematic on it? I'm sure it's overly conservative.

The AN actually was incorrect here. Rather than removing the weights, the systematic here is taken by fluctuating the weights shown in Figurs 32 and 33 up and down by their statistical uncertainty, and comparing the change in signal yields. So the uncertainty is not on applying or not applying this, but in the sample sizes used to derive the weights. This does end up as a roughly 7% uncerainty.

Table 45: the n=4 region has a large number of expected background events, with a large systematic. In a simple S/sigma(B) sensitivity metric, you get S/sqrt(B+DB^2) = S/sqrt(191+6400). Thus, the systematic completely dominates the sensitivity. In general, underlined by my comments above, I have the impression it's hard to do better than guesstimating that systematic - but the sensitivity at low displacement crucially depends on it (shown also in Fig. 42). It's an uncomfortable situation. Better would be if the background could be significantly further suppressed. And if I look at Fig 29, it seems to me there is an obvious kinematic variable you can use further: the track pT. Although I'm not sure how that looks like for fake tracks (can you add a plot?). Can you further suppress the background this way, or in another way, and put similar or better sensitivity on more solid ground for small displacements?

As mentioned on Friday we have also noticed this and have made a successful effort to reducing the overall background as you say. So perhaps this question is not so needed with this updated estimate. On the possibility of using track pt as a discriminating variable, this would be certainly be effective for some chargino masses, but we consider a wide range where the track kinematics may not benefit from this in some regions. We also try to keep the search quite general, and make an effort to remind readers that the AMSB limits we show are provided only as a benchmark and not a specific motivation for analysis optimization.

-- BrianFrancis - 2019-03-07

Topic attachments
I Attachment History Action Size Date Who Comment
pdf AN_18_311-3.pdf r1 manage 14258.2 K 2019-09-10 - 15:33 BrianFrancis
pdf AN_18_311_currentVersion.pdf r2 r1 manage 13923.2 K 2019-09-03 - 14:59 BrianFrancis
pdf compareCalo_wjets.pdf r1 manage 13.4 K 2019-11-12 - 21:18 BrianFrancis
png compareCalo_wjets.png r1 manage 66.5 K 2019-11-12 - 21:20 BrianFrancis
pdf compareCalo_wjets_ratio.pdf r1 manage 13.0 K 2019-11-12 - 21:18 BrianFrancis
png compareCalo_wjets_ratio.png r1 manage 48.8 K 2019-11-12 - 21:20 BrianFrancis
png compareDatasets_GrandOr_METPath_MuonElectron.png r1 manage 116.8 K 2019-07-08 - 23:33 BrianFrancis
png compareDatasets_GrandOr_METPath_compare2018.png r1 manage 109.3 K 2019-09-22 - 20:03 BrianFrancis
png compareDatasets_GrandOr_METPath_compareNLayers_700_100.png r1 manage 122.7 K 2019-06-18 - 17:31 BrianFrancis
png compareDatasets_HLT_MET105_IsoTrk50_v_METLeg_100cm_NLayers6plus.png r1 manage 116.2 K 2019-06-18 - 17:31 BrianFrancis
png compareLeptonVeto_matchedTracks.png r1 manage 66.3 K 2019-12-10 - 00:27 BrianFrancis
jpg comparePU_fakeCRs.jpg r1 manage 113.9 K 2019-08-03 - 01:10 BrianFrancis
jpg comparePU_fakeCRs_ratio.jpg r1 manage 125.5 K 2019-08-03 - 01:10 BrianFrancis
png compareTriggerHits_candTrk.png r1 manage 59.9 K 2019-12-08 - 14:53 BrianFrancis
png compareTriggerHits_isoTrk.png r1 manage 66.8 K 2019-12-08 - 14:53 BrianFrancis
jpg compareWithWithout700GeV100cmNLayers4.jpg r2 r1 manage 104.7 K 2019-07-09 - 21:12 BrianFrancis
jpg compareWithWithout700GeV100cmNLayers6plus.jpg r2 r1 manage 103.4 K 2019-07-09 - 21:12 BrianFrancis
png diMuonPt_correctedDY.png r1 manage 138.1 K 2019-06-18 - 18:51 BrianFrancis
png metAndTriggerEff_700_100.png r1 manage 174.0 K 2019-06-24 - 17:05 BrianFrancis
png metNoMu_correctedDY.png r1 manage 187.2 K 2019-06-18 - 18:51 BrianFrancis
png metNoMu_uncorrected.png r1 manage 182.7 K 2019-06-18 - 19:54 BrianFrancis
png ratio_of_efficiencies.png r1 manage 108.7 K 2019-07-08 - 23:33 BrianFrancis
png sidebandProjections.png r1 manage 69.3 K 2019-08-13 - 18:08 BrianFrancis
png tfFlat_ZtoMuMu_NLayers5_v2.png r1 manage 16.5 K 2019-08-28 - 18:36 BrianFrancis
png tfGaus_ZtoMuMu_NLayers5.png r1 manage 18.4 K 2019-08-28 - 17:58 BrianFrancis
png tf_ZtoEE_NLayers5.png r1 manage 15.1 K 2019-07-17 - 20:44 BrianFrancis
png tf_ZtoMuMu_NLayers5.png r1 manage 16.7 K 2019-07-17 - 20:44 BrianFrancis
Topic revision: r94 - 2019-12-11 - BrianFrancis

Webs

Welcome Guest

 Cern Search TWiki Search Google Search Main All webs
Copyright &© 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback