order---+ Review of TOP -21-014


(Analysts: Please do not use .png format for plots on the twiki-- they do not show up on Safari browsers.)

Color code for answers to reviewer questions:

  • Green -- we agree, changes to analysis/documentation implemented.
  • Lime -- we agree, but the item hasn't been done yet. (Open item.)
  • Red -- we disagree, changes to analysis/documentation is not implemented.
  • Teal -- we agree, but we don't think any change to analysis/documentation is needed.
  • Blue-- authors/ARC/conveners need to discuss. (Open item.)

Explicit green-lights from experts

Category Name Status
Conveners   DONE
Combine   Done

Comments following Final Reading

Follow up comments from Kevin

Every analysis is supposed to have a HEPData entry, with a BiBTeX entry in the paper and a reference to it in the introduction in the form of a sentence like: "Tabulated results are provided in the HEPData record for this analysis~\cite{hepdata}.” I think this sentence belongs at the very end of the introduction. More information can be found by searching for “HEPData” in https://twiki.cern.ch/twiki/bin/viewauth/CMS/Internal/Publications

The sentence has been added to the Introduction although the hep data reference still needs to be identified...The end of Section 6 or the Summary seems like a better place for the sentence, if it must go in the paper.

Whenever you use an operator like =, >, < that only has an argument on the right side, it is a unary operator and the spacing is changed so that the operator and the value on the right are right next to each other. The proper LaTeX way is: {>} 900\GeV. You can search https://twiki.cern.ch/twiki/bin/viewauth/CMS/Internal/PubGuidelines for “unary” to see more. This shows up in the last line of the abstract, L76, Table 2 caption, Table 2 first column, L240, Figure 3 x-axis.

Changed in the text (except for L7 where Tom had requested {\approx}\,6). It's not clear that the Guidelines apply to table entries and figures, where the latter don't use latex formatting. For Table 2 we feel that column one looks better with the extra spaces (also with "750 -- 900").

L3: I don’t think “LHC energies” is correct because it is really dependent on both the energy and the colliding species. I would suggest removing the word “energies” or changing the sentence to “The production of \ttbar from proton-proton (pp) collisions at the LHC comes from gluon fusion process about 90% of the time, with \qqbar annihilation making up the rest.”


L73-75: This is a new sentence which says that AK4 jets are “removed” if they are within DeltaR<0.8 of an AK8 jet. What does this mean? Does this mean the AK4 jet is removed from the AK8 jet (I hope not)? Does this mean the AK4 jet is discarded from the event (it cannot be considered as a b-tag for the W-tag events or as any of the jets in a resolved event)? Please make it clear to me (and ideally in the paper).

Changed to: "Any AK4 jet with $\Delta R = \sqrt{\smash[b]{(\Delta\eta)^2+(\Delta\phi)^2}} <0.8$ from the closest AK8 jet, where $\phi$ is the azimuthal angle, is discarded from the event."

L128-131: I still think there is some ambiguity here. What do you do if an event has more than one t tag or more than one W tag? Do you discard the event? Do you try multiple combinations? Do you give up and look for AK4 jets? I think you need a sentence here explaining.

Added: "Events with more than one \PQt or \PW tag are discarded. "

Table 1 caption: Change “MC simulations” to “MC simulation” or “MC simulation contributions”

The plural form seems more appropriate to us since there are multiple rows - one for each simulation.

Table 1: Change “124” to “120”

Done. Assuming this is a significant digits issue.

L240-2: I think removing the equations defining rpos are a good idea. I think that the main reason to fit for A_C directly is because that is the parameter we are interested in and the returned uncertainties will be more accurate than trying to propagate them. My proposed sentence here would be “Instead, we fit for r_neg and A_C^fid directly to ensure that the uncertainty on A_C^fid is correctly estimated.”


Table 2: I think this looks pretty good. I think that the last column should be “Theory” rather than “A_C (theory)”. In the first column, there is a problem with a space before “GeV”. I think you can writ this as “\GeVns” rather than “\GeV” to fix this. I suggest a bit more horizontal space between the “Total” column and the “Theory” column. You can add more horizontal space by just adding a new (blank) column between them. I would generally reduce the amount of vertical space, especially between the 750-900 and >900 lines.

Changed to "Theory" and "\GeVns" and added the extra space before the theort column. We see extra spaces in column one (750 -- 900, > 900) as an allowed esthetic improvement over the main body spacing rules.

L255: Remove “also”


L258: You define \sigma in the Figure 4 caption as it is used in the figure. I don’t think you need to define \sigma in the text as it is not used in the text anywhere. I suggest removing “$(\sigma)$”

Leave in place as this was added at Tom's request.

Figure 3: The y-axis label is too close to the y-axis values.


Figure 4: I realize that the x-axis is currently in %. I would suggest either changing the label to “Impact on A_c (%)” or dividing the x-axis values by 100. In any case, I think the font size of “Impact on A_C” should be increased so it is as large as the values.


Figure 4 caption: You write “dominant” but you include contributions that can’t even be seen. Are there any systematic uncertainties you don’t include? If not, then please remove “dominant” (and also from L258).

Dominant removed. The only ones that are not degligigle are the MC stats and that is mentioned as not being included

Figure 4 caption: I think that “inclusive” is referring to the M_ttbar>750 GeV. If so, then it is redundant. If not, then you will confuse this reader at least. I suggest removing “inclusive”. In fact, it may be useful to replace “inclusive” with “full phase space” as that piece of information is less prominent (a reader has to distinguish between A_C and A_C^fid).

"inclusive " replaced with "full phase space".

L280: Change “The result” to “This result”%

"The" seems stronger to us. ("This" has the implication that there might have been other results.)

Follow up comments from Tom

72 Is Francesca's question about AK4 and AK8 jets really answered with this sentence? Could you write: "The same PF candidates are used to build large-radius (AK8) jets using a distance parameter of 0.8 ..." This makes it clear that the particles in AK4 and AK8 jets are not exclusive to one or the other.


73 "The AK4 jets" don't start a sentence with an acronym. See Guidelines.


118, 119 You have two consecutive sentences that start "All samples are", which is repetitious. You could rewrite the second sentence as: "The NNPDF 3.0 (2016) and NNPDF 3.1 (2017 and 2018) parton distribution functions (PDFs) [47] are used for all samples and include ..."


128 I think it is useful to mention that you have already qualitatively discussed these three topologies. So write: "separated into the three topologies discussed earlier based on"


141 Is it useful to say "determined from simulation for each topology" ? That is probably described in Ref. [21], but it would show the reader that the top quark mass used is not just one number and depends on the event topology.

"procedure described in Ref. [21]." this is how we directly refer to a reference.


158 I think you have misunderstood the integrated luminosity uncertainty. The CMS detector twiki suggests writing: "while the overall uncertainty for the 2016--2018 period is 1.6\%" So this is not an additional uncertainty. It is the total uncertainty in the summed integrated luminosity for the three years. Unfortunately, the twiki page doesn't give a reference for where this number comes from. But other CMS publications don't have a reference for the 1.6% number either. So I would just use the suggested wording exactly.

Is this what you mean: "Additionally, uncertainties in the integrated luminosity vary per year: 2.5, 2.3, and 1.2\% for 2018~\cite{CMS-PAS-LUM-18-002}, 2017~\cite{CMS-PAS-LUM-17-004}, and 2016~\cite{CMS-LUM-17-003}, respectively, and include both correlated and uncorrelated components across the three years, while the overall uncertainty for the 2016--2018 period is 1.6\%.

167 Would it be clearer to write: "The uncertainty associated with the possible misidentification of the sign of the lepton electric charge is negligible." ?


229 If you don't hyphenate "higher p_T" in the previous line, you shouldn't hyphenate "highest p_T" here. Greg Landsberg believes you should never hyphenate this type of phrase.


236 I worry that a referee is going to ask questions about our fitting for r_neg and A_c^fid the way it is written now. How about writing something like: "One way to measure A_C^fid, the top quark charge asymmetry in the fiducial phase space, is to fit for r_pos and r_neg, the signal strengths that scale the contribution of events with Delta |y| > 0 and < 0, respectively, and then use Eq. (1) to determine A_C^fid. Instead, we fit for r_neg and A_C^fid directly, which insures the proper handling of the correlations between different sources of uncertainty."

Changed to: Combinations of subsets of these channels are also possible and allow us to obtain results for the two mass regions separately. In all cases, the unfolding performs a multi-dimensional maximum likelihood fit of the simulation to the observed data and returns two measured parameters. One way to measure \ACfid, the top quark charge asymmetry in the fiducial phase space, is to fit for \rpos and \rneg, the signal strengths that scale the contribution of events with $\deltay>0$ and < 0, respectively, and then use Eq.~\ref{eq:AC} to determine \ACfid. Instead, we fit for \rneg and \ACfid directly, which insures the proper handling of the correlations between different sources of uncertainty.

Table 2 caption: "compared with the theoretical prediction from MC", since you have "theory" in the table heading, I think it is good to have a similar word in the figure caption description. "are also shown. All values are in percent." It's always good to make this clear in the caption.


Table 2: Capitalize "Measured".


Fig. 3: Increase the size of the numbers on the x-axis labels. They are way too small and you have lots of room. I wouldn't mind if you increased the size of the x- and y-axis labels also. They are bordering on too small, and again, you have lots of space. Make them easy to read. Write "Measured A_C^fid" and "Predicted A_C^fid" in the left legend. This is your "money plot" that will be shown at conferences. You want to make everything very clear and easy to read.


Fig. 4 caption: "uncertainties in the" is the correct wording.


Tom Ferguson

5 Put a space after the \approx sign. The Guidelines say: "The ${\approx}$ macro will put in the correct spacing". But sometimes LaTeX removes this space to make the line length what it wants. However, it's clear from the Guidelines that we do want a space here. So in this case, you should force a space by putting in a "\,".

The Guidelines seem clear although they don't clarify what the "correct spacing" is. Could check with Guidelines author.

12 "The SM value of A_c is expected to be about 1% for LHC" reads better


70 "(called AK4 jets)." Putting "AK4" next to the 0.4 is confusing, "pileup per particle indentification algorithm" no need to capitalize the name. You also never use PUPPI in the paper again, so no need to define the acronym.

The source reference has both forms ("pileup.." and Pileup..) but as long as CMS is consistent.

103 "satisfy either the condition" makes what you mean clearer.


127 "have no t or W tag" reads better.


135 "on the smallest value of the X^2 variable" it's not the smallest variable, it's the smallest value of a variable.


166 Give the standard references for the JEC and JER.


Table 1 caption: We consider "leading" to be jargon. Write: "lepton and the highest p_T jet in", "both the statistical and systematic"


239 "next-to-NLO (NNLO)" you have this term enough times to warrant its own acronym.


254 "at NNLO"


Fig. 3 caption: "including NNLO QCD"


Fig. 4 caption: "The +-1 standard deviation (sigma)" where "sigma" should be the Greek letter sigma, "uncertainties in the"

Done. $\sigma$

275 "leptonically decaying" no hyphen, see Guidelines.


Comments during Final Reading

Kevin Stenson

KS1) Section 2, third paragraph: explanation of the top tagging method is well described in the quoted paper(Z’ search) . Maybe a sentence could be added saying “Additional information on the tagging could be found in... “

Action: authors agree - but careful about W tagger.

Added: "Specialized techniques use AK8 jets and jet substructure information~\cite{CMS-PAS-JME-18-002}, including ``soft-drop clustering''~\cite{Larkoski:2014wba} and ``N-subjettiness''~\cite{Thaler:2010tr}, to identify the hadronic decay of boosted top quarks, following the techniques detailed in~\cite{Sirunyan_2019}. Two exclusive categories are considered: hadronically decaying top quarks ($\ttag$) in which the three partons are merged into a single AK8 jet, and hadronically decaying \PW bosons ($\Wtag$) in which the two partons from the \PW boson are merged into a single AK8 jet, but the bottom quark is reconstructed as a separate AK4 jet."

KS2) Table 1, caption: the sentence explaining the difference between muons and electrons should be added in the paper, not be just in the caption. Can be e.g. added in the first place the table is mentioned. Authors: agree


KS3) Line 182, top pt reweighting: first time that it was mentioned? Authors: this was added in the part of the corrections. Kevin: Agreed.

No change

KS4) Line 248: Mention that the efficiency is taken from MC Authors: the efficiency and acceptance are joined together, and part of the former includes also the SFs, so it’s corrected by data. It might be better to avoid this . Kevin: Agreed.

No change

Francesca Cavallari

FC1) Not clear the flow on how to cluster the jets of AK8 and AK4, in particular whether there is some separation. Authors: all particles are clustered in AK4 and AK8 simultaneously. Action: authors will add a sentence clarifying what you do to disambiguate cases when an ak4 and an ak8 are in the same top quark.

Added and reworded to define Delta R first: "The large-radius (AK8) jets are built using a distance parameter of 0.8 and the Pileup Per Particle Identification (PUPPI) algorithm~\cite{Bertolini:2014bba,CMS:2020ebo}. AK4 jets are removed if $\Delta R = \sqrt{\smash[b]{(\Delta\eta)^2+(\Delta\phi)^2}} <0.8$ from the closest AK8 jet, where $\phi$ is the azimuthal angle. The total jet \ptvec is given by the sum of the \ptvec of its constituents. If a lepton is found within $\Delta R < 0.4$ of an AK4 jet or $< 0.8$ of an AK8 jet, its four-momentum is subtracted from that jet~\cite{Sirunyan_2019}."

FC2) Line136 “The difference between the... and the true top quark mass” You say that you use the top quark mass for the hypothesis: what do you use for the top quark mass as central value? Authors: we removed true because it is loaded. Tom/Francesca: does not fully represent the average, does represent the monte carlo truth. Action: better to add a sentence, like “reconstructed top quark mass determined for each category in simulation with the procedure in Ref.[21]”


FC3) Charge mis-id: there is no uncertainty on the charge asymmetry. Authors : the mis-id is very small, since it integrates over the rapidity. Action: add a sentence in the reconstruction that the charge is measured as... , and in the systematics say that it is negligible.

Added after the systematics of reco, HLT and ID: "The uncertainty associated with the sign of the electric charge of the leptons is negligible.

We could not find any top paper that describes the charge measurement or takes the charge mis-id into account. For instance, TOP-12-038, the single top quark charge asymmetry, found the mis-id to be negligible at 7-8TeV and charge and mis-id was not even mentioned in the 13TeV paper. The asymmetry in the single top is measured more precisely than ours. We prefer to just mention it in the systematics and not try to dwell into the measurement of the charge any further in Section 2.

Brian Winer

BW1) Lines 53-54: there is no reference about the sum of the uncertainties mentioned here - is there a way to add it? And if not, add a description. Authors: there is no reference anywhere, this is what is usually done by CMS. Action: check whether there is a new reference.

Checked recent papers and everybody seems to be doing the same with no reference

BW2) Lines 188-189: the categories are first mentioned here, and they are actually explained afterwards, around lines 211-213. Propose to add a couple of words to avoid the reader to scroll backwards and then forward. Authors: we can add ”that will be described below” or something in this spirit. Action: add ”that will be described below” or something in this spirit.


BW3) Lines 228-229: the fit does not return rneg and rpos, but then it says it returns rneg and AC Authors: this is a new approach that rewrites the likelihood to include in the fit the error taking into account correlations: Action: rewrite to clarify better what is done.

Changed to say we fit for rneg and Ac directly and actually removed the equations as they were adding unnecessary complexity and they can be worked out by anybody (they are just arithmetic from the Ac definition)

"Combinations of subsets of these channels are also possible and allow us to obtain results for the two mass regions separately. In all cases, the unfolding performs a multi-dimensional maximum likelihood fit of the simulation to the observed data and returns two measured parameters: \rneg, the signal strength that scales the contribution of events with $\deltay<0$, and \ACfid, the top quark charge asymmetry measured in the fiducial phase space. By fitting for \ACfid directly, we insure the proper handling of correlations between different sources of uncertainty."

The corresponding description for the full phase space has been modified accordingly: "The unfolded charge asymmetry at parton level \AC is obtained after correcting for the product of the acceptance determined at generator level times the event selection efficiency ($\acc$). Specifically, the number of unfolded signal events in each channel is divided by the corresponding $\acc$ to correct from the fiducial phase space of that channel to the full phase space, which is common to all 12 channels. The uncertainty in the acceptance arising from theoretical sources in the \ttbar generation is several orders of magnitude smaller than the dominant systematic uncertainty and therefore neglected."

BW4) 243-244 How about moving this summary statement later after you have directed the reader to the table with results? You say that AC is consistent, but you don’t show the numbers. The sense is that the reader look at the number and see these are consistent. Authors: this was not clear - it’s fine to invert the order of the sentences. Action: invert the order of the sentences.

Changed to “Table~\ref{tab:results} and \figrefs{fig:fiducial} (left) summarize the \ACfid…

Tom Ferguson

Comments are in the introduction, will be followed up during the reading of that part. Authors: can we go back to AC with capital C everywhere? It is consistent with all previous results. Consensus: yes!


Comments on tables and figures Figure 1

- Remove l+jets from all figures, there is no other physics channel.


- try and increase the font size on the x axis.


- remove the / 1 in the jets and /2 from DeltaY


- Data/MC separate better the number in the bottom two plots


- Not necessarily all points need to be in the bottom panel if they are off, but authors’ preference is to keep them if possible.


- Check the Events/45 GeV number, as well as the Events/0.1, doesn’t look right.


Table 1 - too many blank lines. Proposal: delete one blank line between the Total and the Data


- remember to put the comment on the leptons in the text.


- remove “the” in the systematic components.


- Comment about the electrons and the years: should we explain the order? Authors would prefer not.

No change

- Spell out DY → Drell-Yan? We were trying to be consistent - Action: keep it this way.

No change

Figure 2

- Whatever is done for Fig 1 and applies here should be repeated.


- Here you put pre- and post-fit which is a bit jargon. Add it in the caption : before (prefit) and after(post-fit) in the caption; is already there.

No change

- DeltaY / 2 : should go here as well; better to remove it from Figure 1 as well, since it’s not useful.

No change

Table 2 - Add small blank lines so they don’t run into each other (small vertical space) Use a smaller font for all +- separate uncertainty.


- Ac - Theory: spread apart a bit value and uncertainty.


- All % can be removed? Proposal: add a multicolumn that includes “AC(%)” and below”measured stat syst.... theory”. If it doesn’t work, remove it and put it in the caption,


Figure 3

- Dashes are different between 750-900 and 750 - 900


- Predicted shows a band -> is this standard? Results is in the table, does it matter?

No change

- Move measured and predicted bottom, so they look the same. - Why keep a fixed range? A. it was a CWR comment Action: re-adapt it, by adding space on top so the legend looks better, i.e. like the current left plot.


- Drop the caps in the legend data points

Not sure what this refers to

- Subscript c here seems capital, should be lowercase? A. It is harder to see if lowercase. Also it was centrally decided to keep it everywhere capitalised for consistency.

No change

- Do we want % in the figure? Consensus is that it is better this way.

No change

- Lumi and energies are small, should be increased a bit


Figure 4:

- Lumi and energies are small and bolded


- Number on the left should be removed

Done - W+jets has spaces around the ( W + jets), (mu+jets) and (e+jets) don’t have them. Add spaces.


- impact on AC , C is very off.


- grey bars go beyond the box, fix if possible.


Title No change

No change

Abstract No change - It is not clear which number is given in the abstract. Should we assume it’s full phase space? Yes because it’s the only one specified so far. - We usually say whether it adds statistical and systematic components. Should we do it? Consensus is that it’s obvious and no action is needed.

No change

Introduction - Lines 1-3: qqbar annihilation appears and it is a bit strange. Maybe one wants to mention that it is when qqbar they involved in the collision. Suggestion: one could mention that there are two production mechanisms gg and qqbar, and say the first is dominating. Action: the authors will make a different proposal and explain it in a more straightforward say.

- Line 10: “broader” -> what is the purpose of this description? Authors: this is setting up the next sentence and equation 1. Action: none.

- Lines 5-10 Suggestion: when talk about the high mass region specify that those would be sensitive to AFB, so that it does not have a prominent role that might be misleading. At the beginning start talking about gluon fusion, then add qq, and say we are still sensitive to qq annihilation. Action: Authors will propose how to clarify that all asymmetry should come from that.

Changed to: "The vast majority of top quarks produced at hadron colliders are from \ttbar pairs that originate from a $\cPg\ttbar$ vertex via the strong interaction, where g is a gluon~\cite{Czakon:2013goa,Catani:2019hip}. At the LHC energies, about 90\% of the \ttbar production originates from gluon fusion, while the rest is from \qqbar annihilation. At leading order, the standard model (SM) predicts that \ttbar production from \qqbar annihilation is forward-backward symmetric. However, higher-order SM effects result in a small (${\approx}6.6\% $) positive forward-backward asymmetry \AFB, such that the top quark (antiquark) is preferentially emitted in the direction of the incoming quark (antiquark)~\cite{Czakon_2015}. There is no asymmetry in the gluon fusion \ttbar production that dominates at the LHC, but because valence quarks carry, on average, larger momentum than antiquarks (from the sea), the rapidity distribution of top quarks at the LHC is expected to be broader than that of top antiquarks~\cite{Czakon_2018,Czakon_2016}. The \ttbar charge asymmetry is defined as \begin{equation} \AC = \frac{N(\deltay>0) - N(\deltay<0)}{N(\deltay>0) + N(\deltay<0)}, \label{eq:AC} \end{equation} where $\deltay = |y_\cPqt|-|y_\cPaqt|$ is the difference between the absolute value of the top quark and antiquark rapidities and $N$ is the number of events. The value of \AC is expected to be about 1\% in the SM for LHC center-of-mass energies~\cite{Czakon_2018}."

- Line 29 uniquely restrict: No other measurements have restricted this phase space? Authors: This is their wording. None of the individual paper does it the same way.

No change

- Line 31 this paper → this Letter


- Line 31 there was some controversy on “first measurement” : it’s the first at 13 TeV, there was a point on whether the “and” includes thot.

No change

- Line 39 selections → selection requirements


- Line 41 multijet backgrounds → QCD multijet backgrounds


Summary - Line 262 In the rest you used “highly lorentz-boosted jets” here highly boosted top quarks. Consensus that it’s fine.

No change

- Line 261 Past tense in 261 and you go back . Action: change to present.


- Line 269 QCD : no acronyms in the summary usually → Consensus that it’s fine like this.

No change

- Line 274 very high → highly


- L278 remove space before the “.”


- Line 279 Conclusions should have a stronger statement on the importance of this measurement, e.g. that this shows that top quarks can be measured precisely in highly boosted topologies... Remove the “opening a new era” part, since it was already opened with the measurement. - Suggestion to swap second and third paragraph. Leave the last sentence.


Comments for Final Reading

Kevin Stenson

Type B

Title: “single-lepton channel” could refer to either a single quark or the ttbar pair. Perhaps “Measurement of the ttbar charge asymmetry in events with highly Lorentz-boosted top quarks and a single lepton at 13 TeV” or “Measurement of the ttbar charge asymmetry in highly Lorentz-boosted top quarks at 13 TeV with each event containing one lepton and multiple jets”

Changed to "Measurement of the ttbar charge asymmetry in events with highly Lorentz-boosted top quarks and a single lepton at 13 TeV"

Abstract: I think the last sentence should be removed. The word differential is never mentioned again, the only distributions I see are simply positive and negative Delta Y and it is stretching things to suggest two bins constitutes a differential measurement. Finally, there is nothing mentioned about it in the summary. If you want to include it, then you definitely need to specify the x and y axes of the differential distribution.

Changed to "The result is also presented for two invariant mass ranges..."

L3-L9 and L29-L35: Is there some reason to spend so much time discussing the results for qqbar annihilation and A_FB measurements at the Tevatron. If the idea is that measuring A_FB and A_c probes the same new physics models, then this needs to be made clear. Is it true that new physics should produce deviations in both A_FB and A_c? If not, then I do not see the point of spending so much text on A_FB and the Tevatron. One can simply remove all of L29-L35 and shorten L3-L9 by not mentioning qqbar annihilation or A_FB.

Shortened L3-L9 by removing the sentence "A fundamental difference between \ttbar production in the Tevatron proton-antiproton collisions and the LHC proton-proton ($\Pp\Pp$) collisions is that the former is dominated by \qqbar annihilation and the latter by gluon fusion". Also removed L 29-35 and merged the text into one paragraph

Section 2: I note that Ref 28 is mentioned for some of the jet requirements. I wonder whether the following sentences from that publication also apply to this analysis. If so, I suggest adding them. The PF candidates are clustered into jets using the FASTJET software package [39]. Charged hadrons that are not associated with the PV in the event are excluded from the jet clustering procedure via charged hadron subtraction (CHS) [36].

Ref 28 is the published ttbar resonance search. The Ac measurement is based on that paper but we added 2017 and 2018 data. In L 51 Ref 28 is used to reference the "dedicated jet and lepton selections". Moved the reference after that to make it clear. It refers to the description in paragraph 2 of Section 4 of Ref 28.

L82: Why do you need Ref 28 for this sentence?

This sentence refers again to our specialized treatment of leptons closed to jets, namely, that the 4-momentum of the lepton is removed from the jet. The same paragraph we mentioned before.

Section 2: I also note that in Ref 28, there is much more text devoted to explaining the t tagging algorithm and the inputs. Is is correct that the t tagging in this paper is the same as in Ref 28? If so, perhaps you can explicitly direct the reader to Ref 28 for more information. In Ref 28, there is nearly two full pages (Section 3) which gets reduced to about 3/4 of a page in this paper (last two paragraphs of Section 2).

We use the same top tagger but the W tagger is a new addition. We were asked to add the primary references for CMS top and W tagging rather than Ref. 28. We could also add Ref 28 if you prefer. Maybe after "Specialized techniques?"

L95,96: I am surprised to see AK4 and AK8 jets all the way out to |eta|<2.5. Have other analyses used jets out this far? I thought most of the time only jets with |eta|<2.4 were used. I guess for 2017-18 with the new pixel detector extending coverage, this might be OK but I am not sure about 2016 data

This is a typo, our |eta| goes up to 2.4

L113,L114,L117: Need to clarify exactly what is meant by “all AK4 jets” and “nearest AK4 jets” and “one of the AK4 jets". Is this all AK4 jets in the event? All AK4 jets associated with the primary vertex? I see the “nearest AK4 jets” has a pT and eta requirement but what about the “all AK4 jets”? You may want to define this AK4 jets collection around L77-79.

For the 2D cut it is the closest Ak4 jet to the lepton (Ak4 are all only from the same vertex). b-tag can be any AK4 in the event (irrespective of distance to lepton). Changed to: To reduce the background from QCD multijet events, we apply a two-dimensional (2D) selection that requires leptons to satisfy the condition $\DRmin > 0.4$ or $\ptrellj > 25\GeV$, where \DRmin is the angular separation between the lepton and the closest AK4 jet, and \ptrellj is the transverse momentum of the lepton with respect to the axis of the nearest AK4 jet~\cite{Sirunyan_2019}.

L115-116: Why the difference between muon and electron channels here? Why not ptmiss+pTe > 150 GeV? You may need to add this information to the paper. You could add another sentence like “The larger value of the e+jets \ptmiss requirement efficiently reduces the larger QCD multijet background in this channel and obviates the need for a separate requirement on ptmiss+ptl.” This is the information I got from Ref 28. However, now when I look at the final result, I see that the electron channel has a negligible QCD multijet, while the muon channel has a substantial QCD multijet background. Is this cut responsible for the much lower e+jets yield compared to the mu+jets?

Added the sentence and yes, this cut is responsible for the negligible QCD background in the electron channel. The background in the muon channels ends up being larger but it can be modelled, which was not the case in the electron channel.

Section 4: What happens if a single AK8 jet passes both the t and W tagging requirements? Or are these mutually exclusive?

Line 89 states that they are "two exclusive categories"

What happens in events with two AK8 jets, one that passes the t tag and one that passes the W tag? Please make sure the paper has information that the reader can use to answer these questions. These questions occurred to me while reading L130-134. Most of the questions are answered by the text in L142-145. I suggest moving L142-145 to the beginning of Section 4 (perhaps after the first sentence of Section 4).

Done. Small rewording to make it clear what the final signal candidate sample selection is.

Section 4: I think the reader would appreciate knowing what fraction of the events (either total or signal) are in each topology. This could be inserted around L147. L146-147: This sentence indicates that 70% of the boosted topology candidates have the correct jet assignments. To be clear, the 70% only applies to events where there is a t tagged AK8 jet? This does not seem to be very impressive. This just means that you have selected the correct AK8 jet to go with a top quark and the correct AK4 jet to go with the lepton to make the other top quark. I think it would be better to give the fractions for each of the 3 topologies (or the average value for all events). Specifying just the value for one of the three topologies makes the reader question what the other efficiencies are.

The % depends on the mass but is mostly resolved. The main reason for having the boosted topology is to improve the correct assignments. It is debatable that ~70-80% is not impressive, it is actually worse for the resolved, one item in our list of possible improvements for the next iteration. We can remove the sentence about the 70% but prefer not to give numbers. We could say that the majority of the events is still resolved if needed, but we do not want to get into quantitative details.

L146: Is this chi^2 variable the same as in L139? It needs to be made clear. In L139 the chi^2 variable is just mentioned as a way of arbitrating but then it is used in L146. Maybe it would be better to have a standalone sentence describing the variable. L138-140 could be rewritten as:

We did not get your suggestion, but changed to the following: For each event, one $\ttbar$ hypothesis is selected as the one with the smallest \chisq variable that minimizes the difference between the reconstructed $\tl$ and $\thad$ masses and the true top quark mass determined from simulation~\cite{Sirunyan_2019}. Because background processes typically result in large values of \chisq, only events with \chisq$<30$ are retained. Finally, our signal candidate sample is defined as those events with \ttbar invariant mass greater than $750\GeV$.

Figure 1: It seems like a semi-log plot of the invariant mass would be more informative.


Table 1: I am surprised that the e+jets is only 1/4 the size of the mu+jets. Why is that? Is it the ptmiss cut? I think the reader is owed an explanation.

It is that but also, and principally, the higher lepton pT cut (80 vs 55GEV) and the leading jet pT (185 vs 150 GeV). Added to the caption: The higher pt thresholds on the lepton and leading jet in the electron channel result in significantly reduced signal acceptance compared to the muon channel.

Figure 2: For the top plots, the “pre-fit” and “post-fit” text interferes with the mass range above. Suggest moving this text down for all 4 plots. Figure 2: For the top plot mass range, the first “GeV” is unnecessary and should be removed. Figure 2: You should change “ele+jets” to “e+jets” to match the rest of the paper. Figure 2: The ratio seems to carry more information than the actual distribution. I think it would be helpful to make the ratio plot slightly larger vertically and to zoom in the y-scale to something like 0.7-1.3.


Figure 3: There seems to be a lot of white space. I suggest changing the y-axis range to be -0.05 to +0.05. Also, you should make the data points larger and the error bars thicker to enhance visibility. It is not like there is any clutter.


L280-282: I do not understand the logic of this sentence. Why is it that having more reactions come from valence quarks means the measurement is especially sensitive to BSM physics?

The majority of the BSM models that predict a larger than SM Ac are qqbar initiated and the PDF for valence quarks increases (compared to gluons) at larger x. This is a restatement of what we said in L17.

L282-284: I do not think this sentence is necessary or helpful. It actually dilutes the message of the current measurement being useful.

We could go either way. Right now the measurement is very statistically limited and this comment tries to be forward-looking to the next iteration.

Type A

Abstract: You are allowed to change “CERN LHC” to “LHC” if you like. I am pretty sure everyone reading a top quark paper will know where the LHC is.


L2-3: I do not like “in the form of”. Suggest a first sentence of “The vast majority of top quarks produced in hadron colliders are from \ttbar pairs that originate…”

Changed to "In hadron collisions, top quarks are produced dominantly in pairs..".

L6: Remove “the” in “that the \ttbar”


L60: Remove “, respectively”. First, there is nothing in the sentence to pair the terms with. If you go further pack and try to pair the terms with the previous sentences, the ordering does not match. So, just remove the term “,respectively” and assume the reader is smart enough to match the terms or else change the end of the sentence to “and “resolved”, for the high, intermediate, and low \pt regions, respectively.”

Done. (Latter suggestion.).

L122: Change “All the samples” to “All samples” or “All of the samples”

Changed to "All samples".

L128: I do not like starting a sentence or section title with a symbol. How about “Reconstruction of \ttbar events”?

Currently "of the top quark pair events" but the suggestion is better...A top quark pair could be tt.

L131: “considered as candidates”


L165: Change “reweighed” to “reweighted”


L180: I think you need something like “sets” before “[53]”. There needs to be a noun of some sort.

Done although "PDFs" (and not "PDF sets") appears earlier in the sentence.

L181: Change “in [58]” to “in Ref. [58]”


L188-191: I assume that this comes about from the difference of the top quark pT distribution between data and MC. If so, you need to explain the problem. I also think you need to specify whether the central value is obtained with or without the correction.

Not clear. The authors should address this.

Table 1: Please use normal size font for the table. There are tricks to reduce spacing in tables if necessary. For example, the spacing between columns can be greatly reduced.

The font size has been increased.

Table 1: I would suggest the top row have centered labels rather than right justified.


Figure 2 caption: Can remove “As can be observed,”

Done...but not the only "as can be observed".

L244-5: Writing “which together summarize” means that one needs both the table and figure in order to summarize the values. In fact, they have the same information. Therefore, just write “which summarize”.


Table 2: Please use normal size font for the table. There are tricks to reduce spacing in tables if necessary. For example, the spacing between columns can be greatly reduced.


Table 2: I think one can reduce the amount of vertical space in this table.

Done. And the font size has been increased

L248-9: I think we need to mention that this comes from MC. Perhaps add “, both obtained from MC simulation” to the end of the sentence.

Both refers to acceptance and efficiency? One would think that the efficiency also comes from simulation (and alpha epsilon may be determined in one go). Leave for the authors.?

L268: Change “and result” to “and can result”. After all, you do not specifically require the leptons be non-isolated and you do include events where every parton corresponds to its own AK4 jet.


L322: Fix the title so ttbar is correctly typeset

Done.Used CMS \ttbar alias.

L328-9: Fix the title

Done... $\Pp\PAp\rightarrow\ttbar$.

L339: Remove the end page number


Ref 13 and Ref 19 are duplicates. Consolidate.

Done: Kept the first..The reference appears twice in the list of BSM models/particles.

L418: Fix the last author (should be Emanuele Re, abbreviated as E. Re)


L446: Remove the “no. 9” (remove the “number” entry in bibTex)


Tom Ferguson

Type A Comments

Title "channel in proton-proton collisions at" Without this, the "13 TeV" reads strangely. Not clear what it refers to.

LE: Currently has "pp collisions". Leave for the FR.

Abst. "top quark pair (tt) events" this is our standard wording to define "tt".

The \ttbar symbol should be familiar to everyone reading the paper. If it must be defined in the standard way, then it would be necessary to write out "top quark-antiquark pair events" in the Abstract and Summary by the Guidelines (fewer than 2x appearances). And \qqbar would also need to be explained in the Introduction.

"top quark charge asymmetry of" reads better


How about writing it as: "(6.9 ^+ 6.5 _- 6.9) x 10^-3" this is much easier to parse. I don't mind one zero after the decimal point, but two is hard to read and then you have to individually look at each of the uncertainties and make sure they have two but not three zeroes after the decimal point. Writing it in scientific notation makes everything much clearer. It's not called scientific notation for nothing.

Somewhat done. The authors prefer (0.69 ^+0.65 _-0.69)% for consistency with previous publications..

"in quantum chromodynamic perturbation" we define "QCD" the first time it is used. See Guidelines. Note: no "s" in "chromodynamic" since it is an adjective here.

Done. The Guidelines discourage the defining of symbols in the Abstract where the symbol is not used two or more times (in the Abstract).

"Differential distributions of the charge asymmetry for two tt invariant mass ranges, 750--900 and > 900 GeV," this is our standard way of writing ranges and of only putting the units on the second value in a list. Use an en dash for "750--900"


2 "primarily as top quark pairs (tt) that" be consistent with the abstract. Also, "tt pairs" could be misinterpreted as "tt tt".

Changed to "top quark-antiquark pairs (\ttbar)" but \ttbar (without definition) seems more natural here (and in the Abstract and Summary).

4 "production from proton-antiproton collisions at the Tevatron and proton-proton collisions at the LHC is" reads better.


6 "predicts that tt production from qq annihilation is" reads better

Done. \ttbar. Note that the \qqbar symbol is not defin.

7 Put a small space after the "\approx" sign.

This would seem to be contrary to the Guidelines.

14 "top quark and antiquark" you don't have to repeat "top".


15 "rapidities, (comma) and" for clarity.

A comma seems to introduce an unnecessary pause.

16 "SM at the LHC" just for clarity. Should you give the center-of-mass energy or is the asymmetry fairly independent of this?

Add "at LHC center-of-mass energies" but the authors should comment.

30 The phrase "by the SM at the time" seems to imply that the predictions changed. I don't think you mean this. I would delete "at the time" not needed.

Agree but this paragraph is now gone?!

37 I don't think you need the second "A_c =" it's clear without it. Again, what about writing these values as "(5 + 7 (stat) - 6 (syst)) x 10^-3" ? This is easier to read and understand.

Dropped the second A_c but the extraction of the 10^-3 seems less appealing in this in-line form?

41 "This paper presents the first measurement of the tt charge asymmetry from pp collisions at sqrt(s) = 13 TeV" reads better

Changed to "This paper presents the first measurement of the \ttbar charge asymmetry that used pp collision data at...The authors should clarify what makes the measurement a "first" one. Is it the AND of 13 TeV and high-mass optimization, or OR?

44 "with one W boson decaying leptonically"


45 "and the other hadronically" this is shorter and reads better.


49 "allow us" since "selections" is plural.


64 "calorimeter, (comma) and"

Done...This does appear on the detector description he twiki page without the serial comma.

79 "Large-radius (hyphen) jets (AK8) are built using a distance parameter of 0.8 and the pileup per particle identification (PUPPI)"

Done but left as "Pileup per Particle Identification".

81 "or < 0.8 of" you don't need the second "Delta R"


87 "including soft-drop" hyphen, delete "the"


88 "and "N-subjettiness" delete "the"


89 "in which the jets coming from the fragmentation of the three partons from the top quark decay are" for clarity

Leave in the post-CWR form. The sentence is very long and one would like to believe that the "three partons" are understood at this point.

91 "in which the jets due to the hadronization of the two partons from the W boson decay are", "but the fragmenting bottom quark is"

See previous comment. Note that the resulting sentence would have both "hadronization" and "fragmenting" in it, and also be very long.

94 "DEEPJET [34]" delete the second "algorithm"

Removed the second "algorithm" by writing "(DEEPJET [44])".

95 "applied to each AK4 jet j with" "j" should be in italics. "The t and W tagging" you don't need two "tagging"s.

Done. ($j$)

102 "single-electron" hyphen


108 "two jets j_1 and j_2 with" you need to define the symbols

Done: "two jets, $j_1$ and $j_2$, with".

116 "from W + jets" delete "the"

The definite article seems appropriate here.

117 "jets must be" reads better


118 "samples of the"

"samples for the" reads better to me (LS).

133 "or t_h" you don't need the second "the", "a t nor W tag" you don't need the second "a" or the first "tag".

The repeated "the" , "tag", and "a" don't take much space and "the t_l or t_h" seem too terse.

134 "and t_h"

See previous comment.

135 "b-tagged jet" hyphen, see Guidelines.


137 "to equal the" why use "match" when you mean "equal"

Changed to "constraining...to the W boson mass"?

142 "into the three topologies mentioned earlier" should reference your earlier discussion.

This sentence is no longer present.

147 "comparison of several distributions between the data and the MC predictions" I think "MC" is better than "SM".

"MC" is better than "SM" in this situation (a MC has a specific order whereas SM could mean all orders). But the suggestion is not clear on what is being compared and if the "distributions are within a histogram or the Delta|y|, M_tt,.. distributions. Write as "Figure 1 shows comparison between data and MC simulation for kinematic distributions based on events in the candidate sample.

148 "in the candidate sample. (period) I don't like "our". Also, the current sentence is too long and clumsy. Break it up. "The boosted nature of the top quarks becomes evident in these events in which the M_tt range extends to multi-TeV values, two and three AK4 jets from the collimated top quark decay products are reconstructed, and the leptons are closer to the nearest jet axis than the jet size."


152 "between the distributions from data and the MC predictions is"

Changed to "between data and prediction" but feel that "distributions" idea has been made and a simple judgment is appropriate for the last sentence.

154 "normalization and shape" you don't need the second "the"

The second "the" stresses that "shape" is a distinct concept from "normalization" (although uncertainties often fall in both categories).

160 "[46], (comma) respectively,


161 "across the three years."


162 "There is an additional 1.6% systematic uncertainty in the combined" reads better.

"for" in the sense of adding in quadrature.

163 "normalization and shape of the MC distributions"


168 "high-level" hyphen


169 "(Reco), (comma) and"

Serial comma added but "(reco)".

171 "A constant systematic uncertainty in the efficiency" "flat" is considered jargon.

Use "uniform" instead.

172 "QCD background is used, which"

Not sure what the full suggestion is but the use of ", which" is questionable. Change to ", and".

175 "in t and W tagging" you don't need both "tagging"s


178 "incorrectly identify (mistag)" "mistag" is considered jargon and must be defined.


Fig. 1 caption: "Comparison between the distributions from data (points) and the MC predictions (colored histograms) for signal candidate events in the", "(described in Section 6): Delta |y|", "The vertical bars on the points show the statistical uncertainty in the data. The shaded bands represent the total uncertainty in the MC predictions (described in Section 5). The lower panels give the ratio of the data to the sum of the MC predictions."

Changed the first sentence to "Comparison between data and MC simulation for.. The legend explain the symbols, so it does not seem necessary to repeat the information in the caption.

179 "MC simulations"


180 "The uncertainty from the choice of PDF is estimated by taking the difference between using versions 3.0 and 3.1 of the NNPDF sets, according"


181 "in Ref. [58]." See Guidelines on how you refer to references.


186 "Uncertainties related to the modeling of the initial-"


187 "(ISR and FSR) in the parton shower are determined by varying" so you don't have "taken into account" in two consecutive sentences.


191 "without the correction to the simulated samples used to make the MC top quark p_T distribution agree with data [59]." you need to say what the "correction" is.

The sentence explains the correction?!

198 "to the finite MC" I don't like "limited". It's unlimited - you could make as large a sample as you want, but it would still be finite.

If we don't want to admit to MC sample size limitations in a public pager, maybe just "due to the MC sample size"?

219 "The total likelihood is given" it's not a "result"


222 "advantage that the background contributions are" shorter


Table 1 caption: "The signal event yields in data and the MC predictions after the likelihood fit for each" Delete "used in the analysis" obvious. "2016, 2017, and 2018, and two tt invariant mass ranges). Put the years in numerical order. Delete "for events that pass the signal selection" not needed if you say "signal" at the beginning. "The uncertainties in the MC predictions include both the statistical and systematic components."

The authors feel strongly about the backward ordering of the years. Agree that the selection qualification is no longer necessary. The last suggestion does improve on the text.

232 "Taking various combinations of the channels gives separate results for the two invariant mass ranges."

Not sure that this improves on the text. But maybe I am missing the significance of "combinations of various channels.". Statistics?.

234 "to the observed"


240 Delete "and" we don't add this in a bullet list.


242 "given tt invariant mass region."

Done. "given \ttbar invariant mass region".

Fig. 2 caption: "The signal event yields for Delta|y| < 0 and > 0 from data (points) and the MC predictions (colored histograms) before (left) and after (right) the likelihood fits for each of the analysis channels. The upper and lower plots show the yields for 750 < M_tt < 900 GeV and M_tt > 900 GeV, respectively. The vertical bars on the points represent the statistical uncertainties in the data, and the shaded bands give the combined MC statistical and systematic uncertainties. The lower panels display the ratio of the data yields to the sum of the MC predictions." Delete the last sentence. You already say this in the text, and this is not something you put in a figure caption.

Mostly agree but keep the "comparison" lead to parallel the Fig. 1 caption.

243 Delete "found to be" not needed


245 "summarize the A_c^fid and A_c values for the complete signal sample and for two tt invariant mass regions, along with their"

"together" is not appropriate and the last phrase of the sentence seems to be a run-on. But "full signal" or "combined signal" sounds better than "complete signal".

246 "uncertainties. The theoretical predictions, which include next-to-NLO QCD and NLO EW corrections from Ref. [4] and are obtained by settting the observed quantities to their expected values ("Asimov data"), are also given, along with their uncertainties."

Done. Maybe should define NNLO? % ENDCOLOR%

Table 2 caption: "charge asymmetry values", "fiducial phase space A_c^fid (top)", "full phase space A_c (bottom) are shown for the total sample and the two M_tt invariant mass ranges, along with the corresponding SM predictions A_c (theory) and their uncertainties. The statistical (Stat) and systematic (Syst) uncertanties in the data, the MC statistical uncertainty (MC stat), and the total uncertainty in the measured values (Total) are also shown."

Don't follow the first sentence in the suggestion?

249 "determined at generator" so you don't have two "measured"s in the sentence.


252 Delete "In this case," not needed.

Done...Change the last phrase to ", which is common to all 12 Channels".

Eq. (4) Put a comma after the equation.


253 "where N_gen is the number of generator events in each Delta |y| region, and alpha epsilon^pos and alpha epsilon^neg are the corresponding acceptance times efficiency values. This formula allows us", "strength" (singular)


255 Delete "and its uncertainty" not needed and doesn't read well. All values come with their uncertainty, you don't have to say this.


256 "than the dominant systematic" the uncertainty in the acceptance is also a systematic uncertainty, so you must delineate what you mean.


Fig. 3 caption: "Measured A_c^fid (left) and A_c (right) values (points) for the complete signal sample and for two M_tt ranges, combining the muon and electron samples. The inner tick marks on the points indicate the statistical uncertainty in the data, and the outer tick marks the total uncertainty. The theoretical predictions, including next-to-NLO QCD and NLO EW corrections from Ref. [4], are shown by the bands, with the height of a band representing the uncertainty in the prediction." Delete the last sentence. This should be in the text but not in the figure caption.

"complete" and "combining flavor channels" mean the same thing? Agree that observations about histograms should generally go in the main body and not the caption.

Fig. 4 caption: "The +/- 1 standard deviation (sigma) impacts of the dominant nuisance parameters", "uncertainties in the A_c" you haven't used "inclusive" previously, don't start now. "The MC statistical" never start a sentence with an acronym. See Guidelines.


261 "Figure 4 shows the +/- 1 standard deviation impacts of the dominant systematic uncertainties in the A_c measurement for the complete signal sample."

Done but used "full signal sample".

270 "fit and then extrapolated from the fiducial to the full phase space."

The seems a bit detailed for the Summary.

271 "events with tt invariant masses satisfying M_tt > 750 GeV, (comma) corrected to the full phase space, (comma) is 0.0069 +0.0065 -0.0069, where the uncertainty includes both the statistical and systematic components."


276 "top quark" no hyphen. But you just said the first half of this sentence on line 264. There is no reason to repeat it. So write: "This is the first measurement to use a binned maximum likelihood unfolding technique to measure A_c directly at the parton level and in the full phase space."


278 "It is also the result that"

Assuming you meant "It is also the first result that" but this is re-raising the "first measurement" question..

279 "for the hadronically and leptonically decaying top quarks" delete the second "the" and the hyphen, and make "quarks" plural.


280 "at both the trigger and offline stages."


284 "for the upcoming LHC run and the future HL-LHC." We don't use terms like "Run 3" without defining them first. See Guidelines.

%Blue% Agree that "Run 3" is a problem but any reference to time (upcoming, future) will eventually not be accurate. Use "current" for Run3. Maybe the last sentence could be deleted?

322 The second "t" in "tt" should be in roman font.


328 Only the first word of the title should be capitalized. It should say ppbar --> ttbar" using overlines.


335 The ttbar should be in roman font.


339 Delete "-242"


References 16 and 20 are identical. Same for 13 and 19.

Also Kevin Stinson's comment. Done.

349, 358 Should be "CDF Collaboration"


352 Should be "D0 Collaboration"

No longer referenced?.

364 Should be "Collaborations"

Done. (Use author field and drop the collaboration field to get the plural form.)

389 Only capitalize the first word in the title.


402 "drop" lower case.


432 Put a space between "13 TeV"


439 "41.9 fb^-1"


446 Delete ", no. 9,"


Type B Comments

136 Shouldn't you say why the b tagging information is not used? Reads strangely without a reason.

The presence of a b-tag in each event is required to suppress W background (line 117). The information of which jet is or is not b-tagged is not used in the event reconstruction. This is the same as all previous Z' to ttbar boosted paper and we never entered into detail of why or why not and prefer to do the same here

146 Why give the percentage for the boosted topology and not the others? Also, why not give the percentage of signal candidate events in each topology?

The percentage depends on the mass very strongly. We added this because it was requested but given all the comments received we have now removed it as we do not want to go into the detail that would be required to give a meaningful quantitative answer.

147 You need to define "Other" before Fig. 1 is shown. Don't use "Others".

Added to the caption of Fig. 1 and 2: ``Other'' corresponds to the combined contribution of ST, DY and QCD multijet.

Fig. 1: Put the "Data" at the top of the legend list. That is our standard position. We typically read the legend down in one column and then down in the second column. So the order should be:

Data Other tt MC tot. unc. W + jets

did put Data on top

put the order as it appears

Notice that you should show the shaded band in the legend. You should stack the histograms in the order they are given in the legend from top to bottom. So they should be stacked with tt on top, then W + jets, then Other, not Others.

shaded band now shown

Increase the font size of "ell + jets" and move it below the legends in some of the white space. Put a space on each side of the + sign to match the figure caption. Same for "W + jets".

did increase font size left the position unchanged as I think it looks better there added space between the +

You have too much white space in many of the plots. You could make the legend just one column in these plots to reduce this amount of white space.

Prefer to leave it as it is as this does not work well for all plots

The top left figure needs the "ell + jets" label.


The y-axis label should give the bin width: e.g., "Events / 250 GeV" with a space on each side of the "/". See Guidelines.


Put a space on each side of the "/" in "Data / MC"


The x-axis numerals and labels should have a much bigger font size. The font size of the top left figure should match the other figures.


Decrease the y-axis range on the Data / MC plots so that the points and errors bars are more visible. Make the plot larger vertically so that the y-axis numbers don't overlap with each other.


205 "The index j runs over the total number of reconstructed signal events N_reco." You need to define the symbol.

Actually, it is the number of bins not the number of events. That is why they are both set to 2. Changed to The index $i$ runs over the number of bins at generator level (\Ngen), and the index $j$ runs over the number of bins at reconstruction level (\Nreco).

208 Delete "In this analysis," not needed. What other analysis could you be describing?!

The description is meant to be generic and we prefer to leave it to make it clear to the reader that we choose to use just two bins, which is our choice and a novel method which then allows us to fit for Ac directly. .

218 Move this sentence up to line 200 where "channels" is first used.

Here again the description is meant to be generic and we prefer to leave it as is and only define the channels of our analysis at the end of the generic description of the method. .

Table 1: The numbers are too small to read. Increase their size until there is only a few spaces between each number and the table fills the page.


I don't like your notation of "mu_2018" hard to read and very clunky. Rewrite as "mu (2018)" etc. changed

I would also put them in the order 2016, 2017, and 2018, not the reverse. We have consistenlty used the muon 2018-2017-2016, electron 2018-2017-2016 order because it corresponds to the importance of the channels in the result. We prefer to leave the order as it is as it is used consistenly throughtout the paper

I would put a horizontal line before "Total" and leave a blank line between "Total" and "Data". done

234 I think you need to define the signal strengths. They are unitless numbers that are usually a measured quantity divided by a predicted quantity. Is this the case here? What is the predicted quantity? Be clear.

it was defined later but moved it here: returns two signal strengths, \rpos and \rneg, that scale the contribution of the events with $\deltay>0$ and $<0$, respectively.

Fig. 2: Use most of the comments suggested for Fig. 1. "Pre-fit" and "Post-fit".

done Use "e" instead of "ele". done

Put the years in numerical order. We prefer to leave the order as it is

The x-axis labels are incredibly hard to read and messy. Why not use the symbols - and +, or "A" and "B" to designate the two regions, and explain what they mean in the caption. We prefer to leave this as we have reached this after several iterations. although crowded it is easy to understand

Table 2: The numbers are different sizes and the smaller ones are too small. Make all the numbers have the same font size by increasing the smaller ones. You have room to make the table wider and lots of blank space between some of the columns. Use this. Capitalize "Stat" and "Syst". Put a space after every + and - sign. Put a space before the +/- signs for the A_c (theory) values. Write "A_c^fid in the fiducial phase space". Delete the parentheses around "750 - 900" Think about writing all the numbers in scientific notation. I don't think this would take up much more space since you would be eliminating "0.00" and replacing it with "x 10^-3" You can remove one of the blank lines after each row of nmbers and after the A_c labels, and at least 2 blank lines between the fiducial and full phase space sections. The table takes up too much space. You are trying for PLB, which does have a page limit. done except the scientific notation, left it as it is for now

Fig. 3: Decrease the y-axis range. You have too much white space. Increase the font size of the legend. You have plenty of space for it. Use "750--900" for the x-axis labels. Remove "GeV" from the labels and put it after the "M_tt" in parentheses, like a normal unit. Put a space after the ">" signs in the x-axis labels.


261 This is the first mention of Fig. 4, so it cannot be shown on the previous page. Move it down below this line.

We prefer not to introduce forced locations as the publisher will take care or formatting in any case

Fig. 4: "Integrated luminosity" first word capitalized, "Electron reco" and "Muon reco" second word lower case.


Francesca Cavallari

line 45 either a muon or electron → either a muon or an electron

line 49 Dedicated jet and lepton selections at the trigger and offline levels allows → Dedicated jet and lepton selections at the trigger and offline levels allow (remove the s)

I think that the discussion from line 49 to 60 could be moved in section 2 between lines 86 and 87. We think it is better to leave it in the introduction as it explains what is special about our high-boost topology in general terms but does not get into the specific object identification we do at CMS, which is what we then describe in Section 2.

I don’t understand how you make sure that there is no double counting of the objects in the AK4 and AK8 jets. Can you clarify exactly how the jet clustering procedure flow works for this analysis (what algorithm you run first and what you use to run the second one?)

We use the standard CMS jet reconstruction based on PF which ensures no overlap.

line 138-140 Finally, one tt hypothesis is selected for each event as the one with the smallest chi2 variable that minimizes the difference between the reconstructed tl and th masses and the true top quark mass determined from simulation [28]. I don’t understand what is meant by true top mass determined from simulation and also eq 6.1 and its explanation in ref [28] is not very clear. I don’t understand if it is the event by event mass from reconstructed and MC truth matched objects in simulated signal events that you are using in the chi2, or it is the average mass and resolution for reconstructed and MC truth matched objects in simulated signal events. But then how do you compute this quantity for the background processes and on data? can you please clarify?

Offline objects matched to MC truth information in ttbar MC is used to determine the "true" top mass. The, of course, in data and all our MC samples, we compare the reconstructed mass with that "true" value (which is a Gaussian with a mean and a sigma) to calculate the chisq. Youi can find all details in the appendix of our AN

143-144 please write boosted in quotes or write explicitly the boosted event category, same for the other two categories.

Figure 1why do you show Delta y here? show it later.

We show here representative distributions for the entire sample and show Deltay because it is our main variable. Later we show it in each of the individual channels, but this is the place where we show it for the entire sample

185 for ttbar; for the ttbar process

188 Finally, an uncertainty in the

189 correction to the top quark pT in simulated tt samples, which depends on the generator-level

190 top quark transverse momentum, is evaluated as a one-sided variation computed from the dif191 ference between the top quark pT distribution with and without the correction [59]. I think this is not explained well. of course the top pt depends on the generator level top pt… and the explanation of how you compute this correction is not clear enough.

We added further explanation in the previous section and maybe this is clearer now

196 say to which physics quantity you apply the simultaneous binned maximum likelihood fit to data, perhaps anticipate the lines 228 and following, where you present the figure 2 at least the part of the figures before the fit renormalization, at the beginning of the section.

This is a general introduction to the method and all the details are given later. It is not so trivial as to state the quantity here as we do it to the Ac and the neg side of the Deltay after rewriting the likelihood. The reader really needs to follow closely to understand what is done

227 I think that since the ST sample is mentioned rarely in the paper and it is not a well known acronym it would be better for the reader to spell it out everywhere.

We prefer to leave it as is, we also do not spell out DY and eventually they are combined in any case

I think that you should introduce before somewhere in the chapter about the object reconstruction a sentence or two about the charge measurement for the muons and electrons, and its associated charge misidentification error. The error should also be included in the systematics. These are not typical pt of the leptons, so are the errors on the charge misidentification well known from the MUO and EGM POGs or did you recompute them especially for this analysis ? Normally for the charge asymmetry measurements in other papers (W boson for example) there is a dilution factor due to the charge misidentification which allows to compute the true charge asymmetry, why is this not done in this case?

Nothing special was done as is standard in the top group for other measurements that use the sign of the lepton (see PhysRevD.100.072002 for example)

fig 2 caption text: As can be observed, these uncertainties are reduced significantly after the likelihood fit, and the agreement between data and simulation is improved. this text should be removed from the caption, the caption should only say what is shown in the figure, any comment should be in the text. Just delete this comment from here because it is already in the body of the paper.


I think that in the conclusions a sentence about the interpretation of the results could be added.

Not sure what you mean by interpretation

Brian Lee Winer

Type A: Line Comment

130 I think you could drop “leg of the” and the sentence would read perfectly fine.

Leave in place to reinforce "leg" concept.

130,132, 135 In the first two lines you use hyphens “t-tagged jet”, “W-tagged jet”, but in the last line there is no hyphen “b tagged jet”. I hate the “Hyphen Wars” that get fought over this point, but this jumped out at me. If the CMS Style Guide says to do it this way, so be it.

The Guidelines do give "b-tagged" jet as an example (of correct hyphenation). Line 135 was using the \btagged newcommand defined by the authors, and this does not fit here. The journal will have its own style guide.. %ENDCOLOR

Type B: Line Comment

59-60 In the text preceding this you describe the high Pt, low Pt, and then intermediate range. The sentence on these lines uses the words “ ‘boosted’, ‘semi-resolved’, and ‘resolved’, respectively.” While I think it is clear, the order of the description does not line up with the order of the descriptive words.

We have been asked to change the order back and forth 3 times. We agree the names do not match with the description, but others thought that the names were self-evident and that the order of non-boost was more important. Left for now to decide by the pib com,

106-108 I understand you are trying to be brief but combining the muon and electron selection via parenthetical statements seems cumbersome. Why not just split out the selection into different sentences?

Indeed, this is not uncommon for a letter to save space and also to make it easy to compare the cuts in the two channels. Left it as is.

138 I think the start of this sentence could be reworded to make it read more smoothly. What about, “Finally, the tt hypothesis selected is the one with the…” or “Finally, we select the tt hypothesis with the…”

This sentence was reworded alread

161-162 Seems like this statement regarding an added uncertainty deserves a reference or further explanation as to why you are adding it.

Do we have a proper reference to add

Figure 1 Any consideration to show Mtt on a log scale?


196-197 When I read this first sentence, I felt left in the dark. I had to sit there and think about what all the bins and categories were. (I’m not even sure I came up with the right answer.) I think it would be helpful to the reader to be a bit more explicit about what all the bins are. (Note: You helped me out on the next page on lines 219 – 220, but it was a little frustrating reading to that point and not really knowing.)

This is a general introduction to the method and all the details are given later. It is not so trivial to start explaining what is particular to our analysis earlier without failing to explain the general idea frst.

225-231, Table 1 I am trying to reconcile some statements in this paragraph along with Table 1. The second sentence says you take the backgrounds from simulation. But you also point out the fit constrains uncertainties. I assume the fit can move the normalization of the background through nuisance parameters. Is that right? If so, I assume that Table 1 reflects those shifts. Also, maybe in the Table 1 caption it is worth explicitly mentioning in the last sentence that uncertainties are those constrained by the fit (assuming I have it right).

you are correct about what we do, added "and their normalization allowed to change during the likelihood fit" to the text. The table caption says that the values are after the liklihood fit and we do not think more explanation is needed

232-233 First sentence – I am confused why this statement is here. The rest of the paragraph seems to be talking about extracting A_c

Maybe this is because the figure and the table are in between this text and the previous paragraph. The format has changed now and they are together. The previous paragraph talked about the result when combining the 12 channels and this one explains how to get the Ac for sub-channels

Equation 3 Since r_pos and r_neg come from the fit, why not write down an expression for A_c^fid in terms of these quantities?

The fit gives Ac and r_neg directly, that is why r_pos needs to be written in terms of Ac and r_neg. This methods allows us to obtain Ac and its errors from the fit without having to do an additional error propagation (and have to deal with all the correlations between sources

Figure 2 For the top two plots, the words “pre-fit” and “post-fit” get very close to “Mtt”

This has been fixed%

243-244 How about moving this summary statement later after you have directed the reader to the table with results?

We prefer to leave the formatting to the journal and to latex and not instroduce forced breaks%

Equation 4 Similar question to above. I guess you must have a reason for doing it this way.

We do fit for Ac directly, that is why we need to do this

Figure 4 Do we need to tell the reader what we mean by “impact” or do we believe this is well known?

We think this is known by now in a specialized HEP journal like PLB

261 Perhaps a more descriptive statement about Figure 4. Instead of just “ranking” something like “ranked impact parameters”

Changed to "Figure 4 shows the $\pm 68\%$ confidence level ($1 \sigma$) impacts of the nuisance parameters corresponding to the systematic uncertainties for the inclusive $\AC$ measurement for $\mttbar > 750\GeV$ ranked in order of decreasing importance"

General Comment You spend a lot of time showing the data broken down by years, yet you don’t talk about the consistency of A_c between the years. Are they consistent?

Even though we did run the years separately, they result in results with very large errors as the uncertainties are constrained when combining the channels. So they are consistent but it makes no sense to show the individual channels$ENDCOLOR%

Very nice result and paper! I enjoyed reading it.

More from Sijin Qian

In general

(1) Throughout the paper (including the Abstract and Figure’s captions and axis labels, etc.), about the mass region “750 < Mttbar < 900 GeV”, there are three expressions,

(a) in the Abstract (the last line) and Table 2’s caption (the 3rd-4th lines), as “(750, 900) GeV”;

(b) L221 and Fig.2’s caption (the 3rd line), as “750 < Mttbar < 900 GeV”;

(c) in the header column of Table 2, as “(750 – 900)”

(d) Fig.3’s horizontal axis label, as “[750,900] GeV

I’m not sure whether they can be expressed consistently (similar as the option (c), but using the short hyphens similar as L99, etc.), e.g.

(A) in the Abstract (the last line) and Table 2’s caption (the 3rd-4th lines) can be shortened from (two places)

“(750, 900) GeV and > 900 GeV” → “750-900 and > 900 GeV

This has been done in the Abstract.

(B1) L221: “and two mass regions (750 < Mttbar < 900 GeV and Mttbar > 900 GeV).” → “and two Mttbar regions (750-900 and > 900 GeV).”

(B2) Fig.2’s caption: (the 2nd-3rd line)

“The plots in the upper row correspond to 750 < Mttbar < 900 GeV, and the plots in the lower row to Mttbar > 900 GeV.” →

“The plots correspond to Mttbar = 750-900 GeV in the upper row, and > 900 GeV in the lower row.”

This would be shorter.

(C) Table 2: (two places, in the header column)

“(750 – 900)” → “(750-900)”

The expanded form for the range may improve readability.

(D) Figure 3: (horizontal axis label of each plot; also, the unit “GeV” may should be put after the variable “Mttbar”)

">750 GeV | [750,900] GeV | >900 GeV Mttbar " → “>750 | 750-900 | >900 Mttbar (GeV)”

The way this is done seems to be equivalent.

(2) Throughout the paper (including the Abstract, etc.), to follow the good examples on L276 and L296, etc., a hyphen may should be added between the “top quark” in the term “top quark charge asymmetry”, i.e.

in the Abstract (the 6th and 9th lines), L196, L258 and L269: (five places)

“top quark charge asymmetry” → "top-quark charge asymmetry

"top quark" is never hyphenated per the Guidelines.

Page 0, in the Abstract

(3) The 10th line, the “QCD” may should be explained, but since it has not been used again in the Abstract, thus can be simply spelled out, i.e.

“in QCD perturbation theory with” → “in quantum chromodynamics perturbation theory with” or “in perturbation theory of quantum chromodynamics with”

This is now "in quantum chromodynamic perturbation theory".

Pages 1-10

(4) L14-15, L80, L145, L162, L181-182, L184 and L190. These lines may be shortened from

(a) L14-15: (to follow the good examples on L209-210, etc.)

“of the top quark and top antiquark rapidities and …” → “of the top quark and antiquark rapidities and …”


(b) L80, L181-182 and L184: (four places, as the “PUPPI”, “muR”, “muF” and “hdamp” have not been used afterward in whole paper) L80: “(PUPPI) algorithm [37, 38].” → “algorithm [37, 38].”

L181-182: “Renormalization (muR) and factorization (muF) scales at …” → “Renormalization and factorization scales at …”

The symbols appear in Fig. 4.

L184: “element and parton shower matching scale (hdamp) regulates …” → “element and parton shower matching scale regulates …”

The symbol appears in Fig. 4.

(c) L145: (as the “Mttbar” has been introduced on L42)

“Only events with chi2 < 30 and ttbar invariant mass greater than 750 GeV are retained for” →

“Only events with chi2 < 30 and Mttbar > 750 GeV are retained for”

This is shorter.

(d) L162: (to follow the good examples on L99, etc.)

“added uncertainty of 1.6% for the combined 2016, 2017 and 2018 integrated luminosity.” → "added uncertainty of 1.6% for the combined 2016-2018 integrated luminosity.

This is shorter.

(e) L190: (to follow the good examples on L189 and L191, etc.)

“top quark transverse momentum,” → “top quark pT,”

Slight preference for not using \pt as a replacement for "transverse momentum" unless used with numerical values.

(5) L173, L186-187 and Fig.4. The terms “JER”, “JES”, “ISR” and “FSR” have been used for only one time each in Fig.4 (the vertical axis labels) in whole paper; therefore, they may not have to be introduced on L173 and L187, i.e.

(a) L173: “jet energy corrections (JEC) and resolution (JER) are …” → “jet energy corrections and resolution are …”

These symbols appear in Fig. 4.

(b) L186-187: “to the initial- and final-state radiation (ISR and FSR) modeling …” → “to the initial- and final-state radiation modeling …”

These symbols appear in Fig. 4.

(c) Fig.4’s vertical axis labels: (then they should be simply spelled out on four lines, since there seem sufficient room in the column to accommodate these full names) "2 FSR … 4 JEC … 16 ISR … 19 JER " →

"2 Final-state radiation … 4 Jet energy correction … 16 Initial-state radiation … 19 Jet energy resolution

The authors feel that the table reads better with the acronyms.

(6) L188, the “Q2” has neither been explained and nor been used elsewhere in this paper yet. It may should at least be explained briefly here, i.e.

“at the scale Q2 for the ttbar samples.” → “at the scale Q2 for the ttbar samples, where Q is …”

Q^2 is on the same footing as sqrt(s) - no need to explain?

(7) L219 and L251, to be consistent with all other CMS papers, each equation index in text should be put into a pair of brackets, i.e.

L219: “of the individual likelihoods from Eq. 2,” → “of the individual likelihoods from Eq.(2),”

Agree - this is specified in the Guidelines and should be corrected...With a space.

L251: “as defined in Eq. 1 common” → “as defined in Eq.(1) common”

Agree - this is specified in the Guidelines and should be corrected...With a space.

(8) Table 1, the letter size of the content in Table is a little too small (especially the subscripts) now. As the width of the Table can be extended outward for ~1cm on each of both left and right margins, readers must appreciate it if the letter size in Table 1 can be enlarged to be similar as in Table 2.

The font size has been increased.

(9) These are parts of three previous comments for v17, i.e.

(8) Figs.1-2 (now also including Table 2 in v19)

(a) In the captions, to be distinguishable from the “top” and “bottom” quarks, the position indicators may be better to be changed from

“top” → “upper”


“bottom” → “lower”, ** Your ANSWER: ** ** Agree - per the Guidelines change to “upper” and “lower” to avoid ** any confusion with top and bottom quarks.

This could still be changed for the Table 2 caption.

CWR comments

  • Paper draft as of July 1st, with all commments received during CWR implemented as indicated below. The only comments that the LE is still working on are the Type A comments from Sijin Qian. All other Type A and Type B have been addressed. TOP-21-014_temp.pdf

from Greg Landsberg


  • General: throughout the paper you misuse the term (Lorentz) boosted in two different ways. First, you refer to the sample as "boosted events". This is a misnomer, as the even can't be "boosted"; individual objects are. Second, you refer to the tt¯pair as boosted, which is simply wrong from the physics point of view. The tt pairs at such large mass are produced essentially at rest, so there is nothing "boosted" about them; the only things that are boosted in your analyses are the individual top quarks and their decay products. This needs to be corrected throughout the paper [please see detailed comments below].


* Title: ... asymmetry with highly Lorentz-boosted top quarks in the single-lepton channel at s√=13, s=13 TeV.


* Abstract, LL1-2: The measurement of the charge asymmetry in tt events with highly Lorentz-boosted top quarks decaying to a single lepton and jets is presented.


* L19: ... measuring AC in tt¯ events with highly Lorentz-boosted top quarks will lead ...


* LL45-46: ... optimizes the reconstruction of events with tt¯invariant mass above 750 GeV, resulting in highly boosted top quarks. We target ...


* LL88-93: you really should expand on the t and W tagging: namely provide the variable used to infer the substructure (N-subjettiness, I assume), selection on it, and the jet mass windows, along with the grooming algorithm.

We have added description and references

  • L94: decays of b hadrons [the only "B hadron" is the B meson; "b hadrons" are many that contain a b quark].


* L100: of up to 138 fb−1 [as the electron channel has less!].


* LL120-121: please, provide the full versions of \MGvATNLO and \PYTHIA used in the analysis and specify the PDF sets used in generation. The CP5 tune was only used for 2017-2018; so add the CUETP8M1 tune as well.


* LL132-133: why no b tagging information is used in the resolved case? It surely would reduce the combinatorics! In fact, it's not clear from the paper at all, where the b tagging information is actually used, as you do not specify anything about b tagging for the other two categories. Please, clarify in the paper.

It says that no b-tagging information is used in "this process" referring to the jet assignment to the leptonic or the hadronic top. b-tagging is indeed used as stated in line 116. So this means that all events have at least one b-tag but which jet is b-tagged is not used in the ttbar event reconstruction. In any case, we added a sentence in this paragraph to make it clearer.

* LL134-135: minimizes the difference between the reconstructed tl and th masses and the true top quark mass, weighted by the corresponding uncertainties. [I assume you meant that you minimize ΔM/σM , which is not what's written in the sentence, as "within the uncertainties" doesn't mean weighting by the uncertainty.]

This was changed and a reference was added following comments from others.

* LL135-136: you said that you pick the combination with the smallest χ2. That, by itself, does not reject the background, unless you require this combination to have the χ2 below a certain threshold. If so, you need to specify this threshold.

This information was added..

* L142: ... the boosted nature of top quarks in the events becomes evident ...


* L143: the number of AK4 jets you show in the last plot carries little information about the top quark boost; it's the number of AK8 jets that matters. Moreover, contrary to what you state on this line, there are basically no events with two reconstructed jets; so you need to synchronize the plot and the text.

We believe this plot is meaningful because typically without our special reconstruction, ttbar samples do not have events with 2 or 3 jets, they have the 4 jets from the ttbar + radiation. So even though we have more 4 jets than 2 or 3, the fact that we have some events with 2 and 3 jets shows that our two-body reconstruction is working as intended. Also, showing AK8 number of jets is not meaningful as it is just zero or one..

    • Figures 1-2: It's really odd that you separate the W+jets background and lump all others in the "Others" [which should be typeset as "Other" in the legends, whereas it's evident from Table 1 that W+jets background is smaller than the single top quark one in all categories. You should either show the ST background as a green histogram and move W+jets to "Other", or show both ST and W+jets as histograms and separate them from the rest (DY + QCD + Diboson), which will become "Other".

This separation and grouping was used in previous papers that use this selection and we prefer to keep it as is..

Section 5, general: I see no uncertainty due to the L1 trigger preferring in 2016-2017. Why was not it introduced? Given how energetic your events are, this must be a sizable effect. This needs to be added to the analysis and the paper. I also see no uncertainty due to the lepton momentum scale and efficiency - please discuss these as well.

We now spell out all sources of lepton uncertainties and define the names as they are used in Fig. 4..

- LL153-154: swap the order of 2016, 2017, and 2018 to follow the chronological order; the uncertainty in 2016 is 1.2\%, not 2.5\%. Specify the overall uncertainty taking into account the partial correlation: "... across the three years, resulting in the overall uncertainty of 1.6\%."

The luminosity values were fixed and references were added. 2018 muons is our most important channel and 2016 electrons the least one. We have consistently used the order muon 2018-2016, electron 2018-2016 and prefer to leave it like that.

- LL158-159: there is no such thing as "minimum bias cross section"; please use the correct term, which is "total inelastic cross section"; give reference to our corresponding paper to justify the central value and the 4.6\% uncertainty used.

Fixed, added https://inspirehep.net/literature?sort=mostrecent&size=25&page=1&q=find%20eprint%201802.02613

- LL192-194: you have not mentioned the unfolding anywhere preceding this claim ["This unfolding approach ..."] Everything discussed so far was the basics of the maximum likelihood technique and does not imply the unfolding. The presence of the unfolding only becomes clear from Eq. (2), so you should move the sentence in question after L209.


- Table 1: reorder the columns to follow the chronological order: 2016, 2017, 2018 (per lepton).

See our comment before, prefer to leave as is

- Figure 2 caption, penultimate sentence: how can one see from the figure that the uncertainties are reduced significantly as a result of the fit, given that you only show the post-fit results? So, you should remove the sentence from the caption, and if you want to say this in the text, please do so properly and explain what makes you claim that the uncertainties are reduced considerably; certainly, Fig. 2 by itself doesn't carry this information.

Fig. 2 has pre-fit (left) and post-fit (right). You can see that the systematic band gets reduced and the agreement between data and MC gets better.

- L232: the fiducial phase space has never been defined; please do so properly in the event selection section and also explain how you define it at the generator level.

It is made clear now that it is the final candidate selection. The fiducial definition includes our ttbar event reconstruction (there is no way to find the ttbar mass without the Chisq reconstruction which is done with reconstructed quantities). That is why we unfolded to full phase space, to allow for comparison with theory,

- Figure 3: I do not understand what the left and right panes show. From the caption it appears that the left pane shows the expected (under the SM assumption) and observed values, while the right one shows the comparison with theoretical predictions. However, the legends in both panes are the same, which makes me think that the left plot shows the results for the fiducial phase space, while the right one - fro the full phase space. Which is correct? You really need to make this clear in both the caption and the legends. If the first interpretation is correct, that "Predicted AC " in the left plot should really be "Expected 68\% confidence level" or similar [but then I really see no point in this plot at all].

The caption and the plot have been changed to make it clear what is plotted.

- Figure 4: the figure is far from the presentation quality, and you would be well-advised to remove it from the paper and move it to the supplementary material on the Twiki. If you want to keep it, you should replace "Impact on signal strength" with "Impact on AC" and change the scale on the axis accordingly [not only signal strength has not been defined, but it's not really clear what "signal" is in this case!]. Then, the category labels should be adjusted to match the paper and CMS Style: "Top quark pT", "t tag" "b tag", "Integrated luminosity", "t and W mistag", "Electron reconstr.", "Muon reconstr."; some of the variables have not been defined at all, hdamp, 2D (e + jets), 2D (μ + jets), Electron HLT, Muon HLT. The 1σ impact should be "68\% confidence level impact", as σ has never been defined. Finally, it appears that the "Top pT" entry has both positive and negative variation pulls in the same direction - how could it be given that this is a one-sided uncertainty, so there is only one pull?!

Changed to impact in Ac, all variables are now defined in the systematic section, top pT is one sided (spelled out in systematic section and added reference)

- L253: ... charge asymmetry in tt events with highly boosted top quarks in ...


- LL255-256: ... was optimized for top quarks produced with high Lorentz boosts ...





LL2-3: is performed using data collected in proton-proton collisions at s√=13 TeV with the CMS detector at the CERN LHC and corresponding to an integrated luminosity of up to 138 fb−1.

Changed as suggested. The different integrated luminosity for the electron channel is a detail, and this is explained in the text.

L10: next-to-leading order;

Leave hyphenated (next-to-leading-order) as this is modifying "electroweak corrections"..


L4: in the Fermilab Tevatron proton-antiproton collisions and the CERN LHC proton-proton (pp) collisions;

Changed as suggested (but (\Pp\Pp).

L12: antiquarks;

Yes - no hyphen per the Guidelines.

Eq. (1) and many other places in the paper: suggest using "Ac [small Roman c] as the notation for the asymmetry, as "c" stands for charge, and therefore should not be either capitalized or italicized;

Okay - no conflict with charm quark symbol.

L15: antiquark;

Removed hyphen.

LL15-16: beyond-the-SM (BSM);

Added hyphens.

L17 of the order of;

Change to the simpler "about".

L23: delete "and model that predicts the existence of heavy" - just continue the list of new particles;

Change as suggested although this requires separate references.

LL24-25: new spin-0 or spin-1 particles;

One could have both spin states or just one or the other. Leave as "and".

LL44-45: in this Letter is the first one that uses pp collision data;

But \Pp\Pp collision data.

L47: (W→ℓν, where ℓ

is a muon or electron) and the other;

Agree, a definition of "\ell" is needed.

L50: results in the lepton appearing as;


L51: Dedicated jet and lepton selections at the;

Yes, replace "cleaning" with "selections".

L62: referred to as "boosted", ``semiresolved", and ``resolved".

Agree, the suggested order is more natural and the category labels look better uncapitalized.

The CMS detector and event reconstruction:

L81: p⃗ T

[Roman subscript; in two places];

This had been flagged earlier.

L94: multi-classification \textsc{DeepJet} algorithm;

Yes, if there is no \DEEPJET newcommand.

Collider data and simulated samples:

L99: collected with the CMS detector in 2016--2018 and corresponding;

Leave the explicit "Run N" references for the PubComm.

L119: next-to-leading order (NLO) \POWHEG v2 [38] generator.

next-to-leading-order (modifies "POWHEG generator").

Event reconstruction:

L135: true top quark mass;

Add "quark".

LL138-139: W-tagged jets: boosted contains events ... no W tag; semiresolved contains ... no t tag; and resolved contains;

Yes - consistent with earlier change.

L142: in the candidate sample;

Agree: our -> the is more neutral.

Fig. 1 caption, L1: data and the SM predictions for the; L2: (discussed in Section 6) [don't tell a reader what to do!]; L3: (upper left); (upper right); L4: (lower left); (lower right). L6: (discussed in Section 5). LL5-6: remove the last sentence of the caption, as you said this on LL144-145 already.

"upper" and "lower" changes made earlier; "described in Section" although this is not too different from "see Section"; agree that the idea of the last sentence has already been expressed.

Systematic uncertainties:

L150: and the integrated luminosity normalization.

Don't think "integrated" is needed here. Whether the luminosity normalization is affected for each lumi bundle or for the integral, the result is the same.

L166: The uncertainties in the tagging scale factors are parameterized as functions of the jet pT

Agree - the plural form is more appropriate here. . L176: into account for the tt¯

"signal" should be understood here.

signal and;

Unfolded results:

LL188-189: Barlow--Beeston ``lite" approach [42].

Stay with B-B-lite. This can be found online, so the use as a handle trumps slang considerations.

L199: The index i runs over the bins at the generator level;

Agree - it is better to avoid the word "truth".

L203: The response matrix;

Agree - capitalize the start of each sentence in the list (but okay) sto start with symbols).

L213: (750<Mtt¯<900 GeV and Mtt¯>900 GeV).

Changed for consistency with similar range descriptions.

Eqs. (3, 4) and L227: replace the subscript "truth" [jargon] with "gen";


Table 1 caption, L2: and 3 years: 2016, 2017, nd 2018, separated into two mass regions) for;

Authors prefer latest-to-earliest counting of years. Leave for PubComm.

Table 1 body, first subheader: 750<Mtt¯<900 GeV;

Already changed.

Fig. 2 caption, L1: data and the SM predictions; L2: The plots in the upper row; L3: (750<Mtt¯<900 GeV and the plots in the lower row; L6: remove the last sentence; already stated in the text.

Some of these suggestions were implemented earlier. Agree with the other. % ENDCOLOR%

LL228-229: move "and" from L229 to L228 [i.e., no line break];

Let TeX handle the line breaks.

LL232-233: within the uncertainty;

Add "the".

L236: (``Asimov data").

Add double quotes.

Table 2 body, header line: Stat, Syst, MC stat, Total; subheads: AC

"stat" and "systematic" but otherwise implement suggestions.

in the fiducial/full phase space;

Add "the" in both rows.

L248: at next-to-NNLO (NNLO);

next-to-NLO...But not used again in paper..


LL253-255: in proton-proton collisions at s√=13

Spell out "proton-proton" in Summary.

TeV has been presented, based on data collected by the CMS experiment at the LHC, corresponding to an integrated luminosity of 138 fb−1. The selection;

Very similar form adopted.

from Sijin Qian

In general

  • (1) Throughout the paper (including in Figure and Table captions and in Tables, etc.), to be consistent with some good examples in this paper (e.g. Figs.1-2's legends and plot labels, etc.) and
to avoid some bad examples (e.g. L119-120 and Fig.3's caption where the "Z + jets" and "e + jets" have been cut into two lines, etc.),the spaces before and after the symbols "+" in the expressions of "xxx + yyy" may be better to be removed, e.g.L116-117: (three places), "in the mu + jets (e + jets) channel. To suppress the contribution from the W + jets background," --> "in the mu+jets (e+jets) channel. To suppress the contribution from the W+jets background,"

* Other places where also need to be changed by the similar way are L100, L102-103 (two places), L105-106 (three places), L119-120 (three places), Fig.1's caption (the 2nd line), L173, L212 (two places), L216, Table 1 (in the caption, the 2nd line, two places; in the headercolumn (the 3rd row in each sub-Table), two places), Fig.3's caption (the 1st two lines, two places), and L256, etc.

The X+jets combinations are specified by newcommands within the main body, so LaTeX has some freedom to adjust spacing and we would prefer not interfere with this. It may be that the Final Reading will require some changes and the journal will undoubtedly have their own style guidelines.

* (2) Throughout the paper, the terms "diboson", "JER", "ISR" and "FSR" have been used for only one time on L216 and in Fig.4 (the vertical axis labels) in whole paper; therefore, they may not have to be introduced on L122, L165 and L177, respectively, i.e.

(a) L121-122: "Vector boson pair (diboson) events are simulated with PYTHIA8." --> "Vector boson pair events are simulated with PYTHIA8."

Since "diboson" is a familiar term and it appears in one instance (aside from the definition), we would prefer to keep this.

(b) L164-165: (together with the item (6c) below to remove the never-used "JES")

"in the jet energy scale (JES) and resolution (JER) are ..." --> "in the jet energy scale and resolution are ..."

(c) L177: "to the initial- and final-state radiation (ISR and FSR) modeling ..." --> "to the initial- and final-state radiation modeling ..."

(d) L216 and Fig.4 (the vertical axis labels), they may be simply spelled out, i.e.

(i) L216: "The diboson background yield" --> "The yield of vector boson pair background"

(ii) in Fig.4, the vertical axis labels,

"2 FSR . . . 16 ISR . . . 19 JER" -->

"2 final-state radiation . . . 16 Initial-state radiation . . . 19 jet energy resolution"

For acronyms appearing in figures we feel these should be defined even if it is only a one-time use. For those appearing one or fewer times in the text we tend to agree that the Guidelines should be followed.

  • Page 0, in the Abstract (3) The 1st and 3rd-4th lines, the "ttbar" and "pp" should be explained at their 1st appearances in the Abstract on the 1st and 3rd lines, i.e.

(a) the 1st line:"for highly boosted top quark pairs decaying ..." -->"for highly boosted top quark pairs (ttbar) decaying ..."

(b) the 3rd line: (for "pp", but since it has not been used again in the Abstract, so can be simply spelled out)"of data collected in pp collisions at ..." -->"of data collected in proton-proton collisions at ..."

(c) the 4th line: (then can be shortened by using the "ttbar") "for top quark-antiquark pairs produced with large" -->"for ttbar pairs produced with large"

In order to avoid explaining the meaning of \ttbar and \qqbar at the beginning of the main body, we would prefer to assume these are well known symbol combinations.

  • Pages 1-7:(4) L4-5 (L8-9 and L44) and L82-83, the "ppbar", "qqbar" and "DeltaR" may should be explained at their 1st appearances in text on L4-5 and L82, i.e.

(a) L4-5:"the Tevatron ppbar collisions and the LHC pp collisions is that the former is dominated by qqbar annihilation ..." -->"the Tevatron proton-antiproton (ppbar) collisions and the LHC pp collisions is that the former is dominated by quark-antiquark (qqbar) annihilation ..."

We feel that the multiple acronym explanations would adversely affect the readability of the Introduction and that "pp", "lobar", "ttbar" are either well known or understandable from the text. "DeltaR" should be defined.

(b) L82-83: (also, as the numerical value of the angle phi has been implicitly shown here and L112, etc., and an angle can be measured in either the radians or degrees, therefore, the unit of phi may should be specified) "within DeltaR = sqrt(...phi ...) < 0.4 of an AK4 jet, where phi is the azimuthal angle," -->"within an angular distance DeltaR = sqrt(...phi ...) < 0.4 of an AK4 jet, where phi is the azimuthal angle in radians,"

The CMS Guidelines explicitly state that the use of "radians" is understood in this context (and suggest that the information should NOT be provided).

(c) L8-9 and L44: (then can be shortened accordingly)

(i) L8-9:"such that the top quark (antiquark) is preferentially emitted in the direction of the incoming quark (antiquark) [3]." --> "such that the t (tbar) is preferentially emitted in the direction of the incoming q (qbar) [3]."

We think that writing out the names avoids any confusion that might result from sym(symbar).

(ii) L44:"is the first one that uses proton-proton collision ..." -->"is the first one that uses pp collision ..."

We prefer the indirect explanations of the particle symbols in the current text.

* L12-62, I wonder whether a Feynman diagram should be added for the process in this analysis similar as most of other CMS papers.

A couple of diagrams might have been appreciated but this is something the ARC should have requested prior to CWR.

* (6) L15, L24-25, L27, L37-39, L66-67, L77, L81, L84, L108, L113-114, L131-132, L140, L153, L164, L174, L180 and L224. These lines may be shortened from

* (a) L15: (as the "SM" has been introduced on L6) "be modified by beyond the standard model" -->"be modified by beyond the SM"

Agree that BSM should be defined and the definition could rely on the SM acronym.

* (b) L24-25:"These models introduce new spin-0 and spin-1 particles in ..." -->"These models introduce new spin-0 and -1 particles in ..."

Prefer to spell out "spin-1" to avoid the possible misinterpretation of spin -1 (i.e., a negative total spin value).

* (c) L27, L66-67, L76-77, L164 and L174: (eight places, as the "EFT","ECAL", "HCAL", "PV", "JES" and "ME-PS" have not been used afterward in whole paper) L27:"an effective field theory (EFT) approach in which" --> an effective field theory approach in which"

(Guidelines) We agree except in cases where the acronym appears in figure legends,

L66-67:"calorimeter (ECAL), and a brass and scintillator hadron calorimeter (HCAL)," --> "calorimeter, and a brass and scintillator hadron calorimeter,"


L76-77:"The primary vertex (PV) is taken to be ..." -->"The primary vertex is taken to be ..."

Agree - PV is not needed as it is not used subsequently.

L164:"Uncertainties in the jet energy scale (JES) and resolution" -->"Uncertainties in the jet energy scale and resolution"

Since "JER" and "JEC" appear in Fig. 4 we feel that there is some motivation for the definition.

L174:"The matrix element and parton shower (ME-PS) matching ..." -->"The matrix element and parton shower matching ..."

Agree: ME-PS does not serve much purpose if it is not appearing in Fig. 4 or anywhere else in the paper.

* (d) L37-39: (one of two duplicated "7 and 8 GeV" and of two "AC =" may be saved) "have combined their inclusive and differential measurements of AC at two center-of-mass energies (7 and 8 TeV), obtaining AC = 0.005 + 0.007 (stat) + 0.006 (syst) and AC = 0.0055 + 0.0023 (stat) + 0.0025 (syst) at 7 and 8 TeV," -->"have combined their inclusive and differential measurements of AC and obtained AC = 0.005 + 0.007 (stat) + 0.006 (syst) and 0.0055 + 0.0023 (stat) + 0.0025 (syst) at two center-of-mass energies (7 and 8 TeV),"

Agree - the first instance of "7 and 8TeV" is not needed.

* (e) L81 and L84: (two places, to follow the good examples on many other CMS papers, the "pT sum of ..." can be used to shorten the "sum of the pT of ..."; also, to be consistent in this paper, the font of subscript "T" in "pT" should be changed)

We agree with the ptvec rendering point and will use \ptvec for this. The expressions seem equivalent to us.

L81:"The total jet pT(italic) is given by the sum of the pT(italic) of its constituents." -->"The total jet pT(non-italic) is given by the pT(non-italic) sum of its constituents."

Agree that standard aliases (i.e., \ptvec) should be used.

L84:"as the negative vector sum of the transverse momenta of all the PF" -->"as the negative vector pT sum of all the PF"

Slight preference to spell out "transverse momenta" in this instance.

* (f) L108:"at least two jets with pTj1 > 150 GeV (pTj1 > 185 GeV)," -->"at least two jets with pTj1 > 150 (185) GeV,"

Agree that the compactness of the suggestion helps here although the sentence remains complicated because of the dual threading over muons and electrons and j1 and j2.

* (g) L113-114:"pT,rel(l, j) is the transverse momentum of the lepton with respect ..." -->"pT,rel(l, j) is the pT of the lepton with respect ..."

Slight preference to spell out "transverse momentum" in this context.

    • (h) L131-132: (two "the"s may be removed)

"to either the tl or the th. For events with no t tag and no W tag, all possible assignments of AK4 jets are considered for both the tl and the th." -->"to either the tl or th. For events with no t tag and no W tag, all possible assignments of AK4 jets are considered for both the tl and th."

The current form reads better to us even if "the" appears several times.

    • (i) L140:"events with a ttbar invariant mass (Mttbar) greater than 750 GeV are" -->"events with a ttbar invariant mass Mttbar > 750 GeV are"

Since this is the first appearance of Mttbar, we would prefer to avoid the compacting.

    • (j) L153: (two places; also, the Refs. should be given for the 3 uncertainties) "2.5%, 2.3%, and 2.5% for 2018, 2017, and 2016" -->"2.5, 2.3, and 2.5% for 2018, 2017, and 2016 [xx-yy]"

Agree - a single % suffices and a reference should be given although it looks like references have been included since this comment was made..

    • (k) L180: "depends on the generator-level top quark transverse momentum," -->"depends on the generator-level top quark pT,"

Slight preference to spell out "transverse momentum" in this context.

    • (l) L224:"corresponding to the Delta|y| > 0 and Delta|y| < 0 regions," -->"corresponding to the Delta|y| > 0 and < 0 regions,"

Agree - the shortening is beneficial.

  • (7) L34, I'm not sure whether the "collaborations" should start with a capital letter to be consistent in this paper (e.g. L37 and in the References Sections, etc.), i.e."from the two collaborations, CDF and D0," -->"from the two Collaborations, CDF and D0,"

The CMS Guidelines support capitalizing "collaborations" in this context. The sentence has been slightly modified to remove the emphasis on "two collaborations" and to make the connection with CDF and D0 clearer.

  • (8) Figs.1-2 * (a) In the captions, to be distinguishable from the "top" and "bottom" quarks, the position indicators may be better to be changed from "top" -->"upper" and "bottom" -->"lower", i.e.

Agree - per the Guidelines change to "upper" and "lower" to avoid any confusion with top and bottom quarks.

(i) Fig.1: (the 3rd-4th lines, four places), "Delta|y| (top left), reconstructed Mtt (top right), distance between the lepton and the closest AK4 jet DeltaRmin(l, j) (bottom left), and the number of AK4 jets (bottom right)." -->"Delta|y| (upper left), reconstructed Mtt (upper right), distance between the lepton and the closest AK4 jet DeltaRmin(l, j) (lower left), and the number of AK4 jets (lower right)."

Agree - follow the advice of the Guidelines to avoid possible confusion with top and bottom quarks.

(ii) Fig.2: (the 2nd-3rd lines, two places; also, an extra unit "GeV" in the inequality on the 2nd line may be removed)

"The plots in the top row correspond to 750 GeV < Mttbar < 900 GeV, and the plots in the bottom row to Mttbar > 900 GeV." -->

"The plots in the upper row correspond to 750 < Mttbar < 900 GeV, and the plots in the lower row to Mttbar > 900 GeV."


* (b) In the caption of Fig.1, the line above the last line may be shortened from "statistical uncertainty and the systematic uncertainty (...)." --> "statistical and systematic uncertainties (...)."

The "MC" preceding "statistical" makes this suggestion problematic.

* (c) In the upper two plots of Fig.2, the plot labels at the upper-left corner of each plot under the "CMS", similar as the item (a(ii)) above, an extra unit "GeV" in the inequality may be removed, i.e."750 GeV < Mttbar < 900 GeV" -->"750 < Mttbar < 900 GeV"


* (9) Table 1, in the row below the header row, the same as the item (8c) above, an extra unit "GeV" in the inequality may be removed, i.e."750 GeV < Mttbar < 900 GeV" -->"750 < Mttbar < 900 GeV"


* (10) L236, a Reference may should be given to the "Asimov data", i.e."to their expected values (Asimov data)." -->"to their expected values (Asimov data [xx])."

Agree IF there is an appropriate reference. Simply pointing to a book on statistics is not very helpful.

  • (11) Table 2, in the header column, the extra spaces inside the brackets in the header r (before the "GeV") and the 2nd row in each sub-Table (before and after the dash symbol; also, to be consistent in this paper, e.g. L100, etc., the dashes can be replaced by hyphens), i.e.

"Mttbar ( GeV)

> 750

(750 --- 900)

> 900

> 750

(750 --- 900)

> 900 " -->

"Mttbar (GeV)

> 750


> 900

> 750


> 900 "

The extra space before "GeV" should be removed but it seems okay to overrule text formatting rules for the numerical ranges within a table.

Pages 9-10

(12) Fig.3's caption and L248-249, the "EW" and "NNLO" should be explained at their 1st appearances in text on L248-249; however, since they have been used for only one time each on the 3rd line of Fig.3's caption, thus not have to be introduced, and can be spelled out in both Fig.3's caption and L248-249, i.e.

(a) L248-249: (and the "NLO" introduced on L119 can be used here)

"at NNLO QCD and NLO EW corrections from Ref.[4]." --> "at Next-to-NLO QCD and NLO electroweak corrections from Ref.[4]."


(b) Fig.3's caption: (the 3rd line, they also should be spelled out) - "including NNLO QCD and NLO EW corrections from" --> "including Next-to-NLO QCD and NLO electroweak corrections from"


(13) Fig.4

(a) In the vertical axis labels,

(i) on the 4th line, the term of "JEC" has not been used anywhere else in this paper yet, so should be spelled out;

(ii) on the 22nd line, the 1st letter of each line should be capital and on the 24th and 26th lines, the 2nd words should be in the lower case, i.e.

"4 JEC . . . 22 luminosity ... 24 Electron Reco ... 26 Muon Reco " -->

"4 Jet energy correction . . . 22 Luminosity ... 24 Electron reco ... 26 Muon reco "


(b) In the caption,

(i) the 2nd-3rd lines, the colors of blue and red are mentioned, but can not be displayed in black-white; the problem can be solved by using the darkness, i.e. (also, it may be clearer if a word of "horizontal" is added before the "bars" on the 3rd line)

"The blue and red bars show ..." --> "The blue light and red dark horizontal bars show ..."

Assume you mean "light blue" and "dark red" but these are also colors so the use is ambiguous. Since the legend explains the bar convention, the B&W concern may not apply. However, "red" should come before "blue" to pair with up and down.

(ii) the last line, the "PDF" and "HLT" used in the vertical axis labels should be explained since they have not been used anywhere else in this paper yet, i.e.

"MC statistical uncertainties are omitted here." -->

"MC statistical uncertainties are omitted here. In the vertical axis labels, the "PDF" is parton distribution function and the "HLT" is high level trigger."

we have defined these terms earlier in the text now.

(14) L253-254 and L259-261, in the Summary Section, to be consistent with all other CMS papers, the "ttbar", "pp", "Mttbar", "NNLO", "NLO" and "SM" should be explained (or spelled out if it would not be used again in this Section) at their 1st appearances in this Section on these lines, since some readers may only read the Summary Section instead of whole paper, i.e.

(a) L253-254: (for "ttbar" and "pp")

"top quark-antiquark pairs in pp collisions at ..." --> "top quark-antiquark pairs (ttbar) in proton-proton collisions at ..."

As with the Introduction we would prefer not to expand the sentences to explain the particle symbols. Leave for the PubComm.

(b) L259-261: (for the "Mttbar", "NNLO", "NLO" and "SM")

"ttbar events with Mttbar > 750 GeV corrected to ... corresponding theoretical prediction at NNLO in perturbation theory with NLO electroweak corrections from Ref.[4] is 0.0094 + .... Good agreement between the data and the SM predic-" -->

"ttbar events with the invariant mass Mttbar > 750 GeV corrected to ... corresponding theoretical prediction at next-to-next-to-leading order in perturbation theory with next-to-leading order electroweak corrections from Ref.[4] is 0.0094 + .... Good agreement between the data and the standard model predic-"

Agree, since this appears in the Summary.

(15) Between L262-263, an Acknowledgment Section should be added according to the recommendation from the CMS Collaboration Board regarding to the Ukraine-Russia war.

The latest guidance is for the post-FR group to add the Acknowledgments.

from ELTE

Type A

  • line 40: perhaps not necessary to explain what "syst" and "stat" means.


  • line 59 low end of the pT spectrum

Added the missing "spectrum".

  • line 189 is "lite" a slang here ("light" - but then its a jargon) or a typo (like)? Comes from combine github but seems to appear differently in other papers: Barlow–Beeston light method or simply Barlow–Beeston method

Since "Barlow--Beeston-lite" appears online (as does "B-B light" and "B-B method" ) we have left this in place. "lite" serves as a handle in this instance so escapes the slang criticism.

Type B

  • The title and abstract uses the term "charge asymmetry" but it only becomes clear in line 12 that this is a sort of rapidity asymmetry between tbar and t. Could the title and/or abstract be clarified a bit more to avoid confusion?
Previous publications on the topic also have ttbar charge asymmrety in the title. As the target journal is HEP-specific, we think this is OK.

  • line 5: At leading order...: this sentence refers to Tevatron or LHC?
It refers to the ppbar production, no matter where it happens. It is later stated that the gg production has no asymmetry (at any order).

  • line 7: However...: this sentence presumably refers to Tevatron. Make clear.
Same comment as before,it refers to the ppbar production no matter where it happens.

  • line 38: Add reference for ATLAS-CMS combination
Added https://link.springer.com/article/10.1007/JHEP04(2018)033

  • line 75 if the PF muons have low efficiency at high pT, and for this reason you do not use PF muons but outside-in muons, then a) why doesn't CMS modify the PF algorithm for muons to become more efficient using your method, b) why do we discuss this internal technical issue in the paper? At the moment it sounds odd to mention that the analysis is based on PF but then immediately later criticize PF implicitly.
This is the standard procedure from CMS for high pT muons. We did not do anything special for the analysis but just used the muon POG recommendation for high pT muons, as referenced in Ref. 26.

  • Maybe you meant “We also use muons…”?
We only use muons that are not PF muons for this analysis.

  • line 80 remaining PF candidates... : this would be a good place to mention how pileup is treated, since pileup contributes to objects that are not charged particles.
We have added more information as requested by others also..

  • line 96 can any references/explanations be added to t tagging?
We have added more information as requested by others also..

  • line 111 2D selection: does that mean that you make 2 cuts (as mentioned), or you make a cut on one variable and the cut depends on the other variable (that is not explained/mentioned well then). As this sentence was understood differently by colleagues in the group, it might need clarification. E.g. leptons rejected if dR…<0.4 and pT…<25 GeV.
We believe as stated it should be clear that the condition is a logical OR. This definition was used before and there is areference in case someone wants to look deeper into the cut..

  • line 115 it is not clear where all the values of all these cuts come from (how they are optimized)
The event selection follows our prior publication with 2016 data only and is given as a reference. We added the reference again at the end of the paragraph. .

  • line 135 phrased like this, it may be a bit misleading, since it may seem like you want to minimize the difference between t_l and t_h (which is not the case). SUggestion: minimizes the deviation of the reconstructed tl (and th) masses from the true top mass.
We believe the sentence is clear as it states that we minimize the DIFFERENCE OF this and that WITH the comparison. .

  • line 135 Not clear what "within uncertainties" mean. If you minimize a quantity by selecting the best fitting candidate, then you either do it as a significance (ie relative differences divided by the uncertainty), or you ignore the uncertainty; but none of these cases is described well by "within uncertainties". It would be nice to clarify this in the text.
The true top mass refers to the value we obtain from our MC after we do the ttbar event reconstruction. It is fitted with a Gaussian that has a given RMS and that information is taken into consideration by the likelihood minimization. This was also done in our previous published paper and all details are in the given reference. .

  • line 144 does the 'jet size' mean the jet radius parameter here?
Not all jet algorithms are based on a cone and it is common to refer to the extension of the jet energy deposits as jet size. .

  • FIg 1: why are there only 2 bins on the top left plot, in spite of having a lot of statistics?
These are the two bins we use to measure the asymmetry. There is no information for the asymmetry measurement on having finer segmentation as it is just defined as the positive and negative delta|y_top|-|y_antitop| .

  • line 152 how is this 30% and 5% determined?
These are standard for these types of measurements in the top group. In any case, they are priors that are reduced significantly with the likelihood unfolding. We have added a reference.

  • line 153: Lumi uncertainty in 2016 is 1.2%, not 2.5 %. Was the correct value used in the final results?
Yes, fixed in the text..

  • Table 2: the measured fiducial AC increases with mass; the full phase space AC however is non-monotonic (69, 243, 37 e-4 respectively). How is that possible?
This was discussed during the review at quite some length. There is no reason for the AC to fall in between the two mass regions when combining. To better understand this, one needs to look at the pulls for the individual channels and the combination. When all 16 channels are ran together, the likelihood chooses central values that can be different than the choices when running individual channels. .

from ROME

  • One general comment is that Sections 2 and 4 contain "event reconstruction" in the title and related to this, we find that the description of the event reconstruction and selection is spread between section 2,3 and 4. Perhaps a more compact way of organizing the text could be useful.
    • This section describes our ttbar event reconstruction, added ttbar to the title.

Type A

  • Abstract: Double space before "Differential distributions for..."

TeX (or LaTeX) is controlling the spacing in the text.

  • L9: "Since the gg initial state is symmetric, the SM does not predict an asymmetry..." instead of "There is no asymmetry..."

The SM could not predict an asymmetry for a symmetric initial state. We prefer the current phrasing.

  • Eq.1: N, the number of events, is not defined

Added a phrase equating N with the "number of events".

  • L85: Correction are applied to the jets to improve the energy scale and resolution; the ptmiss is modified to take them into accounts accordingly

We assume you meant "into account" but sill prefer the current phrasing.

  • L123: bunch crossings

Yes, crossingS.

  • L140: neither a t tag or a W tag

Changed as suggested (but "neither..nor").

  • L207: N(dmu) represents the priors for the nuisance parameters. Normalisation uncertainties are assigned to a log-normal distribution, all others are nuisances have a normal distribution prior

Changed along the lines suggested.

  • Figure 2: increase space between the x-axis and its labels.


  • L262: Good agreement between this measurement and the most precise SM predictions is thus observed.

Changed but "the measurement" and "most precise SM prediction".

Type B

  • Abstract: Too many NNLO: suggest to simplify as most precise available calculations in perturbation theory (QCD and EWK). You will specify later which is the precision of the calculations
    • This is how the authors from the prediction request we refer to this specific measurement. Left it as it was as the NNLO refer to pQCD first and EW corrections second and various combinations of those two exist.

  • Introduction: We believe that a Feynman diagram may help the reader to understand better all the discussion without necessity to go back to the references
    • We are at the maximum allowed figures for this short letter and believe that this being a regular SM ttbar production does not warrant the space, as we would need to remove one of the other figures. We could add it as supplemental material if needed.

  • L16: "The value of Ac is expected to be on the order of 1% in the SM." seems to be in contradiction with the sentence in L9.
    • The Ac is only present in the ppbar production and not the gg production. Both Tevatron and LHC have contributions from both ppbar and gg production. Within the SM, the AC at the LHC is 1%. We think the text as stated is correct.

  • LL21-29: do the BSM models provide the expected magnitude of the asymmetry? if so, why not mention it to state up front what is the required experimental precision.
    • Unfortunately, they do not. In this paper, the focus is to show that one can measure the AC in this very boosted regime, something that has not been done before. We do not expect to be sensitive to any BSM until the much larger statistics in Run 3 and HL-LHC, where one would do an EFT-type of analysis. But this was not the focus for this publication.

  • L29: if you are not putting any limit on dimension six operators in this paper (however it would be nice to have this… ) it will be good to say that put direct limits on dimension six operators is beyond the scope of this paper but can be done on the basis of the presented results. Also some numerical examples of which kind of discrepancy one may expect in the asimmetry could be useful. Eg these one can say that XXX would produce large asimmetry at the level of YYY
    • See response above. As we are providing the unfolded results, any theorist can try to see how their model fairs with our data. We do not think we need to directly state it.

  • L54: the meaning of "topological requirements" here may not be clear to the reader. Consider providing more information instead of the reference alone.
    • Topological cut was historically used in top publications for any type of 2D cut that rejects a corner of phase space, which is what this OR does.

  • L79: in the AN you mention that you use PUPPI jets for t and b tagging. You should then provide a description and the reference for the PUPPI algorithm.
    • Done

  • L88-89: We would explicitly mention the soft drop algorithm and the N-subjettiness variable before the corresponding references.
    • Done

  • L132: why is the b-tagging info not used at this level of the analysis?
    • It says that no b-tagging information is used in "this process" referring to the jet assignment to the leptonic or the hadronic top. b-tagging is indeed used as stated in line 116. So this means that all events have at least one b-tag but which jet is b-tagged is not used in the ttbar event reconstruction. In any case, we added a sentence in this paragraph to make it clearer.

  • L136: We find the description of the background rejection criterion a bit vague. There is a threshold on the chi^2 value to remove background events from the analysis? In this case you could report it.
    • It is 30 but given the value with no explanation did not seem right to as. The entire selection is based on our published result with 2016 data with was given as a reference.

  • L140: it would be good at this point to have an idea of the fraction of events going into each category for the events with Mtt>750 GeV
    • The main contribution is from the resolved and boosted, but this also depends on the mass bin and the lepton flavor. These are not categories we fit separately and we had the information on a earlier version of the paper, but we could not show it with post-fit errors and it was decided to remove it as it was very confusing. You can find the values (pre-fit) in the analysis note.

  • L184: A statement is missing here about the impact of the systematic uncertainties on the analysis results. You could add here a forward reference to Fig. 4, which provides the plot of the impact of several nuisance parameters.
    • Prefer to leave it as is as these are the pre-fit uncertainties and Fig 4 are the post-fit and cannot be referred to without first explaining the likelihood unfolding, which is what is done in Sec 6.

  • Results: how do the results compare to previous results at lower energies? Has the asymmetry increased due to the enhancement of qqbar->ttbar subprocess due to the selection on invariant mass? A quantitative comparison to previous measurements and an explanation of what this new measurement brings is missing in the Summary or Results section.
    • We added this sentence to the Summary after line 262 as a separate patagraph: This novel measurement of the top-quark charge asymmetry is the first one to use 13TeV data and a binned maximum likelihood unfolding technique to measure Ac directly at parton level in the full phase space. In addition, it is the first result that focuses exclusively on the very high Lorentz-boost regime, using dedicated reconstruction techniques for the hadronically and the leptonically-decaying top quark at both trigger and offline level. Ac is especially sensitive to BSM processes in this highly boosted phase space since the relative contribution of valence quarks increases at high momentum transfer. The result demonstrates that top quark properties can be precisely measured in the highly boosted topology, opening a new era of exploration for Run 3 and HL-LHC.

  • L212: was demonstrated that also splitting according to the presence of a t tag or a W tag any improvement in terms of constraining the nuisance parameters? One sentence in this direction may be added in the text
    • The splitting improves the correct assignment of the jets to the hadronic and leptonic jet, and we have added those values to the paper Please note that the channels are defined irrespective of the topology. But we did clearly see that combining channels, in particular the low and high mass, significantly constrains the main uncertainty which is the top-tagging uncertainty.

  • Fig.2 caption: We would not say that uncertainties are reduced much after the fit, but it can be stated that even before the fit a good agreement with the data is observed (indeed as the S/B is very large in all channels, we would not expect not a large reduction of the nuisance uncertainties)
    • We believe the uncertainties have been reduced a lot, if you look at the table the total uncertainty for the MC prediction is less than 8% while is was 15-20% pre-fit.

  • L239: how the impact of the nuisance parameters is treated in the efficiency to correct to full space (eg lepton reconstruction, scale and resolution, etc)? Better to clarify this aspect
    • The correction to full phase space is done at the MC generator level and is only affected by theory uncertainties on ttbar which are many orders of magnitude smaller than any detector-related uncertainties

  • Fig. 4: We think a table with groups of nuisances will be easier to understand than the impact plot: eg one group could be. lepton trigger, efficiency and ID, etc etc
    • We had a table but removed it due to lack of space for a letter. We could add it to the supplemental material.

from DESY

  • General Comments:In several places, there seems to be a bit of imprecision concerning the terms “boosted events”,  “boosted ttbar pairs”,  and “boosted t quarks”.  This may affect title, abstract, main body and conclusions. According to our judgment, as a consequence of the M_ttbar >750 GeV >> m_t mass cut, the individual top quarks are indeed highly boosted, but the ttbar pairs not necessarily so. Accordingly, we have made a few suggestions for corrections below whenever the concept of “boosted ttbar pairs” is mentioned, starting with the abstract. We leave it to the judgment of the authors whether the term “boosted events” in the title and elsewhere in the paper is vague enough to be appropriate in this sense.  
You are correct and this was done

  • Partially related to the issue above, please add comments early on clarifying  why a cut on M_ttbar > 750 GeV is used instead of cuts on the jets directly. In this context, it could be extremely interesting to include a figure that shows the categorization of jets as a function of pt_top (or - for that matter - M_ttbar), for the categorization of “boosted”, “semi-resolved”, and “resolved”.
We esentially did what CMS and ATLAS have done before which is to measure the Ac in mttbar bins. This might be historic but it was the best way to compare

  • In the introduction and throughout, it  would be good to have more references to existing CMS results, physics objects, systematics, etc..
these were added

  • We could not find the significance of the measurement of the charge asymmetry in the paper, thus it would be nicer to have that also.
A paragrapg was added to the summary section

  • In the results and summary sections, we suggest to add more comments on how BSM models closely related to these results might be constrained by them. In the introduction section, BSM is already introduced and takes quite a  large part of the section. However, we could not find any BSM related results, and the measurements are compared only with SM prediction in the end. 
In this paper, the focus is to show that one can measure the AC in this very boosted regime, something that has not been done before. We do not expect to be sensitive to any BSM until the much larger statistics in Run 3 and HL-LHC, where one would do an EFT-type of analysis. But this was not the focus for this publication.

Type A

  • L. 6: remove “the” before “ttbar production”


  • L. 10: that -> which

Left as "that" as in a restrictive clause (as opposed to NR ", which") but not entirely confident either.

  • L. 10: Propose full stop after “LHC”. And then: “However, because …

Prefer to keep the current phrasing so as not to break the flow of the paragraph.

  • L. 128 and throughout: T tagged jet -> t-tagged jet , W tagged jet -> W-tagged jet

Yes, changed to be consistent with CMS guidance on hyphenation.

  • L. 148: background simulated -> simulated background

Since the signal sample is also simulated and there is no data-driven background, changed to "in both signal and background samples".

  • L. 182: for -> from

%RED" Left unchanged although not completed confident as to what this refers to.

  • L. 194 than a -> than those obtained with a

Changed as suggested.

  • L. 222: rephrase “the unfolding performs a” for better English, e.g. “a multi-dimensional maximum likelihood fit of the simulation to observed data is performed to obtain to signal strengths,

Although "unfolding performs" could be criticized on the grounds that this personifies "unfolding", we have left this unchanged to keep some emphasis on "unfolding".

  • L. 233: ‘Fig. 3’ -> ‘Figure 3’

We are strictly following the Guidelines, which as the full form (Figure) only for the very first word in a sentence.

  • L. 247: ‘Fig. 3’ -> ‘Figure 3’

We are strictly following the Guidelines, which as the full form (Figure) only for the very first word in a sentence.

  • Summary: mixture of present participle, past tense, present -> not consistent in overall sentences in Summary section

The tenses in the Summary seem natural even if the mixture is not in accord with the Guidelines. Left unchanged for now.

  • Table 2: Propose to use normal (small) fonts in the column headers: “statistical” “systematic” “MC-statistical” “total”

Agree although for spacing reasons it may be necessary to abbreviated some of these terms.

Type B

  • Abstract: ‘highly boosted top quark pairs’ -> ‘highly boosted top quarks’ (see general comment above)
    • Done

  • Abstract: ‘next-to-next-to-leading order in perturbation theory’ -> ‘next-to-next-to-leading order in QCD perturbation theory’ (EWK is mentioned separately)
    • fixed

  • Abstract: It would be nice to have the numeric result for the charge asymmetry quoted in the abstract.
    • fixed

  • L. 16: We suggest removing this sentence since this is explained in detail and with references in the next paragraph.
    • fixed

  • L. 17: Please add a reference for the SM expectation of A_C.
    • DONE

  • L. 17 (and paragraph lines 30-43): We suggest to present existing experimental results already here. Starting with ATLAS and CMS results. Increase level of detail for CMS (and ATLAS). Balance to amount of text spent on Tevatron measurements against that on LHC. The story about AFB at the Tevatron is history.
    • We have gone back and forth with this in previous iterations and settled on this one because we prefer to talk about ATLAS and CMS results just before we explain, starting in line 44, what is different for our paper. We will try to add a bit more detail of ATLAS and CMS. Will check how we are doing with space.

  • L. 19: ‘highly boosted ttbar events’ -> ‘highly boosted t quarks’ (see general comment above)
    • fixed

  • L. 45: ‘highly Lorentz-boosted ttbar events’ -> ‘highly Lorentz-boosted t quarks’ (see general comment above)
    • fixed

  • L. 46: ‘invariant mass above 750 GeV’ -> ‘invariant ttbar mass above 750 GeV
    • fixed

  • L. 46: Please explain here why a cut on Mtt was chosen rather than pt, and what the implications on the phase space as a function of pt are. This is very relevant because the jet categorization is a function of pt rather than Mtt.
    • We were following the previous published ATLAS and CMS results to be able to compare with them when it came to the precision obtained with our selection.

  • L. 48: If “q” includes the flavour information as it does when you quoted ‘qqbar annihilation’ elsewhere in the paper, we suggest to change “qbar” of ‘W -> qqbar’ in this line into “q’bar” to clarify that quark and anti-quark are not the same flavour.
    • fixed

  • L. 51: It is not obvious what the ‘Dedicated jet and lepton cleaning’ is. Additional comments on this seem useful for readers who are not familiar with the details.
    • We have the same level of detail that was used in prior letters using this selection. We refer readers to the 2016 publication that uses the same selection for all the details. We are short on space to start adding a lot of details on the selection which has not changed and decided to focus on the unfolding which is new.

  • L. 54: ‘The behavior of the top’ -> ‘The topology of the top’ (this is about topology in the lab frame, not about behavior in the top rest frame)
    • fixed

  • L. 62: If ‘as “Boosted”, “Semi-resolved”, and “Resolved”’ is referring to the previous three sentences, then ‘as “Boosted”, “Resolved”, and “Semi-resolved”, respectively’ (same order) makes it more clear.
    • Right,done

  • L. 111: Is it really important to point out that this is a 2D cut? Even though the “or” in line 112 signifies it clearly, it is actually not straightforwardly clear whether one or both of the criteria are sufficient. Suggest to modify “to satisfy the condition” into “to satisfy at least one of the following two conditions” or similar.
    • The exact same wording was used in prior publications and as you say, it is clear as it is and we prefer to leave it for consistency

  • L. 132: Why not use b-tag information for resolved?
    • Line 116 states that a b-tag is required for all events. Line 133 indicates only that the information of which jet is b-tagged is not used in the jet assignments for the event reconstruction. In any case, as several people commented on the same, we have added explicitely here that all events have a b-tag but the flavor of the jets is not used in the jet assignment

  • L. 133: Please add information about the efficiency for getting the ttbar hypothesis right.
    • Added to line 141: "Only events with Chisq<30 and ttbar invariant mass greater than 750GeV are retained for further study. With these conditions, 70% of the events in the Boosted topology have the correct jet assignments as measured in simulation.

  • L. 135: What is the “true top mass”, i.e. which value is used?
    • It is the one given by our MC after we do the ttbar event reconstruction. It varies for leptonic and hadronic side and also depends on the topology for the hadronic side, and we do not think it is meanigful to quote and it was not quoted before in any of the publications that use this reconstruction.

  • L. 137: It is not clear how exactly the ‘candidate sample’ is defined. Please add a clear definition (list the cuts?) at some point before the first quote in this line.
    • The cuts are in section 3, and then there is the chisq jet assignment and the mass cut (we added details as requested, see previous question). We will also add ttbar to the title in section 4 to make it clearer that is the event reconstruction for our candidate sample

  • L .138: It is not totally clear to what extent the W tag and the t tag are mutually exclusive. Are we vetoing jets that have both a t tag and a b tag ?
    • Line 90 says top tag and W tag are two exclusive categories

  • Figure1: The scale of the top left figure is zero suppressed. Please let it start from zero, such that the green background contribution can be seen properly.
    • Fixed

  • Figure 1: Why are only two bins used for Delta |y|?
    • These are the two bins we use to measure the asymmetry. There is no information for the asymmetry measurement on having finer segmentation as it is just defined as the positive and negative delta|y_top|-|y_antitop| .

  • Figure 1: It looks like this figure does not have enough information. E.g., why are the plots only for AK4 jet? If t tagged jet reconstructed from AK8 jet and AK4 jet is only for leptonic t candidates (L. 128-129), then what about results from hadronic t candidates? And, why are only ‘events with two and three jets’ mentioned in text while the figure contains also more than 3 jets?
    • AK4 jets are only assigned to the leptonic top when there is a top-tag as you say, but we have the majority of the events in the resolved category and there you have the typical 4 AK4 jets. The additional jets from ttbar+jets production are also ak4 jets. We believe this plot is meaningful because typically without our special reconstruction, ttbar boosted samples do not have events with 2 or 3 jets, they have the 4 jets from the ttbar + radiation. So even though we have more 4 jets than 2 or 3, the fact that we have some events with 2 and 3 jets shows that our two-body reconstruction is working as intended. Also, showing AK8 number of jets is not meaningful as it is just zero or one.
We added the comment "events with two and three AK4 jets originating from the collimated top quark decay products are reconstructed" to hopefully make it clearer what we mean..

  • L. 152: 5% seems fairly aggressive given that this is a high-pt phase space, the uncertainty of the ttbar cross section is certainly larger than that for the inclusive ttbar cross section.
    • These values were used for other analyses in the same topology and we have added a reference. Keep in mind it is just the normalization, a lot other very important uncertainties are added to the ttbar MC.

  • L. 153: The luminosity for 2016 is known to 1.2%. Please include the citations for the luminosity (2016: paper LUM-17-003, 2017: PAS LUM-17-004, 2018: PAS LUM-18-002) * DONE and added the references

  • L. 187: Please add a definition of the ‘bins’ mentioned in this line. Bins in which quantities? How many?
    • Bins are defined in line 199. We prefer to leave it as is, as it is important to separate simulated and reconstructed quantities..

  • L. 194: The “channel” here also means e+jets and mu+jets channel? Maybe move paragraph L 210-220 to before the presentation of the likelihood.
    • We believe we have been very careful to use different names for topologies (boosted, semi-resolved, resolved), lepton flavor (electron and muon), years (2016-2018) and mass bins and channels refers to each of one of the 12 possible combinations. .

  • L. 211: To be more clear, please replace ‘the individual likelihoods’ by ‘the individual likelihoods, Eq.(2)’.
    • Done

  • L. 213: It is interesting that the two mass regions which are defined by (750, 900) GeV and above 900 GeV have similar statistics in the result plots. Thus, if this was a criterion why the mass bins were chosen accordingly, then maybe it would be nice to state this.
    • Maybe it was in previous papers, we just chose it to be able to compare the size of the uncertainty with previous publications, as it is statistically dominated.

  • Figure 2: Please add labels “pre-fit” for the left figures and “post-fit” for the right figures.
    • DONE

  • L. 230: Please define ‘the fiducial phase space’ at truth level. This is important for theorists to be able to compare their predictions to our results. Usually, this phase is similar to, but not identical to, the reconstructed phase space, thus it needs to be explicitly quoted separately.
    • Unfortunately, this is not easy because it includes our ttbar event reconstruction (there is no way to find the ttbar mass without the Chisq reconstruction which is done with reconstructed quantities). That is why we unfolded to full phase space, to allow for comparison with theory,

  • Figure 3: It is clear that the left figure is for the fiducial region since there is comment in the caption, otherwise the right figure is not. We suggest that ‘the theoretical prediction, including … (right)’ is replaced by ‘the theoretical prediction in the full phase space, including … (right)’.
    • We had a more detailed key but the fonts were too small. We thought of adding "full" but decided against it, if you check the paper, we consistently refer to "fid" for the fiducial AC but just AC for the full.

  • Figure3: Please add more information on how the theory uncertainty in the left figure is obtained. Also, how can it be that the data uncertainties on the right and left seem to be the same. The theory uncertainty on the left should propagate as an additional extrapolation uncertainty to the data on the right.
    • The fiducial measurement is what we obtain with Asymov data and the uncertainties are the ones we get from there. The statistical uncertainty only includes the data statistics. The pure theoretical uncertainty on the correction from fiducial to full is orders of magnitude smaller than the uncertainties from the detector modelling and not visible in the plot.

  • L. 244: Please change ‘full phase space’ into ‘full phase space, as defined in Eq. (1)’.
    • Done

  • Figure 4: There are several comments concerning this figure. First, it is not clear to us whether “Impact on Signal Strength” is really the right x-axis title. Second, the uncertainties, especially from modeling, are surprisingly large. Since this is an asymmetry, one would expect that most modeling uncertainties cancel in the ratio. However, it appears as if modeling uncertainties are by far the dominant uncertainty. Third, we suggest to add the total uncertainty, and the constraints for this figure. Then please expand Section 5 to provide a quantitative explanation for the uncertainty. Lastly, why does ‘Top pT’ have a one-sided impact. Also, is there something to cite for its reweighing? If so, please cite it.
    • You are right, with a ratio one would expect that the errors mostly cancel, but as you can see in the AN appendix D, the error on Ac will be large when sqrt(error-positive - error-negative)^2 is large, i.e. irrespective of the individual sizes of the positive and the negative error, what affects the uncertainty on AC is the difference. The other problem is that we have literally hundreds of uncertainties for different sources like pdf, lumi, JEC, JER, etc. We have gone back and forth a lot on this figure and settled on combining different sources to give the reader an idea of the main sources. It is not perfect, but we do not have space to describe all the sources. We might add the full impact plot to the supplemental material. Top-pt, as explained line 180-181, is one sided, as prescribed by the top group.

from MIT

Type A

  • line 4: "the LHC pp collisions" --> "the LHC proton-proton collisions"

This would be conventional - to spell out "proton-proton (pp)" in the first instance but leave \ppbar (for the Tevatron) and \ttbar subject to the same logic, which would make for a very long sentence.

  • line 44: proton-proton collision --> pp collision (you will now have defined in line 4)

See previous comment.

  • line 55: decaying ... decays (avoid repetitions)

The sentence is concerned with a sequential decay, so some repetition is necessary.

  • line 27: (EFT) label is introduced but not used in the text

Leave it on the grounds that acronyms do not necessarily need to be limited to subsequent use in the text. They could be used to reinforce the understanding of the Reader.

Type B

  • General: How did you treat the resolved, boosted, semi resolved analysis categories?
    • The 3 topologies differ on their event reconstruction. The main reason for having them is that the correct jet assignment for the boosted events improves when the information of the top and W tag are taken into consideration. But then the analysis is done in the 12 categories defined by year, lepton flavor and the 2 mass bins but with the topologies (boosted, semi-resolved, resolved) combined. There is not enough statistics in the semi-resolved to keep them separate and combining them also allows us to constrain on the main systematics which is the top-pT.

  • Figure1 and Table1 seem to combine the resolved, boosted, semi resolved in an unique category, or are you using them as separate region?
    • See response above. Combining them allows us to control the top pt, which is one of the main systematics

  • Would be nice to have a quantitative estimate on each category's strength.
    • See response above. The main improvement from the different topologies is that the correct jet assignment increases substantially. This information is now added to the paper.

  • Abstract:"with the standard model prediction at next-to-next-to leading order in perturbation theory with ..."--> specify that this is QCD i.e."next-to-next-to leading order QCD prediction with ..."
    • DONE

  • Line 18:"we expect that measuring AC in a sample of highly boosted tt events will lead to a more stringent probe of quantum chromodynamic (QCD) predictions and higher sensitivity to BSM physics processes that might alter the charge asymmetry[7]."
    • this is what is says now?

  • The results in table2 still have large stat and syst uncertainties as previous 7,8 TeV measurements you reported in line 38 where the non boosted approach is used.
    • You cannot just compare the error because the AC is about half as large at 13TeV than it was at 7-8. In any case, it is dominated by statistics, so that cannot be changed. But having a better jet assignment compared to using a more traditional ttbar reconstruction, we are going in the right direction.

  • Line 80-86: there is no mentions jet ID and pileup jetID
    • We have added references as requested by others

  • Line 88-96: which working point do you use for t-tag, W-tag and b-tag. You can add some representative efficiency and mistag rate
    • done

  • Line 118: single top production: wort specify t- and tW production.
    • We do not separate them in our analysis but we use them in the correct proportions.

  • Line 119: is this powheg "v2" ?
    • They are just Powheg, not v2.

  • Line 120: madgraph5_amc@NLO is used at LO or QCD/Vjets ? specify it *done- at NLO. used to general QCD and VJets at NLO

  • Line 122: PDF set is the same for 2016 and 2017-2018, mention which set you use.
    • Done

  • Line 118-125: mention the cross section normalization of these samples ?
    • These are post-fit and we do not typically state post-fit values for background samples, especially in a properties measurement like this one.

  • Figure 2 (left):consider removing the pre-fit yields plots.
    • We like to show how the uncertainties are reduced and a plot like this one is also included in the latest ATLAS paper. We prefer to leave it.

  • Maybe some plots can be added showing the yield and agreement for the different categories boosted, resolved, semi resolved.
    • The 3 topologies differ on their event reconstruction. The main reason for having them is that the correct jet assignment for the boosted events improves when the information of the top and W tag are taken into consideration. But then the analysis is done in the 12 categories defined by year, lepton flavor and the 2 mass bins but with the topologies (boosted, semi-resolved, resolved) combined. There is not enough statistics in the semi-resolved to keep them separate and combining them also allows us to constrain on the main systematics which is the top pT. Unfortunately, this means that we do not
have post-fit plots for the 3 topologies and we did not want to show pre-fit plots. We went back and forth with this in previous versions of the paper and decided against it as it was too confusing for a short letter.

  • It can be useful to the reader to reproduce your analysis and understand the add on value of the categories.
    • We have made public the unfolded results to the full phase space, so any theorist or other experiments can compare with it. The topologies essencially improve the correct jet assignment (values added to the paper) but then are not kept separately as the statistics get really small rendering the fit meaningless.

  • Figure 3:The vertical bars represent the statistical and total uncertainties: do you mean total systematics uncertainty.
    • No, the inner bar is statistical and the other bar is the quadratic sum of the data statistics and the other two sources: systemativcs and MC statistics. We believe this is standard procedure.

  • Figure 4:General: this is a very technical plot for a paper
    • Agreed, but it is also standard for papers that do likelihood minimization and likelihood unfolding. It is a standard plot from Combine and also included in the latest ATLAS paper. It is the most compact way of showing the effect of the systematics post-fit. We would not have space to describe all the information in this plot in words.

  • JER: fit seems to touch some boundary, did it converge ?
    • Not sure what you mean, both variations are to one side only, you need to look carefully to see it.

  • PDF: why is it so asymmetric ?
    • This is the envelope of the 100+ pdf variations which was combined here as requested by the ARC during approval. It is not a single uncertianty.

  • TOP pt is one sided: is this expected ?
    • Yes, it is stated in line 180-181, standard top group procedure.

  • W-tag is missing
    • It is not important enought to show up. We only showed the top 26 (as ATLAS did)

  • all the uncertainties are correlated among the processes and among the years:i.e. for the luminosity and the JES should have some uncorrelated part among the years 2016,2017,2018 but I see only one nuissance.
    • This is because in this plot we are only showing the envelope (otherwise, the plot goes on for many pages). After a lot of discussion, this is what was suggested to us during ARC review.

  • spell the W-tag, t-tag and b-tag like in the text without the hyphen
    • Will see what the LE says and the we'll be consistent

  • label JEC in figure 4 while JES in line 164
    • Thanks! fixed in text

from Santander

Type A

  • L9-L12: This sentence is too long, isn’t it?

The sentence is on the long side but it is not difficult to follow and breaking it into two sentences would tend to disrupt the flow of the paragraph.

  • L46-48: We don’t think so many commas are needed and I feel a connection word is missing when explaining the two types of W boson decays. Try something like “We target the single-lepton channel in which both top quarks decay as t->bW, where one W boson decays leptonically (W->lv) and the other decays hadronically (W->qqbar).” or “We target the single-lepton channel in which both top quarks decay as t->bW, with one W boson decayingleptonically (W->lv) and the other decaying hadronically (W->qqbar).”

We prefer the compactness of the current text even if this involves a number of commas.

  • L48: Since in L100 events are already splitted into electron and muon channels, maybe it could be good to state here that these channels are defined, and then optimized separately.

We don't think the Introduction is the right place to discuss optimizing channels separately.

  • L61-62: no needed comma after the word “analysis” -> “All three topologies are considered in this analysis and are referred to as “Boosted”, “Semi-resolved”, and “Resolved”.”


  • L65: Within the solenoid volume are -> Within the solenoid volume there are

We stay with the standard text (https://twiki.cern.ch/twiki/bin/viewauth/CMS/Internal/PubDetector).

  • L189: I would put the [42] after “approach”


  • Figure 1: Put the same font size in y-x axes of the plots (y’s is much bigger)


  • In general for all the plots: It should be CMS preliminary, preliminary is missing.

according to the CMS publication rules, we shouldn't put preliminary in the paper, that is for the PAS

Type B

  • Abstract:“Differential distributions for two invariant mass ranges are also presented” -> Please specify which ones (these in L213)
    • Done

  • L74: and muons
    • We do not use PF for muons, we use the recommended high pT reco for muons which is not based on PF.

  • L113: Is pT,rel(l,j) the ratio between the lepton and the nearest jet, or the projection of the lepton pT in the jet axis?
    • It is the traditional definition, as stated in line 114, the pT of the lepton with respect to the nearest AK4 jet axis.

  • L116: Why is there no cut with MET + lepton pt in the electron channel? Is the p^{e}_T missed from the expression? It would be a good idea to rephrase this sentence to state more clearly the cuts applied on electron and muon channels separately.
    • There is a much higher MET cut on the electron channel. The cuts were optimized in a prior publication, we have added it as reference.

  • Section 3: You mention (line 137) that the signal candidate sample is divided into "boosted", "semi-resolved" and "resolved" categories, and the binned likelihood uses this categorization, but what was the actual split of the candidate samples between these categories? The figures shown integrate all three
    • The 3 topologies differ on their event reconstruction. The main reason for having them is that the correct jet assignment for the boosted events improves when the information of the top and W tag are taken into consideration. But then the analysis is done in the 12 categories defined by year, lepton flavor and the 2 mass bins but with the topologies (boosted, semi-resolved, resolved) combined. There is not enough statistics in the semi-resolved to keep them separate and combining them also allows us to constrain on the main systematics which is the top-pT.

  • Figure 1: since isolation is not required for leptons at trigger nor offline selection, what causes the "bump" around DeltaR(l,j)=0,5?
    • If isolation requirements were made, there would be no event below 0.5. There is some loss of efficiency when the lepton is totally parallel to the jet axis, and that is why there is a loss of events between 0 and 0.5 in DeltaR.

  • L166: Which jet? If it is the AK8 jet, please specify.
    • It refers to the AK4 jet for the btag and the AK8 jet for the W and top tag. We had AK4 and AK8 superscripts are were asked to remove them by the top pubication chair when he reviewed the paper before CWR. .

  • Section 6: Formula 2: how is the response matrix Aij used in the likelihood actually derived?
    • We count the number of events reconstructed in a given Delta|Y| bin that come from simulated events from that bin for the diagonals, and crossing from events generated in a bin and reconstructed in another one for the off-diagonal terms.

  • Table 2: the caption should perhaps not say "in the fiducial phase space" since the table shows both the fiducial and the full phase space charge asymmetry. Also, we think that in this table is the first time that we see > 750 GeV for the mass range. Is it the inclusive region containing (750-900) and >900 regions? It would be useful to mention it in the text.
    • We say in line 140 that we only keep events with M>750 for further study. You are right the caption does not mention full phase space for the right plot, we changed it as follows: Measured unfolded charge asymmetry in the fiducial phase space (top) and the full phase space (bottom) shown for individual channels compared with the SM predictions. Results are shown for events with Mttbar> 750 GeV and for two invariant mass ranges, (750, 900) GeV and > 900 GeV. .

  • I find surprising the effect of the acceptance*efficiency correction to go from fiducial to full phase space Ac in the case of high mass. You claim that this correction has essentially negligible uncertainty, but it is a large effect and in the case of high mass it goes in the "wrong" direction than that expected from the theory (and from the other two mass cases).
    • This correction is done at generator level and only includes theoretical uncertainties for the ttbar samples. It is indeed completely negligible. However, you need to keep in mind that the prediction in the case of the fiducial phase space is our measurement using Asimov data, so it is affected by all the detector uncertainties which move around trying to accommodate the asimov data. While the prediction for the full phase space is the theoretical prediction provided by the authors, who ran it for us. No detector simulation involved. .

  • Fig. 3: Should the caption explicitly mention fiducial (left) and full (right) phase space? The distinction can only be inferred by the labels in the plot. Also: could you make the vertical axis go from -0.4 to 0.4? Nothing is interesting outside this range. Same than for Table 2. Maybe it would be good to comment that first bin includes 2 and 3.
    • Changed the caption to: The measured AC values in different mass regions, combining the μ + jets and e + jets channels, compared with the corresponding predictions. (Left) Measured Ac^fid compared with the prediction in the fiducial region obtained by fitting Asimov data. (Right) Measured Ac in the full phase space compared to the theoretical prediction, including next-to-NLO QCD and NLO EW corrections from Ref. [4]. The vertical bars represent the statistical and total uncertainties. We also reduced the vertical scale.

  • Figure 4: why does changing the top Pt up and down have the same sign effect on the signal strength? Is it within the MC statistical uncertainties?
    • The top pt variation is single sided as stated in line 180-181, standard top group procedure.

from Statistics Committee

  • Abstract - Measured value of the charge asymmetry should be provided.The two mass range mentioned in the last line should be given.
    • DONE

  • L135 - What is the value of the "true top mass" ?
    • It is the one given by our MC after we do the ttbar event reconstruction. We fit is with a Gaussian and use the mean and RMS in the likelihood

  • L159 - Reference is needed for the cross section value and its uncertainty.
    • DONE

  • L212 - It needs to be mentioned that the 12 channels include the "Delta y" variable also."bin" is a better choice than "channel" in this case. "bin" is used in L227.
    • The 12 channels include 3 years, 2 lepton flavors and 2 mass bins, not the two bins of the Delta|y|. The 12 channels are defined in line 210.

  • The range of Delta y and the binning in this variable is important in this analysis. The values should be clearly mentioned at the beginning of Section 6.
    • The asymmetry only depends on the Delta|y| to be positive or negative. No cut is applied and finer segmentation is not needed as it does not add any information.

  • L192 - What exactly is the "unfolding approach" ? It needs to be clarified how the unfolding is being done. Whatever has been written till L192 does not describe unfolding.
    • We believe Eq 2 and the rest of section 6 is the most concise and precise way of explaining what we do. There is no good reference we have found as we do something quite unique which is to redefine the signal strength to give us Ac directly with no need for further error propagation

  • Table 1 caption - Which 12 channels are you talking about here ? It is not clear what is the difference in sprcification for the numbers in columns 2, 3, 4 and columns 5, 6, 7. It is not mentioned in the caption nor mentioned in the text where Table 1 is described (L213).
    • The 12 channels include 3 years, 2 lepton flavors and 2 mass bins as defined in line 210 and used as labels in the figure and the table.

  • L235 - "...theoretical values..." - These are the "expected" values of A^fid_C isn't it ?
    • These values are not provided by the authors. They are calculated by us with the same machinery we use to unfold the data but using Asimov data instead of collider data. That is what the sentence "obtained when setting the observed quantities to their expected values (Asimov data) is trying to convey.

  • Fig. 3 - What exactly is being plotted here ? There is a discrepancy between the variable in the legend and the variable in the y-axis label in the figure on the left. It is mentioned in the text, but the caption should also mention it clearly. * Changed the vertical label and expanded the caption as follows: The measured AC values in different mass regions, combining the μ + jets and e + jets channels, compared with the corresponding predictions. (Left) Measured Ac^fid compared with the prediction in the fiducial region obtained by fitting Asimov data. (Right) Measured Ac in the full phase space compared to the theoretical prediction, including next-to-NLO QCD and NLO EW corrections from Ref. [4]. The vertical bars represent the statistical and total uncertainties.

from Milano Bicocca - Institutional review

Type A

  • abstract: We recommend putting the result of the measurement already in the abstract. You write that you used a binned maximum likelihood fit to correct for detector and acceptance effects, we wonder whether it would make more sense to explicitly say that you used the unfolding technique

We have added the value as requested by others. We think that unfolding is jargon and prefer corrected for detector and acceptance effects. (LS: Also, "unfolding" seems to be to detailed to appear in the Introduction.

  • line 26: “... and dedicated loop” is jargon. You should rephrase it.

We feel that "loop" is broadly understood and would defer this to the PubComm.

  • line 40-41: labeled “stat” and labeled “syst”. In lines 38-39 you put (stat) and (syst) we would maintain coherence in notations.


  • line 44: the first one -> the first

"first one" sounds slightly better even if the "one" is strictly unnecessary.

  • line 247: In lines 233-234 you already anticipated what Table 2 and Fig 3 show.

Agree - see next comment.

  • we suggest adding “also, i.e. Table 2 and Fig 3 ALSO show.

Yes, "also show" provides some needed context.

Type B

  • line 26-28: what is the expected size of EFT and BSM effects? 1% SM contribution to AC is mentioned in line 17; does it include EFT? We suggest adding some words and numbers to guide the reader.
    • The 1% is the predicted SM Ac for the LHC, where the gg production dominated. In this paper, the focus is to show that one can measure the AC in this very boosted regime, something that has not been done before. We do not expect to be sensitive to any BSM until the much larger statistics in Run 3 and HL-LHC, where one would do an EFT-type of analysis. But this was not the focus for this publication.

  • line 38-39: references to ATLAS and CMS measurements are missing
    • OK

  • line 46-48: there is no mention of the reasons for targeting these specific channels.
    • It is mentioned in line 18 that the relative contribution from qqbar (which has the Ac) increases at high momentum transfer.

  • line 60-62: it would be helpful for the reader to know how the three categories play, i.e. which is the expected relative size (percentage)
    • The percentage in each category depends strongly on the mass and the lepton flavor. We do not think it adds anything to the paper to include these values. Keep in mind the main reason for having them is that the correct jet assignment for the boosted events improves when the information of the top and W tag are taken into consideration. But then the analysis is done in the 12 categories defined by year, lepton flavor and the 2 mass bins but with the topologies (boosted, semi-resolved, resolved) combined. There is not enough statistics in the semi-resolved to keep them separate and combining them also allows us to constrain the main systematic which is the top-pT

  • line 107: PTe> 80 GeV: we suppose that this value (which is considerably below the single electron trigger threshold) was “optimized”, and resulted not to bias the measurement. Please consider adding a comment to address it.
    • Our electron trigger is an OR of two triggers,one with 50GeV and a jet, and one with 115GeV. We are not below the electron threshold.

  • line 110: you state “no isolation requirements” but in line 113 you state that you require “DeltaRmin(l,j) > 0.4, which seems to be an isolation requirement.
    • Isolation typically refers to energy deposits or tracks close to the leptons. In the sense, our 2D cut is not an isolation requirement. The statement "no isilation" tries to emphasize that we do not have any "Iso" term (energy or track) in either the trigger or offline.

  • Table 2, line > 750 for “AC in fiducial phase space” you report the value of 0.0022, but looking at the values 0.0039 for (750-900) and 0.0118 for > 900, I expected the value for > 750 to be somewhere in between the two, and not actually lower. Do you have an explanation for why it’s lower? Maybe we just misunderstand the table.
    • This was discussed during the review at quite some length. There is no reason for the AC to fall in between the two mass regions when combining. To better understand this, one needs to look at the pulls for the individual channels and the combination. When all 16 channels are ran together, the likelihood chooses central values that can be different than the choices when running individual channels. .

from Albert De Roeck

  • line 21: introduction. Years back the corresponding measurements at the Tevatron created some excitement, but this has been essentially quietened down since a few years. At the LHC such a measurement is definitely more challenging. Our sensitivity, as a QCD check, is still limited, so the direct interest now is if we see significant deviations, perhaps due to BSM. Can we give the reader a yardstick how large such possible BSM effects could be, say in favourable scenarios which lead to a significant effect, and are still allowed by the data in [19]? Eg if the best we could expect would be like 0.1% effect, there is no real chance to have presently any sensitivity for that, so I would hope these predictions allow for larger effects to be possible... smile
    • The focus of this paper is to show that one can measure the AC in this very boosted regime, something that has not been done before. We do not expect to be sensitive to any BSM until the much larger statistics in Run 3 and HL-LHC, where one would do an EFT-type of analysis. But this was not the focus for this publication.

  • line 27 "explained" looks to the wrong word here. "interpreted"?
    • fixed.

  • line 95: b-tagging: do we use a particular working point? and if so what is it?
    • we use the tight working point, this is now mentioned.

  • line 119: POWHEG is NLO so this generator naturally has a charge asymmetry for ttbar production of the right magnitude included, I assume? Could be mentioned.
    • Indeed, the SM asymmetry is assumed for all processes in the simulation. We do not think adding the information specifically adds any crucial information and as we are tight on space, left it as is.

  • line 121: did all years use the CP5 tune? In most analyses, we report a different tune used for the 2016 simulated data.
    • yes, all three years use CP5 tune. details can be found in the AN2021_069_v17

  • What about the choice of PDFs for these generators? We don't seem to say anything on that.
    • We added the following information: Line 125: All the samples are generated with the NNPDF 3.x parton distribution functions (PDFs) [REF NNPDF collaboration, Parton distributions for the LHC run II, JHEP 04 (2015) 040, [arXiv:1410.8849] inSpire (https://inspirehep.net/literature?sort=mostrecent&size=25&page=1&q=find%20EPRINT%20arXiv%3A1410.8849) Line 177: The PDFs from the NNPDF3.x set [repeat ref] are used to evaluate the systematic uncertainty in the choice of PDF, according to the procedure described in ref.
[J. Butterworth et al., PDF4LHC recommendations for LHC run II, J. Phys. G 43 (2016)023001 [arXiv:1510.03865] inSpire (https://inspirehep.net/literature?sort=mostrecent&size=25&page=1&q=find%20EPRINT%20arXiv%3A1510.03865)

  • line 133: is there a specific reason no b-tagging is used? Does it lead to a bias or to important efficiency losses. Or is it not needed to suppress background here? In fact do we at the end use any b-taging at all?
    • We do require at least one b-tag in the event selection. Please see Line 116, that states that a b-tag is required for all events. Line 133 indicates that the information of which jet is b-tagged is not used in the jet assignments for the event reconstruction, but all events have a btagged jet. In any case, we added a sentence in this paragraph to make it clearer.

  • line 135: how is the mass reconstructed of the t_l? This channel has an escaping neutrino (MET)...
    • We use the usual constraining to the W mass. We added a sentence and a reference

  • line 136: This is very qualitative: is this a fixed cut or category dependent? what is the typical size of the cut?
    • We have added the information on the cut and also the percentage of correct jet assignments as requested by others

  • section 5 is descriptive but not many hard numbers or methods are explained/detailed. We often say we vary something but not by how much, and what effect it has (however O guess the latter one can see the essence at the end in Fig4...)
    • We are constrained in space and believe that Fig. 4, even though only useful for experts, is the most concise summary of the effect of the main uncertainties. We might add a table with the systematic uncertainties as supplemental material. We had it, but we were over the allotted space.

  • line 173: are combinations with opposite variations of the scales excluded? That is at least the recommendation. (to NOT use these).. And this seems to be an important systematic, as seen in Fig 4, so we better not overestimate it..
    • We confirm that we have both up, both down and one varied and the other nominal only

  • Section 6: unfolding results. Do we have any systematic on the (choice of the) response matrix? Is that not missing?
    • you probable refer to a systematic from the choice of generator. You need to keep in mind that we only have two bins (positive and degative delta y) and that the matrix is very diagonal. We did closure studies evaluating changes in the simulation and they are essencially none seen, this and other typicaly considerations like purity and not an issue in our analysis given that we only have two very broad bins

  • line 217: VV background: can we give an indication in the text how small it is? (we looked at it as we mentioned the simulated processes we use, before)
    • Is is negligible and added to other because of that.

  • Table 2: the uncertainties given on the theory values of A_C in the fiducial region are as large or larger than value itself. (this is not the case for the full phase space results). Are these numbers correct, and if so what is the explanation for that?
    • The values are correct. To better understand why they are do large, you need to keep in mind that these values are not provided by the authors. They are calculated by us with the same machinery we use to unfold the data but using Asimov data instead of collider data. That is what the sentence "obtained when setting the observed quantities to their expected values (Asimov data) is trying to convey. The theoretical prediction in the fiducial phase space is affected by all the detector uncertainties and the MC statistics, just like our measurement. The prediction for the full phase space is the theoretical prediction provided by the authors, who ran it for us. No detector simulation or MC statistics is involved, resulting in a much more precise value.

  • line 245: this information maybe be correct, but how did you determine the uncertainty on the acceptance?
    • This correction is done at generator level and only includes theoretical uncertainties for the ttbar samples. The uncertainty is completely negligible. .

  • How did the result in this paper improve over our earlier results ? The uncertainty on the measurement is still very large... Can we comment to that in the conclusion?
    • Added this sentence to the end of the summry section: This novel measurement of the top-quark charge asymmetry is the first one to use 13TeV data and a binned maximum likelihood unfolding technique to measure Ac directly at parton level in the full phase space. In addition, it is the first result that focuses exclusively on the very high Lorentz-boost regime, using dedicated reconstruction techniques for the hadronically and the leptonically-decaying top quark at both trigger and offline level. Ac is especially sensitive to BSM processes in this highly boosted phase space since the relative contribution of valence quarks increases at high momentum transfer. The result demonstrates that top quark properties can be precisely measured in the highly boosted topology, opening a new era of exploration for Run 3 and HL-LHC.%ENDCOLOR

from Anna Benecke, Joscha Knolle, Jan van der Linden

  • Line 95: Please mention the deepJet tagger by name and also cite the btv paper BTV-16-002 and corresponding b-tagging efficiencies and light mistagging efficiencies at the working point you are using
    • Done

  • Line 168ff: please also cite the BTV DP note CMS-DP-2018-058
    • Done

  • Line 105: Given that you only use 133/fb in the electron channel, shouldn’t the integrated luminosity be quoted as “133−138/fb” rather than as “138/fb” (in the abstract and on the plots)?
    • We have changed the abstract, changes to the plots would be unnecessary as these are electrons and muons combined.

  • Line 153: The correct per-year luminosity uncertainties are 2.5, 2.3, and 1.2% for 2018, 2017, and 2016. Since your AN lists the correct uncertainty components, this is probably just a typo, so please correct it. We suggest also to quote the uncertainty value of 1.6% for the combined 2016-2018 integrated luminosity.
    • Fixed and also added the references to the lumi papers as requested by others

  • Line 78 You state that charged hadrons associated to PU vertices are removed, but I guess that for AK8 jets you are using PUPPI which would require additional information and the citation of the two PUPPI related papers stated on the PubDetector Twiki. Could you please add this information if you are using PUPPI?
    • Added

  • Line 81/82 you are stating that you remove the lepton energy from AK4 jets that overlap with the lepton. Are you also applying some cleaning criterion on AK8 jets? Like rejecting AK8 jets that overlap? Could you please add the information if that is the case?
    • yes, we are. this information is now added.

  • Line 88 you mentioned that you use substructure information, including the mass of the leading three subjets [32] and the number of subjets [33] [..]. But I guess you are using the soft drop mass and nsubjettiness. The soft drop mass is not exactly the mass of the leading three subjects and also nsubjettiness is not exactly the number of subjets. There are some good explanations of the soft drop and nsubjettiness on the PubDetector Twiki. In addition, you should cite the JME-18-002, which explains the correction factor derivation is explained there.
    • Done

More approval questions

hdamp study request from Jan and Matteo

Dear authors,

since one of the pre-approval comments that was to be addressed during ARC phase originally comes from us (TMP L3), we would like to re-iterate on the point of the statistical fluctuations on the hdamp nuisance parameter, and the strong constraints. We share the conclusion you have drawn during ARC review that indeed it seems like a statistical effect, and we are happy to see that you are in the process of performing studies that could shed light on the issue.

When going through the analysis Twiki, we discovered that your current solution is to add an additional nuisance parameter to cover the effect. If we understood the Twiki correctly, what you are doing is equivalent to increasing the nominal "bin-by-bin" uncertainties, as the size of that additional nuisance parameter seems to be not coupled to the value of the nuisance parameter for hdamp.

While this procedure seems like an intuitive way to solve the issue, we (and others) have shown in the past, that unfortunately, it does not provide a solution to the underlying problem (also discussed and detailed in [1]).

The only viable solutions to attacking the problem at its source are: a) estimate the effect with toys b) include the effect directly in the likelihood through additional degrees of freedom that are coupled to the hdamp nuisance parameter value. Such an approach was taken (in approximation) in TOP-20-008 for example.

The method you are currently applying cannot serve as a correct estimate of the effect. We had suggested using toys since it is conceptually easier to do and just requires computing resources. We are very happy to discuss this in one of our upcoming meetings.

Best, Jan and Matteo

[1] https://indico.cern.ch/event/761804/contributions/3160985/attachments/1733339/2802398/Defranchis_template_constraints.pdf

    • the detail of the study is attached, the issue seems not to be caused by statistics hdamp-study-May10.pdf: hdamp study presentation May 10th, 2022

Comments from Orso on v5 of the PAS, May 13th, 2022

  • Abstract:“At next-to-leading order in perturbation theory with next-to-leading order electroweak corrections “ The next-to-leading order reads redundant. Maybe next-to-leading order in electroweak perturbation theory?

  • L19 the asymmetry -> the charge asymmetry
    • Done

  • L21 please remove the second mention of axigluons
    • Done

  • L27 is added to the SM lagrangian [27] ->doesn’t explain why they should be there. Suggest for lines 25-27: -> “deviations from the SM prediction can also be explained through an Effective Field theory approach in which new physics contributions are described via a fixed set of dimension-six operators added to the SM lagrangian[27].”
    • Done

  • L33 “somewhat” -> remove somewhat , leave only “larger”
    • Done

  • L47-49 “Hadronically decaying top quarks yield decay products with angular distances between partons that are smaller than the jet clustering distance parameter
and are thus reconstructed as a single jet” -> here you mention all hadronically decaying top quarks, also you refer to the AK8, but then for the resolved you refer to the AK4 instead. I would suggest:“Hadronically decaying top quarks at the high-end of the pt spectrum yield decay products with angular distances between partons that can be smaller than the jet clustering distance parameter, and are thus reconstructed as a single, large cone jet.”
    • Done

  • L123 b-tagging information -> In order not to start with “b-tagging” one could do “No b-tagging information on individual… is used…”
    • Done

  • L130 reading again, I find myself asking again the same question about disambiguation, so I really think it would be clearer if you mention “one W-tag and no t-tag”.
    • Done

  • L131: looking through the paper, I think this is the first time Mtt is introduced in the text, if so it should be defined explicitly as it is such an important variable: “...only events with an invariant mass of the top quark-antiquark pair Mtt larger than 750 GeV are retained for further study.”
    • Done

  • L168: I suggest to just mention the procedure and a reference, not the specific function which seems too technical and prompts further questions.
    • Done

  • L205: “the contribution … are taken” -> “the contributions … are taken”
    • Done

  • L220: “(within uncertainty)” -> use commas: “, within uncertainty,”
    • Done

  • L222: and in the figure on the left of Fig 3 -> Fig 3
    • Done

Comments from Orso on v13 of the paper draft, May 1st 2022

Type B : main physics points and content

  • L218-221: I think here it would be a good place to add a statement about the findings of the study about which systematics causes the total fit result not to be in the middle of the two. A couple of sentences on what syst.causes that would be useful, and it is what we discussed
    • We discussed with Andy and we feel that adding this statement is not the best idea. We indeed see that hdamp by itself with has this effect, but we have not done a systematic study to see if combination of other systematics or even some other individual systematic would have this effect also. We think that adding this statement will bring the focus on the central value of the measured Ac which is not the right thing to do given the large uncertainties. The fit is redone in 3 different mass regions and there is nothing physical that forces them to fall in any particular order.

  • L223: until Atlas publishes theirs, I think this is the first measurement at LHC, correct? So one should state that.
    • Changed "the first CMS" to "the first" measurement

  • L231: is the final number 0.0069 or 0.0068 ? please check!
    • Fixed, very sorry, we seem to have a problem with different people committing changes at the same time.

  • Figure 4: top pT prefit has two sides, but both are pulled left, is there an easy way to explain reason? In the AN you mention it’s one sided, but if I had to guess maybe you symmetrized it and one of the cases is not so likely. Looking through section 5 I seem not to be able to find the top pT description, if so, a brief explanation (order a couple of lines like for other systematics) should be added.
    • It is indeed one sided. It can vary between no correction and the parametrized correction. Added to the paper in the systematics description: Finally, the top quark transverse momentum in simulated $\ttbar$ samples is corrected by a reweighing function given by $e^{0.0615 -0.0005*pT}$, where $\pt$ is the transverse momentum of the top at generator level. A one-sided variation given by the difference between the top $\pt$ distribution with and without correction is applied as systematic uncertainty.

  • Editorial aspects: the CCLE should have another iteration on the paper before greenlight for CWR. Please also consider the Type-A comments you find below.

  • Figure 4: very happy to see this figure converging! Some of the systematics names should still be extended, to make sure it is readable: All instances of “xsec“ should be “cross section” WJets → W+Jets. “DY” acronym should be defined in is not defined (line 107 could be a good place?) QCD → QCD multijet Muon 2D → 2D selection? TTbar → top quark-antiquark cross section (if \ttbar doesn’t work) Muon reco → reconstruction

    • Done
  • Figure 4: caption, at the end: "even though they dominate." → I think this is not necessary.
    • Removed

  • Figure 1: please mention in the caption where the normalization is taken from. Are those normalized to luminosity?
    • The caption says "after the likelihood normalization". Maybe this is not clear? Should we say "after a binned maximum likelihood fit on the data as detailed in [unfolded results]?

Type A: editorial / minor text changes

  • L4 “LHC pp collider,” → remove the comma
    • Done

  • L8 I think guidelines say that anti-top → top anti-quark everywhere: top quark (anti-quark) is preferentially…I found it close-by in several other places, but please check through the paper.
    • Done

  • L10: here you use antiquarks, somewhere anti-quarks. Please check the CMS PubComm guidelines and use it consistently, or check with CCLE.
    • Done

  • L11 anti-tops → top anti-quarks
    • Done

  • L13 top and anti-top → top quark and anti-quark
    • Done

  • L24: through interference terms only? Not also by dedicated loops? Seems a very specific statement for such a broad array of models
    • fixed

  • L25: I think the acronym EFT is not used anymore in the text, if so it is not necessary to define it.
    • Done

  • L86: t-tagging, b-tagging etc. : please check the conventions about top and W tagging:https://twiki.cern.ch/twiki/bin/view/CMS/Internal/PubGuidelines. I think it should be implemented as: - top quark tagging instead of top tagging is encouraged, so in case change all instances - t-tag and t tagging are ok if explained at the first instance, but please use the Pennames for t tagging as well, not just the $t-tagging$ version
    • Done

  • L90: (2018, 2017 and 2016) → add a comma: (2018, 2017, and 2016)
    • Done

  • L92 and following: if you want to use pT with the mu and e apex, please make sure to define it in the first instance. Also in line 92 it refers to the pt of the muon at trigger level, later on at reconstruction level. I think it would be better to just use pT and everytime qualify which particle it refers to, but it’s ok if you prefer to keep it like this * I think the text explains which particle and what level it is implemented at

  • L102: I don’t think DR min or PTrel are explained here, right? I think a couple of sentences should be needed. Also , make sure pTrel uses the roman font for “T” and “rel”. I also think rel would be better at the apex, or after a comma, e.g. "pT,rel " or "pTrel"
    • added explaination and fixed pTrel

  • L106 tt should not be italic, please check the latex
    • fixed

  • L106 and following single top → single top quark
    • Done

  • L107 : here it would be a good place to define e.g. Z/gamma+Jets = DY or similar, if you intend to use it later, see further comments on figures and tables
    • Done

  • L109 “Dibosons “ is jargon: "Vector boson pair (Diboson) "
    • Done

  • L116 and following: I suggest to use pennames for t_h and t_l, thus avoiding to italicize them all
    • fixed

  • L126-127: I think Semi-resolved contains events with one W-tag and no t-tag, correct? I think this should be clarified, as you do for the Resolved case.
    • We defined before that Wtag and toptag are two exclusive categories and think it is not necessary to repeat it.

  • L130 “to multi-TeV region” → maybe “to the multi-TeV region”
    • Done

  • L132: “Good agreement is observed…” → “Good agreement between prediction and data is observed…”
    • Done

  • L136 “simulated sample” → “simulated samples”
    • Done

  • L138 I think you mean “integrated” luminosity of data, correct? Or here you want to mention that the pileup profile is reweighted to match the one in data? I feel this should be clearly split in two sentences to avoid confusion.
    • This sentence is reworded.

  • L142 “All other sources affect both normalization and shape.” → I think pileup does affect also shapes, so this is another reason to split the two sentences above to this end.
    • fixed

  • L148 “applied in addition” → “is applied” , in addition is redundant and can make people wonder if you mean something specific.
    • Done

  • L153-154 add space between “Factorization” and the parenthesis “(muF)”, same for “Renormalization(muR)”
    • Done

  • L158 (ISR & FSR) → (ISR and FSR)
    • Done

  • L161: I think besides the top pT, cross section uncertainties are also not mentioned here.
    • Added at the beginning that we have 30% for backgrounds and 5% for signal

  • L163: “... is obtained using a binned maximum likelihood fit on the data to extract the signal strength modifiers through a simultaneous fit to all bins and categories…” → suggest: “is obtained by performing a simultaneous binned maximum likelihood fit to data in all bins and categories…"
    • Done

  • L166 “to a single nuisance parameter for each process, this is known” → I think not accurate, as BB-lite does consist in wrapping together stat. uncertainties for each bin. I would suggest to rephrase simply as “Statistical uncertainties due to the limited MC sample size are treated separately in each bin with the Barlow-Beeston approach [37].” without going into details of how they are split per each process.
    • Done

  • L172: one line number is missing between L172 and Eq. 2, probably because of latex issues, please fix.
    • fixed

  • L190: (2018, 2017 and 2016) → add a comma: (2018, 2017, and 2016)
    • Done

  • L195: here you should have the same notation for Single Top, QCD and Z+jets you defined in the Section 3. Please make sure to unify it and use a consistent nomenclature! Example of different instances: Z+Jets, DYJets, DY → should all be e.g. “DY”, if you define it in Section 3 QCD, Multijet QCD, multijet QCD, QCD multijet → should all be e.g. “QCD multijet” SingleTop, single top → could be “single top quark”
    • fixed

  • Table 1 caption: ”(2018, 2017 and 2016)” → add a comma: “(2018, 2017, and 2016)”
    • fixed
  • Table 1: same comment as line 195, plus please change DATA → Data, TOTAL → Total
    • fixed
  • Table 1, significant digits: please check the pubcomm guidelines on the suggested number of significant digits:https://twiki.cern.ch/twiki/bin/view/CMS/Internal/PubGuidelines#Number_of_significant_digits_for
    • fixed- now has 2 significant digits

  • L198: some line numbers are missing between L198 and Eq. 3, probably because of latex issues, please fix.
    • fixed

  • L198-through equation 3:“Sub-combinations” → suggest: “Combinations of sub-sets of these channels”
    • “multidimensional” → ”multi-dimensional”
    • “which allows us to obtain from the likelihood fit directly.” → I think it would be better to end with “directly:” rather than a full stop (“directly.”), so to better introduce Eq. 3.
      • done
  • L203 “Acfid which gives“ → “Acfid is“
    • fixed
  • L203 “Acfid is the number of the charge asymmetry Acfid ? the second one is not necessary, since here you are giving the symbols in the context of the formula. This unless you want to separate the measured value from the observable, in which case you should change the first one to “Acfid,meas, but I think it would be an unnecessary complication.
    • fixed
  • L206 “Table. 2” → remove the dot: “Table 2”
    • fixed
  • L218 “Table. 2” →remove the dot: “Table 2”
    • fixed

  • L208: some line numbers are missing between L208 and Eq. 4, probably because of latex issues, please fix.
    • fixed

  • L208-through equation 4:
    • “To do this” → suggest “To achieve this”
    • “In this case, ” → suggest “In this case, the relation:”
      • fixed

  • L209 “from the likelihood fit directly the signal strength” → we already had defined the signal strength, so you can leave only AC full - it seems out of place that out of all quantities you decide to redefine just rneg.
    • we are not redefining here but stating the r_neg is one of the signal strengths we extract along with Ac so it is a part of our results

  • L212-214 maybe this should be in the Systematic section?
    • fixed
  • L216 “Figure.4” → remove the dot: “Figure 4”
    • fixed

Approval Questions March 11, 2022

    • See presentation about some of these questions
ARC-Authors_Meeting_April_6.pdf: ARC-Authors meeting slides April 6, 2022

  • 1. Why the lumi nuisance is pulling by 2 sigma?
    • if you look at the yields table pre-fit (page 19 of the Approval talk) you will see that all channels have a data/MC ratio of about 0.9-0.95 while 2017 muons has 1.07 and 2017 electrons has 1.0. It is the need to bring the 2017 MC up that causes the close to 2 Sigma up variation in 2017 luminosity. 2016 and 2018 luminosity are pulled a bit towards the lower side (see appendix G1 for the complete list of pulls). We realized that our implemetation of the 2D SF niusance was not as intended. We now allow it to vary separately from the lepton ID/Reco and that solves the pull on the 2017 luminosity. The resulting impact is shown below * Ac_750_full-new.pdf: Impact after separating the 2D SF nuisances

  • 2. Please consider to remove luminosity as a nuisance, since we cannot use same data to measure a cross section and also constrain the LHC machine performance, which is what lumi is. The suspicion is that lumi nuisance serves as a wildcard being shifted while in truth its other nuisances that need shifting.
    • we did as requested and the impacts can be seen below. Keep in mind it is only the 2017 luminosity that is pulled up, which is needed because of the difference in the data/MC normalization seen in our sample for that year compared to 2016 and 2018. Not allowing the luminosity nuisance to accommodate the need of the different years causes other nuisances to move away from nominal to try to accommodate the data, as can be seen with single top x-sec (the largest background) and some of the data MC statistics.

* Ac_750_full-fixedlumi.pdf: Impacts when fixing luminosity to the nominal value as a test

  • 3. p21, on the hdamp systematics - how did you ensure the MC statistics is not an issue? (A toy study was suggested before entering ARC review.) What will happen if the major systematics are symmetrized?
  • The effect of hdamp was adressed with an alternative to toys during ARCreview (See point 4. uner Comments from ARC/Authors meeting Feb 15, 2022). Appendix F shows the nominal and up/down variations for each channel for hdamp with the statistical error on each entry. The are some few bins where the differences between the up/down variations are not statistically significant. We ran with all hdamp systematics symmetrized in all channels and the result is that hdamp is even more constrained. We think the main effect we see from hdamp is statistics and are adding an additional nuisance the size of the stat component of hdamp. This is running now.

  • 4. For the hdamp systematics, check if the option of re-weighting the central samples still available, which could give better MC statistics power.
    • This does not appear to be possible

  • 5. Please fix data error bars for the m(tt) distribution on p17.
    • Done %

Comments from ARC

Comments from ARC/Authors meeting Feb 15, 2022

General comments for the analysis

  • 1. Add electron trigger vs pT plot for each year
    • These have been added to Appendix A

  • 2.1 Regarding contribution of tW in ttbar, ask Single top conveners for latest guidelines. Do we need to use alternate samples as variation?
    • Communicated with Matthias Komm. The guidance is that only analyses that have Wt as a signal need to include the different diagram removal/diagram subtraction schemes as modelling uncertainty.

  • 2.2 Do a fit with tW+ttbar combined to see the result
    • We tried this for one channel 2018 muons (750,900) mass and the measured value of the assymmetry did not change. About half of the single top contribution comes from Wt and accounts for 3% of the sample. Given these findings and the recommendation, we will not add a varied sample to the Wt uncertainty.

  • 3. boosted ttbar samples- 2017 ntuples already exist- start by adding that. Look into making ntuples for 2016 and 2018
    • We have added the boosted samples for all 3 years and the MC statistical error is reduced by 40% and the total error by 10%. More importantly, the MC statistics is no longer the source with the largest impact. AN and paper were updated.

  • 4. Make up/down with nominal hdamp plots including MC stat uncertainty
    • These have been added to a new Appendix F. It can be seen that 1.- The uncertainties are strongly not symmetric (selectively for some rather than other years/distributions). 2.- For 2016 and 2018, many bins do exhibit a one-sided behavior. 3.- The shape effects are very different across years, even in the same category. 4.-In one case, the two uncertainties even flip (2016, e+jets, > 900 GeV). However, none of these features can be attributed to statistical fluctuations. After discussion with the ARC, the proposal was to rerun the Impacts using Asimov data but treat hdamp across the 3 years as uncorrelated. Both the central value and the total error remain unchanged compared to the nominal result where hdamp is treated as correlated between the years. We conclude that the constrain seen in hdamp does not affect the final result, no further concern about the effect of hdamp on the result remains.

* Ac_750_full.pdf * Ac_750_900_full.pdf * Ac_900_full.pdf * r_1.pdf * r_2.pdf

  • 5. Add all data with no Mttbar cut for an inclusive measurement? How many events are there for M<750? Plot |DeltaY| for each channel/region to see.
    • We measured Ac in the low mass region (Mttbar<750) for 2018 MUONS which is about 45% of the total sample (the electron channel only contributes about ~5% of the events because of the much higher kimelatic cuts. This means that we could obtain a result with statistical uncertianty comparable to M>750GeV. However, compared to the ATLAS preliminary result, the statistical error would be a factor of 4 larger, which renders our result not competitive in this low mass region. The mass region below 750GeV was beyond of the scope of this analysis from the beginning, and we prefer to stick to the baseline result for this publication.

  • 6. How to include Wilson coefficients for EFT interpretations
    • We looked in to this. Including Wilson coefficients would require defining a MG process and UFO model, generating the Wilson coefficients and then run the mad graph card with Pythia8 and finally using Rivet to create the needed reweighed observables. This is not something that we can complete in the given timescale, specially as we would have to run this privately and we are pressed in time to finish the analysis before key personpower is lost.

For the paper:

  • 1. Figure 3: All statistical error should be summed up together but included in the plot. Meaningful names should be used for the systematics.
    • We produced a plots with better names and both pre and post fit values for the paper

  • In the introduction and in the summary, it should be written that this is the first measurement published with 13 TeV data, and I think also the first one with substructure techniques to reconstruct top quarks - maybe you can cross-check this, it seems the case to me though.
    • There is an 8TeV ATLAS paper (PLB 756 (2016) 52-71) that also includes exclusively masses above 750GeV and uses a fat jet for the hadronic top but isolated leptons. There is a preliminary ATLAS result at 13TeV (ATLAS-CONF-2019-026) that includes both resolved and boosted topologies and also uses only isolated leptons. We will stress that this is the first measurement at 13TeV truly optimized for the very high boosted regime not only on the reconstruction of the hadronic top but also on the reconstruction of the leptonic top (no explicit isolation and jet/lepton cleaning starting at trigger level)

  • I feel like the information on the table of systematics could be improved, especially the "uncertainty" column for shape uncertainties. It is not possible of course to convey info on how much it affects the result easily in this table, but maybe you can put the typical range of variation in the pre-fit distribution to give an idea of how large the uncertainty is, e.g. jes : 1-4%, electron id 1-2%, pileup < 1% etc.

  • Also, one should if possible add the info on per-year correlation. If the information is trivial, e.g. always correlated/ always decorrelated but one or two, it can go in the text as well.
    • We state in the text that there is partial correlation between luminosity and mentioned partial correlations as appropriate (some sources of b-tagging). We feel that going into much more detail is beyond the scope of a letter and all the information will be in HEPdata.

comments from Alberto Orso Maria Iorio on AN_v10 (Feb 10, 2022)

  • Comment 2) Page 7, L178 and Table 4 : you are studying boosted channels, but you are not using high Mtt samples TT_Mtt700to1000 or TT_Mtt1000toInf. Is this not affecting your MC statistics? Please show the MC statistics contribution to the uncertainty in your final distributions for Delta|y|. Also follow up on this in the systematic section - see also comment 10)
    • We added them and we now quote MC stats separately.

  • Comment 3) Page 12, L220-221: you are vetoing events with 1 jet in the HEM damaged sector. Two follow up questions: 3.1) Could you please show a jet-phi and MET-phi distribution in 2018 to make sure this is ok post correction - or point to this distribution if it’s already somewhere? In general the HEM issue should enhance QCD jets, so I’d expect if you check at one of the later stages of selection this should fix the problem, but should at least be verified. 3.2) If there’s still some effect, have you considered extending the veto to electrons as well?
    • We are actually already vetoing on the electrons. The figures below show MET_phi and Jet_phi after applying Veto for events passing our preselection. You can see that there is no "bump" in the -2 to -1 phi region as can be seen in AN2018_298 from B2G-19-004
      MET_phi.png Ak4_j1_phi.png

  • Comment 4). Page 13, Eq. 2: is there a reason to prefer the 2D cut to the MiniIso? I think there might be some studies in B2G-17-017, in case, could you point us to them?
    • The two isolation cuts were compared in AN-2014-035 Appendix C. Quoting from that AN, the mini-isolation has a higher selection efficiency for boosted ttbar final states, but worse rejection of QCD background than the 2D-cut. We stuck with the 2D because the level of QCD is too high with the mini-isolation cut.

  • Comment 5) Your trigger for electrons is expected to lose efficiency for high-pt electrons. Did you check the efficiency of your trigger? It is usually recommended to use photon triggers in this case. In case there is some reason (e.g. from B2G-17-017), please link the appropriate studies.
    • We have actually measured the efficiency in data and MC and extracted a SF. If you look at Fig 24, 26 and 28 af Appendix A1 you can see that the efficiency in data is very close to 100% even for high electron pT. This is due to the OR with the high pT single electron trigger and none of the triggers having lepton isolation.

  • Comment 6) Page 14: 6.1) Can you please clarify a bit more in the text if you are doing the jet-lepton cleaning for all the PF candidates or just for the ones selected? 6.2) Can you please also specify if you have any AK4/AK8 disambiguation before the top-assignment part?
    • The lepton/jet cleaning is done at the preselection level and is done to every lepton (signal and veto). Only AK4 jets that are at DeltaR>0.8 from the AK8 jet are considered for the jet assignments (See Sec 4.2, line 427)

  • Comment 7) Page 15: My understanding is that you don’t have any strict requirement on the tagging of the jet inside of the hadronic top (although you don’t explicitly veto it ), is that correct? My guess is that always asking for a second b-jet would kill a lot of signal and not help significantly against the background, but could you please elaborate also in the text a bit about this?
    • You are correct about our treatment of the b-tagging requirements. We have not studied the effect of requiring a second b-tag but as our reducible background after one b-tag is very small, we think the gain of asking for a second one would not outweigh the loss of efficiency for the signal

  • Comment 8) Page 16: 8) Please clarify the following in the selection: 8.1) Veto electrons / muons are always also “main selection” electrons? 8.2) Jets: in the spirit of comment 6.2) if you don’t have an AK4/AK8 disambiguation at any stage, the leading / second to leading jet can overlap with the AK8, correct? I guess it’s fine, it’s a design choice, but it’s for my understanding of things.
    • 8.1) Veto leptons are defined in Section 3.2.1 and 3.2.2. The only difference with signal leptons is that they have a lower pT - this is to reject more dilepton events. 8.2) AK4 and AK8 have to be at DR>0.8 apart to be considered.

  • Comment 9) Page 23, Tables 8-9: 9.1) You mention that the errors shown include stat. and MC uncertainty. Could you please clarify the entity of both? I am particularly concerned about the number of ttbar. 9.2) Also you mention you have a 30% uncertainty on WJets, single top, and QCD, but the uncertainty here for single-top is around 10%. Please clarify and eventually modify the label accordingly, if this is not what is shown.
    • Table has been remade and a mistake in the script was found. Keep in mind that the numbers in the table do not enter the Combine analysis at all, they are just used for the AN as information on the sample composition.

  • Comment 10) 10.1) Page 25, L463: It is not clear how you treat MC Statistical uncertainties, and in the paper you mention you use the standard Barlow-Beeston-lite method, I assume as implemented in combine. Please add it in the note as well - and of course sorry in case I missed it!
    • Added as an extra item in the list of systematics considered and also to the unfolding section.

  • 10.2) From figures 161-222 (pages 143-178)in appendix H it seems like the MC stat uncertainties are by far the largest component, especially in data, but in Figure 3 of the paper they are absent. Can you please clarify if the latter is just done like this to display the physics component of the uncertainty? This it particularly important in view of my comment 2, because this is a point where the addition of the exclusive samples at Mtt > 700 might really help.
    • We followed what ATLAS did in his conference proceeding and excluded the MC stat errors from this plot (even though they are very important) to just show the ranking of systematic errors. We will make sure to make it clear in the paper that the MC stats errors are dominating but not shown in the plot.

  • Comment 11) I am not sure anybody has ever considered this, but the tWchannel has interference at NLO with the TT. This means that it might be the same new physics contributions apply to both. While from the theory standpoint I am not sure people have considered it in CMS, one could consider to have the measurement done not only on TT, as is done, but considering the sum of TT and single-top tW as a signal. While this might not be in the scope of this paper, at least I think this background requires a bit of care. Therefore I would suggest to: 11.1) Check in the single top component how much is tW and how much the other parts. I think the tW will be the largest part.
    • Wt is 48% of the total single top sample in our phase space.

  • 11.3) Comment about the feasibility of repeating the analysis by summing together tW and ttbar.
    • We tried this for 2018 muons (750,900) and the measured result for the Ac did not change (the Wt events are 3% of the total signal)

  • Comment 12) The part on the unfolding might benefit from a bit clearer definition, my main point of doubt being about how you define the fiducial phase space. My understanding is that the fiducial is taken as the reconstructed phase space. Please confirm the following and improve the description accordingly:
    • Yes, the fiducial phase space is defined based on reconstructed quantities, added this to the AN

  • 12.1) Page 29, L625, and Eq 7: here you always take the top quark at parton level, correct? You never consider any particle-level quantities.
    • Yes, we always correct back to parton level (see line 621).

  • 12.2) mu_i is the number of parton level top quarks in the full phase space or in the fiducial phase space? Please clarify.
    • Yes, mu is the number of expected reconstructed events in the fiducial phase space, as predicted by MC. Added a clarification.

  • 12.3) L679 here Ntruth you mean the number of events in the corresponding bin of reconstructed Delta_y, i.e. fiducial phase space == defined by cuts at detector level?
    • Ntruth is defined at reconstructed level but includes information from both generator-level bins. Changed the description in the note and added "gen" to the places where generator information is used to hopefully make it easier to follow.

  • 12.4) Page 33, figures 10-11 : what is the variable on the x-axis, Delta|y|? This is made a bit confusing by the “>/<” symbols, which refer to the region I guess.
    • The x-axis is actually the individual variations of the different theoretical uncertainties. We have fixed the plots

comments from Helena Malbouisson on AN_v10

  • 1 - Did the STAT group review the analysis? Were there any recommendations on the handling of the statistical uncertainty of the A_C?
    • We filled the questionnaire but did not hear back. The Combine contact approved the Data cards and the treatment of errors within Combine.

  • 2 - Trigger efficiency curves ==> is it OK to cut the muons on 55 GeV when the trigger threshold is 50 GeV? Is 55 GeV already in the efficiency plateau? I did not find the efficiency curves for the single muon trigger in the analysis note. Would you please include it or point it to me in case I have missed it?
    • We are using the centrally provided SF and did not include the plot but only the reference to the muon POG tiwki (Ref. 24). You can see the plots for 2018 at
https://indico.cern.ch/event/801419/contributions/3330788/attachments/1801407/2938479/190225_wonjun_SingleMuon_TriggerEff_SF_For2018.pdf The SF is essencially 1 for pT>52GeV and 55GeV is certainty in the plateau.

  • 3 - Figure 1: There seems to be a trend in the chi2 distribution for the 2016 data ==> is it acceptable/understood? Also, the uncertainties seem a bit large. Since there are both systematics and statistical uncertainties, it is hard to say from the plot which is the dominant. Would you please specify?
    • These plots are pre fit and this looks mostly like a normalization issue. Because they are pre-fit, the dominant uncertainties are the flat scale uncertainties on the MC yields.

  • 4 - Figure 2: The quality of the fit is not very good for the leptonic tops mass (chi2/ndf is 124/7). Is it understood why? Does it have any impact on the signal event selection?
    • We use the same range for all fits and actually the individual years (shown in the appendix) have better chisq/dof values. In any case, the chisq is just a method to assign the jets to the leptonic or the hadronic side and the outcome of the jet assignment is quite insensitive to the values used in the chisq definition. Please note that we use values for sigma that are quite large compared to the error of the Gaussian fit (See Line 904 in Appendix B).

  • 5 - I wonder if the top quark mass uncertainty should be included in the systematics? Is the chi2 variation included in the systematics? I did not see it listed in section 5.
    • If by top mass uncertainty you mean using MC generated with different masses, then we do not do that. Refer to our previous response about the effect of the chosen values in the chisq. It would not be meaningful to repeat the analysis with different values of parameters in the Chisq definition as it would not result in different jet assignments.

  • 6 - l. 697: what is the uncertainty on A_C^{full}? How did you conclude the uncertainties studied are smaller than the systematics of the A_C^{full} ? Would you please mention it in the text?
    • Fig 10 and 11 show, in a vertical scale that corresponds to the error on AC(full), the variations from the different theory error sources. The horizontal axis was mislabeled but each point in the plot represents one variation for PDF, q^2, FSR/ISR and hdamp. The plot has been fixed in the AN. All these variations are insignificant compared to the uncertainty on Ac from all the other sources.

  • 7 - Figure 14 and similars: I can't say what are some of the nuisance parameters listed in the plot. It's the same in the paper. Shouldn't the names be more suggestive?
    • We remade the plot adding the MC statistical error and using more meaningful names for the uncertanites.

  • 8 - Maybe it is not the scope of this analysis, but did you also consider doing the differential measurement of the A_C?
    • Actually, we do have one inclusive for Mass > 750 and then two differential bins [750,900] and > 900GeV

Comments from Conveners

comments from Jan and Matteo on AN (Nov 23)

  • The impact of MC stat is very large, as indicated by the raking of the BB parameters and the strong constraint on hdamp. Please consider using the boosted samples produced in the context of the energy asymmetry analysis (KIT). These include both the nominal model and the hdamp variations
    • We determined that the constrain observed in the hdamp samples was not due to low statistics in the samples. We also determined that the effect of the hdamp constrain in the final results is negligible. The samples were privately produced and it is taking forever to run on them. Also, only partial samples are available and the 2017 ones were deleted.

  • Out of curiosity, what would the final uncertainty be if you merge the Mtt bins? I understand that from physics considerations it is advantageous to keep them separate, but due to the MC stat issue described above, it could turn out to be the opposite
    • The MC statistics error enters combine for the signal samples only. The main problem comes from the migration in the signal sample between mass bins (750-900) and >900GeV, which we were taking into account in all three results we were presenting (Mttbar > 750, (750-900) and > 900). We have now changed how the combine fit is set up and we are able to get much better results for the MC stats. We fit in one mass bin at the time, corresponding to the 3 results: Mttbar > 750, (750-900) and > 900. Events generated with a mass outside the reconstructed mass bin under consideration are taken into account as underflows or overflows. However, Combine is not forced to try to fit a handful of events that have a reconstructed mass outside the mass bin where we are measuring the Ac.

  • Given the very strong constraint on hdamp, and therefore the unreliable estimate of this effect, the impact of the MC stat uncertainty of the hdamp samples should be studies. The easiest way would be to use a toy-template procedure as in TOP-17-001, as documented in the corresponding AN
    • We were able to determine that the constraint is not caused by low statistics in the samples and that it does not affect the final result.

  • Top pt re-weigthing: for clarity, please mention that you are applying the data/powheg+pythia weights
    • DONE

comments from Jan and Matteo (Oct 12th)

  • In your selection, can a jet be both t-tagged and W-tagged? Or do you first check for t-tags and subsequently for W-tags?
    • The conditions for an AK8 jet to be t-tagged or W-tagged are exclusive of each other and a jet cannot fulfill both, (typo on SD mass fixed) . We have added more information in the selection section to hopefully make this clearer.

  • For clarification: for leptons you use an ID not requiring isolation (as it should be). Both are cut based; are there more powerful IDs around that you could use in case the analysis would profit (e.g. the ttbar reconstruction)?
    • We had compared with the MVA cut a while ago(AN2015_107_v9 attached) and it was found that the cut based ID was better for optimizing the background rejection in the boosted tuba phase space.

  • (303): MET: not too important now, but please don’t forget to change it to “missing transverse momentum” for the paper
    • Done

  • (315): HEM: is the PU distribution affected by this? Do you have control plots showing that? (This might affect more analyses, so please feel free to just refer to a check here)
    • We haven't checked the PU distribution, but the jet eta and phi have been studied. Can be seen in B2G-19-004 (AN2019_298).

  • Section 4 in general: having in mind the upcoming review process, it would be good if you could motivate the choices, categories, and strategy etc a bit better right in the beginning; right now, it is hard to follow. Some comments below are related to that
    • Keep in mind we re-use the event selection that was published in 10.1007/jhep04(2019)031. We added references to the AN that details all the optimizations in object ID and event selection that were done prior to that publication.

  • L 348: How do the AK8 jets and the AK4 selection relate? When do you pick which combinations?
    • We have changed the description of the event selection and categorization and hope this is clearer now. Only AK4 jets that are at dR>0.8 from a top or W tag are considered for jet assignment.

  • L357: Lepton 2D cuts. You claim that the QCD background is reduced strongly. Do you have control plots (in general a few more control plots would help)
    • We include plots only after the event reconstruction that includes the final selection on the chisq and also the categorizaion into the 3 topologies. These are shown starting in Sec 4.2 and also in appendix C. We have added the plots of the two Delta|y| bin also.

  • Fig.1: could you include more details in the relevant range (around 30, where you cut)?
    • DONE, now the shows the 0 to 30 range, which are the events that pass our final selection .

  • All figures: keep in mind that for the paper version of all plots you will need to increase the legend sizes (not needed for AN imho). The rest already looks good in terms of sizes etc. I would also recommend choosing different colours. A great way to do this is using https://colorbrewer2.org/ .
    • DONE. Now the colors are ttbar(l+jets) in red, ttbar(others) in darker red, W+jets in green and others in blue

  • L398: defining these categories (even if you don’t use them explicitly) in the beginning of Section 4 would help the flow of the whole section
    • We have moved them to the end of the event selection (Sec 4.1) before the kinematic reconstruction.

  • L401: in the boosted jet category, is it strictly necessary to require 0 W-tagged jets? Maybe you can check (for signal and main backgrounds) how many events have 1 t-tagged jet and one or more W-tagged jet. Maybe it’s totally negligible, but you may gain some events by being inclusive on the number of W-tagged jets in this category.
    • top and Wtag conditions are exclusive of each other (softdrop mass covers different ranges). We cannot have both at the same time. We also do not expect more than 1 hadronic top in our sample, and it would be either a boosted (top) or semi-resolved (W), not both. We would not expect to gain anything except some all hadronic candidate which we do not want in our sample.

  • Section 5: Also here, starting by stating that you are measuring in the full and fiducial phase space defined by XYZ would improve clarity
    • We have now stated this at the beginning of the unfolding section and then added separate subsections for each of the corrections. We have also added a section for how we implement the likelihood unfolding using Combine.

  • L442: you say that the priors for the nuisance parameters follow a log-normal distribution. This is correct for normalisation parameters, but in all other cases I believe one should use a normal distribution. Is there any specific reason for this choice
    • There was a typo and is now fixed in the AN. We indeed use log-normal for the normalizations only and normal for the other uncertainties.

  • L448: in the formula, where do you account for correlations between nuisance parameters (e.g. from years)?
    • All correlations across the years and channels are taken into account in the combine data cards. We added information about the Combine implementation in a subsection of the unfolding section to make this clearer.

  • About the previous questions, have you had your data cards reviewed yet?
    • Yes, we got the green light on Jan 27

  • L455: In the next presentation in the TMP meeting, I think we should have a discussion about visible and full phase space again, and how exactly the extrapolation etc. is done.
    • yes, that will be helpful

  • Section 5.1: it is not quite clear why this is necessary. In the end you don’t have a choice w.r.t. ‘forward’ versus ‘backward’ in terms of binning, and also the binning is very coarse. This is also reflected in the high and continuous purity and stability, and the fact that you can easily do it without regularisation for the forward-backward unfolding. It is not clear from this section though, what happens to migrations between the mttbar bins, and how they relate. This would be the more important question and should be described.
    • You are correct, we were not taking the possible migrations between the mass bins into account. We are doing this now by starting from 8 bins at generator level instead of 4 (we split the mass in two bins). The AN has been updated accordingly.

  • Figure 7: it looks to me like purity and stability are almost the same in every bin. I am not saying this is wrong, but it would be great if you could double-check this result
    • We have double checked this and it is correct.

  • Table 13: when you say “Shape” you mean that the normalisation component is removed? I would advise against doing this in the visible phase space: the uncertainty should not have a normalisation effect in the full phase space, but due to acceptance effect you will have a genuine normalisation effect in your visible phase space
    • FIXED (it was a typo), all nuisances parameters (except x-sec) have a shape and normalization components

* In the same table, hdamp is classified as a “Normalisation” uncertainty. Is it a mistake in the table?

    • FIXED, yes it was a mistake

  • Do I understand correctly that the JES are still not split in this version of the fit? What else is missing?
    • The JES uncertainty are based on the different sources - there are 10, so it is taking a while but they should be done soon. The JER correction is ready. Both were implemented for version 11 of the AN.

  • Which distributions exactly are used as input to Combine? These should be in the main body of the AN
    • the delta|y| is used as an input, added in Sec 4.3

  • It looks to me like the MC stat uncertainty is quite large in your fit. This is clear from the impact plots, and most likely is also responsible for the strong constraint on hdamp. Would that be possible to a) See the effect of hdamp, on your fit distributions (nominal vs up vs down), including the MC statistical uncertainty b) Re-bin some of the distribution in order to reduce the effect
    • hdamp up/dowm plots are shown below for muons and electrons. We have only two bins. Rebinning is not an option

  • 2018 muon Ttbar (negative \Delta|y|):

  • 2018 muon Ttbar (positive \Delta|y|):

  • 2018 electron Ttbar (negative \Delta|y|):

  • 2018 electron Ttbar (positive \Delta|y|):

  • Why is the effect of the top pt re-weighting two-sided? With one-sided uncertainties it’s common practice to set the down (or, equivalently, up) variation equal to the nominal. In this way the impact will be one sided, and one side of the uncertainty will be unconstrained
    • FIXED, we had symmetricized the set down with respect to the nominal to get an up variation, but have changed to what you suggest.

comments from Robert Schoefbeck

  • L73 Ref to the theory predictions?
    • added

  • L104 13 TeV I hope ...
    • fixed

  • Sec 4.3.2. / 4.3.3 - Please explain if/how the measurement regions for the top and W SF differ from the analysis selection and whether or not it is a concern.
    • The description of the control regions used for the mistag measurement is available in Appendix C. The selection is the same as for the signal region except for an inverted cut on the leptonic term of the chisq discriminator and a veto on b-tagged jets. We believe that the regions are similar in phase-space as to be relevant and at the same time exclusive of the signal region because of the inverted chisq cut and the btag veto.We have added some text in the systematic section to make it easier to follow. This method has been used before and we are confident that it is of no concern.

  • Sec. 4.4 I assume the 2017 Met EE fix is applied?
    • yes our analysis selection uses jets with pT>50 GeV and in range [-2.5,2.5] eta. So this is taken care of.

  • L338 What motivates the dR>1.2 cut (as opposed to 0.8)?
    • this is a typo, now fixed to 0.8

  • L402 please fix
    • fixed

  • Unfolding: Are weight-based inefficiencies accounted for in TUnfold? Please see AN-19-227 Sec. 10 for a detailed explanation. (Please not that a perfect closure when unfolding the MC to itself is NOT a check if the weight-based inefficiencies related to reco level objects are incorrectly applied at the parton level where the closure is checked; please clarify what is done in Fig. 6 in this respect)
    • We are following the procedure used in AN-17-130 and believe it is taking correct care of the weight-based inefficiencies related to reco level objects as we never cut any MC event, the weights are just carried along and might be very small but all events are kept.

  • Please compute the condition number of the unfolding matrix (see Statcom twiki).
    • done, as recommended by the Statistics Committee, we use the singular values instead of using Condition() method. The conditional number is ~10 for all the responses matrices, which together with the very small tau values, allows us to proceed without regularization. We updated the AN.

  • Figure 4 - please add statistical error bars.
    • done

  • Is Fig. 4d from 2017? The lumi value says otherwise.

    • Please remember that the 2017 electron channel does not include Run 2B and therefore has less integrated luminosity. It just happens to be very close to the 2016 luminosity, but not exactly the same. We have added a note in the section to remind the readers of this and hopefully avoid confusion.

  • Please indicate in the caption which row is which. In Fig. 4b,d the stability is asymmetrical, but in opposite ways. Why is it so? Is it significant? Is it numerically relevant for the result?
    • We have improved the caption. We do not know why the up/down variations are not symmetric in the last bins but we have very few events there and also, given the statistical errors, the impact is likely not significant.

  • L471 please fix
    • done

  • L480 Numbers should be justified, i.e., based on previous measurements and, maybe, inflated with extrapolation uncertainties to your measurement regions. Alternatively, can you consider measuring the background normalization in-situ by letting the nuisance float?
    • We use the same prescription as used in AN-2017-130 and assign a 30% rate uncertainty to all MC-derived backgrounds (all in our case, including the ttbar dilepton and all hadronic). We have added a reference to the note. We are not fitting for the background rates in this analysis.

  • L485 Please update to current recommendations with partial correlations.
    • done and AN updated

  • L496, 503. Afaik the Muon SF are partially correlated, e.g. the statistical component of the SF uncertainty is uncorrelated but other sources are not. Please specify whether this is so for the high pt muon ID. Please provide pointers to the exact SFs you are using.

  • We either need to see the impacts for the missing uncertainties or a study that shows they are negligible. Otherwise, we can't know whether the strategy is feasible.
    • We show here the impacts for the entire sample, and separately for boosted, semi-resolved and resolved. As you can see, top pT dominates in all, but top tag is important in the boosted one, where you expect it to be, and not in the resolved.

* Impacts for all combined

* Impacts for resolved

* Impacts for boosted

* Impacts for semi-resolved

  • L567, Fig. 9 As said during the talk, the constraint on top pt should be understood. For example, if there is a mismodelling of the acceptance in one of the categories, it could easily lead to a bias/constraint because your measurement regions roughly correspond to top-pt. I think a dedicated study of the effect of top-pt reweighting is needed.It looks like your pre-fit top_rew uncertainty is already substantially different between e+jets and mu+jets. This should be explained,I see no reason for that.Looking at the top pt shapes in the backup (thank you for adding these), I do not find the pulls so surprising. Can you show a reweighted top pt spectrum and e.g. also bin the reweighted shapes in the analysis categories?

* We have plotted the top pT for the leptonic and the hadronic top in the muon and the electron channel in 2018 to compare data with the reweighed MC with its error, which is given as symmetrized difference between the MC ttbar pT with and without the ptreweighing correction. As can be seen, the description and the error appears adequate in our signal selection (chisq<30 and Mttbar>900 GeV), however, the correction has a trend at high top pT where it is not sufficient as preferred by the data.

  • Hadronic top pT for the 2018 muon signal sample:

  • Leptonic Top pT for the 2018 muon signal sample:

  • Hadronic Top pT for the 2018 electron signal sample:

  • Leptonic Top pT for the 2018 electron signal sample:

* We show below a comparison between the data and MC with and without the top pT reweighing in the 3 regimes: resolved, Boosted and Semiresolved. As you can see, the correction helps in all cases and the uncertianty covers the difference between the corrected MC and the data. Keep in mind that in the combined sample the main contribution comes for the resolved regime, then the boosted and a small contribution from the semi-resolved.

  • ttbar pT Boosted:

  • ttbar pT Resolved:

  • ttbar pT Semi-resolved:

  • I suspect hdamp could provide a substantial uncertainty, will be interesting to add it.Once the missing systematics are there, it will be good to show the systematic correlation plot to learn more about the important players.
    • It does not seem to be very important as you can see in the file below that shows the main systematics for the ttbar signal. We also include the correlation plots
* Systematics for signal

  • hdamp and toppT systematics:

  • correlations:

  • Fig. 71p How is it possible that toptagUp has zero effect (Does it mean the plot is dominated by dileptonic top)?
    • Our candidate sample is dominated with the resolved top sample, we do consider ttbar dilepton and all-hadronic as backgrounds

  • Fig. 72p Does the size of the variations make sense with the top tag SF uncertainties? Please comment on that in the main body of the AN.
    • We believe the answer to your question is the same as above though, our candidate sample is dominated with the resolved top sample.
  • Please comment on ultra-legacy usage for later / paper
    • We have been approved to stay with the current samples for this publication, will move to ultra-legacy and extend to lower Mttbar regions later

  • before Eq.1 - a short discussion of the BSM models predicting modifications of Ac should be added.

    • Done

Object Review

MC concerns

comment from Enrique Palencia Cortezón:

  • I suggest that for 2016 ttbar and single top, you use the CP5 samples.
    • we switched to the CP5 samples for 2016 and updated the AN accordingly

Trigger concerns

comment from Charis and Nicolò:

  • In order to ease our review and the book-keeping of all the analyses reviews, we would ask you to fill the questionnaire in the TOP trigger TWiki, in particular listing the trigger paths you are considering, the scale factors you are applying, and the relevant AN where we can find the details. fill out this questionnaire: https://twiki.cern.ch/twiki/bin/view/CMS/TopTriggerTriggerScaleFactors
    • Done

Follow up from questionnaire:

  • For the muon triggers/trigger SFs we agree that the strategy is fine since you are following the recommendations and using the appropriate centrally provided SFs.

  • For the electron cross trigger SF derivation we are a bit confused by the approach you are following, since you are referring to it as 'Tag-and-Probe'. What we would like to understand is : - Are you indeed implementing a T&P approach? If this is the case we would like to understand which is the tag/probe/passing probe selection and what peak do you reconstruct in eμ events?
    • In our method that uses the ttbar e mu channel, the tag is the muon and the probe is the electron. Both leptons pass the tight cuts we use in our candidate sample selection. We do not look at the peak in the sense of the Z to dilepton tag & probe method, but we show with the plots in the appendix that the resulting sample is a very pure ttbar e, mu dilepton sample, which means that the "probe" electron should pass the electron trigger.

  • Could you be using the orthogonal dataset approach? We understand that you are using an orthogonal, to your analysis region, set of events (eμ di-lepton events) and a reference trigger path (HLTMu50) to determine the trigger efficiency. Is this the case? If so, what dataset are you using to determine the trigger efficiency, SingleMuon or SingleElectron?
    • We use the SingleMuon dataset and the muon trigger to measure the electron trigger efficiency. We changed the description in the AN to orthogonal sample to make it clearer and we added the information that we use the Single Muon dataset.

  • Also, taking a look at the trigger efficiency and SF plots we observe that there are many bins with very large uncertainties.
    • Is this a result of low stats? Have you tried further optimizing the SF binning?
    • For events that fall in the empty bins what is the SF and corresponding uncertainty that you apply?
    • Do these large electron trigger SF uncertainties have a large/ high ranking impact in your final fit?
      • We received this comment from the eGamma POG, and they suggested to merge bins. This was done and is documented in the appendix

JET/MET concerns

comments from Ashley and Mikael

  • You need to fill out the survey on jet/MET use before we can begin the review [1], to save yourself time please check that what you did is consistent with the recommendations for JER [2] and JEC [3] and explain if there is any variation, as that is what I will ask about.
    • Filled out the survey and we conform that we have no departures from the recommendations

[1] https://twiki.cern.ch/twiki/bin/viewauth/CMS/TopJetMETSurvey

[2] https://twiki.cern.ch/twiki/bin/view/CMS/JetResolution#JER_Scaling_factors_and_Uncertai

[3] https://twiki.cern.ch/twiki/bin/view/CMS/JECDataMC

  • L 184 : “pileup-hadrons” -> “pileup candidate” hadrons to be consistent with previous description on L 180 (same comment for L 254)
    • made the correction and updated AN

  • L 263 : Please explicitly state the JER version used as you did for JEC on L 261
    • Summer16_25nsV1*(2016v3), Fall17_17Nov2017_V3_* (2017), Autumn18_V7_* (2018). AN has been updated

BTV concerns

questions/comments from Jan and Denise:

  • Please make sure that you also apply nJets/HT-based corrections to your phase-space (omitting b-tagging requirements) as explained in our TWiki for the discriminator reshaping method [1]. Side note: If you are using UL samples, that effect might be negligible, we have not investigated this with UL. In the case of the Re-reco samples, these corrections have to be applied
    • Yes, we are applying the 2D correction (nJets/HT) to all events that pass our events selection and reconstruction as instructed. This has been documented on an Appendix

  • When using dedicated tagging SFs for your AK8 jets, remove the AK4 subjets of these jets from the collection of jets that is considered for deriving your AK4 b-tagging SFs.
    • Yes, we don't take into account the AK4 subjets for deriving AK4 b-tagging SF's.

  • In l. 194, you say you are using the btag shape reweighting method. The description of your systematic uncertainties in l. 370 then does not match your chosen reweighting method. For this method, there are in total 8 uncertainties that are applied to jets of all flavors. The recommended correlation scheme is detailed here [2].
    • Yes, we are applying the btag shape reweighting method with all 8 uncertainties (cferr1, cferr2, hf, hfstats1. hfstats2, jes, lf, lfstats1, lfstats2) and taking into account the correlation. The AN has been updated with more details

  • As soon as you obtained your results, we would like to have a look at the impact plot as well, to make sure that no unexpected behavior is observed for the b tagging nuisance parameters. Please consider filling in the TOP BTV questionnaire by then as well [3].
    • We added an appendix with all the details and filled out the questionnaire in [3]

[1] https://twiki.cern.ch/twiki/bin/view/CMS/TopBTV#Common_mistakes_when_deriving_b

[2] https://twiki.cern.ch/twiki/bin/view/CMS/BTagShapeCalibration#Correlation_across_years_2016_20

[3] https://twiki.cern.ch/twiki/bin/view/CMS/TopBTV#Analysis_review

EGamma concerns

questions comments from Alessia and Mohsen

*Can you confirm that the ID that you are using is indeed the recommended Fall17v2?

Yes. we are using the recommended Fall17v2 ID

  • Can you comment on the procedure that you use to remove the isolation cut from the ID definition?
    • Isolation can be applied separately from the other ID cuts and we do not apply any isolation requirement (track or calo-based) offline. We also use a trigger that does not include any isolation requirement. However, we realized that the SF is not centrally provided without isolation and remeasured it ourselves. AN was updated with the study and the SF were shown at the EGamma POG meeting on July 2nd.

  • Are you applying the reconstruction scale factors? You only mention the trigger and ID scale factors, but you should also apply the reco ones as specified here : https://twiki.cern.ch/twiki/bin/viewauth/CMS/EgammaIDRecipesRun2
    • yes, we are applying the reconstruction scale factor. AN has been updated to make this clear.

  • About the electron trigger SFs, I see that you derive them yourselves, but they should be approved by the EGM POG first. In this case, if not already done, you should present the SFs in the Egamma Reco/Comm/HLT meeting and get green light from them.
    • We presented our results on June 11th and they were approved. They did comment though that we need to measure the ID SF without isolation ourselves, which we did ane presented at the July 2nd meeting. They have been documented in the AN.

MUON concerns

  • you are using the high Pt ID, but this selection does not use the Particle-Flow algorithm. From this twiki page https://twiki.cern.ch/twiki/bin/view/CMS/SWGuideMuonIdRun2 you can read: "The High-pT selection does not use the Particle-Flow algorithm. Please consider this option ONLY if you do not use the Particle-Flow event description in your analysis. If you do, start from the Loose (or Tight) ID and then consider possible addition (or removal) of further quality cuts." It is also true that your pT range is not that high, so using PF might be fine in your case, but we just would like to know whether you considered this or not.
    • We use Particle-Flow for all objects except for the muon. This has been made clear in the object ID section of the AN

  • It seems to us that you are not applying any ISO SFs. Since you don't have a standard ISO cut, the official ISO SFs are not suitable for your analysis and you should compute these SFs yourselves, then write this to the MUO POG.
    • Our 2D cut is really a topological cut and not an isolation cut. It has been studied with 2016 data using a ttbar dilepton (e, mu) sample and the SF was consistent with 1. See AN-2015-107, Section 5.2. However, we ended up adding a 15% uncertainty as requested (see below).

  • Is there a reason why you are not considering reconstruction SFs? Are they negligible? * reconstruction SFs are negligible but we are applying them. The AN has been updated.

  • Are you applying any additional uncertainty to cover the phase space extrapolation (Zs-to-ttbar) in the SF computation? You can apply 0.5% per muon on the ISO component, following results in http://cms.cern.ch/iCMS/jsp/openfile.jsp?tp=draft&files=AN2018_210_v4.pdf. * We looked at the note you suggest and the 0.5% for muons is specifically for the extrapolation of the isolation component of the SF. Because we do not apply any isolation cut on our muon, we think this is not applicable. Also, we are applying extra 15% flat uncertainty to the 2D cut (our "isolation") which is already very conservative.

Comments from Federica, Sergio, George, and Clara

  • About the SF error bars, they are unphysical and need to be correctly computed. “Error propagation” is quite vague and we want to make sure exactly how you compute them. Our recommendation is to use the TEfficiency class in ROOT https://root.cern.ch/doc/master/classTEfficiency.html, where the correct methods for these cases are implemented.
    • I have been using TGraphAsymmErrors. If I look here : https://twiki.cern.ch/twiki/bin/view/CMS/DataMCComparison. TGraphAsymmErrors is a valid option to use. I have also checked by hand the errors and they are almost the same as what I get through TGraphAsymmErrors. Do you still think I need to switch to TEfficiency? Regarding the negative errors- they are what I am getting, from what I can see from the definition in TEfficiency, I dont think the negative error would be solved. How do you think I should handle this?
    • follow up comment - Thanks for the checks! It is fine but we think you should modify the errors so they don't go below zero.

  • You claim that the SF is negligible as the values are compatible with one but that is not true in all cases, see: dRmin_mu2_jet 2017 (fig 52 c), range 0.3, 0.4 by eye: MC = 0.55 with negligible uncertainty data = 0.70 \pm 0.05 ,SF = 1.27 \pm 0.10 (reported in fig 54 c)
    • Not compatible with unity. What we meant is that it is very close. Do you propose using an overall 10% error on this? I feel that might be too much
    • follow up comment- Even if you are not going to apply SF, we think an uncertainty should be added any way. You have very large error bars in the first bin so maybe the best strategy is to merge bins, but an extra uncertainty should be applied.
    • We ended up applying a flat 15% uncertainty to cover the first bin.

  • Is there a reason for the worse data/MC agreement in 2017 (fig. 50)?
    • I have double checked this. There are no errors, 2016 and 2018 does appear to be slightly better.

  • 4) Our suggestion for the mass plots, for example 51 a) b) was to change the width of the bins so they are more readable. On the other hand, the p_Trel plots which were good in AN-21-069_v4, now have very coarse binning (entries/25GeV, see fig 51 c) d) for example)
    • I have fixed this. AN s now updated.

  • It looks like data is systematically below MC in 2017 and this does not reflect into the efficiencies and SF (Figure 28 and 29 c-d). Could you check this? Maybe it is due to the fact that the range is very large. It could be useful to plot the efficiencies and SF with a better granularity in dR, maybe also add plots as a function of pT_rel.
    • The plots in Figure 28 and 29 are the efficiency of the 2D cut on data and MC plotted separately and not the ration of Data/MC. The pT_rel cuts were not included initially as effectively, the 2D-cut can be seen as a cut on pT_rel, for events with deltaR<0.4, which means no cuts are applied for deltaR<0.4. Also added the pT_rel plots now.

  • Fig 25,26 and 27 e and f -> are those identical??
    • Yes, added both mu+jets plots by mistake, now fixed.

  • Fig 15,26 and 27 a and b -> not optimal binning of the mass plots
    • fixed.

  • Fig 28: data point for dR>0.4 are not reported right? Maybe you can add a line to the AN saying it explicitly and adding the reason why.
    • yes and done

  • Fig 29: error bands are unphysical because they go to negative values.
    • yes, I calculated the errors through error propagation. Should I just remove them?

  • Fig 29: not easy to read, could you plot an horizontal line at 1?
    • done

  • How are jets defined for the deltaR(mu, jet) measurement.
    • at least 2 AK4 jets are selected with pT>50 GeV and |eta|<2.4.

Comments on January 25th:

  • Figure 44: Uncertainties have been reduced a lot since last version of the note. Can you explain why? These uncertainties are obtained by propagation of stat. only uncertainty, right? In the past there where negative values, and we agreed on correcting this, as it was unphysical but now they are much smaller than expected. As a consequence the SF is not compatible with 1 for the first bin (in all three years) anymore, so looking at this plots now we would indeed recommend to apply the SF. But please clarify the previous points first.
    • If you remember, the first and the second bin had negative values. These values have been calculated using following: https://twiki.cern.ch/twiki/bin/view/CMS/DataMCComparison. They do end up having negative values. I modified the uncertainty on the first and the second bin to be truncated before it goes below 0 (I made both the positive and negative errors the same). So the uncertainties for the first two bins are smaller and the middle two bins are the same. The SF itself hasn't changed- do you suggest I show the actual positive uncertainty on it (which does make it compatible with 1). I thought I should treat both +/- uncertainty the same way. I have updated the plots to show the errors as is, in the positive side.

  • In the systematic section it is not stated that an uncertainty is applied to cover for the 2D data/MC differences. The 5% value for this uncertainty is not well motivated by plots in section A.3. You also mentioned in the past that in previous versions of the analysis a 5% was applied, can you point us to the studies that motivated that 5%?
    • This hasn't been implemented yet, but will be. We are rerunning the fit with some changes. I have updated the AN to include this in the systematics section. The paper was B2G-17-017. Since it was applied as a flat uncertainty, I think it got overlooked in the PAS as it doesn't matter in the fit. But if you look at the twiki for that analysis: https://twiki.cern.ch/twiki/bin/viewauth/CMS/B2G17017Review#Muon_Review_AN1 (last question in the muon section): comment: Lines 484-488: Do you apply the additional systematic uncertainties of 1% (ID) and 0.5% (trigger) recommended by the MUO POG [3]? answer: Yes, we apply the recommend muon POG ID and trigger uncertainties as 1 sigma shape systematics. They are fit to their final values during the MLE process. We will double them in light of the fact that we don't use the High-pt muon ID and the loose trk relative isolation. So that comes to ~3%. I had suggested applying a flat 5% to be conservative but have changed to 15% following your recommentation

CWR comments to the paper (TOP-21-XXX-paper-vxx)

comments from Jan and Matteo paper draft (23rd Nov)

  • Intro: when explaining the difference in tt production between the Tevatron and the LHC, I would mention that qq annihilation is also relevant at the LHC for high Mtt, which is the topology you are looking at. Also, this is one of the reason why you would expect higher sensitivity to Ac in the boosted regime, not only in terms of BSM, but also in QCD. This should be better highlighted.
    • We added a sentence "Since the relative contribution of valence quarks increases at high momentum transfer~\cite{PDF4LHC}, we expect that measuring A_C in a highly boosted ttbar sample will lead to a more stringent probe of quantum chromodynamics (QCD) predictions and higher sensitivity to BSM physics processes that might alter the asymmetry~\cite{arXiv:1109.3710}."

  • When discussing BSM contributions, add references and optionally even mention some models explicitly, e.g. W/Z’, colour triplets. I see now. you mention some models later, but it’s probably better already here. Then you also avoid switching from a theory paragraph to describing the Tevatron measurements, to BSM theory, to introducing your measurement.
    • We moved this up and also added more deferences.

  • L. 66: I think soft drop and n-subjettiness can be mentioned explicitly here.
    • as we do not later mention these two variable any more, we think that the wordy description we have now is enough and the variables themselves do not need to be defined with a special name.

  • L. 69: It is not clear which b-tagger are you using in the resolved regime from the text here. Please mention DeepJet+refs explicitly
    • Added E. Bols et al 2020 JINST 15 P12012, let us know if this is the one you had in mind.

  • L. 86ff will need references
    • Done

  • When explaining the categories, please mention that the t and W tags are exclusive, as you do in the AN (unless we missed it in the paper)
    • Added this information when we first introduced top and W tag around like 65.

  • L. 121ff: please also mention the modelling of statistical MC uncertainties (opt. with BB-lite), in particular since they play a major role.
    • Done

  • Fig1: All labels, legends etc must be increased in size. The simple rule is: A capital letter in the figure should be at least as big as a small letter in the caption. I would also propose to change the colours, in particular the light green
    • We are working with the LE on the figures for the paper

  • Eq.2 missing definition of “k”. If it refers to the channel, you can add it in the previous line “For each channel k in our analysis”
    • This was changed after we redefined the channels

  • L142: I would use the word “priors” rather than “constraints”. I don’t find it wrong from the language point of view, but “constraint” is usually intended as the post-fit constraint
    • Changed

  • Please fix Eq 4 (and its reference in one of the following lines)
    • Fixed

  • I am not sure that L173-174 are relevant. Would it make any difference to set them to different values? As long as no prior is associated to them (i.e. they are POI) this shouldn’t be the case
    • We are indeed setting them as POI and the value will not not matter for the unblinded results. With the data blined, if we were not to set the values the errors would not make sense.

  • For the luminosity, I understand from the AN that you use the correct correlations between the years. It would be good to mention around line 180 that the quoted numbers are not uncorrelated (and maybe provide an estimate of the correlation). You can refer to other Run2 papers in case
    • Done

  • Table 3: I would put the first row at the end, and explicitly write “combined” instead of > 750 GeV. Otherwise it would give the wrong impression that it refers to an inclusive measurement of Ac in the kinematic region Mtt > 750 GeV
    • This has changed because of our re-definition of the channels

Things that are missing in the paper draft

  • Some control distribution to appreciate data/MC agreement
    • We added mass and Delta|y| plots

  • The pre-fit and/or post-fit plots of the distributions used in the fit
    • We added both pre and post-fit plots for Delta|y| in each channel to the paper

  • The impact of the various systematic uncertainties on the final result
    • This was added after unblinding

Comments to the pre-approval talk (AN-2021-069)

comments after Pre-approval (Nov 25)

  • Please ensure that all of the object reviews are fully satisfied before unblinding
    • only stat did not respond but Combine approved the Data cards and error implementation

  • Please improve the MC statistics if possible, e.g. explore the boosted samples
    • We are running on the 2017 boosted samples and will see how much they help.

  • W+jet components are not too small on page 9 but nearly invisible on page 10, while the selections should be the same. Please check/verify.
    • The difference between page 9 and 10 is that we applied b-tagging, which significantly reduces the W+jets background.

  • Demonstrate/cross check if the asymmetry of the background does not course any issue; at least checked with MC
    • Theoretically, we do not expect any asymmetry on the backgrounds. We cannot check this at generator level because we need to define the leptonic and the hadronic top with our reconstructions for the background samples that have no true ttbar decay. We did that and the measured asymmetry for the main backgrounds (W+jets and single top) is zero. The main background is actually ttbar dileptons for which a true asymmetry exists. But we do include ttbar dilepton events that pass our selection and reconstruction as signal, i.e. they are not part of the background which indeed has zero Ac.

Topic attachments
I Attachment History Action Size Date Who Comment
PDFpdf ARC-Authors_Meeting_April_6.pdf r1 manage 4027.2 K 2022-05-10 - 22:36 CeciliaGerber ARC-Authors meeting slides April 6, 2022
PDFpdf ARC-Authors_Meeting_Feb_15.pdf r1 manage 2329.9 K 2022-05-10 - 22:35 CeciliaGerber ARC-Authors Meeting Slides Feb 15, 2022
PDFpdf Ac_750_full-approvalVersion.pdf r1 manage 48.5 K 2022-04-05 - 16:13 CeciliaGerber Impacts for M750 shown at approval talk
PDFpdf Ac_750_full-fixedlumi.pdf r1 manage 47.3 K 2022-03-14 - 22:32 CeciliaGerber Impacts when fixing luminosity to the nominal value as a test
PDFpdf Ac_750_full-new.pdf r1 manage 49.2 K 2022-03-18 - 18:15 CeciliaGerber Impact after separating the 2D SF nuisances
PDFpdf Ac_750_full.pdf r3 r2 r1 manage 49.2 K 2022-03-18 - 18:14 CeciliaGerber Impact after separating the 2D SF nuisances
PDFpdf Ac_900_full.pdf r1 manage 44.6 K 2022-02-24 - 18:48 HugoAlbertoBecerrilGonzalez  
PNGpng Ak4_j1_phi.png r1 manage 22.6 K 2022-02-11 - 22:44 TitasRoy  
PNGpng MET_phi.png r1 manage 22.6 K 2022-02-11 - 22:45 TitasRoy  
PNGpng SystVariation_Ttbar_1hdamp_elec-1.png r1 manage 40.9 K 2021-11-03 - 17:59 HugoAlbertoBecerrilGonzalez  
PDFpdf SystVariation_Ttbar_1hdamp_elec.pdf r1 manage 14.3 K 2021-10-20 - 19:45 HugoAlbertoBecerrilGonzalez  
PNGpng SystVariation_Ttbar_1hdamp_muon-1.png r1 manage 42.7 K 2021-11-03 - 17:59 HugoAlbertoBecerrilGonzalez  
PDFpdf SystVariation_Ttbar_1hdamp_muon.pdf r1 manage 14.4 K 2021-10-20 - 19:45 HugoAlbertoBecerrilGonzalez  
PNGpng SystVariation_Ttbar_2hdamp_elec-1.png r1 manage 41.1 K 2021-11-03 - 17:59 HugoAlbertoBecerrilGonzalez  
PDFpdf SystVariation_Ttbar_2hdamp_elec.pdf r1 manage 14.3 K 2021-10-20 - 19:45 HugoAlbertoBecerrilGonzalez  
PNGpng SystVariation_Ttbar_2hdamp_muon-1.png r1 manage 43.0 K 2021-11-03 - 17:59 HugoAlbertoBecerrilGonzalez  
PDFpdf SystVariation_Ttbar_2hdamp_muon.pdf r1 manage 14.4 K 2021-10-20 - 19:45 HugoAlbertoBecerrilGonzalez  
PNGpng correlations.png r1 manage 1229.3 K 2021-07-11 - 23:06 CeciliaGerber hdamp and toppT systematics
PDFpdf hdamp-study-May10.pdf r2 r1 manage 1409.9 K 2022-05-12 - 22:42 HugoAlbertoBecerrilGonzalez  
Unknown file formatpptx hdamp-study-May10.pptx r1 manage 2194.2 K 2022-05-12 - 22:40 HugoAlbertoBecerrilGonzalez  
PDFpdf hdamp-study-May12.pdf r1 manage 1410.4 K 2022-05-12 - 22:43 HugoAlbertoBecerrilGonzalez  
PNGpng impacts_all.png r1 manage 244.5 K 2021-07-12 - 19:01 TitasRoy  
PNGpng impacts_boosted.png r1 manage 242.9 K 2021-07-12 - 19:17 TitasRoy  
PNGpng impacts_resolved.png r1 manage 244.6 K 2021-07-12 - 19:17 TitasRoy  
PNGpng impacts_semiresolved.png r1 manage 251.3 K 2021-07-12 - 19:08 TitasRoy  
PNGpng pT-had-electron-2018.png r1 manage 65.7 K 2021-07-09 - 01:18 CeciliaGerber Hadronic Top pT for the 2018 electron signal sample
PNGpng pT-had-muon-2018.png r1 manage 64.3 K 2021-07-09 - 01:14 CeciliaGerber Hadronic top pT in the 2018 muon signal sample
PNGpng pT-lep-electron-2018.png r1 manage 60.9 K 2021-07-09 - 01:19 CeciliaGerber Leptonic Top pT for the 2018 electron signal sample
PNGpng pT-lep-muon-2018.png r1 manage 64.4 K 2021-07-09 - 01:16 CeciliaGerber Leptonic Top pT for 2018 muons signal sample
PDFpdf r_1.pdf r1 manage 39.1 K 2022-02-24 - 18:48 HugoAlbertoBecerrilGonzalez  
PDFpdf r_2.pdf r1 manage 38.9 K 2022-02-24 - 18:48 HugoAlbertoBecerrilGonzalez  
JPEGjpg systematics.jpg r1 manage 832.8 K 2021-07-11 - 22:52 CeciliaGerber files
JPEGjpeg systemtics2.jpeg r1 manage 263.3 K 2021-07-11 - 23:06 CeciliaGerber hdamp and toppT systematics
PNGpng ttbarpTBoosted.png r1 manage 140.4 K 2021-07-11 - 23:17 CeciliaGerber ttbar pT
PNGpng ttbarpTResolved.png r1 manage 256.0 K 2021-07-11 - 23:17 CeciliaGerber ttbar pT
PNGpng ttbarpTSemiresolved.png r1 manage 181.3 K 2021-07-11 - 23:18 CeciliaGerber ttbar pT
Edit | Attach | Watch | Print version | History: r171 < r170 < r169 < r168 < r167 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r171 - 2022-07-30 - CeciliaGerber
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback