Review of AN -17-226


Pre-OR comments

Comments from B2G meeting

QCD closure shows some disagreement, could look at for each QCD MC HT bin

  • Here is the QCD agreement in the HT 1000, 1500, 2000 bins for higgs pt,eta, and the Mthb. There does not seem to be a consistent story here. It seems like there is a slight deviation in 1500, but 1000 and 2000 are quite close. Looking at the full distribution, the agreement is at the ~12% level with ~9% uncertainty. Overall I think there is not significant evidence for a bias, but to be conservative, we will include the 12% "non-closure" as an additional systematic


pth1000.png etah1000.png mthb11000.png


pth1500.png etah1500.png mthb11500.png


pth2000.png etah2000.png mthb12000.png

b inverted region seems to have a higher QCD Higgs pass to fail ratio than the b pass region. We would expect this to be the opposite

  • After studying this in detail -- this seems to be due to the disambiguation procedure. Previously the order of the higgs and b anti-tagging in the SB5 region was taken randomly due to the large overlap in order to not introduce a bias based on the pt ordering. However, because the b is of AK4 origin and the has a lower pt requirement, it is more likely to match multiple jets. In the case that the b mistag selection was attempted on the higher pt jet, the Higgs mistag selection was very likely to fail because of the AK4-AK8 deltaR requirement. This still leads to a consistent background closure test, but it is the reason for the TF value change in the b0 region. There is a more natural way to disambiguate in the SB5 region while still avoiding an ordering bias.

  • To fix this, we use the new procedure which -- regardless of object/selection -- choses the candidate randomly in the case of multiple tags. The b is then always identified only after AK8 tagging. This gets rid of a pt ordering bias while keeping a more analogous ordering to the signal region. The change to the analysis and agreement is very small, but this directly addresses the issue with the different TF value in the b0 region and signal region.

TF is signal region TFSR.png TF is b0 region TFSB.png

Comments from Jim

L110 Please include a reference for the mass correction

  • Done. I think 17-001 makes the most sense here

L124: Higgs tagging has priority over t-tagging. Does this take into account the different mass corrections used for t-tagged jets and H-tagged jets? (If there was a jet with L2L3 corrected mSD=136 GeV and Wmass corrected mSD = 134, could this jet be both t-tagged and H-tagged?)

  • An event can only record one tHb candidate. We try to tag all considered AK8 jets with both H and t and then check for overlap. In the case where a jet is ever tagged as both a top and higgs, it is considered a higgs. So then a jet can never be considered both. This ends up not having a huge effect because tagging a Higgs is much much more restrictive than tagging a top with our selection.

Section 3.5 B-tagging: More information on which SF are used and how they are applied is needed. Which specific .csv file is used (CSVv2_Moriond17_B_H.csv)? Are you using subjet_CSVv2_Moriond17_B_H.csv for the subjets? For the double b-tag SF please reference a specific version of the twiki or include the numbers in the note.

  • we use CSVv2_Moriond17_B_H.csv (added to documentation). Double-b info added as well. No b tag SF is used on the subjets, instead we use the inclusive top tagging SF.

Section 4: Overall I would say this section is a bit hard to understand and could benefit from some rewriting. I think defining your tagging and anti-tagging definitions and supplementing Table 3 with the following would also help: SR: 1 t-tag 1 H-tag 1 b-tag SB1 1 anti-t-tag 1 anti-H-tag 1 b-tag SB2 1 anti-t-tag 1 H-tag 1 b-tag SB3 1 t-tag 1 anti-H-tag 1 b-tag etc.

  • We have changed the table in the documentation -- This is much better as the regions are all used multiple times.

L134-137: Here you describe SB1 but you don’t use “SB1” in the text. I think this would help clarify the method.

  • Here we describe the H inverted selection to describe the concept of the background estimate, but this is used in multiple regions so I think it makes sense to have this separate from the definitions.

L140: “Higgs pT spectrum”. What does this mean exactly? The Higgs candidate jet pT spectrum?

  • Yes, this is reconstructed Higgs candidate pt. Changed in text

L141: What is the “full Higgs selection” ?

  • All cuts required for the Higgs tag as if an event were to be put in the final plot. So everything included in the Higgs column in the row labeled SB2 of table 3 (which is the same as the signal region).

L138-143: To make sure I understand, the TF numerator is the H-tagged jet’s pT spectrum from SB2 and the denominator is the inverted H-tagged jet's pT spectrum from SB1?

  • Yes. This in theory is very similar to a "Higgs tagging rate" approach except explicitly inverting the selections leads to less signal contamination. Unfortunately also at the cost of being more difficult to explain

L144: SB3 is “a region with all selections applied except requiring the Higgs inverted selection in data”. - I don’t think it is clear from the text what “all selections applied” means. Could you specify in the text? (it is clear in the table)

  • clarified that these are the top and b selections

L154: There are 3 sidebands but 2 contamination numbers listed here.

  • I am confused here -- there are three numbers, the third is defined in the following sentance

L162: Is tt also subtracted for the closure sidebands? It would be good to state it explicitly.

  • Yes, the note has been updated

L215: Why make this assumption?

  • We studied this using gen-matching and it is overwhelmingly b origin (>95%), so it makes sense for simplicity. Clarified in note.

L211-223: Do you take into account the subjet b-tag scale factors and uncertainties?

  • We use the inclusive top tag SF

Overall: I didn't see any mention of optimization. How did you choose the loose b-tag WP, t-tag WP etc.?

  • Optimization for us is a slow and involved process, so we do it approximately. We optimize by starting at the loose points and shifting to the next tighter WPs for each handle. Then calculate expected limit and discovery potential. If it improves then we move to the next tighter point. The main reason why it is so complicated is the process of extracting a "close enough" background approximation. For many of the regions the QCD MC does not have significant enough statistics to get a background estimate (that does not have unphysically large systematic uncertainty) so we produce approximate data driven methods when possible. The procedure unambiguously suggests a tight H tag WP and looser t, b working points. However we dont think the description should be included in the note because of the complexity and inability to keep up to date with analysis changes. Although perhaps keeping the same background shape and scaling the QCD with MC normalization would have made more sense.

Followup Followup from Devdatta

The AN is lacking key details from previous rounds of questions. Please include all responses in the TWiki also in the AN.

Also please improve the figures: axes titles, legends etc are very difficult to read for most.

  • In progress

Please change B'-> B and T' -> T.

  • In progress

One of my main concerns is still the top pt reweighting. The improvement it brings is still not clear from Fig 7. The shape change is negligible. How does it transform to the shape change for the reconstructed W', T and B quark mass distributions? Is there a strong concern that not including the top pt reweighting will result in significant degradation of the search sensitivity?

  • The largest shape bias should be in top pt. Here is a comparison before and after reweighting. After reweighting shows an overall better agreement


  • There is a very small effect on the limits as we see with the ptreweight-on and ptreweight-off limits here.



Fig 7: Signal lines not visible.

  • In progress

Sec 2: Please describe the theory parameters sL and cot(theta2).

  • Done

LL38-44: What motivates the choice of the BRs of the W' and the VLQs? What motivates the choice of the widths? Does the narrow W' sample work well to describe a signal of width 14%? What are the typical resolutions of the reconstructed signals?

  • BR to vlq is from the theory paper listed. Width is designed for cases where the tB/bT dominates. For now we are using narrow for all signals, but there are options to change this assumption if needed

What is the jet selection eta?

  • 2.4, added to note

LL79: "HLT PFHT475 or HLT AK8PFJet260 trigger" do you apply a simultaneous OR? What other cuts go into the turn-on plot?

  • Yes we use the OR for the prescaled triggers just like for the un prescaled versions used in the full selection. What is a simultaneous OR? However we were worried that the prescaled HT trigger could have the same HT bug as the full trigger (potentially biasing the ratio). This was checked by only requiring the prescaled jet trigger, but there was no discernable effect. The only other cuts are requiring at least three AK4 jets and the preselection of two AK8 jets pt>200 In order to roughly but the selection into the kinematic regime of the analysis.

Fig 1: What is the trigger turn-on as a function of M(tHb)? Are you on the turn-on for the full mass range?

  • To study this without the full analysis assigning of flavors I just get the mass of the top three AK8 jets for the first attachment. The second is the same plot except in the approximate minimum pt range for boosted objects (pt>400 for leading >300 for subleading pt>200 for tertiary). For a boosted analysis we have nearly full efficiency



Table 2: Why does the signal efficiency drop suddenly for M(W') = 1500 GeV and M(B'/T'?) = 1300 GeV?

  • Optimistically, this will only leave 200GeV for the jet coming from the W' - which will largely fail the kinematic selection. The high and low VLQ mass are meant to be near the edge of where the boosted signal region starts to fail kinematically.

Ll130-131: "The sensitivity of the selections used in the analysis have been studied both in the context of expected limit and W 0 discovery potential." --> Can you put the optimization results in the AN?

  • Optimization for us is a slow and involved process, so we do it approximately. We optimize by starting at the loose points and shifting to the next tighter WPs for each handle. Then calculate expected limit and discovery potential. If it improves then we move to the next tighter point. The main reason why it is so complicated is the process of extracting a "close enough" background approximation. For many of the regions the QCD MC does not have significant enough statistics to get a background estimate (that does not have unphysically large systematic uncertainty) so we produce approximate data driven methods when possible. The procedure unambiguously suggests a tight H tag WP and looser t, b working points. However we dont think the description should be included in the note because of the complexity and inability to keep up to date with analysis changes. Although perhaps keeping the same background shape and scaling the QCD with MC normalization would have made more sense.

Figs 6, 10: Sorry but it is just not clear how the tH and bH candidates are built. Once you have an event with b, t and H jets, you make the bH and tH pairs, and plot them as the B and T quarks? Since you cannot simultaneously have both hypotheses, you apply a weight of 0.5 to each? Please include the details of the algorithm in the AN.

  • We make bH and tH for every event regardless of whether the sample was generated with T or B and plot them summed together using the W'->Tb and W'->tB BRs. We do not use them for limit setting in any way so there is no issue. Hypothetically is these were used for limit setting (say in the next iteration) then probably we would create a chi2 matching and have one (most likely) VLQ per event -- but for the current sensitivity range this doesn't really make sense

Also, about the MtH resolution, the biggest difference between the T and B quark signals is the "tagged as b matched to t" (Fig 16). Do you understand why this is so large? How are the matching performed? What does un-matched/ matched mean in the figures on the TWiki, ie. what is unmatched in the pair of objects? Please include details in the AN.

  • Matching is performed with deltaR from the generator truth particles (matched by jet radius/2). The top is matched to the abs(pdg)=6, and bottom is matched to the abs(pdg)=5 but also the mother particle needs to be either the W' or the B' so there should not be any confusion. Matched means all three were correctly matched, and everything else in unmatched. Again, this can not have an affect on the limits and is purely for display and kinematic interest. Plots are added to note

Table 3: Can you add one more column with remarks on how each of the "SB" regions are used? Like, SF extraction, bkg validation, etc.

  • Done -- merged with Jims comment

For the background estimation validation, do you have the MC-only results in the signal region? The TF for the signal region (Fig 8) and the validation region (Fig 3) have some significant differences. It would be useful to check what we have for QCD MC before unblinding.

  • Done -- added non-closure uncertainty based on results

Also, how much is the signal contamination in SB4?

  • This is what is shown in the b0 control plots

Table 5: Please cite the systematic uncertainty sources (was asked before). Which uncertainties are applied to the signal/ ttbar/ QCD?

  • I am not sure what citing the uncertainties in the table means -- shouldn't the citation be in the text?.
  • Added a column for the process that the uncertainty is applied to.

Sec 5: Add references for all systematics.

Ll204-210: What about the top jet mass scale and resolution?

  • The top mass scale/resolution is not used/measured, but rather it is included in the overall scale factor.

LL215: "The scale factor and uncertainty is applied in the assumption that b tagged jets are of true-b origin." --> this needs to be checked, particularly for ttjets. I believe the flavour figure in the TWiki is from a signal

  • Correct, this is from signal -- the same is applied to ttbar as an approximation. We have added the mistagging rate uncertainty to both

Fig 15: What are low, central, and high VLQ masses? Please state the numbers from Table 2.

  • Reference to table 2 has been added. Listing all masses was quite verbose

Once again, please include all replies in the AN as well as in the TWiki.

Followup from Devdatta

top-tagging: What is the 'b0' control region, in ref to Table 2?...Is it SB4?

  • Yes, this was in reference to SB4.

Since you do not have the plot without the top pT reweighting and a test statistic to compare to, it is difficult to estimate the improvement. This needs to be finalized before moving to OR.

  • Well in the high pt region ttbar reweighting primarily reduces the overall normalization. Because the pull shows a slight deficit in the top mass range, dropping ttbar reweighting can only worsen the agreement. I have explicitly rerun the analysis without the reweighting taken out to illustrate this. Note: previous versions of this plot had slightly off top mass corrections (did not affect any other plots) but the conclusions are the same





About Table 4: Actually it is the equivalent of table 4 that would be useful to have at the end of Sec 3, based on MC yields. Also, please include the signal selection efficiencies which are more useful--your signal cross sections are unknown.

  • The content of table 4 I think still makes sense after background estimation, with the accompanying figure for Sec 3 being Fig. 2. However we have moved it from limits to the end of section 4 -- we would like to keep the yields for now as the final product should have trustworthy cross sections. The efficiency table has been moved to sec.3

Please include the SB4-derived H jet mass spectrum and the mean (128 GeV) in the AN.

  • Done

Why does the spectrum sharply start at 100 GeV? Do you apply the Msd > 105 GeV cut but not the upper cut?

  • The sdmass cut is applied to this selection. This is the correlation between AK8 jet mass and softdrop mass.

Fig 6: The question is, do you use each event, with b, t and H jets, to build a T->tH and B->bH candidate? i.e., are the event interpreted as both W'->Tb and Bt?

  • Each event is only counted once. In the case of signal we process separate samples that restrict the decay to either Tb or Bt. These are then weighted and summed

>>>> All signal distributions are the sum of W'->Tb and W'->Bt assuming an even BR
It is not clear what is summed and how. "even BR" == "equal BR"?

  • Right. The theory paper has the total estimate for the BR of the sum of the two, and the assumption that they are equal seems to make the most sense

This is related to the question I asked about Fig 12. For the confusion matrix, I think we need to look at the accuracy of the case where both H and t and correctly tagged. However, this is just kinematics, so as long as you pick the three jets correctly together, the resolution should not be so different. it should not matter if you mistag the top as a Higgs.

  • The confusion matrix shows that the correct assignment is quite likely, but as you mentioned the optimization may prefer higher efficiency over perfect assignment because we only use Wprime mass. The alternate strategy was also considered (2d limits in the W',VLQ space) but the actual improvement was small in the region that we set limits -- largely at this range the signal VLQ peak is close to the bulk of the distribution.

Also, on Fig 6, I meant, why are the signal distributions for MbH and MtH so different?

  • We have looked into this considering that the most likely explanation is the higher confusion probability for the bH. We split up the B' and T' samples into cases where the Mth and Mbh are gen matched correctly and incorrectly. The fact that in the correct configuration there is a narrow VLQ mass peak (and the incorrect leads to an sudakov peak) leads me to believe that this is indeed the higher probability of incorrect matches is to blame here. Likely there is an intelligent way to disentangle the options, but because they lead to the same trijet W' it would not have any effect on the current limit setting strategy.









About the ttbar fraction in the SBs1-3: is this fraction for the numerators now 7.3% (line 151)? What changed here? is the ttbar fraction this low for all the sideband regions?

  • This is due to the preselection issue detailed in the change log

On another point, you mentioned earlier that you use double-b tagger to improve your background rejection. However, it is known that the double-b tagger is less effective against non-QCD backgrounds. Since your signal region has about 36% ttbar, did you check that using double-b actually improves your limits?

  • subjet b tagging is close but sub optimal as optimized on sig/sqrt(bkg)

Could you please propagate all the responses to the AN. Some of these are quite relevant for the OR.


Comments from Annapaola

Introduction: As you suggest this section should be quite expanded, describing the theory behind the search, including which are the decay modes of the VLQ under considerations and the assumptions done

l 14: Which tune is used?

  • CUETP8M2T4 -- added to text

l 17: Here you said the range of mass production, but do not indicate which step is used. Please, can you include this info?

  • 500GeV -- added to text

l 18: What is meant for “central VLQ mass range”?

  • Clarified in text. There are three VLQ mass ranges generated per W' point. This is the central mass point

l 18-21: in general here a description of the model used for signal production is missing, including assumptions done if any. For instance why do you consider 50% of VLQ->qH?

  • This description was on line 20 -- perhaps it should be more prominent?. Yes, we consider 50% for the VLQ->qH

ll 25-26: is this the recommended correction? Are you applying top-pt reweighting as your baseline? If so, why? Can you show the effect of applying vs not applying this correction to your analysis?

  • We do apply the standard correction, and the agreement looks resonable in the control region (see attachment). Technically the recommendation for top pt reweighting is to "do it yourself", so we would still be interested in finding a ttbar rich region to test this in.


Section Event reconstruction: Description of object definition is in general quite synthetic. Please, can you add additional information such as for instance, if you are applying a Jet ID, and if so which criteria are you using? Are you applying any lepton veto? If not, did you consider to apply it for avoiding possible overlap with a future search in the semileptonic channel?

  • Good catch, it seems that there is not even a dedicated jet section -- We do use jetID loose. This section has been expanded
  • The machinery to exclude leptons is in place and was turned on for the first iteration of the analysis. However for the current version we have left it out because we do not have the operating point for the potential semileptonic search

l 53: this sentence seems a bit misspelled, maybe you can clarify it better.

  • Yes, that did not make much sense. The sentence has been rewritten -- Basically the X axis is the sum of the three jet pts used (AK8+AK8+AK4) such that it is natural to define a minimum cutoff given analysis level objects

  • Looking towards the future, we have decided to switch to calculating this with the full AK4 sum. The issue being a complication in which of the leading jets to choose as the AK4. Previously this was the tertiary jet, but such is not always the case for our full selection.

  • Unrelated -- Upon reading closer, this description was slightly out of date. The jet pt trigger used was actually AK8PFJET450 as would make more sense. Also the prescaled rigger was an OR of jetpt and jetht (details in text)

l 53-57: a description of how the trigger efficiency curve is calculated is missing in the text, please add a description. From the Figure caption I understand you compute it using the same primary dataset as the analysis but with reference to PFHT475 trigger. I am not sure if this may imply a loss of “inefficiency” due to the overlap between PHT475 and Jet450.

  • The text has been expanded to include the information in the figure.
  • The fact that Jet450 and HT triggers have some non-overlapping region was indeed a worry. However it is an extremely small phase space where an event has a summed jet pt < 475 but still has a single jet with pt > 450.

B tagging section: can you, please, expand a bit this section explaining why do you use the Loose working point, which is exactly the value of the threshold used and if you are applying or not b tagging SFs and, if so, which set of SFs?

  • This section has been expanded. The loose operating point was optimized based on discovery potential and expected limit (they agree). We are using the b tagging SF for b jets and the latest iteration from the ReReco Twiki.

l 121: I may have missed it, but it looks like SB3 is not described, or there is not direct link between description and label used to identify the CR.

  • The label is explained in table 3 and referred to as the "Higgs inverted" selection in the text. Table 3 shows that the double b tag is dropped and the softdrop mass is tightly inverted on the Higgs candidate. Perhaps there is some more prominent way to provide this information?

l 125: “by using is repeated twice”

  • Done

l 125: why do you consider 30% as “conservative”?

  • This explanation has been expanded. Basically the QCD MC jet mass distribution is used to predict the mean jet mass in data. The 30% is twice the rms of the distribution, which given the observations from data is quite conservative, but we agree that this is a subjective argument. We have changed the procedure to get this from data instead -- in which case will use the rms alone for a first estimate such that the uncertainty scales with the width of the jet mass distribution -- To emphasize however, this is an extremely small effect.

l 130- 133: Please, can you adjust the wording to make this sentence clearer?

  • Done

l 135: why do you use 100%? More more what is the ampunt of ttbar in this region?

  • The uncertainty on the ttbar subtraction here is somewhat arbitrary because it relies on inverted selections that do not have dedicated scale factors. However, the effect of the subtraction given any reasonable scale factor is small. We therefore use the difference between the subtracted and unsubtracted shapes -- perhaps this is a better way to explain it (clarified in text).

l 136-139: Please, clarify which region is SB4/5/6/7

  • Sorry if I misunderstood, but isn't this summarized in table 3?

Figure 3 is missing labels on X axis

  • Done

Figure 4 on: Legend is missing Also, please, provide same plots for the other CRs

  • Legends have been added, but these plots can not be made for other CRs as they require a carefully designed orthogonal set of regions to estimate the QCD component. However we can add raw data/ttbar distributions to the Appendix if it is helpful.

Figure 7: same comment as before + “estimation” and “signal”

  • Assuming this is in reference to X axis labels, this is done.
  • However I do not understand what is meant by " "+ “estimation” and “signal” ". Perhaps this can be clarified?

Figure 7: do you verify that the method works applying it to inverted b-tagging regions, but TFs are quite different between your closure and your main regions. Please, can you comment more on the validity of the check?

  • This has been further explained in the text. The fact that the TF is different between the closure region and the main region shows an augmented b flavor fraction in the pre-Higgs tagged QCD (which is expected). This closure test demonstrates the ability to invert selections on the top candidate without biasing the Higgs tagging method which is really the primary assumption that goes into the background estimate.

l156: a reference to top-tagging Bfs is missing

  • Done. The only reference is the twiki it has ben added

l 160: where did these uncertainties on ttbar cross section come from? More over, they vary with what?

l 204: can you, please, clarify what do you mean by “the shapes are normalised to nominal distribution”?

  • Ie the rate component of the shape uncertainty is zero. IIRC at some point this was the B2G recommendation, but perhaps not anymore?

l 208: wh this uncertainty is not applied to signal as well?

  • We have calculated the signal Q2 and are treating it as an additional uncertainty in the signal cross section in the limit plot. However, is the uncertainty in Q2 for a heavy s-channel resonance quite conservative using the standard procedure?

l 222: how uncertainties are displayed? Are they summed up in quadrature?

  • Yes, they are summed in quadrature. In the case of shape uncertainties, the bin-by-bin upward and downward deviations are summed allowing for an asymmetric uncertainty region.

I do not see any mention to uncertainties due to limited MC statistics. Are you considering it?

  • Yes, this has been added to the text.

Also, how do you cosider uncertainty on top-pt reweihting?

  • We use the difference with the unweighted distribution as the up variation, and two times the weighting as the down variation.

In general, please, can you compile a list at the beginning of the not about what is missing and planned to be updated in the next version/s of the note? This is useful for us to have a clear idea of what is missing and do not ask trivial questions.

  • Done

l 23: Do I understand correctly that you are summing up the two contributions, fixing their BR?

  • Yes, this is an assumption in the analysis. We have the total BR for Wp->tB + Bt from theory, but I suppose the fraction of tB and Bt could be varied. I am not sure of the realm of validity in the theory for these to be largely different, but this could be displayed in a 2D plot for instance.

Comments from Devdatta

Title: using T and B are more standard nowadays for VLQs

Table 1: Does the ttjets sample in Mtt bin have as decent a statistics at the high mass as the large inclusive sample? The all-hadronic Z'->tt uses the following /TT_TuneCUETP8M2T4_13TeV-powheg-pythia8 /RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM which has about 78M events

  • We were originally using this set but switched to the Mtt sample to increase statistics. The Mtt sample has ~30% higher statistics in the high Mtt range and also processes faster.

Ll21: is q in qH light, third, or all generations?

  • 3rd gen ie T->tH + B->bH

Ll25: The top pT distribution is not recommended unless otherwise justified. What is the justification in this analysis? In general I am a bit wary since the top pT weights stop well short of the pT range used in these kinds of analyses. Also, please give a reference.

  • For now this is assumed to correct the distribution. We see good agreement in the b0 control region in the top mass spectrum (see attachment). Likely this alone has the statistics to nail the ttbar fraction down, but probably we can find a better region. To-do item.


Ll32: "For the two AK8 jets, the DR" --> "The DR between the two AK8 jets"? Can you give a reference to "reduce jet-shape correlation", even though it may be mentioned later in the text (sorry, I couldn't find it)?

  • Ie two close AK8 jets compete against which particle is assigned to which jet and can affect the jet shape in a way that is correlated with tagging variables ie Nsubjettiness/softdrop mass

Fig 1: Which sample is used and what are the selections for this turn-on? None of your jet pT cuts are higher than 475 GeV, so why do you chose this trigger? There are jet triggers with pT > 300 GeV thresholds that would do fine. Also, can you plot a low-mass signal on this curve to show how much we lose?

  • This is Ht, which is much much lower than the summed pt of all jets -- it is chosen such that we can assume events that pass the Ht800/900 trigger will pass the Ht475 trigger as well. The Ht distribution for our lowest mass sample is dependent on the W' mass (1500GeV), so most is still in the plateau -- although the turn on here is a known inefficiency in performing a boosted analysis, and if there is interest in a resolved version then this region could be explored. I have added the ht distribution to the plot

  • Also, as mentioned in the updates, the prescaled trigger here was an OR of jetht and jetpt

Ll88: Why was double-b tagging chosen? Did you study the subjet b-tagging?

  • The original version of the analysis involved subjet b tagging, and was switched to double b tagging because of the improved efficiency at the same background rejection

Ll97-98: Why was the H tag prioritised over the top tag? Using MC truth, what is the rate of correct matches to the top and Higgs? Also, in Fig 2, are the distributions for the top and Higgs before or after the disambuguation?

  • H-tagging criterion is much tighter than the top criterion, so it is much more likely that higgs can be also classified as a top then vice-versa. Also, because we rely on the H-tagging rate, the method is probabilistically correct if the Higgs classification always takes precedence. Fig 2 in the previous version contained a slightly tighter disambiguation, we have updated the plot to hausee the same scheme.

  • The gen-matched confusion matrix is largely diagonal as can be seen in the following plots -- added this to the note

Bpconfusion.png Tpconfusion.png

Before Sec 4: What are the final event selections? Please put an event yields/ efficiency table (Move Table 4 here and add signal efficiencies).

  • Table 4 requires explanation of the background estimation to make sense, so it should not be before section 4.
  • Efficiency added to appendix.

Ll122: "in this selection" --> SB3?

  • Right, updated in text

Ll123: In which region is the "mean" Higgs mass ~130 GeV? Can you show the distribution? Why not replace M(H jet) by 125 - M(H jet)? See paper B2G-16-026 for instance.

  • This is modifying the jet 4 vector energy using the jet mass correlation with the softdrop mass cut. I assume you mean using the reduced mass? This likely will have a very small effect on the limit, so we would like to study this, but right now there are likely better ways to optimize. This has been changed to taking it from the data sideband (mean 128). The other option is to take the mass distribution randomly from the distribution (aka Z'->ttbar), but the effect is so small that it is hard to motivate a more sophisticated approach.


Table 2: 3rd column header "B" --> "b jet"

  • Done

Table 2: Why were the mass criteria used to invert top/ Higgs tagging, instead of others like b/double-b requirement? If you use the double-b for the Higgs instead of the mass, you would have a much larger statistics over the entire pT ranges, instead of what you now get in Figs 3, 7.

  • The top b inversion would lead to top-higgs b-flavor correlation and the procedure would probably not close. It is important to note that the inversion is the mass but also drops the double-b requirement, which is important for the statistics for the measurement.

Fig 3, 7: How do you interpolate between the sparse points?

  • Currently there is no fitting here -- the bin content is used. we have the infrastructure to do this, so this might be an improvement in the future

Fig 4-6: What do the error bars contain?

  • All background uncertainties considered

Fig 6: A discussion on the signal building should be in Sec 3. How do you build a signal hypothesis from the selected events? Do you consider both the W'->Tb and the W'->Bt types?

  • We have added information here. I'm not sure what you mean by "build a signal hypothesis", but I assume this means how we reconstruct the W' invariant mass. All signal distributions are the sum of W'->Tb and W'->Bt assuming an even BR

Fig 6: What are the signal normalizations? Why is the shape disagreement more for the Mbh hypothesis?

  • Explained further in the captions. Signal is normalized to the luminosity of the dataset. The shape disagreement here is small and seems to be covered by the systematic uncertainties.

Figs 4-6: The ttbar contribution (red) is quite large at ~35% apparently (Table 4). This should get even larger in the signal region. So, which regions do the fractions on ll129-130 refer to?

  • This is performed in all regions to the right of the equation in L139, L121 (original draft) whereas the figures are from the left. These regions have inverted selections that tend to amplify the QCD contribution greatly

Before Sec 5: Please add the blinded distributions and final event yields after background estimation here (Figs 11-13)

  • Done

Ll180: Why are all b jets assumed to be from B hadrons, instead of just looking up the jet hadron flavour?

  • We do this because signal is very b flavour rich, and the QCD is all data driven. See below for the b candidate jet parton flavour from signal in our full selection -- given that this is largely of b origin there is not much worry about the mistagging rate fraction in this analysis


Ll208-212: There's the ttbar cross section uncertainty, which contains Q2 and PDF uncertainties. Also, there's the top pT reweighting uncertainty? Is the 100% uncertianty on the ttbar in these lines refer to only the sidebands?

  • I don't see a reference to 100% uncertainty in these lines. I assume you mean the component subtracted from QCD? In that case yes it is only sidebands. In the signal region we use the full suite of ttbar uncertainties instead of taking the conservative uncertainty for the subtracted regions. Indeed the cross section uncludes q2 and pdf (listed as the recommendation see below), and we also apply dedicated q2 and pdf uncertainties. I am not sure how much overlap of these uncertainties exist given that one is the overall cross section derived from the Top++v2.0 program and the other is a product of Madgraph generation.

Fig 12: Why does the B->bH signal hypothesis have a poorer resolution than the T->tH?

  • This is a good question and we are still looking into it -- The generation plots look believable. The confusion matrix (as shown earlier in the twiki) is slightly less diagonal in the B' case which could explain this.

Fig 14: Which signal hypothesis is used? Would you be setting cross section limits in the 2D plane of the W'-T and W'-B masses?

  • This is W' to either T' or B' assuming equal BR. There are a few ways to show the full limit spectrum and we need to think about the way that makes the most sense. The issue with 2D (MWp,MVLQ) is that we generate a T',B' mass band and not the full phase space, but likely this still makes the most sense. Currently we include three limits in the latest version. There are other possibilities, such as modifying the T',B' BR, widths etc

-- KevinNash - 2017-08-31

Topic attachments
I Attachment History Action Size Date Who Comment
PDFpdf AN-17-226_v1p01.pdf r1 manage 1674.1 K 2017-10-05 - 19:11 KevinNash  
PDFpdf AN-17-226_v1p02.pdf r1 manage 1717.8 K 2017-10-31 - 20:14 KevinNash  
PDFpdf AN-17-226_v1p03.pdf r1 manage 2063.5 K 2017-12-06 - 23:47 KevinNash  
PDFpdf AN-17-226_v1p05.pdf r1 manage 2116.7 K 2017-12-13 - 22:52 KevinNash  
PDFpdf AN-17-226_v1p1.pdf r3 r2 r1 manage 1674.1 K 2017-10-05 - 19:11 KevinNash  
PNGpng Bpconfusion.png r1 manage 83.4 K 2017-10-03 - 22:49 KevinNash  
PNGpng HiggsConfusionMatrix.png r1 manage 44.5 K 2017-10-03 - 21:52 KevinNash  
PNGpng PTRW.png r1 manage 80.6 K 2017-11-19 - 01:31 KevinNash  
PNGpng PTRW_PTCOMP.png r1 manage 119.7 K 2017-11-20 - 21:00 KevinNash  
PNGpng PTRWoff.png r1 manage 92.9 K 2017-10-26 - 16:32 KevinNash  
PNGpng PTRWon.png r1 manage 89.9 K 2017-10-26 - 16:32 KevinNash  
PDFpdf PaperDraft_V0.pdf r1 manage 616.5 K 2017-12-12 - 21:43 KevinNash  
PDFpdf PaperDraft_v1.pdf r3 r2 r1 manage 617.9 K 2018-01-03 - 20:41 KevinNash  
PDFpdf PaperDraft_v2.pdf r2 r1 manage 617.9 K 2018-01-03 - 20:41 KevinNash  
PNGpng TFSB.png r1 manage 52.0 K 2017-12-11 - 23:18 KevinNash  
PNGpng TFSR.png r1 manage 54.0 K 2017-12-11 - 23:18 KevinNash  
PNGpng Tpconfusion.png r1 manage 84.3 K 2017-10-03 - 22:49 KevinNash  
PNGpng bhm.png r1 manage 26.9 K 2017-10-30 - 20:09 KevinNash  
PNGpng bhum.png r1 manage 30.2 K 2017-10-30 - 20:09 KevinNash  
PNGpng bpartonflav.png r1 manage 24.5 K 2017-09-29 - 22:05 KevinNash  
PNGpng eta1000.png r1 manage 87.5 K 2017-12-01 - 16:23 KevinNash  
PNGpng eta1500.png r1 manage 86.8 K 2017-12-01 - 16:23 KevinNash  
PNGpng eta2000.png r1 manage 87.9 K 2017-12-01 - 16:23 KevinNash  
PNGpng etah1000.png r1 manage 96.4 K 2017-12-12 - 15:54 KevinNash  
PNGpng etah1500.png r1 manage 93.9 K 2017-12-12 - 15:54 KevinNash  
PNGpng etah2000.png r1 manage 88.7 K 2017-12-12 - 15:54 KevinNash  
PNGpng hmassungroomed.png r1 manage 32.1 K 2017-09-27 - 21:10 KevinNash  
PNGpng mtFixed.png r1 manage 91.0 K 2017-10-25 - 23:52 KevinNash  
PNGpng mthb1000.png r1 manage 81.6 K 2017-12-01 - 16:23 KevinNash  
PNGpng mthb11000.png r1 manage 239.4 K 2017-12-12 - 15:54 KevinNash  
PNGpng mthb11500.png r1 manage 233.4 K 2017-12-12 - 15:54 KevinNash  
PNGpng mthb12000.png r1 manage 229.7 K 2017-12-12 - 15:54 KevinNash  
PNGpng mthb1500.png r1 manage 77.6 K 2017-12-01 - 16:23 KevinNash  
PNGpng mthb2000.png r1 manage 83.0 K 2017-12-01 - 16:23 KevinNash  
PNGpng noPTRW.png r1 manage 81.1 K 2017-11-19 - 01:31 KevinNash  
PNGpng postkinmthb.png r1 manage 51.7 K 2017-12-01 - 21:59 KevinNash  
PNGpng prekinmthb.png r1 manage 53.3 K 2017-12-01 - 21:59 KevinNash  
PNGpng pt1000.png r1 manage 78.6 K 2017-12-01 - 16:23 KevinNash  
PNGpng pt1500.png r1 manage 86.3 K 2017-12-01 - 16:23 KevinNash  
PNGpng pt2000.png r1 manage 97.9 K 2017-12-01 - 16:23 KevinNash  
PNGpng pth1000.png r1 manage 80.5 K 2017-12-12 - 15:54 KevinNash  
PNGpng pth1500.png r1 manage 91.5 K 2017-12-12 - 15:54 KevinNash  
PNGpng pth2000.png r1 manage 99.9 K 2017-12-12 - 15:54 KevinNash  
PNGpng ptrwoff.png r1 manage 32.8 K 2017-10-16 - 22:03 KevinNash  
PNGpng qcdkin.png r1 manage 285.0 K 2017-12-12 - 15:30 KevinNash  
PNGpng qcdmthb.png r1 manage 142.8 K 2017-12-12 - 15:30 KevinNash  
PNGpng thm.png r1 manage 26.0 K 2017-10-30 - 20:09 KevinNash  
PNGpng thum.png r1 manage 28.7 K 2017-10-30 - 20:09 KevinNash  
PNGpng tmass.png r1 manage 90.2 K 2017-10-05 - 17:52 KevinNash  
Edit | Attach | Watch | Print version | History: r45 < r44 < r43 < r42 < r41 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r45 - 2018-01-03 - KevinNash
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Sandbox All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback