Difference: SingleTopPolarization (106 vs. 107)

Revision 1072014-08-08 - JoosepPata

Line: 1 to 1
META TOPICPARENT name="AndreaGiammanco"

Single Top Polarization analysis

Line: 934 to 934
From Jeannine, August 8, 2014

1) I've only noticed several issues/problems with the QCD modeling:
a) Looking at Figure 27 it makes no sense to have one QCD template and one template for non-QCD processes. There is quite some separation power for DY, EW V production,top. So, I suggest to perform a 4 template fit, giving the SM processes the usual width to float. Please report also the scale factors for the SM process.
b) Looking at Figure 28, inparticular at BDT_antiQCD in the region above the cut value, the contamination from non QCD processes is by far too high. So we basically have no idea how to extrapolate from the cut region as there is certainly a sizable uncertainty on the contamination. How can we trust the QCD estimation in the BDT_antiQCD>cut region given this contamination issue? Furthermore, how can we trust any QCD shape in the region BDT_antiQCD>cut and even worse the correlations between variables as it is needed for the selection BDT? How can we trust BDT_W,tt for QCD?
c) Concerning table 7-10, the number that really matters, is the number of QCD events (plus uncertainty) after applying the BDT_anti-QCD cut. How does the number change when the non-QCD contamination is altered? How does it change when the QCD MC (isolated region) is used?
d) How can we trust the cosTheta* shape of QCD at all? Does it probably peak at -1? How can we exclude that?
Just thinking loud, would it help to use QCD MC (isolated, anti-iso sel) in the 2j0t region with different cuts on BDT_antiQCD (e.g 0...0.6 in steps of 0.1)? Furthermore, could we learn something from ttbar all-hadronic events (jet mimics a lepton), for example by doing the same check as for QCD MC?
The W+jets modeling was already carefuly attacked in the PAS and the new studies will certainly add more knowledge about the W+jets mismodeling, so I have no comments on this right now.

2) One comment on the BDT trainings, figure 5 and 11, for both BDTs it seems that there is overtraining. In case of the BDT_anti-QCD this is true for signal (KS-test < 5%), in case of BDT_W,tt this is true for background (KS test=0). Is it feasible to find a BDT setting that does not overtain?

3) Looking at Figure 9 it seems that the BDT_W,tt output has a small peak in the signal region. That is something what one would like to avoid. Which background (tW, QCD, Q+jets, ttbar) causes this peak?
On fig. 9, only ttbar and W+jets are included in the background. The templates for all subcomponents will be plotted. In general, this "second peak" has been discussed some time ago , the reason seems to be that for some events, the BDT is unable to deduce them from signal and the gradient boosting does not reweight those trees down by a large enough factor. The style (hatching) of Fig. 9 will also be changed.

4) Figure 24 and 25 (BDT,W,tt in the 2j0t and the 3j2t regions) look ok, as the observed deviation is covered by syst. It would be nice to have at least in the appendix the dta-mc comparison for all BDT_W,tt input variables. In principle also the correlation of the most important input variables and towards BDT_W,tt has to be checked, are they the same for data and MC (see suggestion from Andrea: check correlation between MT-BDT).

5) Fitting:
a) The W+jets template is a bit spiky. What subset causes the spikes? Can we safely ignore this part (e.g. Wc+1p) without introducing a kin. bias? The current smoothing studies are a good idea, I think.
b) the single top scale factor for mu is 1.22. How does this compare to the published single top cross section measurement at 8TeV? Is it consistent?
Joosep will plot the subcomponent templates, however, itís just mostly an issue of nominal MC becoming depleted also in W+2,3, for which we have no excellent approach.

6) Correlation of BDT_W,tt and cosTheta*:
Looking at figure 48 and 49 it is clear that a cut on BDT_W,tt results in ttbar and W+jets shapes that look more single top like. I think many variables used in the BDT_W,tt are correlated to cosTheta*, hence the correlation between BDT_W,tt and cosTheta* is even stronger for the BDT output. As long as the correlation between BDT_W,tt and cosTheta* (and better also the correlation of all variables entering the BDT to cosTheta*) is in data the same as predicted this is ok. However, this assumption has to be carefully checked in different control regions. I suggest to extend the MTW-BDT correlation study suggested today by Andrea towards cosTheta* and the BDT_W,tt output and its input variables and also towards different control regions. Furthermore, I suggest to show data-mc plots for cosTheta* in the 2j0t and 3j2t region for different BDT_W,tt cut values (do we always get reasonable data MC agreement?).
Joosep will add additional plots with cut points.

7) Comphep study and neyman construction:
It seems that the difference between Powheg and Comphep SM is for some distributions larger than the difference between the ano coupling samples. How is the Newman construction done? Does it use Comhep SM for the unfolding or Powheg? Is the use of Powheg in the migration matrix the reason why there is a bias for the SM case, although the pull distributions are all fine for the SM case?

Talks in CMS meetings

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback