Tests of the new 2020 BDT

Most recent version of the analysis ntuples were merged and the new BDT algorithm was applied to them. These can be compared to the 2015/16 analysis data with 2016 BDT.

BDT 2020

This BDT uses the following list of discriminating variables:

No pileup variable was used in this BDT.

Running

When the addClassBDT_2020_groupVersion.cpp macro is run, it prints out missing branch in the input ntuples:

Error in <TTree::SetBranchStatus>: unknown branch -> closeTrkDOCA_T0134217728_LooSiHi1Pt05_f2dc2
Error in <TTree::SetBranchAddress>: unknown branch -> closeTrkDOCA_T0134217728_LooSiHi1Pt05_f2dc2

This should be one of the variables entering the BDT, so it should be clarified whether this could affect the results.

SB data comparison

Number of entries in SB, Original ntuple: 2452402
Number of entries in SB, New ntuple: 2454693
Difference Orig - New: -2291

i.e. there are more events in the new ntuple. Analysis preselections are applied to both.

The following figures show the invariant mass distribution in the top three 18% signal efficiency bins.

Corresponding bin edges:

2016 BDT ... {0.2455,0.3312,0.4163,1}

2020 BDT ... {0.2774,0.3662,0.4418,1}

BDT bin 1.png
Mass distribution in BDT bin 1
BDT bin 2.png
Mass distribution in BDT bin 2
BDT bin 3.png
Mass distribution in BDT bin 3

Data SB blinded fits in top three bins

First, the fitter was validated against shape parameters obtained from data SB fits in all four bins as presented in the 2015/2016 internal note. The model used in both cases is 1st order Chebychev + exponential. Chebychev slope is constrained to be linear with BDT, exponential constant is constrained to be constant amongst the BDT bins.


validationSummary.png
Fitter validation against 15/16 bkg shape parameters in data SB

Next, this fitter was used to perform the same flavour of the fit in the top three bins only, comparing the 2016 and 2020 BDTs:

2016 BDT
massPlot BDT1ETA1 OldBDT lin slope const expConst 3bin.png
2016 BDT bin 1
massPlot BDT2ETA1 OldBDT lin slope const expConst 3bin.png
2016 BDT bin 2
massPlot BDT3ETA1 OldBDT lin slope const expConst 3bin.png
2016 BDT bin 3
fitLog_2016BDT.txt
2020 BDT
massPlot BDT1ETA1 NewBDT lin slope const expConst 3bin.png
2020 BDT bin 1
massPlot BDT2ETA1 NewBDT lin slope const expConst 3bin.png
2020 BDT bin 2
massPlot BDT3ETA1 NewBDT lin slope const expConst 3bin.png
2020 BDT bin 3
fitLog_2020BDT.txt

Background yields comparison:

Combinatiorial
  2016 BDT 2020 BDT
Bin 1 1.0950e+03 +/- 7.29e+01 4.4930e+02 +/- 5.53e+01
Bin 2 1.6882e+02 +/- 3.04e+01 6.8667e+01 +/- 1.65e+01
Bin 3 2.2281e+01 +/- 9.91e+00 2.1687e+01 +/- 1.07e+01

SSSV
  2016 BDT 2020 BDT
Bin 1 2.1811e+02 +/- 4.79e+01 1.8530e+02 +/- 3.90e+01
Bin 2 1.3447e+02 +/- 2.32e+01 8.4999e+01 +/- 1.37e+01
Bin 3 3.4279e+01 +/- 8.52e+00 2.1437e+01 +/- 8.08e+00

Unblinding - re-applying preselection cuts on loose ntuples

The mass spectrum looks strange + there are some negative-mass entries.

weirdBlinding.png
Unblinded region - missing entries
negativeEntries.png
negative entries

Checking Bs MC

These are the mass distribution comparisons between the 2016 BDT applied to the old derivation and 2020 BDT applied to the new derivation.

BDT bin 1 MC.png
bin1
BDT bin 2 MC.png
bin2
BDT bin 3 MC.png
bin3

Bin 0 lower edge

Lower edge of BDT bin 0 (72 % signal efficiency) was identified by ordering the MC events according to BDT and counting the weights (CombWeights branch) untill the ratio of (counted weights)/(total sum of weights) reached 0.28 (= 0.72 signal events passed that BDT cut). The result is:

Crossed 0.72 signal efficiency point at BDT value: 0.164033
Previous entry has BDT: 0.164031

An attempt was made to validate the other bin edges of the 2020 BDT earlier found by Aidan by the same approach:

What was found : 18 % eff ... 0.439777, 36 % eff ... 0.363047, 54 % ... 0.274418, 72 % ... 0.164033

What Aidan found: 18 % eff ... 0.4418, 36 % eff ... 0.3662, 54 % ... 0.2774

UPDATE 25.9.20 :

The disagreement observed in the bin edges was due to wrong calculation of the weights. The "CombWeights" branch contains only the QLC*DDW weights, and we need to multiply this further with PVWeight, Muon{1,2}_trigger_sf, Muon{1,2}_reco_eff_sf. Then we indeed get the same bin edges as Aidan found together with (hopefully this time) correct bin edge of the 0th bin.

18 % eff ... 0.441817, 36 % eff ... 0.366231, 54 % eff ... 0.277443, 72 % eff ... 0.167089

2016 BDT values of events in 2020 BDT bins and vice versa


BDT 16vs20 allBins.png
Signal MC

Regarding the two MC derivations:

2016 derivation: 166218 events in total, 120682 in the top 4 BDT bins, out of those 113728 match an event from the 2020 derivation

2020 derivation: 166752 events in total, 118016 in the top 4 BDT bins, out of those 110886 match an event from the 2016 derivation

there are 156694 events shared between the two derivations

Taking into account only the events in common, the current bin edges still correspond more or less to 18% signal efficiencies:

Common events - efficiency
  2016 BDT 2020 BDT
bin 0 0.178217 0.180164
bin 1 0.182015 0.179978
bin 2 0.184372 0.179547
bin 3 0.19332 0.180383

Full Fit on 2016/2020 BDT

feature 2016 analysis fitter new fitter
comb bkg (chebychev) yes yes
sssv bkg (exponential) yes yes
Bs (double gaussian) yes yes
Bd (double gaussian) yes yes
Peaking bkg (double gaussian) yes yes
Peaking bkg constraint yes yes
Smearing parameters yes no
Relative efficiency in bins yes no
BDT mean + constraint no yes

Validation of the new fitter

In terms of Bs/Bd yields:

  N Bs N Bd
new fitter 80.83+/-21.0 -10.96 +/ 19.1
15/16 result 80 +/- 22 -12 +/ 20
massPlot BDT1ETA1 2016Deriv fullFitValidation SM-initialization.png
Fitter validation BDT bin 0
massPlot BDT2ETA1 2016Deriv fullFitValidation SM-initialization.png
Fitter validation BDT bin 1
massPlot BDT3ETA1 2016Deriv fullFitValidation SM-initialization.png
Fitter validation BDT bin 2
massPlot BDT4ETA1 2016Deriv fullFitValidation SM-initialization.png
Fitter validation BDT bin 3
The model fitted by the new fitter has the background parameters initialized from the inidividual fits and the signal normalization initialized with the SM expectations: 91 Bs and 10 Bd

-- OndrejKovanda - 2020-08-31

Topic attachments
I Attachment History Action Size Date Who Comment
PNGpng BDT_16vs20_allBins.png r1 manage 12.6 K 2020-10-07 - 11:35 OndrejKovanda  
Texttxt fitLog_2016BDT.txt r1 manage 256.5 K 2020-09-03 - 10:42 OndrejKovanda 3Bin_blindedData
Texttxt fitLog_2020BDT.txt r1 manage 238.3 K 2020-09-03 - 10:42 OndrejKovanda 3Bin_blindedData
PNGpng massPlot_BDT1ETA1_2016Deriv_fullFitValidation_SM-initialization.png r2 r1 manage 30.6 K 2020-10-13 - 17:54 OndrejKovanda  
PNGpng massPlot_BDT1ETA1_NewBDT_lin_slope_const_expConst_3bin.png r1 manage 32.5 K 2020-09-03 - 10:42 OndrejKovanda 3Bin_blindedData
PNGpng massPlot_BDT1ETA1_OldBDT_lin_slope_const_expConst_3bin.png r1 manage 32.4 K 2020-09-03 - 10:42 OndrejKovanda 3Bin_blindedData
PNGpng massPlot_BDT2ETA1_2016Deriv_fullFitValidation_SM-initialization.png r2 r1 manage 30.8 K 2020-10-13 - 17:54 OndrejKovanda  
PNGpng massPlot_BDT2ETA1_NewBDT_lin_slope_const_expConst_3bin.png r1 manage 31.1 K 2020-09-03 - 10:42 OndrejKovanda 3Bin_blindedData
PNGpng massPlot_BDT2ETA1_OldBDT_lin_slope_const_expConst_3bin.png r1 manage 32.4 K 2020-09-03 - 10:42 OndrejKovanda 3Bin_blindedData
PNGpng massPlot_BDT3ETA1_2016Deriv_fullFitValidation_SM-initialization.png r2 r1 manage 29.3 K 2020-10-13 - 17:54 OndrejKovanda  
PNGpng massPlot_BDT3ETA1_NewBDT_lin_slope_const_expConst_3bin.png r1 manage 28.9 K 2020-09-03 - 10:42 OndrejKovanda 3Bin_blindedData
PNGpng massPlot_BDT3ETA1_OldBDT_lin_slope_const_expConst_3bin.png r1 manage 29.3 K 2020-09-03 - 10:42 OndrejKovanda 3Bin_blindedData
PNGpng massPlot_BDT4ETA1_2016Deriv_fullFitValidation_SM-initialization.png r2 r1 manage 30.5 K 2020-10-13 - 17:54 OndrejKovanda  
PNGpng negativeEntries.png r1 manage 17.4 K 2020-09-15 - 10:57 OndrejKovanda  
PNGpng validationSummary.png r1 manage 97.5 K 2020-09-03 - 10:29 OndrejKovanda  
PNGpng variableTable.png r1 manage 115.5 K 2020-09-03 - 10:24 OndrejKovanda  
PNGpng weirdBlinding.png r1 manage 19.7 K 2020-09-15 - 10:51 OndrejKovanda  
Edit | Attach | Watch | Print version | History: r11 < r10 < r9 < r8 < r7 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r11 - 2020-10-16 - OndrejKovanda
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback