BsPlot
Usage of sPlot for the analysis and the control channel
Introduction
In this new page derived from the home page of Ioannis Xiotidis the strategy to have the first pass of the Bs effective lifetime is going to be described and explained. A set of plots is going to be presented with explanations and comments, in the attempt to describe as detailed as possible the procedure, that is described in the previous page.
Ntuple information
For this study the 2015/2016 Bmumu analysis ntuples were used because of the good understanding of the properties. Another benefit is that the data are unblinded, allowing to proceed with the usage of the sPlot technique without biasing ourselves.
Bs mumu ntuples (after skimming)
Type |
Period |
Stream |
DSID(MC)/Year(Data) |
Events |
Candidates |
MC |
N/A |
N/A |
300426 |
534475 |
534572 |
Data |
D |
Main |
2015 |
41084 |
41110 |
Data |
E |
Main |
2015 |
394939 |
395216 |
Data |
F |
Main |
2015 |
263809 |
263986 |
Data |
G |
Main |
2015 |
623371 |
623769 |
Data |
H |
Main |
2015 |
207176 |
207306 |
Data |
J |
Main |
2015 |
1075294 |
1076043 |
Data |
A |
Main |
2016 |
328447 |
328692 |
Data |
B |
Main |
2016 |
861654 |
862232 |
Data |
C |
Main |
2016 |
1359363 |
1360316 |
Data |
D |
Main |
2016 |
2137295 |
2138732 |
Data |
E |
Main |
2016 |
764986 |
765508 |
Data |
F |
Main |
2016 |
1441737 |
1442713 |
Data |
G |
Main |
2016 |
1548242 |
1549332 |
Data |
I |
Main |
2016 |
2903764 |
2905863 |
Data |
K |
Main |
2016 |
1292971 |
1293961 |
Data |
L |
Main |
2016 |
3208134 |
3210773 |
Data |
D |
Delayed |
2016 |
342602 |
343080 |
Data |
E |
Delayed |
2016 |
182085 |
182440 |
Data |
F |
Delayed |
2016 |
336941 |
337623 |
Data |
G |
Delayed |
2016 |
368415 |
369147 |
Data |
I |
Delayed |
2016 |
547113 |
548338 |
Data |
K |
Delayed |
2016 |
229967 |
230497 |
Data |
L |
Delayed |
2016 |
553802 |
555296 |
A preselection is applied in those ntuples according to the one that is described in the internal note of the Bmumu analysis on the partial Run 2 dataset. The preselection code can be found in the git repository that has been set up for this analysis
B+ ntuples (after skimming)
TBA
The preselection for the B+ has been identified and set up, however cross checks against the cutflow in the internal note lead to some discrepancies that are currently investigated so this is has become an on-going task
Mass fits on Bs mass distribution
The next step after identifying the ntuples was to setup a class that is taking advantage of the
RooFit framework and the knowledge of the partial run 2 data sets and fits the fitting region of the analysis [4766, 5966]
MeV with the following model:
PDF_full = NCheb * Chebychev + NExpo * Exponential + NSignal * Gaussian
the models where chosen in order to describe specific components of the fitting region:
- Chebychev: Combinatorial background
- Exponential: Same sign same vertex (SSSV) background
- Gaussian: Signal
For those fits the BDT from the 2015/2016 analysis was used as it was and a cut was applied as shown also in the previous page. In order to cross check the fitter initially the highest BDT bin from the 2015/2016 analysis was used (0.416-1.00) and only the background was fitted, afterwards the cut has been loosen according the the study for looking at the proper time to 0.3-1.0
Mass fits
In this section we are showing the mass fits obtained by using the developed class. The cross checks were performed either against the internal note or in agreement with a fitter code that Alex wrote. The following regions were used for the mass fits:
B meson |
Range name |
Range [MeV] |
Bs |
LowerSB |
4766 - 5166 |
Bs |
UpperSB |
5526 - 5966 |
Bs |
Signal Region |
5166 - 5526 |
Bs mass fits
The following two fits were performed on 2015/2016 data and validated to provide the expected result.
BDT bin 3 from the 2015/2016 analysis, SB only fit, pulls and residuals are wrong since they are only from the upper SB. FitLogBDTBin3MUCMassSBOnly |
|
For the first fit we compared with the fit in the internal note for the second fit we had to compare with a macro that was written by Alex and compared with the class result in order to make sure that we are doing consistent things. An issue that was found with this method was the the too generous choice of limits on the parameters, increases significantly the step for the Minuit leading the fitter to miss the signal and flatten completely the gaussian in the signal region
Mass fit PDF model systematic studies
In this section a first study on the systematic effect of changing the bakground model and the range of fit in the invariant mass fit is being performed. Changing the background model and the range could lead to a better background subtraction from the sPlot technique. For this reason the following configuration has been introduced:
* For those fits the fitter did not manage to find an improvement and therefore returned an error code. The attempt to recover those fits is by applying a second follow up fit with the result obtained from the first one, however the fitters still fail, with a sensible result though.
From the subtraction obtained with sPlot the signal distributions where extracted and summarized in the following plot for the different background configurations.
Signal distribution take with sPlot for different background configurations |
Another performed in order to see the strength of sPlot for background subtraction was to plot the distribution of the weights with the mass and with proper time. In this way we will be able to verify that the sPlot procedure is a more elaborate procedure rather than the basline by hand subtraction that someone can do.
The final step of this initial study is to fit the lifetime distribution with out chi2 fitter to estimate the effect of the model selection in the invariant mass fit on the lifetime
Model |
Range |
Fitted Value |
Plot |
Full model |
[4766,5966] |
1.08 +0.5 -0.28 |
Chi2 scans for lifetime fit with different backgrounds |
Cheb0 full model |
[4766,5966] |
1.17 +0.7 -0.36 |
1st order cheb |
[5200,5966] |
0.99 +0.33 -0.21 |
0th order cheb |
[5200,5966] |
1.15 +0.66 -0.34 |
Exponential |
[5200,5966] |
1.17 +0.69 -0.35 |
Mass fit residuals and pulls (Signal yield)
While checking the fitter for the lifetime, we observed that when include also the sPlot in the procedure there is some extra stuff that seem not to behave correctly. By a closer look it was seen that the mass fitter many times was mis-behaving or even failing, therefore the pulls and residuals of the signal yield has been computed:
Signal yield residuals and pulls with no constraint on the fit parameters |
Signal yield residuals and pulls with constrained the mean and sigma of gaussian, fit starting point are the values obtained from a signal only fit on the singal MC |
Signal only fit on signal MC |
there are two main comments by looking at the plots from above. The main comment is that if we don't use any constraint we see that plenty of fits are failing leading to a loss of ~50% of our sample. The second thing to look is the the residuals plot in both scenarios. The residuals don't look good at all since they are flat. Discussions with Alex need to happen to see what we are missing.
Mass fits on B+ mass distribution
TBA
sPlot for Bs
Next step after fitting the mass range was to associate the mass fit with the sPlot technique. In sPlot what is required is a fit with known models to an independent variable (mass) in order to extract the distribution of the required control variable (effective lifetime). The two variables need to be uncorrelated in order to have a correct extraction of the control variable distributions as well as the error estimation.
Fit configuration for sPlot:
Variable |
Range |
Binning (for visual inspection) |
Units |
Inv. Mass |
[4766,5966] |
30 |
MeV |
Prop. Time |
[0,12] |
8 |
psec |
BDT |
[0.3,1.0] |
NA |
NA |
Mass Fit starting configuration:
Parameter Name |
Starting Value |
End Value |
aCheb |
-0.05 |
-0.133577 +/- 0.315038 |
ExpSlope |
500 |
123.771 +/- 26.7356 |
meanGaus |
5366 |
5344.15 +/- 28.2744 |
sigmaGaus |
160 |
70.2014 +/- 30.9718 |
chebEvents |
200 |
332.438 +/- 79.8380 |
expoEvents |
100 |
257.655 +/- 60.9702 |
gausEvents |
60 |
49.8445 +/- 23.7094 |
For this fit we define the ranges above as signal region and SB.
|
|
sPlot for SSSV background events |
sPlot for Combinatorial background events |
sPlot for Bd
TBA
MC distribution fit to sPlot
In this section the fitting method for extracting the effective lifetime value is going to be discussed. In addition the cross checks to identify potential problems are going to be shown, with potential fixes.
Main concept
In our lifetime measurement as described in the previous twiki page, what we are going to fit our data with are MC template histograms parameterise with the following event per event weight. This method was preferred as a starting point, because it avoids parameterising the detector and selection effect on our data sample with a response functional form. The event per event weight used is: w(tau) = exp(-t/tauMC)-exp(-t/tau), where tauMC: is the value that the MC was generated and tau is fit parameter.
TauMC identification
An important part of our method is to identify what the tauMC value is our MC samples. To identify the value we performed an exponential extended fit on the full MC truth proper time sample far away from the pick of the distribution. The result of that fit that can be show below yield to a tauMC value of 1.511 ps, which is the value used to calculate the event per event weight for our MC template
Truth proper time fit on MC sample without BDT cut |
the main reason why we are fitting this region is because we know that lower proper time values are affected by selection criteria and detector effects, therefore by taking advantage of the large statistics we move far away from that region and still be able to obtain a stable fit that effectively looks unbiased.
Weighted distribution
An example of the MC template histograms parameterised with tau can be seen in the following plot. In the plot 24bins in the measured ct have been selected and a BDT cut of 0.3 on the MC sample.
MC templates for different tau values |
Fit Estimator
In this section the fit estimator used for the lifetime extraction is described. Initially in our studies we decided to fit our MC templates with the sPlot extracted data by using a Chi2 goodness-of-fit estimator. Essentially, what we are doing is for every MC template we calculate the sum of the Chi2 of each bin and then identify the template with the lowest values, which provides the value of the effective lifetime. To estimate the error with this procedure we rely on the statistics theory where is shows that the values of the chi2 distribution at Dchi2 = 1 are the lower and upper statistical errors. For the calculation of the chi2 a cut on the lower statistics bins has been applied since the impact on the calculation is huge and mostly unphysical. Therefore only bins with more than 10 events on data were used in the calculation.
Chi2 scan for several lifetime values in a given range around the minimum |
for reproducing the plot from above the following configuration was used:
Variable |
Value |
Range |
BDT |
0.3 |
[0.3,1.0] |
sPlotEntries |
10 |
[10,inf] |
Lifetime |
(Step) 0.03 |
[0.5,2.5] |
sPlotBins |
24 |
NA |
MassFitBins |
30 |
[4766,5966] |
Note: For the next studies the interfacing of the likelihood function with the standard tool of minimization (Minuit) is going to used. In this way the minimum identification is speed up and the error estimation performed in a more standard way.
Conclusions on first pass
With the first analysis pass we were able to extract the lifetime value with the corresponding uncertainty. However, detailed studies need to be performed on the Residuals and the Pulls for the fit estimator in order to identify whether our method is somehow biased or not. The first result from our dataset is tau = 1.00 [-0.2,+0.3] ps, however as noted again many changes need to be applied since we have to study in detail pulls and residuls.
In this section we are going to describe the toy MC studies on the fit method in order to identify any potential source of bias. Initially the identification of how we are planning to generate statistically independent toy samples had to be determined. Afterwards two methods to study the method have been identified. The first one is to study initially only the signal MC where only the fit estimator is being put in place and try to understand if there is any bias directly correlated to the method. The second one is a follow up after we agree that we are understanding in depth the fit estimator to include in the validation the sPlot method. Finally it has to be mentioned that both of the methods are going to be applied also for the control channel.
Bootstrap method
For generating toy MC the bootstrap method has been agreed to be used. The main concept is to use the full signal MC that includes ~70k events and with the use of a poisson distribution and a uniform extract independent datasets according to the expected number of events from the signal. Since our dataset is unblinded, by fitting the mass distribution we identified that with a BDT cut 0.3 O(50) events are expected in the signal region. For this reason the seed of the poisson distribution is 50, and then the events are selected by multiplying the value of a uniform distribution between 0 and 1 with the total number of events in the data set (70k). In this way we are able to extract as many independent datasets as we want, since the number of events required is very low in compare to the total number that are include in the full MC sample. To check that the toy generation has no bias from the generation procedure, the following graph has been produced. Where no dependence can be seen between the toy number and the mean value of the dataset.
proper time mean value from toy generation, signal only |
Fitting the Toy samples
For fitting the toy sample two methods are put in place. The first one is the standard Chi2 scan as it was described above. The second one is use of Minuit, where we are interfacing our likelihood with minuit and with minos extract the value of the errors. The reason why the second method has been put into place is because of it's increase in performance for generating and fitting the toy samples. Initially a validation has been performed to identify whether Minuit is doing exactly what we expect as with the scan method:
Example toy sample for values quoted below, showing the invariant mass distribution and proper time |
Signal MC distribution used for the fit |
The result of fitting the lifetime with different tau values (scan) is being showed here where the minimum value is being found.
|
Chi2 fits with the scan method and minuit per event ratio and difference |
Pulls/Residuals for toys
In this section what I am going to show is the results of fitting a set of toys and see how the lifetime varies for different datasets. Two stages are being implemented to be 100% that what is done is unbiased and we are having a good understanding of the whole procedure. Initially what we are looking at is generating a toys from the signal MC only. What I mean is that we are going to generate directly the proper time distribution according to the expected number of events from the signal fit on data (~50 events), parameterise with the same weight (described above) the MC distribution, scale one to each other and fit the histograms. The second technique involves also the use of the sPlot technique. If no clear mistake is going to be found in the signal only fits then we are going to generate bootstrap samples with a blend of the bbmumuX MC for the background and the signal MC according to the expected events from the signal fit on data.
Pulls/Residuals Singal only (Chi2)
Configuration:
#Toys |
BDTCut |
tBins |
FitRange |
Fitter |
Minimizer |
Plots |
1000 |
0.3 |
6 |
[0,12] |
Chi2 |
Minuit |
Chi2 fitter Residuals and Pulls on signal MC only |
1000 |
0.3 |
8 |
[0,12] |
Chi2 |
Minuit |
Chi2 fitter Residuals and Pulls on signal MC only |
1000 |
0.3 |
12 |
[0,12] |
Chi2 |
Minuit |
Chi2 fitter Residuals and Pulls on signal MC only |
By looking at the above plots there are a couple of outcomes that can be extracted. First of all we see that our procedure has a very small bias in terms of identifying the minimum, since we see that the mean value of the residual and pull plot is very close to 0. Some, more information to be extracted is that the Std. Dev of the residuals is O(0.2) which is compatible with the error we are observing on data, giving as the sensitivity of our method. Finally, two more things can be seen from the residuals and pulls, one of them is that the value of the Std.Dev. from the Pulls is different than 0 leading on an underestimation of the error in our method. The second fact is that there is clearly a bin dependence with the selection of tha proper time bins. To lift those issues, we decided to change the fitter and move from a chi2 fitter to a multinomial fitter, since we know that the chi2 fitter is an approximation of large statistics, which here is not the case.
Pulls/Residuals Signal only (Multinomial)
As stated above the Chi2 fitter seems to have a significant issue with the calculation of the errors (underestimation), that is why a decision was made to move into a multinomial fitter, by implementing an -2LogLikelihood fitter. The same studies with the same configuration were run, and cross checked so to see whether we can reach a better performance.
#Toys |
BDTCut |
tBins |
FitRange |
Fitter |
Minimizer |
Plots |
1000 |
0.3 |
6 |
[0,12] |
Multinomial |
Minuit |
Multinomial fitter Residuals and Pulls on signal MC only |
1000 |
0.3 |
8 |
[0,12] |
Multinomial |
Minuit |
Multinomial fitter Residuals and Pulls on signal MC only |
1000 |
0.3 |
12 |
[0,12] |
Multinomial |
Minuit |
Multinomial fitter Residuals and Pulls on signal MC only |
1000 |
0.3 |
24 |
[0,12] |
Multinomial |
Minuit |
Multinomial fitter Residuals and Pulls on signal MC only |
1000 |
0.3 |
42 |
[0,12] |
Multinomial |
Minuit |
Multinomial fitter Residuals and Pulls on signal MC only |
From the plots it can be seen that the multinomial fitter with the same binning as the chi2 in proper time has a Std. Dev. value in the pulls = 1.00. Meaning that the error calculation is being performed correctly. Additionally, we see that the performance of the fitter is almost constant in terms of the binning. Therefore, we decided to try and increase the bin content to large values and see if there is any correlation. From the two extra plots (24,42) it can be seen that there is no dependance on the error calculation with the binning, neither for the mean of the residual distribution. Also, the performance in terms of resolution on the lifetime value seems to be the same as the chi2, which is what it was expected as well.
Pulls/Residuals full dataset (Chi2)
The next step is to integrate in the toy procedure the sPlot technique and check the impact it has on our residuals and pulls. Even if the Chi2 did not provide sufficient performance with the signal only we will still going to run the pulls and residuals on it for different configurations and see how it performs with sPlot included. So what we are using is the bb->mmX MC for the bkg and the signalMC for the signal. However, there is a flow in this method that we need to validate, and I will describe it in the Assumptions section a bit lower.
#Toys |
BDTCut |
tBins |
Exp. Signal Events |
Exp. Bkg Events |
FitRange |
Fitter |
Minimizer |
Plots |
1000 |
0.3 |
6 |
50 |
590 |
[0,12] |
Chi2 |
Minuit2 |
Chi2 fitter Residuals and Pulls on full data set with sPlot and 6 ct bins |
1000 |
0.3 |
8 |
50 |
590 |
[0,12] |
Chi2 |
Minuit2 |
Chi2 fitter Residuals and Pulls on full data set with sPlot and 8 ct bins |
1000 |
0.3 |
8 |
50 |
295 |
[0,12] |
Chi2 |
Minuit2 |
1000 |
0.3 |
8 |
50 |
148 |
[0,12] |
Chi2 |
Minuit2 |
1000 |
0.3 |
12 |
50 |
590 |
[0,12] |
Chi2 |
Minuit2 |
Chi2 fitter Residuals and Pulls on full data set with sPlot and 12 ct bins |
Additionally we created a set of plots to show how the mean, std. dev. of pulls and residuals evolves with the different number of expected background events
Evolution plot for Residual Mean with the errors |
Evolution plot for Residual Std. Dev. with the errors |
Evolution plot for Pulls Mean with the errors |
Evolution for Pulls Std. Dev. with the errors |
There is a very nice feature that has been observed by looking at the pulls and residuals. If we check the number of underflow bins in the pulls we see that they decrease when decreasing the expected number of background events. This leads to the a preliminary outcome that most likely the mass fit on the independent variable is not perfroming as expected, leading to fits that take more background inside than expected. For this reason a study is being made where the stabilization of the mass fit is being attempted initially by fixing the sigma and the mean of the gaussian component to the values observed from fitting the signal MC. This study is being performed with the multinomial an shown below.
Pulls/Residuals full dataset (Multinomial - no changed statistics)
#Toys |
BDTCut |
tBins |
Exp. Signal Events |
FitRange |
Fitter |
Minimizer |
Constraint in signal mass model |
Plots |
1000 |
0.3 |
6 |
590 |
[0,12] |
Multinomial |
Minuit2 |
No |
Pulls and residuals for 6 ct bins with no constraint and full bkg |
1000 |
0.3 |
8 |
590 |
[0,12] |
Multinomial |
Minuit2 |
No |
Pulls and residuals for 8 ct bins with full, half, quarter of the expected background, with the sequence starting from top and no constraint on mass singal model |
1000 |
0.3 |
8 |
295 |
[0,12] |
Multinomial |
Minuit2 |
No |
1000 |
0.3 |
8 |
178 |
[0,12] |
Multinomial |
Minuit2 |
No |
1000 |
0.3 |
12 |
590 |
[0,12] |
Multinomial |
Minuit2 |
No |
Pulls and residuals for 12 ct bins with no constraint and full bkg |
1000 |
0.3 |
6 |
590 |
[0,12] |
Multinomial |
Minuit2 |
Yes |
Pulls and residuals for 6 bins with full background and keeping the mean and sigma of signal model in mass constant |
1000 |
0.3 |
8 |
590 |
[0,12] |
Multinomial |
Minuit2 |
Yes |
Pulls and residuals for 8 bins with full background and keeping the mean and sigma of signal model in mass constant and different bkg |
1000 |
0.3 |
8 |
295 |
[0,12] |
Multinomial |
Minuit2 |
Yes |
1000 |
0.3 |
8 |
178 |
[0,12] |
Multinomial |
Minuit2 |
Yes |
1000 |
0.3 |
12 |
590 |
[0,12] |
Multinomial |
Minuit2 |
Yes |
Pulls and residuals for 12 bins with full background and keeping the mean and sigma of signal model in mass constant |
Pulls/Residuals full dataset (Multinomial)
The next step is to integrate the toy procedure with the new fitter we identified and this is the Multinomial likelihood. However, there is a small issue with using the multinomial out of the box as we did for the signal only toys. The main reason for this is because the multinomial requires the entries of the sPlot histogram which in our scenario is an event per event weighted histogram. This leads to a discrepancy between the full toy scenario and the signal only scenario. A solution has been identified to use instead of the direct entries of the bin the square of the sum of the weights in the bin divided by the sum of the squared weights. This solution remains an approximation and that's why it has to been cross checked if we can use it right away. So for the multinomial the following plots have been observed:
#Toys |
BDTCut |
tBins |
Exp. Signal Events |
FitRange |
Fitter |
Minimizer |
Constraint in signal mass model |
Plots |
1000 |
0.3 |
6 |
590 |
[0,12] |
Multinomial |
Minuit2 |
No |
Pulls and residuals for 6 ct bins with no constraint and full bkg |
1000 |
0.3 |
8 |
590 |
[0,12] |
Multinomial |
Minuit2 |
No |
Pulls and residuals for 8 ct bins with full, half, quarter of the expected background, with the sequence starting from top and no constraint on mass singal model |
1000 |
0.3 |
8 |
295 |
[0,12] |
Multinomial |
Minuit2 |
No |
1000 |
0.3 |
8 |
178 |
[0,12] |
Multinomial |
Minuit2 |
No |
1000 |
0.3 |
12 |
590 |
[0,12] |
Multinomial |
Minuit2 |
No |
Pulls and residuals for 12 ct bins with no constraint and full bkg |
1000 |
0.3 |
6 |
590 |
[0,12] |
Multinomial |
Minuit2 |
Yes |
Pulls and residuals for 6 bins with full background and keeping the mean and sigma of signal model in mass constant |
1000 |
0.3 |
8 |
590 |
[0,12] |
Multinomial |
Minuit2 |
Yes |
Pulls and residuals for 8 bins with full background and keeping the mean and sigma of signal model in mass constant and different bkg |
1000 |
0.3 |
8 |
295 |
[0,12] |
Multinomial |
Minuit2 |
Yes |
1000 |
0.3 |
8 |
178 |
[0,12] |
Multinomial |
Minuit2 |
Yes |
1000 |
0.3 |
12 |
590 |
[0,12] |
Multinomial |
Minuit2 |
Yes |
Pulls and residuals for 12 bins with full background and keeping the mean and sigma of signal model in mass constant |
Conclusion on toy studies
It has been seen that non of our likelihood parametrizations produced a sufficient acceptable result. This leads us to change our fitter approach. All approached indicate either a bias on the fitter (mean and pull distribution shifted) or a issue with the error estimation (Pull distribution sigma = 1 not symmetric pulls). Neither stabilization the mass fitter nor by reducing the background (increase the separation of sPlot) lead to an acceptable result. For this reason we started implementing a numerical calculation of the likelihood with the use of toy generation. This will lead to a CPU intensive method but is the final fort to parameterize a weighted histogram. For that reason a new twiki page needs to be create to enhance all the stuff in regard of the
NumericalLikelihood studies.
--
IoannisXiotidis - 2020-05-19