HIN-18-008 LambdaC review (preapproval stage)

Pre-approval comments%COMPLETE5%

Hi Analyzers,
please find bellow the additional comments raised during the meeting today. Please address these, together with the ones sent separately on the AN, asap. You will not be preapproved (and can not start working with the ARC) until these 2 homework are addressed satisfactory.
Also, the majority of these answers will require you to add extra documentation (figures, explanations, etc) in the AN. Do not just answer on Twiki.

Ping us/conveners, if there is something not clear in the list of questions.
Cheers,

TC.

======================================
General:

1.
all figures have to have proper title: CMs Simulation Preliminary, CMS Preliminary (data). You should also use the standard style, because it seems all your axis titles, CMS logo position, etc are not the standard ones (https://twiki.cern.ch/twiki/bin/view/CMS/Internal/FigGuidelines)
2.
Fill the Stat comm questionnaire asap

The comments are pasted at the bottom part of this twiki.




S6: this is NOT standard event selection. The standard one has 3 towers in PbPb, and 1 or 0 in pp. clarify what was actually used.

for pp: abs(PVz)<15&&pBeamScrapingFilter&&pPAprimaryVertexFilter

for PbPb: pclusterCompatibilityFilter&&pprimaryVertexFilter&&phfCoincFilter3&&abs(PVz)<15

HIN-16-001 and HIN-16-016 are using the same event selection.


S6:
-- How much data you add with the extra centrality triggers? this info is missing the AN, please add
-- For PbPb peripheral: do you use PbPb or pp reco?

For the extra entrality trigger, the lumi is 110.159 mub^{-1}.

The number of event of these extra centrality trigger ( apply event selection and remove the overlap MB trigger but not requiring cnetrality >30%) is 238773787

We use PbPb reco with PV refit.
S7:
-- i don't understand the y and eta selection... is NOT for lambda_c both of them for sure (y is for lambda_c, eta maybe for the daughter tracks?)
-- pthat=0 is the same as MB. why listing pthat=0?
-- Is there a bias from pthat=4 for the lambdaC’s used in analysis?

Both y and eta are sure for Lambda_C. I take Lambda_C decays into pkpi non-resonance subchannel fragment file as an example. https://github.com/ruixiao491/MC_gen_fragment/blob/master/lambdaC_pt4_request/python/lambdaCpkpi.py

All the information about fragmentation files are in here

The aim for the table on slide 7, is to list all the information for MC dataset samples, there is no specific reason to list pthat =0.

There is no bias from pthat =4. We plot pthat:pt with pthat=0, pt >4 official MC sample.

_2018-03-16_1.44.06.png


S8:
-- this detailed description should be added in the AN too.
-- sideband very close to the peak.Check that you are outside 3sigma region. Also, make sure the SB region is defined based on the Data resolution
-- What do you do with MC extrapolation to lower pT? How is extrapolation done? (answer has to be added in the AN too)

Have already checked that the sideband is outside 3 sigma.

The differential cross section should have the same shape as the gen-pt. If we take 4<pt<5 GeV as an example, the weight for 4-5 should be the same as 6-8. So we first expolate to lower pt from MC and then we applied the optimized cut tp get the differential cross section and we do the TMVA again.


S9:
-- Why does signal significance decrease when signal efficiency increases?
-- What was optimized for the cuts? Significance, efficiency, purity?

The cuts will become looser when signal efficiency increases, which means that both the number of signal and background become larger. The gree line is the signal significance on slide 9. When signal efficiency = 0.5250, the significance reaches the maximum value. The signal efficiency is bigger than 0.5250, the significance becomes smaller, because when the cuts become loose, the increasing number of background is much much more than the inceasing number of signal.

The optimal cut values are defined as the one maximizing the statistical significance s/$\sqrt (s+b)$.


S10:
-- the eta track is what you fixed in the end, or that came out of the TMVA?! sounds bizare to have 1.2 everywhere out of TMVA-- clarify also in the AN
-- cuts for 0-30% and 0-100% are identical. How come?
-- where do these cuts live on the ROC plot? (requested also in context of documentation in the AN)

The eta cut of 2 tracks are not from TMVA. From pre-approval slide 9, I listed the training variables for TMVA. The table on slide 10 is the potimized cuts that we used for this analysis. The majority of the value for selction variables are from TMVA. (except for the eta value of 3 tracks).

The cuts for 0-30% and 0-100% are indeed identical. Because the N_coll for 0-30% is very large and this means that 0-30% takes the main part of 0-100%, which decides the cuts for 0-100% are identical with 0-30%.

The optimal cuts values are defined as the one maximizing the statistical significance s/$\sqrt(s+b)$. pp10_20_weight_ptcut_significance.png

So above plot is the significance vs signal efficiency plot. The green line is for significance. Through this plot, we can see that there is a peak. There is also some sentences in this plot:" the maximum s/\sqrt(s+b) is 9.7993 when cutting at 0.5250. There is a xml file which includes the values of cut varialbes for each specific signal efficiency.


S11: What type of swapping is included in the swapped component? You have 3 particles..

The swapped component is the Lambda_C candidates with incorrect mass assignment from exchange of the proton and pion. Even though we have 3 particles, but the charge of kaon is different from both proton and pion, which means that only proton and pion have the possibility with incorrect mass assignment.

S12: are these binned or unbinned fits? (missing info also in the AN)

The invariant mass plot fit is binned fits


S13: 40% difference between 0-100% and 30-100% ... this doesn't look right
-- 2x different efficiency*acc between pp and PbPb
-- need to separate acceptance from efficiency, for both pp and PbPb
-- clarify how the Ncoll weighting is actually done (event-by-event or in some other way). Include a plot with these weight factors in AN. Cross-check efficiency with and without centrality reweighting.

Slide 13 is about acceptance times efficiency. Because the selection cuts are different between 0-100% and 30-100% and the cut for 0-100% is much tighter than the cuts for 30-100% (10<pt<20 GeV), it is normal that there is 40% difference between 0-100% and 30-100%.

Also for pp and PbPb, we use different MC. Due to multiplicity effect are different between pp MC and PbPb MC and the selection cuts for PbPb is tighter than that of pp. This is the reason that there is 2x difference for acceptance * efficiency for pp and PbPb.

Because there is a difference of centrality between PbPb MC and PbPb data. So we first noramlized MC centrality distribution to PbPb data centrality event by event. Since the number of signal in data is proportional to N_coll and there is at least one Lambda_C in PbPb MC, so we normalied by Ncoll. The weight factor is: weight = data centrality distribution/ PbPb MC centrality distribution * Ncoll.

accptance_pp_TMVAcuts_official_pt4_TMVAcuts.gif accptance_PbPb30_100_TMVAcuts_official_pt4_TMVAcuts.gif accptance_PbPb0_30_TMVAcuts_official_pt4_TMVAcuts.gif accptance_PbPb0_100_TMVAcuts_official_pt4_TMVAcuts.gif

Acceptance for pp (1st plot), 30-100 (2nd), 0-30\% (3d) and 0-100\% (4th ) plot.


S14:
-- still not clear how you calculated the 14% in the tracking efficiency
-- Selection efficiency enormous for peripheral PbPb and pp. Why so much larger than for central?

The tracking effieciency for pp on slide 14 is 11%. I think that what you are asking is the tracking efficiency 14% on slide 15. First, for slide 14: The single track efficiency is 4% in pp, and there are 3 tracks in our analysis. So the tracking efficiency equals to 1-0.96^3 = 11%. For slide 15: The single track efficiency is 5% in PbPb, and there are 3 tracks in our analysis, so the tracking efficiency equals to 1-0.95^3 = 14%.

S15: 'pt shape uncertainty' seems fishy to have such a big difference between the 0-100%, and 0-30%, and the 30-100% ...

First, the pt shape uncertainty depends on the bin width of the histogram, that is the reason that for the uncertainty for PbPb centrality 30-100%, 8<pt<10 Gev is smaller than that of PbPb centrality 30-100% 10<pt<20 GeV. Next, we get the Lambda_C pt spectrum via m_T scaling from D0 measurement spectrum, which means that the pt spectrum is the one with centrality 0-100%. So we use this to compare with PYTHIA in 3 cenrlaities bins, we would have difference. Also, we can compare the systematic uncertainty for 10-20 GeV for centrality 0-30% and 0-100%. For 0-30%, the uncertainty is 6.9% and is 7.1% for 0-100%. These two values are consistent with each other, due to the N_coll for centraliy 0-30% is very big and 0-30% takes the majority part of the data of 0-100%. This is the best we can do, due to we do not have any models for Lambda_C that can be up to 10-20 GeV.


S20:
-- ... what does 'loose cuts' mean?
-- how can you combine all these uncertainties?

>>>Loose cut means the cuts are looser than the optimized cuts that we use for our analysis. It should be no cuts, but for Lambda_C, the signal/ background ratio is very small, so the most loose cuts are the one that we can just see the signal.

All the cut efficiency are summed together in quadrature as the total systematic due to on cut efficiency. The table of all the systematic uncertainty for each specific variables and the total systematic uncertainty due to cut efficiency is on slide 22


S21-22:
-- how do you chose the interval of variations? --> add also in the AN
-- what are the other selection you apply when you make these figures (have to cut on variables that are not strongly correlated to this variable).
-- should follow what was done in the B AN, and not the D AN (and get the OK from the Spectra PInG!)
----- Should be refined taking into account correlation between different points.
----- Start with plotting statistical uncertainties on the ratio plots of the systematics. Quote difference wrt cut value you are using.
----- Need to have soft cut on other variables when testing a specific one. Cut on variables that are not strongly correlated with the one you are studying.

For the systematic uncertainty assocaited to cut efficiency:

1) We calculate the uncertainty for each cut value which correlates the varied cuts and the used cuts.

The fomula is on slide 9 of this

2) Because the uncertainty for each cut value is bigger than the fluctuation of the points, so every points are stronly correlated.

3) We plot the double ratio for each cut value and the uncertainty for each cut value.

4) We fit the plot and constrains when the varied cut value equals to the used cut value, the double ratio is 1. The fit function is pol1.

5) From the fit, we can expolate the double ratio when the varied cut is no cut. (The no cut for alpha is 0.2)

5) The difference between the double ratio for no cut and 1 is the systematic uncertainty for this specific cut variable.

We used pp 10-20 cut scan as an example.

pp_10_20_dls_double_ratio_fix3widths.gif pp_10_20_Dchi2cl_double_ratio.gif pp_10_20_alpha_0315_double_ratio.gif


S23: you have to add aditional material in the AN, to show the mT spectrum you used to get these weights: step-by-step, what you do, and how you do it

A: About the mT scaling: the invariant mass vs m_t for both D0 and Lambda_C should on the same line. So we can use D0 spectrum to get Lambda_C spectrum.

1) m^2(Lambda_C)+p_T^2(Lambda_C)=m^2(D^0)+p_T^2(D0)

from the fomula above, we can get the followings:

p_T(Lambda_C)=sqrt(m^2(D^0)+p_T^2(D0)-m^2(Lambda_C))

2) Then we change we can get the invariant yield vs p_T for Lambda_C.


S24: find hard to believe that pythia is so wrong, that it gets the shape, but it screws up by >10 the pp lambda_c spectra ...

For the pythia prediction on the cross section, we used pp MB cross section to be the one before filter and calculate the lumi = number of events/(filter efficiency * pp MB cross section). However this turns out wrong, pythia does not equal to in-elastic scattering, so now we calculate the lumi like this: lumi=number of events/cross section after the filter. So this turns out the ratio of data/pythia of cross section is becoming 10 times smalller than before. Because the lumi cancels in the Lambda_C/D0 ratio, so this does not affect the pythia prediction on the Lambda_C / D0 ratio. combine_allcentrality_invariant_yield_Taa_centrality_reweight_new.gif

The above plot is the updated plot for the cross section for pp, PbPb (3 centralies) and PYTHIA prediction.

combine_pp_PbPb_lambdaC_D0_ratio_centrality_weight_0227.gif

The above plot is the Lambda_C/D0 ratio for pp and PbPb with centrality 0-100%.
S24:
-- your total (0-100%) PbPb x-sec is < than in 30-100% ! …
-- PYTHIA is very very wrong (would’ve expected some shape difference, while the ratio seems rather flat vs pT). Check for problems/bugs with the normalization.
------ (for Approval): LHCb result for D+ and D-, a comparison with PYTHIA lhcb_tune-> pretty good compatibility at fwd reason. Should check if there is smth equivalent
-- The TAA+uncert valus in the table are not updated! The latest values (with asymmetric uncertainties) can be found on https://twiki.cern.ch/twiki/bin/view/CMS/GlauberTables

The cross section on slide 24 is the one that have already been normalized by Taa. (which is the effect of Raa causes the total (0-100%) PbPb x-section is smaller than that in 30-100%).

For the PYTHIA bugs have already been explained in the previous comment.

We have already used the updated value for Taa+uncert and the plot for Raa is as follows:

combine_allcentrality_Raa_withcentrality_reweight.gif



S25:
-- 10% systm. Uncert on Taa from S24. But the open boxes here seem much bigger than 10%. Check for bugs on how you porpagated the uncertainties

-- if they are compatible on this slide, they should be at the same value on S26 (when divided by a fixed pp point) --> you have to be careful with the interpretation and claims of ‘consistency’



*** Additional questions from Emilien (StatComm representative, you will get them again when you fill the questionaire, here just in case you want to get to them faster):
- pol4: is that really needed? looks like the bkd is rather smooth and monotonic. same with the alternative functions
- check how the widths compare between the two Gaussians, from 1 bin to the other? also, the relative strength of the 2 Gaussians?
- binned or unbinned fits? need to do unbinned fits.
- goodness of fit? (eg when looking at alternative fits and claiming they look ok)
- fitting syst: are the sig and bkg variations done at the same time?

Comments from Emilien %COMPLETE4%


Dear Rui,

Thank you for filling the Statistics Committee questionnaire. I have a
few questions:
- 3.4: what is the CutsSA method of TMVA?
- 3.6: I don't believe the data-MC comparison of the MVA input
distributions is available in the AN, please add it.
- 3.8: what do you call highly correlated variables? Also, I thought
(3.7) that you did not study correlations between input variables? It
could be interesting to see the correlation matrix for the input
variables, for signal and background separately (not necessarily for all
bins -- just 1 bin would be enough).
- General questions about the MVA: how are the signal and background
normalised? Why isn't proton pt used for PbPb, is this related to your
answer to question 3.8? What is the preselection for the signal and
background training samples?
- 4.3 I'm confused. In the AN it looks like you're using functional
forms, not templates? Also, are you using binned fits? I would consider
moving to unbinned fits, or at least having finer bins. Also, if using
ROOT, make sure that you use option "I" (use the integral of the
function in each bin).
- 4.7 are the results of this toy study available somewhere?

Thank you,
Émilien


-3.4 For TMVA, the rectangular cut is chosen as the classification method.

-3.6: I added the data-MC input distribution in section Appendix.

-3.8 I will think about your suggestion and see the correlation matrix for the input variables for 1 bin. The reason that we did not use PbPb as a training variable for PbPb is that, we tried to add this add a training variable and this variable cut a lot of the signals and background which causes a big fluctuation when we draw the mass plots and did not enhance the significance significantly. So we did not include this as a training variable in PbPb. We normalized background: weight = number of events in data / number of events that we used to train. We normalized signal to both pp and PbPb, weight= differential cross section 2*L*BR delta pt/number of get level in MC. The formula is hard to write in email, and way to normalized is in slide 8 in the pre-approval slides. (https://indico.cern.ch/event/706612/contributions/2908930/attachments/1609902/2556852/Lambda_C_pre_approval.pdf)
And the preselection for signal and background sample are the followings:
1) the absolute value of rapidity of Lambda_C is smaller than 1.
2) the decay length significance for pp is bigger than 1 and the decay length significance for PbPb is bigger than 2.
3) the vertex probability > 0.05
4) the alpha angle is < 0.2.
5) the absolute value of Pseudorapidity of 3 tracks < 1.2
6) the pt for 3 tracks > 1 for PbPb and > 0.7 for pp.
The training sample initial cuts are on slide 9 in pre-approval slides.

-4.3 I misunderstood the meaning of the problem when I fill the questionnaire . we use the function with parameters to fit the mass plots. Yes, we used binned fits. I would consider your unbanned fit suggestion. Thank you for reminding the option "I".
Also one thing about 4.3. I want to talk about the details about this. Indeed, we use function with parameters, but we first fit the MC gen-level signal (which is the true signal) with double Gaussian to get the two widths of Double Gaussian and the ratio of yield of Double Gaussian. When we fit the data, we fix the 2 widths of Double Gaussian and the ratio of the yield of double Gaussian and keep the mean to be the only parameter to float. So the data only decides the magnitude of the signal (which is the number of signal in each pt bins) and the mean value of the double Gaussian. Also for each bin of the histogram, the number of counts in each bin are huge and there are 3-4 bins under the signal shape, so the effect of unbinned fit should be very small. The details about the signal extraction is also in pre-approval slide, on slide 11. The mass plots are on slide 12. I know that for pp 5<pt<6 GeV, the number of bins under the signal shape is smaller than the others, which is because I rebin only for this pt-range due to I see a large fluctuation for this pt-range. So I will keep the bin-width for this pt range the same as the other pt bins. In summary, I think that the unbinned fit should have a very small effect on this signal extraction.

For 4.3, we want to change the fit option "I" to "L". The reason for this change is the followings:
We generate 400 radom number with Gaussian distribution, we tried to fit with default fit option, "I" option and "L" option. We test the ratio of yield from fit to the yield of histogram, we find that fitting with "L", the ratio is 1 and it is the best among this 3 fitting options. I pasted the code for this test:

#include <TMath.h>
#include <TF1.h>
#include <TNamed.h>
#include <TNtuple.h>
#include <TTree.h>
#include <TMath.h>
#include <TCanvas.h>
#include <TStyle.h>
#include <TLatex.h>
#include <TLorentzVector.h>
void testfit_option(){
TF1 *f_gaus = new TF1 ("f_gaus","gaus",-5,5);
f_gaus->SetParameters(1,0,1);
TH1F *h1 = new TH1F ("h1","h1",50,-5,5);
h1->Sumw2();
for (int i=0; i <400; i++)
{
double a = f_gaus->GetRandom();
// cout<<" random number: "<<a<<endl;
h1->Fill(a);
}

cout<<" initial histogram yield" <<h1->Integral()<<endl;
h1->SetLineColor(9);
h1->Draw("e");
TF1 *f_fit = new TF1("f_fit","gaus",-5,5);
h1->Fit("f_fit","I","",-5,5);
cout<<" default fit "<<f_fit->Integral(-5,5)/h1->GetBinWidth(0)<<endl;

}

-3.6: I added the data-MC input distribution in section Appendix.

OK, this will be in the next version of the AN right? I don't see it in v3.

-3.8 I will think about your suggestion and see the correlation matrix for the input variables for 1 bin. The reason that we did not use PbPb as a training variable for PbPb is that, we tried to add this add a training variable and this variable cut a lot of the signals and background which causes a big fluctuation when we draw the mass plots and did not enhance the significance significantly. So we did not include this as a training variable in PbPb. We normalized background: weight = number of events in data / number of events that we used to train. We normalized signal to both pp and PbPb, weight= differential cross section 2*L*BR delta pt/number of get level in MC. The formula is hard to write in email, and way to normalized is in slide 8 in the pre-approval slides. (https://indico.cern.ch/event/706612/contributions/2908930/attachments/1609902/2556852/Lambda_C_pre_approval.pdf) And the preselection for signal and background sample are the followings: 1) the absolute value of rapidity of Lambda_C is smaller than 1. 2) the decay length significance for pp is bigger than 1 and the decay length significance for PbPb is bigger than 2. 3) the vertex probability > 0.05 4) the alpha angle is < 0.2. 5) the absolute value of Pseudorapidity of 3 tracks < 1.2 6) the pt for 3 tracks > 1 for PbPb and > 0.7 for pp. The training sample initial cuts are on slide 9 in pre-approval slides.

Thank you, all this would be useful to document in the analysis note.

I will include the result for problem 4.7 for one bin from this Purdue email. The plot is as an attachment. Sorry for this inconvenience. From this plot, we can see the mean value is 0 and the width is about 1. There is some fluctuation.

OK, this is not a toy study, this is a plot of the pulls. I believe a toy study would be useful to check the statistical uncertainties and validate the fitting procedure, at least in one bin.

Also one thing about 4.3. I want to talk about the details about this. Indeed, we use function with parameters, but we first fit the MC gen-level signal (which is the true signal) with double Gaussian to get the two widths of Double Gaussian and the ratio of yield of Double Gaussian. When we fit the data, we fix the 2 widths of Double Gaussian and the ratio of the yield of double Gaussian and keep the mean to be the only parameter to float. So the data only decides the magnitude of the signal (which is the number of signal in each pt bins) and the mean value of the double Gaussian. Also for each bin of the histogram, the number of counts in each bin are huge and there are 3-4 bins under the signal shape, so the effect of unbinned fit should be very small. The details about the signal extraction is also in pre-approval slide, on slide 11. The mass plots are on slide 12. I know that for pp 5<pt<6 GeV, the number of bins under the signal shape is smaller than the others, which is because I rebin only for this pt-range due to I see a large fluctuation for this pt-range. So I will keep the bin-width for this pt range the same as the other pt bins. In summary, I think that the unbinned fit should have a very small effect on this signal extraction.

It is not good to choose a different bin width depending on the analysis bin... This introduces possible bias. Also, why isn't the width left free in the fit, to account for a possible mismodelling of the resolution in simulation? Are the fits to simulation and the corresponding fitted parameters available somewhere? Lastly, it would be good to better motivate your choice of signal and background functions (eg comparing the chi2 of the nominal fit with other choices of signal and background functions). For instance, I don't see why you would need a 4th order polynomial to fit what looks like a very smooth background in PbPb.

For 4.3, we want to change the fit option "I" to "L". The reason for this change is the followings: We generate 400 radom number with Gaussian distribution, we tried to fit with default fit option, "I" option and "L" option. We test the ratio of yield from fit to the yield of histogram, we find that fitting with "L", the ratio is 1 and it is the best among this 3 fitting options.

You can combine the I and L options wink and I believe you should do it in your analysis fits. In your test the situation is different from what you have in the analysis: empty bins bias the chi2 fit, which is not the case in your analysis fits (so the L option makes a bigger difference in the test than in the analysis), and bins are narrow compared to the Gaussian width, meaning the function varies little in each bin (so the I option makes a smaller difference in your test than in the analysis).
Also, repeating this toy study starting from the actual function fitted used in the analysis, and repeating it with several toy datasets, would allow to address the question 4.7 :)

For 4.3 set the width parameter float problem:

Because there are too many parmater in the fit function, we could not fit with both the mean of the double Gaussian and the 2 widths of Double Gaussian to be float in the fit. Also HIN-16-001 and HIN-16-007 are using the same signal extraction. Both these two analysis only have the mean to float in this fit and fix the 2 widths and fix the ratio of yield of the double Gaussian in the fit. Also we check the goodness of this fit.

For the PbPb fit function problem:

Because before the fit range is 2.1-2.45 and there is a little curve closed to 2.1 and 2.45, so at that time we use pol4. We just tested in 2.2-2.4 fit range, we indeed can use pol2. Thank you for your suggestion. We will change the fit function to pol2, which is same as the fit function for pp.

We take pp 10-20 GeV as an example.

1) We first count the number of signal and background from the fitting of pp 10-20 GeV data. (here signal number is a and background is b).

2) We use uniform random number to decide whether we would sample signal or background.

3) If the uniform random number is smaller than a/(a+b), then we generate the random number with the signal fit function (double Gaussian) from data, else we generate the random number with the background fit function (pol2) from data.

4) the total number of generated random number equals to the total number foreground from data in the fit range (2.2,2.4).

5) after we generate the radom number, we fit the signal with Double Gaussian and background with pol2. We record the number of signal after the fitting.

6) We repeat the above step for several times and plot the distribution of the recorded number of signal after fitting.

The code for this fit goodness check is here.

I paste one of the fit plot and the distribution of the recorded number of signal after fitting.

histogram_fit_goodnesscheck.gif

The above plot is the distribution of the recored number of sigal after fit. The mean is 952.5, and our initial set for signal number is 959. So the result is consistent. (The entries is 3876 not 5000 is because, the histogram is fitted antomatically, and there is some bad fit which did not fit the signal, and the number of signal of bad fit is <0, so I delect these bad fit.)

Thank you for your suggenstion, we will combine "I" and "L" together.

histogram_fit_toy_MC_plot.gif

The above plot is one example of this fit. The foreground is generated from random number.

First about the signal extraction.

I checked with Jing and found that I misunderstood the fit function of double Gaussian. In previous result, we fixed the 2 width of double Gaussian. Now we add a paramter [5] to accommodate the difference between the MC and data as follows:

1) for signal:

TF1 *ff2=new TF1("ff2","0.007*[0]*([3]*TMath::Gaus(x,[1],[2]*(1+[5]))/(sqrt(2*3.14159)*[2]*(1+[5]))+(1-[3])*TMath::Gaus(x,[1],[4]*(1+[5]))/(sqrt(2*3.14159)*[4]*(1+[5])))",2.1,2.45);

We fix the 2 width [2],[4] and the ratio of 2 Gaussian yield [3], we keep the mean [1] and [5] to float.

The mass plot of pp 10-20 GeV /c with the new fit function is shown below.

10_20_pp_officialTMVA_whole_2gaus_pol2_weighted_ptcuts_eventselection_withLoption_change_fit_function.gif

In this toy study, we use the fit function before to generate the sample distribution.

1) We first count the number of signal and background from the fitting of pp 10-20 GeV data. (here signal number is a and background is b, the value for a in this procesure is 954).

2) We use uniform random number to decide whether we would sample signal or background.

3) If the uniform random number is smaller than a/(a+b), then we generate the random number with the signal fit function (double Gaussian) from data, else we generate the random number with the background fit function (pol2) from data.

4) the total number of generated random number equals to the total number foreground from data in the fit range (2.2,2.4).

5) after we generate the radom number, we fit the signal with Double Gaussian and background with pol2. We record the number of signal after the fitting.

6) We repeat the above step for several times and plot the distribution of the recorded number of signal after fitting.

In the fit procesure, we use the above new fit function. (So this time, we add a float parameter [5] in this signal fit function.) We set the constrains about parameter [5] to the range [-0.5,0.5] and also about parameter [1] to the range [2.7,2.3].)

signal_number_fit_function_0313.gif

The reason for the cut "float_width <0.49" is because for some fit, the parameter [5] hit the limit of the constraints. The above plot is the recorded "signal_number" distribution. The mean of this distribution shows in the statistc box is 968.6.

Then we fit this recorded "signal_number" distribution with a Gaussian function:

The mean of this fit is 966.2. and sigma is 208.7.

The following is the recorded "fitting-error" distribution:

error_distribution_0313.gif

The mean value for this "fitting-error" is 203.1, which is closed to the width of the "recorded signal number".

The initial value that we set is 954, and the mean value is 966, so there is about 1.5% bias in this fit. However comparing to the systemtatic uncertainty of this analysis, this bias is negligible.

Topic attachments
I Attachment History Action Size Date Who Comment
GIFgif 10_20_pp_officialTMVA_whole_2gaus_pol2_weighted_ptcuts_eventselection_withLoption.gif r1 manage 11.4 K 2018-03-09 - 20:41 RuiXiao  
GIFgif 10_20_pp_officialTMVA_whole_2gaus_pol2_weighted_ptcuts_eventselection_withLoption_change_fit_function.gif r1 manage 11.3 K 2018-03-13 - 21:38 RuiXiao  
PNGpng _2018-03-16_1.44.06.png r1 manage 40.5 K 2018-03-16 - 06:46 RuiXiao  
GIFgif accptance_PbPb0_100_TMVAcuts_official_pt4_TMVAcuts.gif r1 manage 6.9 K 2018-03-15 - 05:38 RuiXiao  
GIFgif accptance_PbPb0_30_TMVAcuts_official_pt4_TMVAcuts.gif r1 manage 6.9 K 2018-03-15 - 05:39 RuiXiao  
GIFgif accptance_PbPb30_100_TMVAcuts_official_pt4_TMVAcuts.gif r1 manage 7.0 K 2018-03-15 - 05:36 RuiXiao  
GIFgif accptance_pp_TMVAcuts_official_pt4_TMVAcuts.gif r1 manage 6.5 K 2018-03-15 - 03:27 RuiXiao  
GIFgif combine_allcentrality_Raa_withcentrality_reweight.gif r1 manage 8.8 K 2018-03-16 - 06:08 RuiXiao  
GIFgif combine_allcentrality_invariant_yield_Taa_centrality_reweight.gif r2 r1 manage 14.9 K 2018-03-16 - 17:02 RuiXiao  
GIFgif combine_allcentrality_invariant_yield_Taa_centrality_reweight_new.gif r1 manage 14.9 K 2018-03-16 - 17:22 RuiXiao  
GIFgif combine_pp_PbPb_lambdaC_D0_ratio_centrality_weight_0227.gif r1 manage 11.6 K 2018-03-16 - 06:03 RuiXiao  
GIFgif error_distribution_0313.gif r1 manage 8.3 K 2018-03-13 - 21:34 RuiXiao  
GIFgif histogram_fit_goodnesscheck.gif r1 manage 7.1 K 2018-03-09 - 20:34 RuiXiao  
GIFgif histogram_fit_toy_MC_plot.gif r1 manage 9.1 K 2018-03-09 - 20:46 RuiXiao  
PNGpng pp10_20_weight_ptcut_significance.png r1 manage 110.7 K 2018-03-09 - 23:11 RuiXiao  
GIFgif pp_10_20_Dchi2cl_double_ratio.gif r1 manage 9.6 K 2018-03-16 - 06:24 RuiXiao  
GIFgif pp_10_20_alpha_0315_double_ratio.gif r1 manage 8.8 K 2018-03-16 - 06:24 RuiXiao  
GIFgif pp_10_20_dls_double_ratio_fix3widths.gif r1 manage 9.8 K 2018-03-16 - 06:23 RuiXiao  
GIFgif signal_number_distribution_5000_0313.gif r1 manage 8.6 K 2018-03-13 - 21:25 RuiXiao  
GIFgif signal_number_fit_function_0313.gif r1 manage 10.5 K 2018-03-13 - 21:29 RuiXiao  
Edit | Attach | Watch | Print version | History: r18 < r17 < r16 < r15 < r14 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r18 - 2018-03-19 - CameliaMironov
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Sandbox All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback