--
CristinaAnaMantillaSuarez - 2015-02-23
Projects
Summer Student Project: Measurement of mt using leptonic observables
We are interested on how is the top quark mass m
t related to the kinematics of the leptons in the final state. The idea is that is this analysis the measurement won't be affected by the jet energy scale JES uncertainty.
This is the theory article from which the idea is based on:
http://inspirehep.net/record/1305642
Slides:
https://indico.cern.ch/event/301787/session/3/contribution/17/material/slides/0.pdf
- It presents a procedure for determination of mt based on leptonic observables in dilepton tt events.
- Construct the distributions from leptons only and require b-jets [anti-kT, R=0.5] within the detector (i.e. integrate over) - Minimal sensitivity to the modeling of both perturbative and non-perturbative QCD effects.
- We shall be working with the top quark pole mass (defined as the real part of the pole in the top-quark propagator)
- Employs kinematic distributions of leptons - we are interested in their shapes.
-
- Working with distributions is cumbersome, but the information on the top mass that such shapes encode can be effectively provided by Mellin moments of the corresponding distributions.
ABOUT UNFOLDING
Unfolding can be defined as correcting data for detector effects. Due to the finite resolution of real world particle detectors, any measurement conducted in experimental high energy physics is contaminated by stochastic smearing. The observations recorded with any real world particle detector are always subject to undesired experimental effects, such as limited detector resolution and detection inefficiencies. The observation of such distorted collision events instead of the desired true events is called smearing or folding of the data and often results in broadening of the physical spectra measured by the LHC experiments.
Unfolding then refers to using the smeared observations to infer the true physical distribution of the events. It refers to the problem of estimating the particle level distribution of some physical quantity of interest on the basis of observations smeared by an imperfect measurement device.
There are three main reasons when unfolding is desirable:
- To publish an estimate of the true physical distribution of events.
- To compare measurements of 2 different experiments with different experimental resolutions
- To compare unfolded histograms with theoretical predictions
In our case, we apply an unfolding technique because we want to compare our results with the theoretical predictions from our theory paper of reference. The measured distributions are distorted from the true underlying distributions by the limited acceptance of our detector and by bin-to-bin smearing due to a finite resolution of the variables. We perform our unfolding using
TUnfold package, which consists on a matrix inversion based on a least square fit with Tikhonov regularisation. Similar (but not identical) to the SVD method. Some documentation can be found here:
http://www.desy.de/~sschmitt/tunfoldv16docu.html
http://www.desy.de/~sschmitt/TUnfold/tunfold_manual_v17.3.pdf
Also we will try to follow TOP group general recomendations related to unfolding for top signatures.
https://twiki.cern.ch/twiki/bin/viewauth/CMS/TopUnfolding
Actually, the unfolding code we use is based in the code spinnet in this page:
https://twiki.cern.ch/twiki/bin/view/CMS/TopUnfoldingExampleCodes
For more information about TUnfold algorithm, have a look at:
The class TUnfold can be used to unfold measured data spectra and obtain the underlying "true" distribution. The unfolding method is summarized here:
*The measured spectrum

can be expressed by the true spectrum

multiplied by a smearing matrix
S, that accounts for migration of an event from one bin into another bin due to resolution effects as well as for different acceptances for the different bins:

= S

By performing a regularized inversion of the matrix
S, TUnfold gives an estimate for the true spectrum

and accounts for the above mentioned effects. In addition TUnfold can also take care of the proper subtraction of background contributions with a proper handling of the uncertainties on the background estimation. We use
TUnfoldSys which provides methods to do systematic error propagation and to do unfolding with background subtraction.
TUnfoldSys uses a regularization parameter %\tau%, giving the strength of regularization. Will be roughly on the order of 1e-4. We use this value suggested but in principle you can determine this value by performing unfolding with many different values, maybe between 1e-3 and 1e-7 or such, and choosing the value of tau that minimizes tunfold.GetRhoAvg().
UPDATES:
- NEW WORKING DIRECTORY: /afs/cern.ch/work/c/cmantill/public/CMSSW_5_3_22/src/UserCode/TopMassSecVtx/
22 July, 2015: Unfolding
I obtained the stability, purity and efficiency of the binning as a closure test.
Now I am having problems to define a correct binning to unfold, since the last bin and the first bin contain also the overflow and underflow bins respectively. So I am doing several tests with the binning, and using just one of the variables to test:
At the end I compare the unfolded distributions obtained for each of the cases. And I compare the purity, stability and efficiency and also the
Here I describe how do I choose the bins:
The purpose of using quantiles is to choose these bins in such a way that all bins in the histograms contain the same numbers of events. This will increase the stability of the method. Depending on the case, flattening the truth spectrum after selection might be better than before selection.
- First, I obtained the quantiles for generated and reconstructed distributions (19 and 38 respectively) and I defined the binning with those: e.g.
bins_gen = [22.72,25.45,28.18,30.92,33.67,36.42,39.17,42.01,44.89,47.77,50.94,55.06,59.19,63.63,68.15,74.76,83.59,95,115.65]
len (bins_gen) = 19
bins_rec = [
21.36,22.73,24.09,25.45,26.82,28.18,29.55,30.92,32.29,33.67,35.04,36.41,37.79,39.17,40.57,42.01,43.45,44.89,46.33,47.77,49.21,50.93,53,55.06,57.12,59.18,61.37,63.63,65.89,68.15,70.73,74.76,78.74,83.59,88.75,95,102.85,115.65,
138.46]
len (bins_rec) = 39
The
bold bins are the extra bins in the reconstructed level.
Choosing the binning like this without any modification may lead to fill the first bin with underflow bin and last bin with overflow.
- Now:
bins_gen = [22.72,25.45,28.18,30.92,33.67,36.42,39.17,42.01,44.89,47.77,50.94,55.06,59.19,63.63,68.15,74.76,83.59,95,115.65]
bins_rec = [20,21.36,22.73,24.09,25.45,26.82,28.18,29.55,30.92,32.29,33.67,35.04,36.41,37.79,39.17,40.57,42.01,43.45,44.89,46.33,47.77,49.21,50.93,53,55.06,57.12,59.18,61.37,63.63,65.89,68.15,70.73,74.76,78.74,83.59,88.75,95,102.85,115.65,138.46,250]
For
ptpos :
For all the mass samples:
16 July, 2015: Calculating Mellin moments for each of the mass samples
The Mellin moments
13 July, 2015: Working with mass samples
We will start to work with mass samples from here, corresponding to m
t = [166.5,169.5,171.5,173.5,175.5,178.5]
GeV.
The mass samples are located in
/store/cmst3/group/top/summer2015/treedir_bbbcb36/ttbar/mass_scan/
for ttbar they are named as
MC8TeV_TTJets_MSDecays_*.root where * is the mass. In addition, given tW/tbarW is the main background I will also use:
/store/cmst3/group/top/summer2015/bbbcb36/mass_scan/MC8TeV_SingleTbar_tW_*.root
/store/cmst3/group/top/summer2015/bbbcb36/mass_scan/MC8TeV_SingleT_tW_*.root
09 July, 2015: Rebinning distributions according to quantiles
To check the binning array I was using, I got the quantiles separately and then round those numbers a bit so I have the same bin size for a long range and then a bit bigger bins towards the end.Then I just hardcode them by adding that array into my code. At the end I got this:
In some cases there are events in the 0 pt bin. Could be that there are some events with only one lepton.
I checked the event selection, and there is indeed a cut on abs(
EvCat) to be either 11*11, 11*13, or 13*13. But there are some events in the MC samples which have events in the 0 pt bin, we impose the condition:
if not isData:
if tree.GenLpPt == 0 or tree.GenLmPt == 0: continue
Now the plots look like this:
08 July, 2015: Rebinning distributions according to quantiles
I found a bug in the way that I was calculating the quantiles, so now the distributions look like this for
ptpos:
It looks like all the bins except the first and last are ok (they have more or less the same number of entries), but those two are wrong. I added one more quantil, by adding the 1.0 into my array
07 July, 2015: Rebinning distributions according to quantiles
We have started to study the unfolding procedure.
We are going to define the binning scheme for our histograms, such that we get flat distributions to unfold.
This is the idea:
GetQuantiles ROOT Function gets you the values of x which divide the distribution in the quantiles you define.
You can use it to re-define the binning of the distribution, such that now you now that the statistics will be distributed according to your pre-defined quantiles.
i.e.
- Run once with the histograms defined with a big range and equal binning and get the quantiles for |0.1, 0.2, 0.3, ...,0.9,1 ]
- Run again with the histograms (and error matrix) defined according to the quantiles found i.e. [0, x_{0.1}, x_{0.2},...,x_{0.9},x_{1}, max]
- Unfold using the new binning definition
- At first I made a code which got the quantiles from each of the MC and data samples and rebinned the corresponding histograms according to each of the obtained quantiles.
- But what they wanted me to do was to get the quantiles from one of the DataMUEG files and fix the binning for all for the plots using them. I did this but then I got too many events in the last bin.
I extended the bin range, but I am just increasing the range to 200 without changing the binning, so I just make one big bin from 100 to 200, which will contain all the events there.
Note: Github useful tips
wget -q -O - --no-check-certificate https://raw.github.com/stiegerb/TopMassSecVtx/master/TAGS.txt | sh
git clone git@github.com:stiegerb/TopMassSecVtx.git UserCode/TopMassSecVtx
git status
cp ~/.gitconfig ~/.gitconfig.orig
cp /afs/cern.ch/user/s/stiegerb/public/forCarlotta/.gitconfig ~/
git df
git br
git remote -v
git remote add cmantill git@github.com:cmantill/TopMassSecVtx.git
git remote -v
git checkout -b mtdilepton
git add scripts/runDileptonUnfolding.py
git add scripts/utils.py
git commit -m'Cristinas first commit'
git l
git push cmantill mtdilepton
history | grep git > githistory
Note:
Getting CERN Kerberos ticket in my laptop I connected my computer with lxplus using
OpenAFS. In order to access your CERN AFS account you'll need to obtain an AFS token from the CERN server. I had already installed Kerberos packages but I followed instructions from this sites:
http://linux.web.cern.ch/linux/docs/kerberos-access.shtml
https://gist.github.com/KFubuki/10728230
- Also here I found some CERN very useful hacks
https://wiki.chipp.ch/twiki/bin/view/CmsTier3/HowToWorkInCmsEnv
After installing and setting it up, you should create a ticket and log on:
kinit username@CERN.CH
aklog
You can also test your access with:
klist
ls /afs/cern.ch/
ls /afs/cern.ch/user/c/cmantill/
Now you can work directly from your computer.
Note:
Request Workspace at Lxplus Locally at CERN, the personal working area on CERN's LXPLUS cluster isn't big enough to handle the output files, you'll need to write them to a larger-capacity area.
To ask for "AFS workspace" (up to 100 GB, backed up), login to the Cern account web page and go to "List Services", take the "AFS Workspaces" and then "Settings".
There you can ask for "workspace" in AFS, as well as extend the quota for your backed-up home (up to 10 GB). Please note the different AFS path to your workspace: /afs/cern.ch/work/initial/username where initial is the first letter of your username, i.e. the workspace is not hanging from your home.
Workspace path:
/afs/cern.ch/work/c/cmantill
30 June, 2015: Generating first plots
Turns out I was submitting 0 jobs before because the input directory should be this one:
input directory with the files: /store/cmst3/group/top/summer2015/treedir_bbbcb36/ttbar
I got the plots for the other four distributions. I modified scripts/runDileptonUnfolding.py and got a new version. Here I attach some plots:
The notation I use in my code is the following:
-
- ptpos - Pt of the positive lepton
-
- ptll - Pt of the charged lepton pair
-
- mll - Invariant mass of the charged lepton pair
-
-EposEm- Energy sum of the 2 leptons
-
-ptposptm- Pt sum of the 2 leptons
29 June, 2015: Reading the twiki page and starting to work
The twiki page:
https://twiki.cern.ch/twiki/bin/viewauth/CMS/CMGTopStudents2015
Working directory: /afs/cern.ch/user/c/cmantill/private/top/CMSSW_5_3_22/src/UserCode/TopMassSecVtx
...Processing all MC8TeV and Data8TeV
...
>>> Produced xsec weights and wrote to cache (.xsecweights.pck)
The normalization is computed as a weight

where

is the theoretical cross section of a process and

is the number of generated events. The luminosity

is the ratio of the number of events detected

in a certain time interval to the interaction cross-section

. With this definition, the number of expected events after acquiring a given integrated luminosity

, is given by

where

is the number of selected events in the analysis.
-
distributions
- Creating the reconstructed and generator level distributions
python scripts/runDileptonUnfolding.py -i /store/cmst3/group/top/summer2015/treedir_bbbcb36/ -o unfoldResults/ --jobs 8
input directory with the files: /store/cmst3/group/top/summer2015/treedir_bbbcb36/
--------------------------------------------------------------------------------
Creating ROOT file with migration matrices, data and background distributions from /store/cmst3/group/top/summer2015/treedir_bbbcb36/singlet/
Discarded 0 files duplicated in cmsLs output
Submitting jobs in 8 threads
Histograms saved in unfoldResults/Data8TeV_SingleElectron2012C.root
...
--------------------------------------------------------------------------------
I obtained the distributions at generated and reconstructed level for %$p_T (l^{+})$ (ptpos).
I can subtract the background from the data, and unfold the result by running
python scripts/runDileptonUnfolding.py -r unfoldResults/plots/plotter.root -v ptpos -o unfoldResults
Note: Eos useful commads:
eoscms ls -l /eos/cms/store/cmst3/group/top/summer2015
TFile::Open("root://eoscms.cern.ch//eos/cms/store/cmst3/group/top/summer2015/ttbar/treedir_bbbcb36/ttbar/MC8TeV_ZZ.root")