CMS DAS 2015 @ BARI : Muon exercise
%COMPLETE5%
Contacts
Overview
This exercise has the goal to familiarize you with the basic muon observables and the muon identification and isolation in
CMS.
It starts with a presentation describing the muon reconstruction and identification in
CMS and then some interactive exercises are being performed to demonstrate the details.
The first interactive part of this exercise is to learn to use muons, muon discriminators and muon isolation and to be able to choose the best recipe depending on the analysis you want to perform. Then there is a short exercise on the muon mumentum calibration and scale corrections that demonstrates how the muon scale can be improved by applying them.
Prerequisites
- ROOT.
- Python knowledge is helpful but python is easy enough so that you can learn it on the fly.
Miscellaneous Notes
The color scheme of the Exercise is the following:
- Shell commands will be embedded in grey box, e.g.
cmsrel CMSSW_5_3_23
- Output and screen printouts will be embedded in green box, e.g.
"Selection:detIso_0.175 Signal (%):0.941713788604 background (%):0.177871148459"
- General python code will be embedded in red box, e.g.
self.declareHisto(nameOfHisto,nbins,min,max,x axis title)
- Quizzes and important messages will be highlighted in green
Introduction
CMS aims to search for new physics/particles at the TeV scale and to perform Standard Model precision measurements. Many of the decay products of the particles produces at LHC are leptonic.
Muons have a clean signature at hadron colliders. Therefore one of the main tasks of the
CMS detector is to identify and measure them precisely.
Slides:
ToDo: upload the slides here
Analysis Framework and Tools
The framework available uses FWLite in python to build a modular and interactive analysis running in default
PAT tuples created from a Double Muon triggered sample.
FWLite in python loads all CMSSW objects in python and you can access them exactly as you would in C++.
Creating the Working Area
The commands below setup an area that includes the necessary software used for the exercise by downloading them from the
GitHub
repository.
No working GitHub account should be needed to download these programs.
Anyhow (as this is useful to collaborate to the
CMS software in general) we put here links to relevant to the setup of suh an account:
1
2
.
PLEASE DO NOT COPY AND PASTE ALL THE COMMANDS IN ONE GO BUT INSTEAD EXECUTE THEM LINE BY LINE.
ssh -Y cmsdas01.ba.infn.it
cd YOUR_WORKING_AREA
export SCRAM_ARCH='slc6_amd64_gcc472'
cmsrel CMSSW_5_3_23
cd CMSSW_5_3_23/src
cmsenv
git init
git remote add origin https://github.com/pietverwilligen/Schools.git
git fetch origin
git checkout Students
mv CMSDAS_2015_Bari/* .
rm -rf CMSDAS_2015_Bari
scram b -j 32
cd CMSDAS/Muons/run
If git just doesn't work use this alternative commands to download the package (cancel the CMSSW_5_3_23 directory first):
ssh -Y cmsdas01.ba.infn.it
cd YOUR_WORKING_AREA
export SCRAM_ARCH='slc6_amd64_gcc472'
cmsrel CMSSW_5_3_23
cd CMSSW_5_3_23/src
cmsenv
cp -r /data/shared/Short_Exercise_Muon/CMSDAS_2015_Bari/* .
scram b -j 32
cd CMSDAS/Muons/run
Part 0 : The main analyzer
The main analyzer, containing the functions used in the exercise and the event loop, is located in:
CMSSW_5_3_23/src/CMSDAS/Muons/python/Analyzer.py
The input file is defined at line 150 of the analyzer:
'/data/shared/Short_Exercise_Muon/input2.root'
It contains 144583 DATA events recorded with a dimuon trigger.
The main analyzer runs, by default, over all the events of the file.
It loops on the events and splits the dimuon pairs in two mutually exclusive samples:
- One sample containing opposite sign (OS) di-muons with an invariant mass cut around the Z peak and with the first muon well isolated and identified [signal selection]
if mu1.isTightMuon(self.vertex) and \ # muon 1 is Tight
mu1.pt()>20 and mu2.pt()>20 and \ # muon 1 pT > 20 GeV and muon 2 pT > 20 GeV
mu1.chargedHadronIso()<0.15 and \ # muon 1 is tracker isolated
mu1.charge()+mu2.charge() ==0 and \ # muon 1 and muon 2 have opposite sign
(mu1.p4()+mu2.p4()).M()>80 and (mu1.p4()+mu2.p4()).M()<120. : # dimuon mass > 80 GeV and < 120 GeV
self.signal=self.signal+1 # if the dimuon passes the cuts is classified as signal candidate
- The other sample containing same sign (SS) di-muons with the first muon well anti-isolated [background selection]
if mu1.pt()>20 and mu2.pt()>20 \ # muon 1 pT > 20 GeV and muon 2 pT > 20 GeV
and mu1.chargedHadronIso()>5.0 \ # muon 1 is tracker anti-isolated (or not isolated)
and (mu1.p4()+mu2.p4()).M()<80 \
and mu1.charge()+mu2.charge() !=0 : # muon 1 and muon 2 have same sign
self.background=self.background+1 # if the dimuon passes the cuts is classified as background candidate
Part I: Basic Muon Variables
The aim of this part of the excercise is to get you familiar with the basic kinematics variables of the muons and of the di-muon system.
As the study is performed on real data selected by a double muon trigger during the past LHC run, this gives you a feeling of how the collected data looks like (for the signal and background selections).
Plot some basic muon variables
We will use
studyKin.py
to plot and study some basic muon observables. Inside the analyzer we have created some abstract functions that you need to specify in
studyKin.py
. For example to fill plots after the signal/background selection but before any other cut (see Part II of this exercise) we have created
signPlotBeforeCut
and
backPlotBeforeCut
. Open
CMSSW_5_3_23/src//CMSDAS/Muons/python/Analyzer.py
and go to line 58 and 100 to see the code:
if self.signPlotBeforeCut is not None:
self.signPlotBeforeCut(mu2.pt(), mu2.eta(), mu2.phi(), mu2.charge(), (mu1.p4()+mu2.p4()).M(), (mu1.p4()+mu2.p4()).Pt())
...
if self.backPlotBeforeCut is not None:
self.backPlotBeforeCut(mu2.pt(), mu2.eta(), mu2.phi(), mu2.charge(), (mu1.p4()+mu2.p4()).M(), (mu1.p4()+mu2.p4()).Pt())
As arguments of these two functions we have given the pt, eta, phi and charge of the muon under study (the second loosely identified muon) and the invariant mass and pt of the dimuon pair. Now go to
CMSSW_5_3_23/src/CMSDAS/Muons/run/studyKin.py
open it and study the code. You see that plots are declared, such as:
signMuPtBeforeCut = ROOT.TH1F("signMuPtBeforeCut","",100, 0,100)
and that the function (to fill the plot with a particular variable) has been defined as:
def fillSignalPlotsBeforeCut(mu2Pt, mu2Eta, mu2Phi, mu2Charge, dimuMass, dimuPt):
signMuPtBeforeCut.Fill(mu2Pt)
which as arguments take exactly the arguments of the abstract function
signPlotBeforeCut
in
Analyzer.py
that fills exactly this plot. Note that this newly defined function is then assigned to the abstract function:
analyzer.signPlotBeforeCut = fillSignalPlotsBeforeCut
The code is similar for plotting a muon variable in the case of background events.
To run the script you can exit the editor (or open another shell) and type:
ipython studyKin.py
It will take a bit less than 3 minutes to run over the full statistics. You will make a plot showing the pt distribution of both the signal sample and the background sample. You can do the same for both the muons obtaining two plots like these:
Quizzes on basic muon variables
1. Why is there no information for the region 0-20 GeV?
Now change
studyKin.py
such that you plot also the eta and phi distribution of the two muons and the mass of the dimuon system.
2. What can you say about the signal sample, what physics process dominates there? Why are there almost no signal events at higher pt (50-100GeV)? Are the dimuon resonances mainly boosted or not?
Note that you can give as argument of the abstract function (
signPlotBeforeCut
in
Analyzer.py
) all available PAT Muon variables that are kept for the muons in the ROOT file. Keep in mind that if your implementation of this function (inside
studyKin.py
) uses any PAT Muon variable, then you have to make sure it is implemented as a variable of this
signPlotBeforeCut
function too. Those two functions need to have the same argument list.
3. BONUS QUESTION: Based on what just said, you can try to plot some of the indetification or isolation variables using the same approach shown in this section.
All the quantities are available in the RECO or PAT muon data formats. As a suggestion, try to plot the PF isolation variable as
defined below for the signal sample before cuts.
You could easily do that by adding the whole mu2 object into
signPlotBeforeCut
and
backPlotBeforeCut
and defining the isolation at plotting level.
What do you see? Do you think that the isolation discriminates well between signal and background? Do you think that you would already have sifficent information on hot to set cuts on this variable?
Part II : Muon ID & Isolation
This section is intended to show you how different identificaction variables, such as the quality of the muon track and its proximity to the primary vertex of an interaction, can be used to further enhance signal sample selection and reject reconstructed muons from other sources.
Along the excercise a large use of different muon identification criteria will be used, they were optimised for different scenarios and to satisfy different "use cases".
Before applying them, have a quick look at
SWGuideMuonID to get familiar with what they do.
Performance of muon ID
Open the
CMSSW_5_3_23/src/CMSDAS/Muons/run/studyID.py
file and have a look at it .
The
studyID.py
analyzer allows you to define a custom selection and to see its effect (i.e. the efficiency) on the signal and background sample.
With this information we can define a figure of merit to evaluate the performance of different selection requirements.
All you need to do is to add a function for each selection you want to have, by cloning the example function which requires the Soft ID. Please note that boolean variables are defined in the PAT muon data format for each set of selections (loose, soft and tight ID's) as explained in
SWGuideMuonID:
def softID(muon,vertex):
return muon.isSoftMuon(vertex)
The main analyzer assumes the function will take as input the muon and its associated vertex (to compute the impact parameters). You can then write any code you want inside the function.
To apply the selection through the associated function you need to add a line as this in
studyID.py
:
analyzer.addSelection('softMuonSelection',softID,20,ROOT.kRed)
where the first argument (
'softMuonSelection'
) is the name you choose for the ID, the second argument (
softID
) is the name of your newly created function,
the third argument defined the marker style (20 = dots) which will be used in the plot and the last argument is the marker color in ROOT style (ie
ROOT.kRed
,
ROOT.kGreen
,
ROOT.kMagenta
etc etc)
To add this line ensures that the selection is applied and you see the corresponding result on the plots.
To run the script you can exit the editor (or open another shell) and type:
ipython studyID.py
it will take a bit less than 3 minutes to run over the full statistics.
The command will print on the screen the efficiency for signal and background and will add a point in the plot. The print out will look like:
Selection:softMuonSelection Signal (%):0.927794693345 background (%):0.358543417367
implying that the selection under test (Soft ID) is 92.7% efficient for signal and preserves only 35.8% of the background.
You should then obtain a plot similar to this
By defining additional functions you can evaluate the performance of the different selection requirements, as requested below.
Part II Quizzes on Performance of muon ID
1. Using the techniques explained above, compare the Soft, Loose and Tight muon ID defined in SWGuideMuonID.
These IDs are already implemented in PAT as boolean variables so you dont need to apply all the cuts, you can directly use the functions:
def tightID(muon,vertex):
return muon.isTightMuon(vertex)
def looseID(muon,vertex):
return muon.isLooseMuon()
Remember to add the selection to the analyzer
analyzer.addSelection('tightMuonSelection',tightID,20,ROOT.kBlack)
analyzer.addSelection('looseMuonSelection',looseID,20,ROOT.kBlue)
You should obtain a plot similar to this (in which red is soft, blue is loose and black is tight)
What can you conclude?
Performance of muon isolation
We want now to evaluate the performance of the relative muon isolation,
defined as the vectorial sum of the particles contained in a cone of a given size built around the muon, divided by the muon pT.
In principle we could use the same approach as for the muon ID but since the isolation is a continuous parameter
and we need to define several working points (i.e. cuts) is better to use another combination of python functions, defined in a smarter way.
First of all we need to define a function that returns the isolation value. The macro
studyID.py
already contains the definition of the detector-based isolation:
def detIso(muon,vertex):
return ( muon.isolationR03().sumPt + muon.isolationR03().emEt + muon.isolationR03().hadEt )/muon.pt()
We then need to create a function which runs multiple selections, as of the example also available in
studyID.py
:
analyzer.addMultipleSelection('detIso',detIso,20,ROOT.kGreen,20,0,0.5)
This function is similar to
addSelection
defined above but it accepts 3 new parameters at the end.
These correspond to the number of isolation working points (20 in the example), evenly defined in the given range ([0;0.5] in the example),
for which the isolation performance will be evaluated. The curve which will be obtained in this way is called a
ROC curve
.
If you run the script with the already defined detector based isolation you get an output of the form: ...
Selection:detIso_0.175 Signal (%):0.941713788604 background (%):0.177871148459
Selection:detIso_0.25 Signal (%):0.975641583297 background (%):0.198879551821
Selection:detIso_0.35 Signal (%):0.991300565463 background (%):0.210084033613
Selection:detIso_0.025 Signal (%):0.347107438017 background (%):0.0532212885154
Selection:detIso_0.275 Signal (%):0.981296215746 background (%):0.201680672269
...
You can see that the function
addMultipleSelection
prints a line for each working point (i.e. cut value) defined as above.
For instance
Selection:detIso_0.175 Signal (%):0.941713788604 background (%):0.177871148459
means that
the detector based isolation with an upper cut of 0.175 is 94.2% efficient for signal and preserves only 17.8% of the background.
You should obtain a plot similar to this (after a zoom on the x axis)
What can you conclude? How the isolation cuts reflects on the points?
By defining additional functions you can evaluate the performance of other isolation requirements, as requested in the quizzes below.
Part II Quizzes on Performance of muon isolation
2. Study the isolation options.
For the detector isolation you can access the track, ECAL and HCAL components in a cone of DeltaR=0.3 as following (specific methods are available to access the activity in a cone of radius of 0.3 or 0.4 in the eta-phi plane around a reconstructed muon):
muon.isolationR03().sumPt # ->track component
muon.isolationR03().emEt # -> ECAL component
muon.isolationR03().hadEt # ->HCAL component
Perform the comparisons of the relative isolation for:
- detector based track+ECAL+HCAL
- PF chargedHadrons+photons+neutral Hadrons
You can access the PF isolation as described in the SWGuideMuonID for a cone of 0.4 [default!].
It is worth knowing that these informations are also directly accessible in PAT with:
muon.chargedHadronIso() # ->charged component
muon.photonIso() # -> photon component
muon.neutralHadronIso() # ->neutralHadron component
muon.puChargedHadronIso() # ->pileup chargedHadron component: BEWARE if you want to use this, you must define the deltaBeta correction
and don't forget to add the corresponding addMultipleSelection line:
analyzer.addMultipleSelection('pfIso',pfIso,20,ROOT.kBlue,20,0,0.5)
You should obtain a plot similar to this (after a zoom on the x axis, blue is the Particle Flow isolation)
What can you conclude?
If you have time you could also try these combinations:
- detector based tracks vs. PF charged Hadrons
- PF chargedHadrons+photons+neutral Hadrons vs. DeltaBeta corrections
3. Assume that you want to search for a new particle that decays into muons.
The expected signal in a given luminosity is 10 events and the expected background is 500 events.
Given that the significance in absence of systematic errors is given by s/sqrt(b), find a combination of ID and isolation to maximize significance.
To calculate it you need to multiply your signal efficiency (%) and background rejection(%) by the number of expected events.
To make your life easier there is already a function that you can add at the very end of studyID.py
(or type in the ipython prompt after the event loop is finished)
analyzer.calculateSignificance(10,500)
which is defined in the main analyzer Analyzer.py
The command prints the significance for each of the defined selections.
BONUS QUESTIONS :
How much more luminosity you need to get 5 sigma?
Try to plot at least the PF relative isolation with/without
DeltaBeta corrections for the signal and background samples in the same way explained in the Part I.
HINT: define an helper function in the
Analyzer.py
script to return the isolation variables:
def pfIsoComp(self,mu1):
return (mu1.chargedHadronIso()+mu1.photonIso()+mu1.neutralHadronIso())/mu1.pt()
def pfIsoDBComp(self,mu1):
return (mu1.chargedHadronIso()+max(0,mu1.photonIso()+mu1.neutralHadronIso()-0.5*mu1.puChargedHadronIso()))/mu1.pt()
and update the
signPlotBeforeCut
and
backPlotBeforeCut
. You should obtain a plot like this:
What can you conclude?
The
addSelection
functions work on the
studyKin.py
script, if you have time you could even try to check the impact of the PF relative isolation
DeltaBeta corrected + Tight ID on the muon kinematical variables for both the signal and background events (such as the impact on the pT or the invariant mass distributions before and after the cut).
Part III : Muon scale corrections
Now the next part is dedicated to study and understand the effect of scale corrections.
Check the
studyScale.py
macro.
It declares two histograms and the method fillPlots fills the histograms given the corrected and uncorrected
TLorentzVectors of the muon pair.
The histograms are filled for golden di-muons. The correction is performed using the
MuscleFit package. The current defined histograms are filling the mass difference between the di-muon and the nominal PDG Z boson mass.
You can run the code (it will take a bit less than 3 minutes) with ipython:
ipython studyScale.py
Then in the python prompt check the mean of the distributions using
zDiffU.GetMean()
zDiffC.GetMean()
It is also possible to plot the two histograms overlayed on the same canvas with different colors and to print their mean automatically:
czDiff = ROOT.TCanvas("czDiff","czDiff",600,600)
legend=ROOT.TLegend(0.6,0.6,0.95,0.85)
legend.SetFillColor(0)
legend.SetLineColor(0)
legend.SetTextSize(0.02);
zDiffU.SetLineColor(1) # UNCORRECTED muons in BLACK
zDiffU.SetMarkerColor(1) # UNCORRECTED muons in BLACK
zDiffU.GetXaxis().SetTitle("m_{#mu#mu} - m_{#mu#mu}^{PDG} [GeV/c^{2}]")
width = zDiffU.GetBinWidth(1)
zDiffU.GetYaxis().SetTitle("Entries/" + str(round(width,2)) + " GeV/c^{2}")
zDiffU.GetXaxis().SetTitleOffset(1.7);
zDiffU.GetYaxis().SetTitleOffset(2.0);
zDiffU.GetYaxis().SetTitleSize(0.04);
zDiffU.GetXaxis().SetTitleSize(0.04);
zDiffU.Draw("histe1")
zDiffC.SetLineColor(2) # CORRECTED muons in RED
zDiffC.SetMarkerColor(2) # CORRECTED muons in RED
zDiffC.Draw("histsamee1")
legend.AddEntry(zDiffU, "Uncorrected, Mean: "+str(round(zDiffU.GetMean(),2))+"GeV/c^{2}", "pl")
legend.AddEntry(zDiffC, "Corrected, Mean: "+str(round(zDiffC.GetMean(),2))+"GeV/c^{2}", "pl")
legend.Draw("same")
czDiff.Update()
You should then obtain a plot similar to this
How does the mean of the histogram changes? Based on this, do you consider the muon correction as an improvement or not? Why?
Part III Quizzes
4. Is the mean of the histogram obtained with the corrected muons exactly at the nominal Z mass? Where you expecting it? Why?
5. Compare the width of the Z mass reconstructed with the uncorrected muon momenta with the width of the Z mass of the corrected muon momenta. What can you conclude about the momentum resolution of the muons?
-- PietVerwilligen - 2015-01-13