Link to the slides we'll go over as motivation for this exercise and the AN2012_422:
Newsbox |
---|
This Twiki page presents one of the long exercises prepared for 2015 CMS Data Analysis School, at LPC, FNAL, Jan. 12-16, 2015 and at Bari, Italy, Jan. 19-23, 2015 |
We assume the attendees to be familiar with basic C++ and Python and to have performed the pre-exercises: https://twiki.cern.ch/twiki/bin/view/CMS/WorkBookExercisesCMSDataAnalysisSchool#PreExercises. |
This extended exercise is intended to familiarize you with two dimuon analysis efforts in CMS: Drell-Yan-to-dimuon and Z'-to-dimuon.
Briefly, the Drell-Yan signature is a very rich source for discoveries in the Standard Model. Precise measurements of the Drell-Yan distribution is an important part of doing SM and BSM physics at the LHC.
Briefly, many new models of physics predict a new narrow resonance, here referred to generically as Z', that decays into a pair of oppositely-charged muons. We look for a bump from Z' in the dimuon invariant mass spectrum using a shape-based search, looking for the difference in between the exponentially falling background and the resonance peak that would show up should a Z' exist.
In this exercise, you'll work on the material of the analysis to produce the dimuon mass spectra for data and MC.
In doing this exercise you can learn (among other things):
Introductory slides are here and the paper (insert links)
Since we are using miniAODs, https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookMiniAOD2015.
Log into "cluster905.knu.ac.kr" and Set up cmssw
ssh cluster905.knu.ac.kr cmsrel CMSSW_7_4_7_patch2 cd CMSSW_7_4_7_patch2/src cmsenv git cms-merge-topic ikrav:egm_id_747_v2 git clone https://github.com/jshlee/cmsdas scram b -j6 cd cmsdas/ZprimeAnalyser/test cmsRun runAnalysis.py
To run on data use, runAnalysisRD.py!
/cmsdas/scratch/jlee/ZprimeToMuMu_M-5000_TuneCUETP8M1_13TeV-pythia8/RunIISpring15DR74-Asympt25ns_MCRUN2_74_V9-v1/MINIAODSIM /cmsdas/scratch/jlee/DYJetsToLL_M-50_TuneCUETP8M1_13TeV-amcatnloFXFX-pythia8/RunIISpring15DR74-Asympt25ns_MCRUN2_74_V9-v3/MINIAODSIM /cmsdas/scratch/jlee/TT_TuneCUETP8M1_13TeV-powheg-pythia8/RunIISpring15DR74-Asympt25ns_MCRUN2_74_V9-v2/MINIAODSIM /cmsdas/scratch/jlee/ZZ_TuneCUETP8M1_13TeV-pythia8/RunIISpring15DR74-Asympt25ns_MCRUN2_74_V9-v3/MINIAODSIM /cmsdas/scratch/jlee/WW_TuneCUETP8M1_13TeV-pythia8/RunIISpring15DR74-Asympt25ns_MCRUN2_74_V9-v1/MINIAODSIM /cmsdas/scratch/jlee/WZ_TuneCUETP8M1_13TeV-pythia8/RunIISpring15DR74-Asympt25ns_MCRUN2_74_V9-v1/MINIAODSIM /cmsdas/scratch/jlee/WJetsToLNu_TuneCUETP8M1_13TeV-amcatnloFXFX-pythia8/RunIISpring15DR74-Asympt25ns_MCRUN2_74_V9-v1/MINIAODSIM /cmsdas/scratch/jlee/SingleMuon/Run2015B-PromptReco-v1/MINIAOD /cmsdas/scratch/jlee/DoubleMuon/Run2015B-PromptReco-v1/MINIAOD
root -l out_tree.root TBrowser b
Look at the content of this root output file. Let's look just at some basic variables:
How many events does it contain? Show... Hide 49980
How many muons per event? (check Mu_nbMuon) Show... Hide 151440
What is their muon pT spectrum? (Mu_ptBestTrack) Show... Hide 871.9 GeV
The c++ analysis code is in
cmsdas/ZprimeAnalyser/plugins/ZprimeAnalyser.cc
You don't need to know these macros in detail, just try to run them. If you are interested, you can look at them further in your spare time
Use the provided miniAOD as input and run ZprimeAnalyser. This ZprimeAnalyser runs the full selection for 2mu and saves histograms and reduced ROOT Trees in output files in ROOT format.
The full selection is:
In CMS, the muon reconstruction software uses tracks reconstructed in the inner silicon tracker matched to tracks reconstructed in the outer muon system to produce the global muon track, giving an estimate for the muon charge, momentum, and production vertex. In addition, several different strategies for including information from the muon system have been developed for high-pT muon reconstruction and momentum assignment. For the results in this analysis, we use the “Tune P” ("cocktail") algorithm, which chooses, on a muon-by-muon basis, between the results of a few such algorithms using the tail probability of the χ2/d.o.f. of the muon track fits.
These cuts are implememted in file ZprimeAnalyser.cc, i.e. from line L229-L249 a method for the muon selection is defined: https://github.com/jshlee/cmsdas/blob/master/ZprimeAnalyser/plugins/ZprimeAnalyser.cc#L229-L249
Since we are looking for two muons which come from a single event (DY, Z, Z'), it makes sense that both muons come from the same vertex. We apply a vertex cut that calculates a probability that the two muons are from the same vertex. Moreover a cosmic muon can have the same signal as two muons coming from a single DY event. We can reject most of the cosmic background by excluding muons which are back-to-back. We only want to keep one dimuon candidate per event, and we want to keep the dimuon with the highest pt.
In this exercise we will look at results running the previous executable on /ZprimeToMuMu_M-5000_TuneCUETP8M1_13TeV-pythia8/RunIISpring15DR74-Asympt25ns_MCRUN2_74_V9-v1/MINIAODSIM
root -l out_tree.root TBrowser b
Reminder of the selection steps:
Inside the out_tree.root file, it contains the following plots “n-1” plots, i.e. the displayed quantity is excluded from the cuts. The value of the cut is indicated by a vertical dashed line.
You will find plots like these:
Number of silicon tracker layers containing hits | Number of hits in the pixel detector | Number of DT and CSC segments and RPC hits in the muon chambers | Number of tracker-muon segment matches per muon |
Relative pT error δ(pT)/pT | Longitudinal impact parameter | Tracker relative isolation |
In this file you will see the reconstructed and generated invariant mass spectra. The low mass tail is due to FSR and PDF effects.
Many distributions at generator level will also be produced (Pt, Eta, Phi...).
Exercise: Try to plot the same distributions at reco level.
Exercise: Try to fill gen distributions before the reco-gen matching. Do you see any differences?
The invariant mass resolution is evaluated from simulation, using the signal sample. The reconstructed invariant mass of each dimuon passing our selection criteria is compared to its true mass. The resolution is extracted by fitting the core of the distribution with a Gaussian.
In python try
import ROOT rootfilename = "out_tree.root" tt = ROOT.TFile(rootfilename) tree = tt.tree.Get("tree") hist = ROOT.TH1F("massResolution", "massResolution", 100, -0.5, 0.5) tree.Project("massResolution","(reco_diMu_m - gen_diMu_m)/gen_diMu_m","reco_diMu_m > 4000 && reco_diMu_m < 6000") hist.Draw()
Exercise: How much is the resolution on the Z'mass peak?
Exercise: Try to excract the invariant mass resolutions as a function of invariant mass.
Exercise: Make many bins in masses (0<M<250, 250<M<750, 750<M<1250, 1250<M1750, 1750<M<1250, 2000<M<4000 and 4000<M<6000), then plot the corresponding mass resolution, fit it Gaussian. After plot a relation between Mass resolution and the mass bins (as a hint, please refer to the right plot).
The number of Monte Carlo (MC) events is arbitrary, usually picked so that the statistical error on the quantity one wishes to explore is minimized, but there is always the trade-off between this goal and the reality of computing power and storage. (The more events, the more time it takes to simulate them, and the more time it takes to run analysis code on them.) When only looking at the distributions from just one particular physics process, e.g. Drell-Yan Z/γ^{*} to dimuons, the overall normalization of the histograms is arbitrary; when comparing/overlaying the distributions from different processes, e.g. including the additional background from ttbar into dimuons, the relative normalization between the two matters. And if one wants to compare these distributions to those obtained in data, one sometimes has to use the absolute normalization given the integrated luminosity of the data sample. Above we've determined the latter; here we have a quick exercise on calculating the MC weights.
Each physics process has a cross section σ, expressed in units of barn; often this comes with metric prefixes, such as millibarn (mb) or picobarn (pb). (These cross-section values are from e.g. PYTHIA, or from our theorist friends who know how to use properly higher-order codes to calculate them.) For a given integrated luminosity L (equivalent to the amount of data collected, in e.g. pb^{-1}), the number of events generated from that process is given by N = σL.
Weights question 1: If we have an arbitrary number of events N_{MC} from the MC sample, what weight w should the histograms for this process be scaled by if we're going to eventually use it in a data/MC comparison?
Hint... Hide What if you have a histogram containing all N_{MC} events? You want it to represent N = σL events instead.
Answer... Hide w = σL / N_{MC }
Be aware that there may be a filter efficiency for some processes.
Note that the N_{MC} that matters is the original number of events you pushed through simulation; beware if you lose a batch job somewhere!
One can also think in terms of the partial weight, which does not include the integrated luminosity. It is sometimes useful to calculate the weight in two steps like this so that the scale of the integrated luminosity can be included later in the analysis chain, important when e.g. one loses the output of a few jobs and ends up not running on the full data sample.
Calculate the partial weights for the MC signal and background histograms we have provided (assume integrated luminosity = 1 for now). We will add the data later.
The datasets used, the cross-sections and filter efficiencies used to simulate a given MC dataset, the number of events in any dataset (that can be found using the CMS DBS) are collected in this table.
Sample | Cross-section [pb] |
ZprimeToMuMu_M-5000_TuneCUETP8M1_13TeV-pythia8 | 5.48E-5 |
TT_TuneCUETP8M1_13TeV-powheg-pythia8 | 831.8 |
DYJetsToLL_M-50_TuneCUETP8M1_13TeV-amcatnloFXFX-pythia8 | 6025.2 |
ST_tW_top_5f_inclusiveDecays_13TeV-powheg-pythia8_TuneCUETP8M1 | 35.6 |
ST_tW_antitop_5f_inclusiveDecays_13TeV-powheg-pythia8_TuneCUETP8M1 | 35.6 |
WJetsToLNu_TuneCUETP8M1_13TeV-amcatnloFXFX-pythia8 | 61526.7 |
WW_TuneCUETP8M1_13TeV-pythia8 | 118.7 |
WZ_TuneCUETP8M1_13TeV-pythia8 | 65.9 |
ZZ_TuneCUETP8M1_13TeV-pythia8 | 31.8 |
Scale the histograms with the weight you calculated. You can use histogram->Scale(weight);
After a physical event happens every step towards its recognition is potentially ineffective - we may fail to "see" it. Generally the steps taken by the event toward us are "acceptance", "trigger", "reconstruction", "selection". What is important is the end/combined result of the "efficiency" but it is much easier to understand what is going on if one can look at a "single" step at a time i.e. "parametrize" the overall efficiency. Very often we are free to "parametrize" the steps as long as we use the parametrization consistently. At the end we need to correct the observed number of events by the efficiencies (1/eff).
"Acceptance" is the part of events that we regard as "detectable". You may think of it as a "geometrical efficiency" (the detector coverage is limited) and "kinematic efficiency" (not all "kinematic" properties of a given particle are reliably detectable by the detector). Given the DY (or Z', if we get lucky) production of a dimuon, how often do both the muons even have a chance of being reconstructed? I.e., what is the value of the fraction
acceptance = (Number of dimuons where both muons are in the nominal detector geometry) / (Number of dimuons)
CMS covers almost all of the solid angle around the interaction point. This is given approximately by the edges of the muon chambers, which extend down to roughly 10° from the beamline; this corresponds to a pseudorapidity of |eta| < 2.4. On the other hand, we have to apply a minimum cut on muon transverse momentum to ensure that charged particle makes it into the muon system and can in principle be reconstructed as a Global muon. This cut is normally determined by the Global muon reconstruction efficiency turn-on and is equal to roughly 6 GeV.
Keep in mind that the acceptance rises as a function of mass because at lower masses the dimuon is boosted more with respect to the lab frame and has an increased chance to be boosted along the z-axis, giving lower acceptance; at higher masses, the dimuon is produced closer to rest, so the lab frame is closer to the dimuon rest frame and the decays are more central.
Then "trigger" efficiency is the efficiency to have the event recorded for analysis by the trigger system. This is the "on-line" efficiency and it is unrecoverable - if something is lost there it is lost forever. Since in the offline analysis we are dealing with events that were triggered, one has to take into account the online transverse momentum threshold and the trigger efficiency turn-on. The pt
thresholds are different for each trigger.
"Reconstruction" efficiency is the efficiency to reconstruct the event (i.e. all of its components).
"Selection" efficiency is the efficiency to actually pick up the event by the event selection
There are two common ways to correct for efficiency once measured:
Another related point is that one should carefully think about what is the actual variable(s) used in the efficiency parametrization. When the event is born there is the "true" variable, say pt. Very often a model you want to compare to does not include the whole physics of the process - typically no QED corrections. This means if you want to compare to it you need the "true" pt before final-state-radiation (FSR) but what reaches the detector is not the "true" pt - it is the "post-FSR" pt. Then because of the resolution you don't measure the "post-FSR" pt but the "RECO" pt.
Your analysis technique should be such that to correct properly for these effects and estimate the precision of that. If you think about it you'll find out that there is no way to exactly separate "Acceptance" and "RECO" efficiency - you will always have border/resolution effects, however "Acceptance" times "RECO" efficiency can be defined such as to always be <=1.
Then, all the efficiencies could be estimated by the "post-FSR" variables from MC, all the data/MC efficiency corrections - by the "RECO" variables (of course) and systematic effects studied on this ground.
Maybe the most important aspect of an "efficiency" is the denominator, i.e. efficiency with respect to what. By construction every efficiency is relative! To get the efficiency right one should have a very good estimate (understanding) of what is in the denominator.
Edit ZprimeAnalyser.cc to find out the ε * acc (efficiency*acceptance = number of selected events/ N_MC) of our selection. You can use this number and scale the histograms with a weiight calculated as w' = σLε * acc / Nentries
For the signal sample that you run the value printed is Accxeff = 0.880562.
Now we will compare distributions of invariant mass in data and simulation. The distribution from each simulated sample is scaled by weights derived from the information in the Table reported above and using the value of the integrated luminosity .
The entire summed distribution is scaled to the data using the ratio of the observed to predicted numbers of events in the region of 60 < Mμμ < 120 GeV around the Z peak in the sample of events passing a prescaled muon trigger.
A THStack object will plot two or more histograms on the same plot, but adds up the entries so that all histograms can be seen instead of overlaying the histograms. Show all the MC histograms on a single THStack object.
THStack *hs = new THStack("hs", NULL); hs->Add(histo1); hs->Add(histo2); hs->Draw(); |
Try to produce the final invariant mass spectra considering all the background MC samples.
Create the comparison of the observed opposite-sign dimuon invariant mass spectrum with that expected from standard-model processes. All contributions to the expected spectrum are estimated using simulations. The observed and the expected spectra are in good agreemen. The signal invariant mass for mZ'= 5 TeV is also superimposed.
“Other prompt leptons” includes the contributions from Z → ττ, the diboson production WW, WZ, and ZZ, and single top tW.
The plot obtained is compared to the one reported in the AN2012_422 where the contributions to the expected spectrum are estimated using simulations, except for the contribution from multi-jets and W+jets, which are evaluated from the data.
The Tag&Probe method utilizes the properties of a well known resonance (e.g. Z, J/Psi) to extract a sample of muon probes suitable for efficiency estimation. All the efficiencies extracted with Tag and Probe method are per muon.
In the particular case of the Drell-Yan analysis the intention is to skim an unbiased sample of Z candidates. The tag muon is required to pass a set of selection criteria designed to identify the required particle type. The other (candidate) muon is then used to probe a property which is under investigation.
Extend ZprimeAnalyser.cc to select:
Add in 3 parameters (nbTP, nbTF and Eff.):
output eff= 96%
************************************** * Tag & Probe Eff at Z pole is 0.959 * **************************************
Exercise: try to write a Macro to plot the following:
The dominant irreducible background to our signal comes from the Drell–Yan dimuon production, while the largest background following the Drell–Yan production comes from tt decays.
All background components are estimated using simulations except for the background from jets, which is estimated from data. In order to demonstrate that the Monte Carlo simulation describes the reducible backgrounds well, we also derive them from data as a cross check. We use the following methods:
Behind the t-tbar background, the other main background for Z' is general QCD. Most QCD is removed with an efficient isolation cut, but there is still a rate at which jets fake muons. Monte Carlo does not model well the events that eventually pass this cut, so we are forced to rely mostly on data for the fake rate.
For ttbar->dilepton decays, same-sign muons would not be a good method for measuring the background since ttbar wouldn't decay into two same-sign leptons. However, the probability is twice as high for ttbar to emu as for ttbar to mumu. So if the ttbar->emu decays can be measured, the ttbar->mumu decays can be estimated.
If we assume that MC accurately represents data, then the ratio of ttbar->emu and ttbar->mumu decays should be the same in MC and data. This means the ratio (MC_mumu/MC_emu) = (data_mumu/data_emu). We can measure the MC mumu and emu background distributions and the data emu background distribution. From this we can estimate the data_mumu background on a bin-by-bin basis.
For emu method the 2012 data sample "MuEG 2012 datasets", which is rich with both muons and electrons, were used. The "HLT_Mu22_Photon22_CaloIdL" trigger was used. Then the first reconstructed object is a Muon passing high pt muon ID (2012), while the second object is an Electron passing HEEP V4.1 (2012) "TableHEEPID.pdf".
Extending ZprimeAnalyser.cc, plot the invariant mass spectra of the emu events selected from background MC sample and then superimpose to the previous spectra the emu invariant mass distribution from Data.
Exercise: now try to adopt this Macro to also plot the cumulative plot for both data & MCs.
the observed and expected upper limits on the ratio Rσ of the production cross section times branching fraction of a Z′ boson relative to that for a Z boson. The observed and expected limits agree within the uncertainties. The figure also shows the predicted cross section times branching fraction ratios for the benchmark Z′SSM [50] and Z′ψ [51] models. https://twiki.cern.ch/twiki/bin/viewauth/CMS/LimitsToolDocumentation
For the muon channel alone, the 95% CL lower limits on the mass of a Z′ resonance are 2770 GeV for Z′SSM and 2430 GeV for Z′ψ.
The μ+μ− invariant mass spectrum has been found to be in agreement with expectations from the standard model. The analysis excludes, at 95% confidence level, a Sequential Standard Model Z′SSM resonance lighter than 2770 GeV and a superstring-inspired Z′ψ lighter than 2430 GeV.