PAT Exercise 07: FWLite and MINIAOD Tutorial
Goal of this exercise
FWLite is a good tool to make brief studies and helps alongside code development. The first block of exercises makes you familiar with ad-hoc access of edm-files from the ipython shell. And it also shows how to discover functionality of physics candidate objects.
For LHC Run2, starting Mid 2015,
MINIAOD is the new official CMS data format for most analyses. The second block of exercises contains two simple analysis tasks in order to demonstrate how physics objects look and how they are accessed.
Note:
This web course is part of the
PAT Tutorial, which takes regularly place at cern and in other places. When following the PAT Tutorial the answers of questions marked in
RED should be filled into the exercise form that has been introduced at the beginning of the tutorial. Also the solutions to the
Exercises should be filled into the form. The exercises are marked in three colours, indicating whether this exercise is basic (obligatory), continuative (recommended) or optional (free). The colour coding is summarized in the table below:
Color Code |
Explanation |
|
Basic exercise, which is obligatory for the PAT Tutorial. |
|
Continuative exercise, which is recommended for the PAT Tutorial to deepen what has been learned. |
|
Optional exercise, which shows interesting applications of what has been learned. |
Basic exercises (

) are obliged and the solutions to the exercises should be filled into the exercise form during the PAT Tutorial.
Setting up of the environment
First of all connect to
lxplus
or
cmslpc
and go to some work directory. You can choose any directory, provided that you have enough space. You need ~100 MB of free disk space for this exercise. We recommend you to use your
~/scratch0 space. In case you don't have this (or do not even know what it is) check your quota typing
fs lq
and follow
this link
. If you don't have enough space, you may instead use the temporary space (
/tmp/your_user_name), but be aware that this is lost once you log out of lxplus (or within something like a day). We will expect in the following that you have such a
~/scratch0 directory.
ssh lxplus.cern.ch
[ ... enter password ... ]
Create a directory and local release area for this exercise (to avoid interference with code from the other exercises) and build CMSSW.
cd scratch0/
mkdir exercise07
cd exercise07
# build CMSSW plus additional packages
cmsrel CMSSW_7_4_1_patch4
cd CMSSW_7_4_1_patch4/src
cmsenv
# no special packages needed for this exercise
scram b -j 9
For all of the exercise, a bit of code is needed to setup all the variables. You can copy/paste it for the codebox below or download from
github
. The file should be named
fwlite_boilerplate.py
and must be put into your working directory for the exercises.
# import ROOT in batch mode
import ROOT
ROOT.gROOT.SetBatch(True) # (on lxplus, the X-connection is opened anyways)
# load FWLite C++ libraries
ROOT.gSystem.Load("libFWCoreFWLite.so");
ROOT.gSystem.Load("libDataFormatsFWLite.so");
ROOT.AutoLibraryLoader.enable()
# load FWlite python libraries
from DataFormats.FWLite import Handle, Events
# superimportant shortcuts!!
DeltaR = ROOT.Math.VectorUtil.DeltaR
DeltaPhi = ROOT.Math.VectorUtil.DeltaPhi
DeltaR2 = lambda a, b: DeltaR(a.p4(), b.p4()) # for reco::Candidates
DeltaPhi2 = lambda a, b: DeltaPhi(a.p4(), b.p4()) # for reco::Candidates
# work-around for a bug in root: currently "+" does not work on a LorenzVector type
def addLVs(a, b):
"""add two Lorenz-vectors. a and b should be of the same type"""
LV = type(a)
return LV(a.x()+b.x(), a.y()+b.y(), a.z()+b.z(), a.t() + b.t())
Exercises on FWLite
Exercise 7 a) Fun with the ipython shell
A lot of browsing and searching for member variables can be done in the ipython shell. Start the shell and load the basic fwlite names:
ipython -i fwlite_boilerplate.py
Now paste this bit of code into the shell to open the root file we want to look at.
events = Events("root://eoscms//eos/cms/store/relval/CMSSW_7_4_1/RelValTTbar_13/MINIAODSIM/PU25ns_MCRUN2_74_V9_gensim71X-v1/00000/72C84BA7-F9EC-E411-B875-002618FDA210.root")
# start loop to initialize more data members
for evt in events:
break
# 'events' itself now points to the first event
You may now look at the help text for the events variable:
help(events)
. Or you can go ahead and load the muon collection in order to answer the first question:
# checkout muon member functions
hndl_muons = Handle('vector<pat::Muon>')
events.getByLabel('slimmedMuons', hndl_muons)
muons = hndl_muons.product()
leading_muon = muons[0]
dir(leading_muon) # prepare for a huge output
Questions:
- (1) What's the name of the function to tell if a muon is a global muon? Is the leading muon a global muon?
- (2) What's the energy fraction of photons in the leading jet? And what is the fraction from charged hadrons?
Solutions:
- (1)
isGlobalMuon()
, yes
- (2) photon energy fraction: 0.21, charged hadron energy fraction: 0.75, code:
hndl_jets = Handle('vector<pat::Jet>')
events.getByLabel('slimmedJets', hndl_jets)
leading_jet = hndl_jets.product()[0]
filter(lambda s: 'EnergyFraction' in s, dir(leading_jet)) # gives you the names of the members that contain 'EnergyFraction'
leading_jet.photonEnergyFraction()
leading_jet.chargedHadronEnergyFraction()
Exercise 7 b): Some quick plots
In this exercise, your task is to produce some basic distributions with the given ttbar sample. It can be done in the ipython shell, but it's probably the best to have the shell open in order to try out commands while editing a script that should perform the job. The code from the lecture might be a good starting point for the tasks, save it in your working directory and run it with python or ipython:
from fwlite_boilerplate import *
# there's a bug in root that prevents you from adding four vectors
# hence, for the second exercise you will need this function
def addLVs(a, b):
LV = type(a)
return LV(a.x()+b.x(), a.y()+b.y(), a.z()+b.z(), a.t() + b.t())
events = Events("root://eoscms//eos/cms/store/relval/CMSSW_7_4_1/RelValTTbar_13/MINIAODSIM/PU25ns_MCRUN2_74_V9_gensim71X-v1/00000/72C84BA7-F9EC-E411-B875-002618FDA210.root")
hndl_muons = Handle('vector<pat::Muon>')
hist_muon_pt = ROOT.TH1D('muon_pt', ';leading muon: P_{T};events', 50, 0., 500.)
for evt in events:
evt.getByLabel('slimmedMuons', hndl_muons)
muons = hndl_muons.product()
if muons.size():
hist_muon_pt.Fill(muons[0].pt())
hist_muon_pt.SaveAs('leading_muon_pt.root')
Questions:
- (1) Plot the pt of all muons. What's the mean of the distribution?
- (2) Plot the invariant mass of the leading two muons. Do you see the Z-peak? Why (not)? What about this sample
?
- (3) This is a slightly harder one: Plot the DeltaR between the leading muon and closest jet in the eta-phi-plane. Where's the maximum?
Solutions:
- (1) mean (if bins unchanged): 8.361, code:
# change this:
if muons.size():
hist_muon_pt.Fill(muons[0].pt())
# to this:
for mu in muons:
hist_muon_pt.Fill(mu.pt())
- (2) no, you can't see a peak for the Z-boson, since the main sample here is a ttbar sample. Also bear in mind, that the slimmedMuons are the bare reconstructed particle flow muons without any quality criteria applied. But for the second sample you should see a nice peak, as this is a Z to mu mu sample. Here's the code:
# change this:
if muons.size():
hist_muon_pt.Fill(muons[0].pt())
# to this:
if muons.size() >= 2:
combined_LV = addLVs(muons[0].p4(), muons[1].p4())
hist_muon_pt.Fill(combined_LV.mass())
- (3) the maximum is in the first bin, close to 0. There are reasons for this: 1.) The muons are not isolated, but belong to the jet (as often the case for b-jets). 2.) By default, any muon is clustered into a jet. Therefore, no matter how well isolated, any muon can be found as well in the jet collection. Of course, in a real analysis, those jets would be suppressed by jet-id requirements. code:
# you need a handle for jets and a histogram with a different upper bound for x:
hndl_jets = Handle('vector<pat::Jet>')
h_dr_mu_jet = ROOT.TH1D(
"mu_vs_jet",
";#DeltaR(leading #mu, closest j); events",
50, 0., 2.5
)
# then change this:
if muons.size():
hist_muon_pt.Fill(muons[0].pt())
# to this:
if muons.size():
evt.getByLabel('slimmedJets', hndl_jets)
jets = hndl_jets.product()
leading_muon = muons[0]
closest_jet = min(jets, key=lambda j: DeltaR2(j, leading_muon))
h_dr_mu_jet.Fill(DeltaR2(closest_jet, leading_muon))
Exercises on MINIAOD
Accessing some MINIAOD objects is fairly easy. In this exercise we walk through two physics-related cases. Both should help you to get a feeling for what could be done as a small study within fwlite in python while practicing MINIAOD objects access. The general reference for MINIAOD is
WorkBookMiniAOD2015.
Exercise 7 c): Running a top-projection to clean jets
In the lecture, you saw a 2D-plot of
pt(leading muon) / pt(closest jet)
vs.
DeltaR(leading muon, closest jet)
. Many entries were found around (0, 1), hinting that there are many jets that consist of a muon alone. Your first task is to reproduce this plot. Here's a code-skeleton to get you started:
from fwlite_boilerplate import *
events = Events("root://eoscms//eos/cms/store/relval/CMSSW_7_4_1/RelValTTbar_13/MINIAODSIM/PU50ns_MCRUN2_74_V8_gensim_740pre7-v1/00000/7EC72BA9-44EC-E411-9DF3-0025905A60B0.root")
# define Handles here
h_mu_vs_jets = ROOT.TH2D(
"mu_vs_jets",
";#DeltaR(leading #mu, closest j); p_{T,#mu} / p_{T,j}",
50, 0., 2.5,
50, 0., 2.,
)
for evt in events:
# get collections here
muons =
if not muons.size():
continue
leading_mu = muons[0]
jets =
closest_jet =
h_mu_vs_jets.Fill( ... )
f = ROOT.TFile("plot_mu_vs_jets.root", "RECREATE")
h_mu_vs_jets.Write()
f.Close()
Under
WorkBookMiniAOD2015#Advanced_topics_re_clustering_ev (almost at the bottom of the page), you will find an example cmsRun configuration for event interpretation (EI, the second twisty box). Try to understand what the code does. Verify, that it uses top-projections to remove first pileup, then muons, then electrons from the particle-flow candidate and finally reruns the jet-creation. The new collection of pat-jets is called
patJetsAK4PFCHS
, which you need to specify later in your fwlite-script. Copy the config and run it (make sure to run over all events in the file, not just 10 events). Then modify your code to produce an new plot for the new jet collection.
Questions:
- (1) Could you reproduce the plot shown in the lecture?
- (2) Is the accumulation around (0, 1) gone with the new jet collection? What are these other muons, that are still at low deltaR?
Solutions:
from fwlite_boilerplate import *
events = Events("root://eoscms//eos/cms/store/relval/CMSSW_7_4_1/RelValTTbar_13/MINIAODSIM/PU50ns_MCRUN2_74_V8_gensim_740pre7-v1/00000/7EC72BA9-44EC-E411-9DF3-0025905A60B0.root")
hndl_muons = Handle("vector<pat::Muon>")
hndl_jets = Handle("vector<pat::Jet>")
h_mu_vs_jets = ROOT.TH2D(
"mu_vs_jets",
";#DeltaR(leading #mu, closest j); p_{T,#mu} / p_{T,j}",
50, 0., 2.5,
50, 0., 2.,
)
for evt in events:
evt.getByLabel("slimmedMuons", hndl_muons)
evt.getByLabel("slimmedJets", hndl_jets)
muons = hndl_muons.product()
if not muons.size():
continue
leading_mu = muons[0]
jets = hndl_jets.product()
closest_jet = min(jets, key=lambda j: DeltaR2(j, leading_mu))
h_mu_vs_jets.Fill(
DeltaR2(leading_mu, closest_jet),
leading_mu.pt() / closest_jet.pt()
)
f = ROOT.TFile("plot_mu_vs_jets_solution1.root", "RECREATE")
h_mu_vs_jets.Write()
f.Close()
- (2) Yes, the accumulation at (0, 1) should be not be there anymore with the new jet-collection. The other muons at low deltaR are probably muons in b-jets, as there is a large number of b-jets in ttbar events. code:
from fwlite_boilerplate import *
events = Events("test.root")
hndl_muons = Handle("vector<pat::Muon>")
hndl_jets = Handle("vector<pat::Jet>")
hndl_new_jets = Handle("vector<pat::Jet>")
h_mu_vs_jets = ROOT.TH2D(
"mu_vs_jets",
";#DeltaR(leading #mu, closest j); p_{T,#mu} / p_{T,j}",
50, 0., 2.5,
50, 0., 2.,
)
h_mu_vs_new_jets = ROOT.TH2D(
"mu_vs_new_jets",
";#DeltaR(leading #mu, closest j); p_{T,#mu} / p_{T,j}",
50, 0., 2.5,
50, 0., 2.,
)
for evt in events:
evt.getByLabel("slimmedMuons", hndl_muons)
evt.getByLabel("slimmedJets", hndl_jets)
evt.getByLabel("patJetsAK4PFCHS", hndl_new_jets)
muons = hndl_muons.product()
if not muons.size():
continue
leading_mu = muons[0]
jets = hndl_jets.product()
new_jets = hndl_new_jets.product()
closest_jet = min(jets, key=lambda j: DeltaR2(j, leading_mu))
closest_new_jet = min(new_jets, key=lambda j: DeltaR2(j, leading_mu))
h_mu_vs_jets.Fill(
DeltaR2(leading_mu, closest_jet),
leading_mu.pt() / closest_jet.pt()
)
h_mu_vs_new_jets.Fill(
DeltaR2(leading_mu, closest_new_jet),
leading_mu.pt() / closest_new_jet.pt()
)
f = ROOT.TFile("plot_mu_vs_jets_solution2.root", "RECREATE")
h_mu_vs_jets.Write()
h_mu_vs_new_jets.Write()
f.Close()
Exercise 7 d): Hot new tools for LHC Run 2: Simple W-tagging
In this exercise you'll define a very simple W-tagger and measure its performance within the ttbar sample. Of course, for a real W-tagger, one would need to do background studies on QCD events, but there is already enough background from accidentally overlapping jets in the ttbar sample.
In order to prepare for the for the performance measurement, we first define signal to be the jets that are matched to a generator W-boson, and background to be all unmatched jets. The first task is to simply count the number of signal and background jets. For the next tasks you need to access the substructure information of jets. Have a look at the jets-section in the MINIAOD workbook,
WorkBookMiniAOD2015#Jets, where you can find a twisty box with example code at the bottom of the section. You will find the names of the user-floats that you need for this exercise. After that, please have a look at the code skeleton below, and try to complete it in order to answer the first question:
from fwlite_boilerplate import *
events = Events("test.root")
#define handles here
for evt in events:
# get collections here
gen_particles =
ak8jets =
if not ak8jets.size():
continue
# make a list of generator W particles using this criterion:
# (abs(gen_particle.pdgId()) == 24 and 20 < gen_particle.status() < 30)
gen_ws = list( ... )
# this is a list of lists:
# [[jet1, w1], [jet2, w2], ...]
matched_pairs = list(
(j, w)
for j in ak8jets
for w in gen_ws
if DeltaR2(w, j) < 0.2
)
# this code would produce the same output and is maybe closer to what you've seen before:
# matched_pairs = []
# for j in ak8jets:
# for w in gen_ws:
# if DeltaR2(w, j) < 0.2:
# j_w_pair = (j, w)
# matched_pairs.append(j_w_pair)
matched_ak8jets = list( ... )
unmatched_ak8jets = list( ... )
print 'n_matched:', ...
print 'n_unmatched:', ...
Questions:
- (1) How many signal and background jet's do you count?
- (2) Make a plot of the N-subjettiness ratio tau2/tau1 for both, signal and background. Do you think it has a good separation power?
- (3) Write a W-tagger function according to the example in WorkBookMiniAOD2015#Jets. It should take a jet as an argument and return a bool. Evaluate your tagger: What are the efficiency and mis-tag rates with the given signal and background definitions?
- (4) BONUS: Improve your tagger. You could plot all jet-masses and see which one would perform best. What are your achievements?
The ttbar file you are using has only 3800 events. More events would help, if you want to investigate your tagger in more detail. Here is a
dataset
, that has around 50000 events. You can add all the files as list in the constructor of
Events
.
Solutions:
- (1) signal: 278, background: 596, code:
from fwlite_boilerplate import *
events = Events("test.root")
hndl_gp = Handle("vector<reco::GenParticle>")
hndl_jets = Handle("vector<pat::Jet>")
n_matched = 0
n_unmatched = 0
for evt in events:
evt.getByLabel("prunedGenParticles", hndl_gp)
evt.getByLabel("slimmedJetsAK8", hndl_jets)
gen_particles = hndl_gp.product()
ak8jets = hndl_jets.product()
if not ak8jets.size():
continue
gen_ws = list(
p
for p in gen_particles
if abs(p.pdgId()) == 24 and 20 < p.status() < 30
)
matched_pairs = list(
(j, w)
for j in ak8jets
for w in gen_ws
if DeltaR2(w, j) < 0.2
)
matched_ak8jets = list(
j
for j, w in matched_pairs
)
unmatched_ak8jets = list(
j
for j in ak8jets
if j not in matched_ak8jets
)
n_matched += len(matched_ak8jets)
n_unmatched += len(unmatched_ak8jets)
print n_matched
print n_unmatched
- (2) The shapes of signal and background are a distinct. But since the background is taken from ttbar, there might be a good fraction of combinatorial background of fat jets that are made up from two smaller jets. If you'd take a QCD sample, the background would be pushed to larger values of tau2/tau1. code:
from fwlite_boilerplate import *
events = Events("test.root")
hndl_gp = Handle("vector<reco::GenParticle>")
hndl_jets = Handle("vector<pat::Jet>")
h_sig_taus = ROOT.TH1D(
"sig_taus",
";tau2 / tau1;number of jets",
50, 0., 5.0
)
h_bkg_taus = ROOT.TH1D(
"bkg_taus",
";tau2 / tau1;number of jets",
50, 0., 5.0
)
for evt in events:
evt.getByLabel("prunedGenParticles", hndl_gp)
evt.getByLabel("slimmedJetsAK8", hndl_jets)
gen_particles = hndl_gp.product()
ak8jets = hndl_jets.product()
if not ak8jets.size():
continue
gen_ws = list(
p
for p in gen_particles
if abs(p.pdgId()) == 24 and 20 < p.status() < 30
)
matched_pairs = list(
(j, w)
for j in ak8jets
for w in gen_ws
if DeltaR2(w, j) < 0.2
)
matched_ak8jets = list(
j
for j, w in matched_pairs
)
unmatched_ak8jets = list(
j
for j in ak8jets
if j not in matched_ak8jets
)
for j in matched_ak8jets:
h_sig_taus.Fill(
j.userFloat("NjettinessAK8:tau2")
/ j.userFloat("NjettinessAK8:tau1")
)
for j in unmatched_ak8jets:
h_bkg_taus.Fill(
j.userFloat("NjettinessAK8:tau2")
/ j.userFloat("NjettinessAK8:tau1")
)
f = ROOT.TFile("plot_w_tagging_check_nsubjettiness.root", "RECREATE")
h_sig_taus.Write()
h_bkg_taus.Write()
f.Close()
- (3) with tau2/tau1 < 0.6 and softdrop mass > 50., the efficiency is 0.65 and the mis-tag rate 0.45. code:
from fwlite_boilerplate import *
def my_w_tagger(jet):
return (
jet.userFloat("NjettinessAK8:tau2")
/ jet.userFloat("NjettinessAK8:tau1") < 0.6
and jet.userFloat("ak8PFJetsCHSSoftDropMass") > 50.0
)
events = Events("test.root")
hndl_gp = Handle("vector<reco::GenParticle>")
hndl_jets = Handle("vector<pat::Jet>")
n_matched = 0.
n_unmatched = 0.
n_matched_tagged = 0.
n_unmatched_tagged = 0.
for evt in events:
evt.getByLabel("prunedGenParticles", hndl_gp)
evt.getByLabel("slimmedJetsAK8", hndl_jets)
gen_particles = hndl_gp.product()
ak8jets = hndl_jets.product()
if not ak8jets.size():
continue
gen_ws = list(
p
for p in gen_particles
if abs(p.pdgId()) == 24 and 20 < p.status() < 30
)
matched_pairs = list(
(j, w)
for j in ak8jets
for w in gen_ws
if DeltaR2(w, j) < 0.2
)
matched_ak8jets = list(
j
for j, w in matched_pairs
)
unmatched_ak8jets = list(
j
for j in ak8jets
if j not in matched_ak8jets
)
n_matched += len(matched_ak8jets)
n_unmatched += len(unmatched_ak8jets)
n_matched_tagged += sum(1. for j in matched_ak8jets if my_w_tagger(j))
n_unmatched_tagged += sum(1. for j in unmatched_ak8jets if my_w_tagger(j))
eff = n_matched_tagged / n_matched
mis = n_unmatched_tagged / n_unmatched
print 'eff:', eff
print 'mis:', mis
--
HeinerTholen - 2015-06-29