PAT Exercise 07: FWLite and MINIAOD Tutorial

Goal of this exercise

FWLite is a good tool to make brief studies and helps alongside code development. The first block of exercises makes you familiar with ad-hoc access of edm-files from the ipython shell. And it also shows how to discover functionality of physics candidate objects.

For LHC Run2, starting Mid 2015, MINIAOD is the new official CMS data format for most analyses. The second block of exercises contains two simple analysis tasks in order to demonstrate how physics objects look and how they are accessed.

ALERT! Note:

This web course is part of the PAT Tutorial, which takes regularly place at cern and in other places. When following the PAT Tutorial the answers of questions marked in RED should be filled into the exercise form that has been introduced at the beginning of the tutorial. Also the solutions to the Exercises should be filled into the form. The exercises are marked in three colours, indicating whether this exercise is basic (obligatory), continuative (recommended) or optional (free). The colour coding is summarized in the table below:

Color Code Explanation
red Basic exercise, which is obligatory for the PAT Tutorial.
yellow Continuative exercise, which is recommended for the PAT Tutorial to deepen what has been learned.
green Optional exercise, which shows interesting applications of what has been learned.

Basic exercises ( red ) are obliged and the solutions to the exercises should be filled into the exercise form during the PAT Tutorial.

Setting up of the environment

First of all connect to lxplus or cmslpc and go to some work directory. You can choose any directory, provided that you have enough space. You need ~100 MB of free disk space for this exercise. We recommend you to use your ~/scratch0 space. In case you don't have this (or do not even know what it is) check your quota typing fs lq and follow this link. If you don't have enough space, you may instead use the temporary space ( /tmp/your_user_name), but be aware that this is lost once you log out of lxplus (or within something like a day). We will expect in the following that you have such a ~/scratch0 directory.

ssh lxplus.cern.ch
[ ... enter password ... ]

Create a directory and local release area for this exercise (to avoid interference with code from the other exercises) and build CMSSW.

cd scratch0/
mkdir exercise07
cd exercise07

# build CMSSW plus additional packages
cmsrel CMSSW_7_4_1_patch4
cd CMSSW_7_4_1_patch4/src 
cmsenv
# no special packages needed for this exercise
scram b -j 9

For all of the exercise, a bit of code is needed to setup all the variables. You can copy/paste it for the codebox below or download from github. The file should be named fwlite_boilerplate.py and must be put into your working directory for the exercises.

# import ROOT in batch mode
import ROOT
ROOT.gROOT.SetBatch(True)  # (on lxplus, the X-connection is opened anyways) 
 
# load FWLite C++ libraries
ROOT.gSystem.Load("libFWCoreFWLite.so");
ROOT.gSystem.Load("libDataFormatsFWLite.so");
ROOT.AutoLibraryLoader.enable()
 
# load FWlite python libraries
from DataFormats.FWLite import Handle, Events
 
# superimportant shortcuts!!
DeltaR = ROOT.Math.VectorUtil.DeltaR
DeltaPhi = ROOT.Math.VectorUtil.DeltaPhi
DeltaR2 = lambda a, b: DeltaR(a.p4(), b.p4())  # for reco::Candidates
DeltaPhi2 = lambda a, b: DeltaPhi(a.p4(), b.p4())  # for reco::Candidates

# work-around for a bug in root: currently "+" does not work on a LorenzVector type
def addLVs(a, b):
    """add two Lorenz-vectors. a and b should be of the same type"""
    LV = type(a)
    return LV(a.x()+b.x(), a.y()+b.y(), a.z()+b.z(), a.t() + b.t())

Exercises on FWLite

red Exercise 7 a) Fun with the ipython shell

A lot of browsing and searching for member variables can be done in the ipython shell. Start the shell and load the basic fwlite names:

ipython -i fwlite_boilerplate.py

Now paste this bit of code into the shell to open the root file we want to look at.

events = Events("root://eoscms//eos/cms/store/relval/CMSSW_7_4_1/RelValTTbar_13/MINIAODSIM/PU25ns_MCRUN2_74_V9_gensim71X-v1/00000/72C84BA7-F9EC-E411-B875-002618FDA210.root")

# start loop to initialize more data members 
for evt in events:
    break
# 'events' itself now points to the first event

You may now look at the help text for the events variable: help(events). Or you can go ahead and load the muon collection in order to answer the first question:

# checkout muon member functions
hndl_muons = Handle('vector<pat::Muon>')
events.getByLabel('slimmedMuons', hndl_muons)
muons = hndl_muons.product()
leading_muon = muons[0]
dir(leading_muon)  # prepare for a huge output

Question Questions:

  • (1) What's the name of the function to tell if a muon is a global muon? Is the leading muon a global muon?
  • (2) What's the energy fraction of photons in the leading jet? And what is the fraction from charged hadrons?

Solutions:
  • (1) isGlobalMuon(), yes
  • (2) photon energy fraction: 0.21, charged hadron energy fraction: 0.75, code:
hndl_jets = Handle('vector<pat::Jet>')
events.getByLabel('slimmedJets', hndl_jets)
leading_jet = hndl_jets.product()[0]
filter(lambda s: 'EnergyFraction' in s, dir(leading_jet))  # gives you the names of the  members that contain 'EnergyFraction'
leading_jet.photonEnergyFraction()
leading_jet.chargedHadronEnergyFraction()

yellow Exercise 7 b): Some quick plots

In this exercise, your task is to produce some basic distributions with the given ttbar sample. It can be done in the ipython shell, but it's probably the best to have the shell open in order to try out commands while editing a script that should perform the job. The code from the lecture might be a good starting point for the tasks, save it in your working directory and run it with python or ipython:

from fwlite_boilerplate import *

# there's a bug in root that prevents you from adding four vectors
# hence, for the second exercise you will need this function
def addLVs(a, b):
    LV = type(a)
    return LV(a.x()+b.x(), a.y()+b.y(), a.z()+b.z(), a.t() + b.t())

events = Events("root://eoscms//eos/cms/store/relval/CMSSW_7_4_1/RelValTTbar_13/MINIAODSIM/PU25ns_MCRUN2_74_V9_gensim71X-v1/00000/72C84BA7-F9EC-E411-B875-002618FDA210.root")
hndl_muons = Handle('vector<pat::Muon>')
hist_muon_pt = ROOT.TH1D('muon_pt', ';leading muon: P_{T};events', 50, 0., 500.)

for evt in events:
    evt.getByLabel('slimmedMuons', hndl_muons)
    muons = hndl_muons.product()
    if muons.size():
        hist_muon_pt.Fill(muons[0].pt())

hist_muon_pt.SaveAs('leading_muon_pt.root')

Question Questions:

  • (1) Plot the pt of all muons. What's the mean of the distribution?
  • (2) Plot the invariant mass of the leading two muons. Do you see the Z-peak? Why (not)? What about this sample?
  • (3) This is a slightly harder one: Plot the DeltaR between the leading muon and closest jet in the eta-phi-plane. Where's the maximum?

Solutions:
  • (1) mean (if bins unchanged): 8.361, code:
# change this:
    if muons.size():
        hist_muon_pt.Fill(muons[0].pt())

# to this:
    for mu in muons:
        hist_muon_pt.Fill(mu.pt())

  • (2) no, you can't see a peak for the Z-boson, since the main sample here is a ttbar sample. Also bear in mind, that the slimmedMuons are the bare reconstructed particle flow muons without any quality criteria applied. But for the second sample you should see a nice peak, as this is a Z to mu mu sample. Here's the code:
# change this:
    if muons.size():
        hist_muon_pt.Fill(muons[0].pt())

# to this:
    if muons.size() >= 2:
        combined_LV = addLVs(muons[0].p4(), muons[1].p4())
        hist_muon_pt.Fill(combined_LV.mass())
  • (3) the maximum is in the first bin, close to 0. There are reasons for this: 1.) The muons are not isolated, but belong to the jet (as often the case for b-jets). 2.) By default, any muon is clustered into a jet. Therefore, no matter how well isolated, any muon can be found as well in the jet collection. Of course, in a real analysis, those jets would be suppressed by jet-id requirements. code:
# you need a handle for jets and a histogram with a different upper bound for x:
hndl_jets = Handle('vector<pat::Jet>')
h_dr_mu_jet = ROOT.TH1D(
    "mu_vs_jet",
    ";#DeltaR(leading #mu, closest j); events",
    50, 0., 2.5
)

# then change this:
    if muons.size():
        hist_muon_pt.Fill(muons[0].pt())

# to this:
    if muons.size():
        evt.getByLabel('slimmedJets', hndl_jets)
        jets = hndl_jets.product()
        leading_muon = muons[0]
        closest_jet = min(jets, key=lambda j: DeltaR2(j, leading_muon))
        h_dr_mu_jet.Fill(DeltaR2(closest_jet, leading_muon))

Exercises on MINIAOD

Accessing some MINIAOD objects is fairly easy. In this exercise we walk through two physics-related cases. Both should help you to get a feeling for what could be done as a small study within fwlite in python while practicing MINIAOD objects access. The general reference for MINIAOD is WorkBookMiniAOD2015.

red Exercise 7 c): Running a top-projection to clean jets

In the lecture, you saw a 2D-plot of pt(leading muon) / pt(closest jet) vs. DeltaR(leading muon, closest jet). Many entries were found around (0, 1), hinting that there are many jets that consist of a muon alone. Your first task is to reproduce this plot. Here's a code-skeleton to get you started:

from fwlite_boilerplate import *

events = Events("root://eoscms//eos/cms/store/relval/CMSSW_7_4_1/RelValTTbar_13/MINIAODSIM/PU50ns_MCRUN2_74_V8_gensim_740pre7-v1/00000/7EC72BA9-44EC-E411-9DF3-0025905A60B0.root")

# define Handles here

h_mu_vs_jets = ROOT.TH2D(
    "mu_vs_jets",
    ";#DeltaR(leading #mu, closest j); p_{T,#mu} / p_{T,j}",
    50, 0., 2.5,
    50, 0., 2.,
)

for evt in events:
    # get collections here

    muons =
    if not muons.size():
        continue
    leading_mu = muons[0]
    jets =

    closest_jet = 

    h_mu_vs_jets.Fill( ... )

f = ROOT.TFile("plot_mu_vs_jets.root", "RECREATE")
h_mu_vs_jets.Write()
f.Close()

Under WorkBookMiniAOD2015#Advanced_topics_re_clustering_ev (almost at the bottom of the page), you will find an example cmsRun configuration for event interpretation (EI, the second twisty box). Try to understand what the code does. Verify, that it uses top-projections to remove first pileup, then muons, then electrons from the particle-flow candidate and finally reruns the jet-creation. The new collection of pat-jets is called patJetsAK4PFCHS, which you need to specify later in your fwlite-script. Copy the config and run it (make sure to run over all events in the file, not just 10 events). Then modify your code to produce an new plot for the new jet collection.

Question Questions:

  • (1) Could you reproduce the plot shown in the lecture?
  • (2) Is the accumulation around (0, 1) gone with the new jet collection? What are these other muons, that are still at low deltaR?

Solutions:
  • (1) code:
from fwlite_boilerplate import *

events = Events("root://eoscms//eos/cms/store/relval/CMSSW_7_4_1/RelValTTbar_13/MINIAODSIM/PU50ns_MCRUN2_74_V8_gensim_740pre7-v1/00000/7EC72BA9-44EC-E411-9DF3-0025905A60B0.root")

hndl_muons = Handle("vector<pat::Muon>")
hndl_jets = Handle("vector<pat::Jet>")

h_mu_vs_jets = ROOT.TH2D(
    "mu_vs_jets",
    ";#DeltaR(leading #mu, closest j); p_{T,#mu} / p_{T,j}",
    50, 0., 2.5,
    50, 0., 2.,
)

for evt in events:
    evt.getByLabel("slimmedMuons", hndl_muons)
    evt.getByLabel("slimmedJets", hndl_jets)
    muons = hndl_muons.product()
    if not muons.size():
        continue
    leading_mu = muons[0]
    jets = hndl_jets.product()

    closest_jet = min(jets, key=lambda j: DeltaR2(j, leading_mu))

    h_mu_vs_jets.Fill(
        DeltaR2(leading_mu, closest_jet),
        leading_mu.pt() / closest_jet.pt()
    )

f = ROOT.TFile("plot_mu_vs_jets_solution1.root", "RECREATE")
h_mu_vs_jets.Write()
f.Close()
  • (2) Yes, the accumulation at (0, 1) should be not be there anymore with the new jet-collection. The other muons at low deltaR are probably muons in b-jets, as there is a large number of b-jets in ttbar events. code:
from fwlite_boilerplate import *

events = Events("test.root")

hndl_muons = Handle("vector<pat::Muon>")
hndl_jets = Handle("vector<pat::Jet>")
hndl_new_jets = Handle("vector<pat::Jet>")

h_mu_vs_jets = ROOT.TH2D(
    "mu_vs_jets",
    ";#DeltaR(leading #mu, closest j); p_{T,#mu} / p_{T,j}",
    50, 0., 2.5,
    50, 0., 2.,
)
h_mu_vs_new_jets = ROOT.TH2D(
    "mu_vs_new_jets",
    ";#DeltaR(leading #mu, closest j); p_{T,#mu} / p_{T,j}",
    50, 0., 2.5,
    50, 0., 2.,
)

for evt in events:
    evt.getByLabel("slimmedMuons", hndl_muons)
    evt.getByLabel("slimmedJets", hndl_jets)
    evt.getByLabel("patJetsAK4PFCHS", hndl_new_jets)
    muons = hndl_muons.product()
    if not muons.size():
        continue
    leading_mu = muons[0]
    jets = hndl_jets.product()
    new_jets = hndl_new_jets.product()

    closest_jet = min(jets, key=lambda j: DeltaR2(j, leading_mu))
    closest_new_jet = min(new_jets, key=lambda j: DeltaR2(j, leading_mu))

    h_mu_vs_jets.Fill(
        DeltaR2(leading_mu, closest_jet),
        leading_mu.pt() / closest_jet.pt()
    )
    h_mu_vs_new_jets.Fill(
        DeltaR2(leading_mu, closest_new_jet),
        leading_mu.pt() / closest_new_jet.pt()
    )

f = ROOT.TFile("plot_mu_vs_jets_solution2.root", "RECREATE")
h_mu_vs_jets.Write()
h_mu_vs_new_jets.Write()
f.Close()

green Exercise 7 d): Hot new tools for LHC Run 2: Simple W-tagging

In this exercise you'll define a very simple W-tagger and measure its performance within the ttbar sample. Of course, for a real W-tagger, one would need to do background studies on QCD events, but there is already enough background from accidentally overlapping jets in the ttbar sample.

In order to prepare for the for the performance measurement, we first define signal to be the jets that are matched to a generator W-boson, and background to be all unmatched jets. The first task is to simply count the number of signal and background jets. For the next tasks you need to access the substructure information of jets. Have a look at the jets-section in the MINIAOD workbook, WorkBookMiniAOD2015#Jets, where you can find a twisty box with example code at the bottom of the section. You will find the names of the user-floats that you need for this exercise. After that, please have a look at the code skeleton below, and try to complete it in order to answer the first question:

from fwlite_boilerplate import *

events = Events("test.root")

#define handles here

for evt in events:
    # get collections here
    gen_particles =
    ak8jets =
    if not ak8jets.size():
        continue

    # make a list of generator W particles using this criterion:
    # (abs(gen_particle.pdgId()) == 24 and 20 < gen_particle.status() < 30)
    gen_ws = list( ... )

    # this is a list of lists:
    # [[jet1, w1], [jet2, w2], ...]
    matched_pairs = list(
        (j, w)
        for j in ak8jets
        for w in gen_ws
        if DeltaR2(w, j) < 0.2
    )
    # this code would produce the same output and is maybe closer to what you've seen before:
    # matched_pairs = []
    # for j in ak8jets:
    #     for w in gen_ws:
    #         if DeltaR2(w, j) < 0.2:
    #             j_w_pair = (j, w)
    #             matched_pairs.append(j_w_pair)

    matched_ak8jets = list( ... )

    unmatched_ak8jets = list( ... )

print 'n_matched:', ...
print 'n_unmatched:', ...

Question Questions:

  • (1) How many signal and background jet's do you count?
  • (2) Make a plot of the N-subjettiness ratio tau2/tau1 for both, signal and background. Do you think it has a good separation power?
  • (3) Write a W-tagger function according to the example in WorkBookMiniAOD2015#Jets. It should take a jet as an argument and return a bool. Evaluate your tagger: What are the efficiency and mis-tag rates with the given signal and background definitions?
  • (4) BONUS: Improve your tagger. You could plot all jet-masses and see which one would perform best. What are your achievements?

The ttbar file you are using has only 3800 events. More events would help, if you want to investigate your tagger in more detail. Here is a dataset, that has around 50000 events. You can add all the files as list in the constructor of Events.

Solutions:
  • (1) signal: 278, background: 596, code:
from fwlite_boilerplate import *

events = Events("test.root")

hndl_gp = Handle("vector<reco::GenParticle>")
hndl_jets = Handle("vector<pat::Jet>")

n_matched = 0
n_unmatched = 0

for evt in events:
    evt.getByLabel("prunedGenParticles", hndl_gp)
    evt.getByLabel("slimmedJetsAK8", hndl_jets)
    gen_particles = hndl_gp.product()
    ak8jets = hndl_jets.product()
    if not ak8jets.size():
        continue

    gen_ws = list(
        p
        for p in gen_particles
        if abs(p.pdgId()) == 24 and 20 < p.status() < 30
    )

    matched_pairs = list(
        (j, w)
        for j in ak8jets
        for w in gen_ws
        if DeltaR2(w, j) < 0.2
    )

    matched_ak8jets = list(
        j
        for j, w in matched_pairs
    )

    unmatched_ak8jets = list(
        j
        for j in ak8jets
        if j not in matched_ak8jets
    )

    n_matched += len(matched_ak8jets)
    n_unmatched += len(unmatched_ak8jets)

print n_matched 
print n_unmatched

  • (2) The shapes of signal and background are a distinct. But since the background is taken from ttbar, there might be a good fraction of combinatorial background of fat jets that are made up from two smaller jets. If you'd take a QCD sample, the background would be pushed to larger values of tau2/tau1. code:
from fwlite_boilerplate import *

events = Events("test.root")

hndl_gp = Handle("vector<reco::GenParticle>")
hndl_jets = Handle("vector<pat::Jet>")

h_sig_taus = ROOT.TH1D(
    "sig_taus",
    ";tau2 / tau1;number of jets",
    50, 0., 5.0
)
h_bkg_taus = ROOT.TH1D(
    "bkg_taus",
    ";tau2 / tau1;number of jets",
    50, 0., 5.0
)

for evt in events:
    evt.getByLabel("prunedGenParticles", hndl_gp)
    evt.getByLabel("slimmedJetsAK8", hndl_jets)
    gen_particles = hndl_gp.product()
    ak8jets = hndl_jets.product()
    if not ak8jets.size():
        continue

    gen_ws = list(
        p
        for p in gen_particles
        if abs(p.pdgId()) == 24 and 20 < p.status() < 30
    )

    matched_pairs = list(
        (j, w)
        for j in ak8jets
        for w in gen_ws
        if DeltaR2(w, j) < 0.2
    )

    matched_ak8jets = list(
        j
        for j, w in matched_pairs
    )

    unmatched_ak8jets = list(
        j
        for j in ak8jets
        if j not in matched_ak8jets
    )

    for j in matched_ak8jets:
        h_sig_taus.Fill(
            j.userFloat("NjettinessAK8:tau2")
            / j.userFloat("NjettinessAK8:tau1")
        )

    for j in unmatched_ak8jets:
        h_bkg_taus.Fill(
            j.userFloat("NjettinessAK8:tau2")
            / j.userFloat("NjettinessAK8:tau1")
        )

f = ROOT.TFile("plot_w_tagging_check_nsubjettiness.root", "RECREATE")
h_sig_taus.Write()
h_bkg_taus.Write()
f.Close()

  • (3) with tau2/tau1 < 0.6 and softdrop mass > 50., the efficiency is 0.65 and the mis-tag rate 0.45. code:
from fwlite_boilerplate import *

def my_w_tagger(jet):
    return (
        jet.userFloat("NjettinessAK8:tau2")
        / jet.userFloat("NjettinessAK8:tau1") < 0.6
        and jet.userFloat("ak8PFJetsCHSSoftDropMass") > 50.0
    )

events = Events("test.root")

hndl_gp = Handle("vector<reco::GenParticle>")
hndl_jets = Handle("vector<pat::Jet>")

n_matched = 0.
n_unmatched = 0.
n_matched_tagged = 0.
n_unmatched_tagged = 0.

for evt in events:
    evt.getByLabel("prunedGenParticles", hndl_gp)
    evt.getByLabel("slimmedJetsAK8", hndl_jets)
    gen_particles = hndl_gp.product()
    ak8jets = hndl_jets.product()
    if not ak8jets.size():
        continue

    gen_ws = list(
        p
        for p in gen_particles
        if abs(p.pdgId()) == 24 and 20 < p.status() < 30
    )

    matched_pairs = list(
        (j, w)
        for j in ak8jets
        for w in gen_ws
        if DeltaR2(w, j) < 0.2
    )

    matched_ak8jets = list(
        j
        for j, w in matched_pairs
    )

    unmatched_ak8jets = list(
        j
        for j in ak8jets
        if j not in matched_ak8jets
    )

    n_matched += len(matched_ak8jets)
    n_unmatched += len(unmatched_ak8jets)

    n_matched_tagged += sum(1. for j in matched_ak8jets if my_w_tagger(j))
    n_unmatched_tagged += sum(1. for j in unmatched_ak8jets if my_w_tagger(j))

eff = n_matched_tagged / n_matched
mis = n_unmatched_tagged / n_unmatched

print 'eff:', eff
print 'mis:', mis

-- HeinerTholen - 2015-06-29

Edit | Attach | Watch | Print version | History: r10 < r9 < r8 < r7 < r6 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r10 - 2015-07-01 - HeinerTholen
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback