JUMPSHOT 2019: MET scale and resolution measurement


The Goal of This Exercise

  • Understand fundamentals of MET and its reconstruction
  • Know how to access the MET in the CMSSW
  • Understand the concept and techniques to measure MET scale and resolution
  • Have the set of tools to do MET resolution measurements

Introduction to MET

Introduction slides: link

Missing energy refers to the energy carried by weakly interacting particles such as neutrinos that is not detected by the detector, and results into energy or momentum inbalance due to conservation of energy and momentum. In hadron colliders the total initial energy cannot be determined because of the varying energy carried by partons, however, the initial momentum should be zero. Thus non-zero final momentum indicates the existence of missing energy. The term Missing transverse energy (MET) denotes the missing energy inferred from the imbalance of transverse momentum.

Precise measurements of MET is crucial in standard model measurements that involve neutrinos in the final states and beyond standard model searches that predicts new weakly interacting particles. Yet MET is sensitive to nearly all types of detector resolution, mismeasurements, and inefficiencies. The contribution from other events in the same bunch crossing is also non-negligible. The measurement of MET resolution is thus very important in interpreting the MET measurement.

Measure MET resolution using recoil variable

Generally resolutions are measured (in a Monte-Carlo driven method) by comparing simulated detector response with true (generator level) value. Here we introduce a meathod using Z events, and compare a so-called recoil variable with Z boson \( p_{T} \).

Defining \( \bm{u} \) and \( \bm{q}_{T} \)

For these exercises, you need

Miscellaneous Notes

The color scheme of the Exercise is as follows:

  • Commands will be embedded in a grey box:
    cmsRun testRun_cfg.py
  • Output and screen printouts will be embedded in a green box:
    Begin processing the 3506th record. Run 1, Event 1270974, LumiSection 6358 at 16-Dec-2015 21:45:08.969 CST 

Getting started

Prerequisites

For these exercises, you need to bring your laptop registered to FNAL network link, LPC account link, required to have a valid grid certificate link.

Computing Environment at FNAL LPC

This exercise uses the LPC Cluster cmslpc-sl6.fnal.gov. After logging into cmslpc-sl6.fnal.gov, you need to set up the CMS environment by sourcing one of two files depending on the shell that you are using.

source /cvmfs/cms.cern.ch/cmsset_default.(c)sh

CMSSW Environment

You need to move to a directory in which you would like to practice this exercise.

mkdir -p ~/your/work/dir
cd ~/your/work/dir

Please replace ~/your/work/dir with a path to the directory of your choice.

Then, you can check out a CMSSW release. This exercise uses CMSSW_10_2_5.

cmsrel CMSSW_10_2_5

Move to the src directory and enter the CMSSW runtime environment:

cd CMSSW_10_2_5/src
cmsenv

The dataset used for our measurement

The dataset we will be using is "ZMM_13TeV_TuneCP5-pythia8/RunIIAutumn18MiniAOD-SNB_102X_upgrade2018_realistic_v15-v2/MINIAODSIM", you can check the files contained in this dataset by typing:

dasgoclient -query="file dataset=/ZMM_13TeV_TuneCP5-pythia8/RunIIAutumn18MiniAOD-SNB_102X_upgrade2018_realistic_v15-v2/MINIAODSIM"

You will see a list of files printed. The next step is to check the collections stored in the file. Copy one filename from what just printed out, and type:

edmDumpEventContent filename.root

Replace filename.root with the correct filename from previous step. Then from the output look for the following lines:

vector<pat::MET>                      "slimmedMETs"               ""                "PAT"     
vector<pat::MET>                      "slimmedMETsPuppi"          ""                "PAT"     
vector<pat::Muon>                     "slimmedMuons"              ""                "PAT"

Where slimmedMETs is the standard MET to use in MINIAOD format data, calculated using Particle Flow algorithm plus charged hadron substraction. slimmedMETs is similar but using PUPPI algorithm on top of Particle Flow algorithm to better mitigate pileup. We also need the muon collection slimmedMuons here to recontruct the Z bosons.

You can find the detailed class reference to pat::MET and pat::Muon .

Creating a output TTree

For easier playing around with the data, we would want to create some flat root ntuples with only variables that we'll use. Let's define a tree structure as following and save into a met_tree.py:

import ROOT
import numpy as np

        #for assessing pileup
npv             = n.zeros(1,dtype=int)
rhoall   = n.zeros(1,dtype=float)

        #met information
met     = n.zeros(1,dtype=float)
genmet = n.zeros(1,dtype=float)
rawmet = n.zeros(1,dtype=float)
        #puppi met information
pmet    = n.zeros(1,dtype=float)
genpmet = n.zeros(1,dtype=float)
rawpmet = n.zeros(1,dtype=float)
        #for ZMM sample; met resolution:
qt              = n.zeros(1,dtype=float) # i.e. Z pt
q_eta   = n.zeros(1,dtype=float)
u_pll_pf = n.zeros(1,dtype=float) #u parallel
u_prp_pf = n.zeros(1,dtype=float) #u perpendicular
u_pll_puppi = n.zeros(1,dtype=float) #similar but for puppi mets
u_prp_puppi = n.zeros(1,dtype=float)

def newEventTree():
        t = ROOT.TTree("events","events")
        t.Branch("npv", npv, 'npv/I')
        t.Branch("rhoall", rhoall, 'rhoall/D')
        t.Branch("met", met, 'met/D')
        t.Branch("genmet", genmet, 'genmet/D')
        t.Branch("rawmet", rawmet, 'rawmet/D')
        t.Branch("pmet", pmet, 'pmet/D')
        t.Branch("genpmet", genpmet, 'genpmet/D')
        t.Branch("rawpmet", rawpmet, 'rawpmet/D')
        t.Branch("qt"             , qt            , 'qt/D')
        t.Branch("q_eta"           , q_eta         , 'q_eta/D')
        t.Branch("u_pll_pf"     , u_pll_pf      , 'u_pll_pf/D')
        t.Branch("u_prp_pf"     , u_prp_pf      , 'u_prp_pf/D')
        t.Branch("u_pll_puppi" , u_pll_puppi , 'u_pll_puppi/D')
        t.Branch("u_prp_puppi" , u_prp_puppi , 'u_prp_puppi/D')
        return t

def setBranchAddresses(t):
        t.SetBranchAddress("npv"                    , npv                    )
        t.SetBranchAddress("rhoall"                 , rhoall                 )
        t.SetBranchAddress("met"                    , met                    )
        t.SetBranchAddress("genmet"                 , genmet                 )
        t.SetBranchAddress("rawmet"                 , rawmet                 )
        t.SetBranchAddress("pmet"                   , pmet                   )
        t.SetBranchAddress("genpmet"                , genpmet                )
        t.SetBranchAddress("rawpmet"                , rawpmet                )
        t.SetBranchAddress("qt"                     , qt                     )
        t.SetBranchAddress("q_eta"                  , q_eta                  )
        t.SetBranchAddress("u_pll_pf"               , u_pll_pf               )
        t.SetBranchAddress("u_prp_pf"               , u_prp_pf               )
        t.SetBranchAddress("u_pll_puppi"            , u_pll_puppi            )
        t.SetBranchAddress("u_prp_puppi"            , u_prp_puppi            )

In this file I first defined variables that are going to be linked to tree branches. Then the function newEventTree() returns a TTree with all branch set up. We are going to store information into this tree and save into an output file. The function setBranchAddresses(t) is used when you open a tree that's saved by function newEventTree() and need to set its branch addresses.

Processing MINIAOD files

In this secion I'll split code into several blocks of code and explain them on the way. In the end you should copy and paste all of them into a single file. You can name it as met_analyzer.py.

The following code loads needed modules and open files as specified in files.txt. Don't look at them for now, not interesting.

# import ROOT in batch mode
import sys
import time
oldargv = sys.argv[:]
sys.argv = [ '-b-' ]
import ROOT
ROOT.gROOT.SetBatch(True)
sys.argv = oldargv

from FWCore.ParameterSet.VarParsing import VarParsing
options = VarParsing('python')

#default options
options.inputFiles="files.txt"
options.outputFile="mets.root"
options.maxEvents=-1

#overwrite if given any command line arguments
options.parseArguments()
#in case of txt input file, read the information from txt
li_f=[]
for iF,F in enumerate(options.inputFiles):
    print F
    if F.split('.')[-1] == "txt":
        options.inputFiles.pop(iF)
        with open(F) as f:
            li_f+=f.read().splitlines()
#add a prefix:
for ifilename,filename in enumerate(li_f):
    if filename[:6]=='/store':
        li_f[ifilename]='root://cmsxrootd.fnal.gov/'+filename
options.inputFiles+=li_f
print "analyzing files:"
for F in options.inputFiles: print F

# define deltaR
from math import hypot, pi, sqrt, fabs, cos, sin
import numpy as n

from jetmet_tree import *
from functions import *

# create an oput tree.

fout = ROOT.TFile (options.outputFile,"recreate")
t   = newEventTree()

# load FWLite C++ libraries
ROOT.gSystem.Load("libFWCoreFWLite.so");
ROOT.gSystem.Load("libDataFormatsFWLite.so");
ROOT.FWLiteEnabler.enable()

#to print out the progress
def print_same_line(s):
    sys.stdout.write(s)                  # just print
    sys.stdout.flush()                    # needed for flush when using \x08
    sys.stdout.write((b'\x08' * len(s)).decode())# back n chars

In order to access the collections in the MINIAOD file, we use the Handle class and a unique tag string for each collection:

# load FWlite python libraries
from DataFormats.FWLite import Handle, Events

vertex  , vertexLabel = Handle("std::vector<reco::Vertex>") , "offlineSlimmedPrimaryVertices" #for npv
rhoall_ , rhoallLabel = Handle("double")                    , "fixedGridRhoFastjetAll"          #for rhoall
muons   , muonLabel   = Handle("std::vector<pat::Muon>")    , "slimmedMuons"                   #muon collection
mets    , metLabel    = Handle("std::vector<pat::MET>")     , "slimmedMETs"                   #pf+chs met collection
pmets   , pmetLabel   = Handle("std::vector<pat::MET>")     , "slimmedMETsPuppi"              #pf+puppi met collection

The following code is the main body. It loops through the events, selects the leading two muons and reconstruct the Z bosons. Some selections are applied.

events = Events(options)
nevents = int(events.size())
print "total events: ", events.size()

for ievent,event in enumerate(events):

        if options.maxEvents is not -1:
                if ievent > options.maxEvents: continue

        event.getByLabel(muonLabel, muons)
        event.getByLabel(metLabel, mets)
        event.getByLabel(pmetLabel, pmets)
        event.getByLabel(vertexLabel, vertex)
        event.getByLabel(rhoallLabel,rhoall_)

        npv[0]  = vertex.product().size()
        rhoall[0]        = rhoall_.product()[0]

        print_same_line(str(round(100.*ievent/nevents,2))+'%')

        #met information
        metprod   = mets.product().front()
        met[0]  = metprod.pt()
        genmet[0] = metprod.genMET().pt()
        rawmet[0] = metprod.uncorPt()

        #puppi met information
        pmetprod   = pmets.product().front()
        pmet[0] = pmetprod.pt()
        genpmet[0] = pmetprod.genMET().pt()
        rawpmet[0] = pmetprod.uncorPt()
        ###Reconstruct Z boson:
        nmuons = len(muons.product())
        Zvector = None
        if nmuons <2: continue
        muonCollection=[]
        for imu,mu in enumerate(muons.product()):
                if len(muonCollection)>1: continue
                if len(muonCollection)==0 and mu.pt() < 20: continue
                if len(muonCollection)==1 and mu.pt() < 10: continue
                if not mu.isLooseMuon(): continue
                if abs(mu.eta())>2.4: continue
                if mu.trackIso()/mu.pt()>0.10: continue #Loose working point ;
                muonCollection.append(mu.p4())
        if len(muonCollection)<2: continue
        mu1_pt[0]=muonCollection[0].pt()
        mu1_eta[0]=muonCollection[0].eta()
        mu1_phi[0]=muonCollection[0].phi()
        mu2_pt[0]=muonCollection[1].pt()
        mu2_eta[0]=muonCollection[1].eta()
        mu2_phi[0]=muonCollection[1].phi()
        Zvector=muonCollection[0]+muonCollection[1] #Sum of two muon Lerentz vectors
        h_Zmass.Fill(Zvector.M())
        if Zvector.M()<75 or Zvector.M()>105: continue
        Zmass[0]=Zvector.M()
        pfmetvector = mets.product().front().corP4()
        puppimetvector = pmets.product().front().corP4()
        pfrecoil = pfmetvector + Zvector
        puppirecoil = puppimetvector + Zvector
        qt[0] = Zvector.pt()
        q_eta[0] = Zvector.eta()
        u_pll_pf[0] = - pfrecoil.pt() * cos(Zvector.phi()-pfrecoil.phi())
        u_pll_puppi[0] = - puppirecoil.pt() * cos(Zvector.phi()-puppirecoil.phi())
        u_prp_pf[0] = pfmetvector.pt() * sin(Zvector.phi()-pfmetvector.phi())
        u_prp_puppi[0] = puppimetvector.pt() * sin(Zvector.phi()-puppimetvector.phi())
        t.Fill()
fout.Write()
fout.Close()

I probably need a lot more explanation here. Just copy and paste for now. Copy and paste all code blocks in this section into a file called met_analyzer.py and run it by:

python met_analyzer.py

Simple exercise: Take a look at the distribution variables using root command line tool:

Open the output root file, it should has the name as mets.root*

root -l mets.root

Make sure a tree with name *events is in the file:

.ls

TFile**         mets.root
 TFile*         mets.root
  KEY: TTree    events;3        events

Now make some plots for the recoil variable in parallel and perpendicular directions:

events->Draw("u_pll_pf")
events->Draw("u_prp_pf")

What do you observe? Do they tell us what we want to know about the resolution and scale?

In the parallel direction we need the divide the recoil variable by the Z boson pt to get an idea of how the scale and resolution really looks like:

events->Draw("u_pll_pf/qt")

Does this agree with you would expect?

Make resolution plots in pt and npv bins (optional)

We want to do the measurement more carefully because the met response and resolution can depend on the scale of the MET and the pileup environment. Open a new file with name plot_resolution.py and we'll go through how to deal with the binning.

At the begining set up the ROOT module and open the output file we just created:

# import ROOT in batch mode
import os,sys
oldargv = sys.argv[:]
sys.argv = [ '-b-' ]
import ROOT
from math import sqrt
from array import array
import numpy as np
setTDRStyle()
ROOT.gROOT.SetBatch(True)
sys.argv = oldargv

f_in = ROOT.TFile("mets.root","READ")
t_in = f_in.Get("events")

Then define the binning scheme that we want to use for npv and Z boson pt:

_pt = [0,10,20,25,30,40,50,60,80,100]
_npv = np.arange(20,64,4)
folder = "result"
os.system("mkdir -p "+folder)
os.system("mkdir -p "+folder+"/fit")

Next step we define a function that takes a string as event selection. For all the events that satisfy the selection requirement, it makes the "u_prp" and "u_pll/qt" plots like we did in the previous section. It then returns a dict object with keys such as "response_pf", "response_pf_error", "response_puppi", "response_puppi_error". Notice that we will perform the same measurements on both pf+CHS mets and pf+PUPPI mets, and compare their performance.

def get_met_params(rg): #given a phase space range, return response, sigma u_pll, sigma uprp, and their errors                                                                                             params={} #this dict is to be returned
    #get the responses
    #mean of upll devided by the mean of qt
    #do the same thing for both pf and puppi
    print "getting params in range: " + rg
    print "number of events in range: " + str(t_in.GetEntries(rg))
    _shape=np.arange(-200,200,5)
    shape_qt = ROOT.TH1F("shape_qt","shape_qt",len(_shape)-1,array('d',_shape))
    shape_pf = ROOT.TH1F("shape_pf","shape_pf",len(_shape)-1,array('d',_shape)) #this can be used to fill several different variables
    shape_puppi = ROOT.TH1F("shape_puppi","shape_puppi",len(_shape)-1,array('d',_shape)) #this can be used to fill several different variables
    t_in.Draw("qt>>shape_qt", rg)
    t_in.Draw("u_pll_pf>>shape_pf", rg)
    if shape_qt.GetMean()==0 or shape_pf.GetMean()==0: return None
    params["response_pf"]=abs(shape_pf.GetMean()/shape_qt.GetMean())
    params["response_pf_error"]=sqrt((shape_pf.GetMeanError()/shape_pf.GetMean())**2+(shape_qt.GetMeanError()/shape_qt.GetMean())**2) * params["response_pf"]
    t_in.Draw("u_pll_puppi>>shape_puppi",rg)
    if shape_puppi.GetMean()==0: return None
    params["response_puppi"]=abs(shape_puppi.GetMean()/shape_qt.GetMean())
    params["response_puppi_error"]=sqrt((shape_puppi.GetMeanError()/shape_puppi.GetMean())**2+(shape_qt.GetMeanError()/shape_qt.GetMean())**2) * params["response_puppi"]
    plot_hists([shape_qt,shape_pf,shape_puppi], legend_title_list=["q_{T}", "u_{||}(PF)","u_{||}(PUPPI)"], x_title="p_{T}(GeV)", y_title="Events/5GeV", plot_name="fit/qt_upll"+rg,text_description=rg)    t_in.Draw("u_pll_pf>>shape_pf",rg)
    if shape_pf.GetStdDev()<=0: return None
    params["sigma_pll_pf"]=shape_pf.GetStdDev()
    params["sigma_pll_pf_error"]=shape_pf.GetStdDevError()
    params["resolution_pll_pf"]=params["sigma_pll_pf"]/params["response_pf"]
    params["resolution_pll_pf_error"]=sqrt((params["sigma_pll_pf_error"]/params["sigma_pll_pf"])**2+(params["response_pf_error"]/params["response_pf"])**2)*params["resolution_pll_pf"]
    t_in.Draw("u_pll_puppi>>shape_puppi",rg)
    if shape_puppi.GetStdDev()<=0: return None
    params["sigma_pll_puppi"]=shape_puppi.GetStdDev()
    params["sigma_pll_puppi_error"]=shape_puppi.GetStdDevError()
    params["resolution_pll_puppi"]=params["sigma_pll_puppi"]/params["response_puppi"]                                                                                                                      params["resolution_pll_puppi_error"]=sqrt((params["sigma_pll_puppi_error"]/params["sigma_pll_puppi"])**2+(params["response_puppi_error"]/params["response_puppi"])**2)*params["resolution_pll_puppi"]
    t_in.Draw("u_prp_pf>>shape_pf",rg)
    if shape_pf.GetStdDev()<=0: return None
    params["sigma_prp_pf"]=shape_pf.GetStdDev()
    params["sigma_prp_pf_error"]=shape_pf.GetStdDevError()
    params["resolution_prp_pf"]=params["sigma_prp_pf"]/params["response_pf"]
    params["resolution_prp_pf_error"]=sqrt((params["sigma_prp_pf_error"]/params["sigma_prp_pf"])**2+(params["response_pf_error"]/params["response_pf"])**2)*params["resolution_prp_pf"]
    t_in.Draw("u_prp_puppi>>shape_puppi",rg)
    if shape_puppi.GetStdDev()<=0: return None
    params["sigma_prp_puppi"]=shape_puppi.GetStdDev()
    params["sigma_prp_puppi_error"]=shape_puppi.GetStdDevError()
    params["resolution_prp_puppi"]=params["sigma_prp_puppi"]/params["response_puppi"]                                                                                                                      params["resolution_prp_puppi_error"]=sqrt((params["sigma_prp_puppi_error"]/params["sigma_prp_puppi"])**2+(params["response_puppi_error"]/params["response_puppi"])**2)*params["resolution_prp_puppi"]
    plot_hists([shape_pf,shape_puppi], legend_title_list=["PF met","PUPPI MET"], x_title="u_{#perp}(GeV)", y_title="Events/5GeV", plot_name="fit/uprp"+rg,text_description=rg)
    return params

We can visualize the plot in this way: create a seperate TGraph for each eta bin, with x axis being the Z pt and y axis being the response or resolution, then we can put multiple TGraph into a TMultiGraph and draw it. Fill the TGraphs with the following code:

# Create one TMultigraph for each of the following measurement
mg_metResponse_pf = ROOT.TMultiGraph()
mg_metResolution_pf = ROOT.TMultiGraph()
#Loop through eta bins
for ieta in range(len(_eta)-1):
   sel_eta="qeta>"+str(_eta[ieta])+"&&qeta<"+str(_eta[ieta+1])
   leg_eta=str(_eta[ieta])+" < #eta < " + str(_eta[ieta+1])
   li_metResponse_pf = []
   li_metResponse_pf_error = []
   li_metResolution_pf = []
   li_metResolution_pf_error = []
   li_pt_center = []
   li_pt_halfwidth = []
   #loop through Z pt bins
   for ipt in range(len(_pt)-1):
      ptlow, pthigh = _pt[ipt], _pt[ipt+1]
      sel_qt="qt>"+str(_pt[ipt])+"&&qt<"+str(_pt[ipt+1])
      li_pt_center.append(0.5*ptlow + 0.5*pthigh)
      li_pt_halfwidth.append(0.5*pthigh-0.5*ptlow)
      params=get_params(sel_eta+"&&"+sel_qt)
      li_metResponse_pf.append(params["response_pf"])
      li_metResponse_pf_error.append(params["response_pf_error"])
      li_metResolution_pf.append(params["resolution_pll_pf"])
      li_metResolution_pf_error.append(params["resolution_pll_pf_error"])
   # transfer list into TGraph:
   npoints=len(li_pt_center)
   g_metResponse_pf = ROOT.TGraphErrors(npoints, li_pt_center, li_pt_halfwidth, li_metResponse_pf, li_metResponse_error)
   g_metResolution_pf = ROOT.TGraphErrors(npoints, li_pt_center, li_pt_halfwidth, li_metResponse_pf, li_metResponse_error)
   g_metResponse_pf.SetTitle(leg_eta)
   g_metResolution_pf.SetTitle(leg_eta)
   mg_metResponse_pf.Add(g_metResponse_pf)
   mg_metResolution_pf.Add(g_metResolution_pf)

With the TMultiGraph ready, we can finally make the plots:

# Make plot for the MET response
c1 = ROOT.TCanvas("c1","c1",800,800)
mg_Response_pf.Draw("AP")
c1.BuildLegend()
c1.Print("result/met_response.png")
# Make plot for the MET resolution
c2 = ROOT.TCanvas("c2","c2",800,800)
mg_Resolution_pf.Draw("AP")
c2.BuildLegend()
c2.Print("result/met_resolution.png")

Review and further exercise

What we just plotted are the response and resolution measurement of the recoil variable in the direction parallel to the Z boson pt. But an important piece missing is the resolution of recoil variable perpendicular to the Z pt. What's also interesting is how PUPPI algorithm performs against pf+CHS algorithm when the number of primary vertices npv increases. Try play around with the code and make the following plots as exercises: 1. A plot showing the resolution of the recoil variable perpendicular to the Z pt, use the same binning scheme as above. 2. A plot showing the parallel resolution, but instead of showing different Z eta bins, show one from pf+CHS algorithm and one from PUPPI algorithm, within the same eta range. 3. Similar to above, but instead of using Z pt as x axis, use npv as x axis.

References


MET Significance .
CMS-PAS : Performance of Missing Energy Reconstruction in 13 TeV pp data .
Pileup Per Particle Identification (PUPPI).

Edit | Attach | Watch | Print version | History: r8 < r7 < r6 < r5 < r4 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r8 - 2019-09-12 - SiqiYuan
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Sandbox All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback