"Killing MET softly” — constituent-based pileup suppression techniques

Project description

“Invisible” particles, i.e. those that cannot be directly detected with experimental instrumentation, include Standard Model neutrinos and more exotic hypotheses related to Dark Matter, Extra Dimensions and Supersymmetry. The only experimental observable capable of shedding light on these mysterious objects is the missing transverse momentum (MET), that is the observed momentum imbalance perpendicular to the beam axis. At present and future LHC conditions, experimental detectors have to contend with many simultaneous proton-proton collisions or pile-up overlaid on the event of interest, vastly complicating the reconstruction of MET.

To date, various pile-up mitigation approaches have been adopted, including correcting for the average pileup density per event in local regions and the use of precision tracking to tag objects originating from pile-up vertices. While showing great promise in the reconstruction of hadronic jets, these require some adaptation for use in MET. The student will applying constituent-level pile-up correction techniques, including SoftKiller in conjunction with particle flow reconstruction, to MET reconstruction in ATLAS and assess the resulting performance against standard benchmarks. Successful demonstrations of these techniques will be fundamental to continued effective usage of MET in the ATLAS trigger and physics analyses in the foreseeable future.

Some useful references:

  • 2012 MET paper — This reviews how we reconstruct missing ET in some detail
  • SoftKiller paper — This explains one of the approaches we’re interested in for pile-up suppression in MET.
  • PFlow paper — Ideally we would combine SoftKiller with the particle flow reconstruction, which is explained here
  • ROOT website (You probably found this already. I’d suggest trying to use pyROOT as well as C++ ROOT).
  • Linux shell tutorial -- Important for getting around your unix machine (including Mac terminal)
  • Emacs tour -- One of the two major text editors on unix, and the one I prefer. Important for editing code on remote machines.

Coding hints

A basic primer for working on the CERN computing infrastructure:

ssh -Y [yourusername]@lxplus.cern.ch # log in remotely; -Y should provide you with a display connection on Mac
setupATLAS # load the "lsetup" command for sourcing ATLAS software
lsetup root # load the root executable and libraries
You'll be given a working directory at /afs/cern.ch/user/u/username/, but this is limited in size, so the better place to work is /afs/cern.ch/user/w/username/. Maybe convenient to alias the work directory to ~/work.

If you need to transfer files, you can use scp or the more powerful rsync.

CERN has started using GitLab for software version control, so once you start writing standalone scripts or even more complex code, you should start putting it in Git. I made a repository here that we can use to collect all the code that we put together.

For a few more tips, you can also refer to InstructionsForSummerStudents2016.

Ntuple format

For making a start, I produced a small sample of events with kinematic information written to a ROOT TTree. These contain a directory called "SoftK", inside which is a TTree called "ntup". You should try plotting various properties of the events to get a sense for what information is in the file. I'll probably keep updating these at:


There are several "object collections" contained in the tree:

  • Electrons
  • Muons
  • Jets of different types
    • AntiKt4TruthJets -- jets built using simulated particles from the Monte Carlo event generator as a reference
    • AntiKt4EMTopoJets -- jets built using standard ATLAS topological clusters of energy in the calorimeters
    • AntiKt4EMTopoSKJets -- as AntiKt4EMTopo but with the SoftKiller algorithm applied
    • AntiKt4EMTopoVorSKJets -- as AntiKt4EMTopoSK but with Voronoi subtraction & spreading applied to the clusters before SoftKiller
    • AntiKt4EMPFlowJets -- jets constructed using particle flow objects
  • MET collections corresponding to the different jet collections. There are several "terms" or "components" of the MET calculation
    • FinalTrk -- the total MET for the event, composed of the following parts:
    • RefEle -- the contribution of electrons to the MET
    • Muons -- the contribution of muons to the MET
    • RefJet -- the contribution of jets to the MET
    • PVSoftTrk -- the "soft term", consisting of tracks matched to the hard scatter primary vertex
For the electrons, muons and jets, we store:
  • pt: transverse momentum
  • eta: pseudorapidity
  • phi: azimuthal angle
  • e: energy
Additionally, there may be branches that indicate other information about the objects, that may be useful for selecting good quality objects. For example:
  • charge and isolation (the amount of energy/momentum in a cone around the object measured with clusters or tracks) for leptons
  • Jet Vertex Tagger, a discriminant used to separate pile-up jets from hard scatter jets
MET is a two-vector, hence we don't store a four-momentum, but instead have:
  • mpx, mpy -- the x- and y-components
  • met, phi -- the magnitude and azimuthal angle (redundant when we store mpx and mpy, but useful to plot, so we store these for convenience)
  • sumet -- the scalar sum of the transverse momenta of the particles that were used to build the MET
Finally, we have some event-wide information:
  • isData -- was this made from a sample of collision events, or from simulation?
  • EventNumber -- an identifier for the event within the sample
  • DSID -- the dataset ID, either a run number if this is data, or a number identifying the MC sample configuration
  • AverageIntPerXing -- the average number of interactions per bunch-crossing in ATLAS, which tells us how much pileup to expect
  • nPV -- the number of primary vertices reconstructed, telling us how much pileup the detector "actually saw" in this event

Week 1 assignments

Some questions to answer:

  • Are there objects that show spatial relationships to one another? We measure distances using deltaR = sqrt(deltaEta^2 + deltaPhi^2).
  • How well do the reconstructed jet collections match the truth jet collections in position and energy/momentum?
  • If you add up the different objects in the event, can you reproduce the MET?
  • How do the two measures of pileup correlate with each other? How do they correlated with other quantities in the event?

Week 2 assignments

Follow-up on week 1

  • You should try and get a sense of what different types of events "look like", so if you find some time, try plotting a few distributions like the Z pt, the jet pt, electron pt, number of jets, electrons...
  • Another thing we discussed is trying to understand how the Z kinematics and the MET are correlated, so again you could look at the same histograms that are currently defined (deltaPhi between Z and MET, ratio of transverse momenta between Z and MET) for a few different selections, e.g. high pT Z's, 0 jets...

Now that you have a clearer idea how to work with the objects in a ROOT file, the next step is learning to plot the MET performance metrics that we use to study whether we're doing a good job with MET reconstruction: Scale & resolution.

MET scale

As we've discussed, the usual way in which we determine whether the hadronic portion of the MET (especially the soft term) accurately balances the other parts (measured from leptons or photons) is to measure the projection of the MET onto some axis defined by the event. Typically this is the Z transverse momentum vector.

  • Try plotting this distribution (the dot product of MET and the unit vector in the direction of the Z boson).
  • We want to know actually what the mean value of this distribution is, and whether it is correlated with the Z pt. For this you will need to have to have a set of histograms for different Z pt values -- or you could fill a 2-dimensional histogram (TH2D).
  • How about the MET perpendicular to the Z boson? How do you expect the mean of this to behave?
  • What if we wanted to know specifically whether the soft term balanced everything else (the "hard terms")?
  • Should we expect this quantity to depend on the amount of pileup in the event? How about on the number of jets?

MET resolution

The other piece of information we are interested in is the precision of the MET measurement -- is it close to the actual value in "truth"? For this we measure the resolution, defined as the width of a Gaussian distribution fitted to the MET along some direction, or sometimes instead we examine the RMS (root mean squared).

  • Often we define the MET resolution as the width of the x-component and/or the y-component of the MET (subtracting off the true MET in events where we expect "real" MET from neutrinos etc). Why is this reasonable to do? In Z events, how could we define another pair of axes along which to study the resolution?
  • As with the scale variable, we want to measure the dependence of the MET resolution on some properties of the event. Which ones would be especially interesting? Should this depend on e.g. the Z pt? How about the number of jets?
  • Given a 1D histogram, you can easily retrieve the RMS (see the TH1 class reference). But we said earlier that we might use the width of a gaussian fitted to the histogram. Try making a fit. Does the gaussian distribution accurately match the MET distribution everywhere? How about in a limited range? If you manage to fit a gaussian reasonably well, does the width of the gaussian match the RMS?

Jet counting

It's often useful for us to count the number of jets in the event, e.g. so that we can look at MET distributions and performance metrics in events with 0, 1, 2... jets.

Jet selection

Just as a reminder, we need to reject pileup jets, so we pass the jet if:

  • |eta| > 2.4, i.e. the jet is not inside the tracker
  • pt > 60 GeV, i.e. the jet has a large pT, so is probably not pileup
  • If neither of the above is true, then require JVT > 0.59 (which has been determined to be the right value for EMTopo jets)
To complicate matters a bit, we might want to use a different JVT value for jets that are less sensitive to pileup. E.g. for EMPFlow jets, we use 0.2. We will want to figure out what is actually the best value for the different types of SoftKiller jets.

Overlap removal

Remember that last week we discussed how the jets overlap with other objects (electrons especially). So to count jets, you will need to remove the overlaps -- as we discussed, here is a standard overlap-removal strategy used by ATLAS:

  1. If a jet is within deltaR<0.2 of an electron, remove the jet
  2. If an electron or a muon is within deltaR<0.4 of a jet (that was not removed by an electron), remove the lepton.
Try comparing the numbers of jets, electrons and muons with and without these cuts -- how big a difference is there? I will make a file with Z->mumu events as well, then you should compare the Zee and Zmumu events, and see if these are consistent.

Plotting in jet bins

Now that you know how to do overlap removal, try plotting the MET and some of the performance metrics in:

  1. all events
  2. events with 0 jets
  3. events with at least 1 jet
  4. events with exactly 1 jet
  5. events with at least 2 jets

Jet calibration and fakes

We talked about how we need to calibrate jets in order to get them to the right scale (matching to truth jets). To determine whether the scale is correct, we define the "jet response", R: reco_jet_pt / truth_jet_pt Then we can look at the dependence of the mean and the resolution (width) of R versus things like jet pt, eta and pileup.

We think that the EMTopo and EMPFlow jets are probably calibrated well, but the calibrations for the SoftKiller jets are only preliminary ones, so we should check if they are good. To do so:

  1. Match (selected) reconstructed jets to truth jets (you can use deltaR<0.3)
  2. Compute the jet response and make some histograms (1D in multiple bins, or 2D) that tell you what the response looks like,
then determine how the mean and resolution vary with jet pt, jet eta and pileup (e.g. nPV)

In the process of matching reconstructed jets to truth jets, you will probably find some (selected) reconstructed jets that match to no truth jet. Because the truth jet container holds only jets from the hard scatter, we declare these to be pileup jets. What is the mean number of pileup jets per event for the different jet collection types? How does this vary with the jet pt and eta?

Assignments for August

Running jet calibrations with official code

For this you will have to either install an analysis release locally, or work on lxplus.

Instructions for lxplus

Log on to lxplus:
ssh -Y lxplus.cern.ch
Then you will need to run some setup scripts to get the environment in place for using the analysis releases. Try running
-- if this works, you can skip the next two lines. If not, then do:
export ATLAS_LOCAL_ROOT_BASE=/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase
source ${ATLAS_LOCAL_ROOT_BASE}/user/atlasLocalSetup.sh
I would suggest that you edit the file ".bashrc" and insert these lines, so they are called automatically every time you log in.

Now you can set up various programs using the "lsetup" command, e.g. "lsetup root", "lsetup asetup". The latter is what you need.

Setting up the analysis release

An analysis release is a set of programmes and libraries of analysis code that are pre-compiled by ATLAS regularly. This saves you having to write all your code from scratch! We'll then compile some analysis code against the analysis release. First, create a new directory to work in, and cd to that. On lxplus, you can set up the analysis release like this:

lsetup asetup
asetup AnalysisBase,21.2.1
Then, you will need to get the calibration code, from git (first setting up a recent git version):
lsetup git
git clone --recursive https://:@gitlab.cern.ch:8443/khoo/JetCalibAnalysis.git
This will grab a copy of the repository here: https://gitlab.cern.ch/khoo/JetCalibAnalysis, the "recursive" argument ensuring that submodules (links to other git repositories) are also checked out. You may find the following page informative: https://twiki.cern.ch/twiki/bin/view/AtlasComputing/SoftwareTutorialAnalysisInGitReleases There is also more software info available here: https://twiki.cern.ch/twiki/bin/view/AtlasComputing/SoftwareTutorial but it's probably too much/too detailed to be really useful for you.

Now, you need to compile the code:

mkdir build
cd build
cmake ../JetCalibAnalysis -DATLAS_PACKAGE_FILTER_FILE=../JetCalibAnalysis/package_filters/package_filter_AnalysisBase.txt
make -j3
source */setup.sh
What we are doing in the cmake line is to configure the compilation from a file (CMakeLists.txt) in the analysis project (JetCalibAnalysis directory), specifying a file (package_filter_AnalysisBase.txt) that restricts the compilation to just the packages you need to run the calibration. This is because I use a different release with some extra packages to actually generate the ntuples, but with the other packages you will have enough to operate on the ntuples if I make them.

In future, if you set up the release again, you don't need to recompile. Just do (assuming you started in the directory containing JetCalibAnalysis/ and build/):

asetup AnalysisBase,21.2.1
source build/*/setup.sh
Don't forget the second step!

To run, you can now do the following:

mkdir ../run
cd ../run
cp /afs/cern.ch/work/k/khoo/public/SoftKSummer17/jetcalib.ntup.jzxw.root .
ln -s ../JetCalibAnalysis/DeriveJetScales/scripts/*.py .
python deriveJES.py AntiKt4EMPFlow -t AntiKt4EMPFlowJets --inputPattern jetcalib.ntup.jzxw.root -s "EtaJES+Plots.jes"
where of course you can substitute whichever jet collection you want for "AntiKt4EMPFlow", although so far the configuration files e.g. AntiKt4EMTopo.py don't exist for all the SK collections (they are easy to extend). The file listed above only has the standard EMTopo and EMPFlow collections for now, but I should be able to add the rest later today. When this finishes running, you will be able to look at AntiKt4EMPFlow_JES.pdf.

-- TengJianKhoo - 2017-06-19

Topic attachments
I Attachment History Action Size Date Who Comment
C source code filec MET_Recreation_TJeditplusplus.C r1 manage 13.5 K 2017-06-26 - 12:04 TengJianKhoo An updated version of the MET_Recreation.C file, with some examples of class definitions and more sophisticated looping
C source code filec SoftKNtup_basicReader.C r2 r1 manage 7.8 K 2017-06-21 - 11:56 TengJianKhoo A simple compiled ROOT macro example that reads a TFile and makes a histogram of the dielectron mass
Edit | Attach | Watch | Print version | History: r8 < r7 < r6 < r5 < r4 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r8 - 2017-08-10 - TengJianKhoo
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback