Hands-On Advanced Tutorial Session -- Jet Substructure (Summer 2014)

Introduction

This tutorial will give an introduction to jets in CMS, jet clustering algorithms, and jet substructure algorithms to be used in analyses needing to identify boosted top or W/Z/H bosons. We will use a special custom code framework developed specifically for these studies, using the fastjet plugin to allow easy access to clustering algorithms and substructure tools (including those still under development). We will spend the first day covering the basics of substructure, while the second day of the tutorial will be more related to studying open questions in the JME group involving substructure algorithms and pileup mitigation.

Software Framework Setup

The following instructions should be used in conjunction with working on the CMSLPC SL5 nodes.

The first step is to setup a CMSSW release:

setenv SCRAM_ARCH slc5_amd64_gcc472
source /uscmst1/prod/sw/cms/cshrc prod
cmsrel CMSSW_6_2_8
cd CMSSW_6_2_8/src/
cmsenv

git clone https://github.com/violatingcp/BaconAna.git -b CMSSW_6_2_X
git clone https://github.com/nhanvtran/SubstrAna.git
git clone https://github.com/violatingcp/Dummy.git

cp ~pilot/fastjet_setup_HATS.xml $CMSSW_BASE/config/toolbox/$SCRAM_ARCH/tools/selected/fastjet.xml
scram setup fastjet
scram b -j 8

cd SubstrAna/Summer14/bin
runHats 0.8 ca 0 10 1

To setup the software on LXPlus or the Wisconsin T2 (Note: the recommended method is to use the cmslpc nodes which runs faster):


setenv SCRAM_ARCH slc5_amd64_gcc472
source /uscmst1/prod/sw/cms/cshrc prod
cmsrel CMSSW_6_2_8
cd CMSSW_6_2_8/src/
cmsenv

git clone https://github.com/violatingcp/BaconAna.git -b CMSSW_6_2_X
git clone https://github.com/nhanvtran/SubstrAna.git
git clone https://github.com/violatingcp/Dummy.git

cd SubstrAna

# Modify fastjet.xml.modified to choose your local fast jet installation
# for LXPLUS: /afs/cern.ch/work/i/ishvetso/CMSSW_6_2_8/src/SubstrAna/fastjet
# for hep.wisc.edu: /afs/hep.wisc.edu/swanson/fastjet
# for other sites you will have to install fastjet following the instructions in SubstrAna/README [A]

cp fastjet.xml.modified $CMSSW_BASE/config/toolbox/$SCRAM_ARCH/tools/selected/fastjet.xml

scram setup fastjet
cd ..
scram b -j 8

cd SubstrAna/Summer14/bin
runHats 0.8 ca 0 10 1
# If you have trouble accessing the files you may have to change the file locations in the file lists as follows:
#  //cmseos:1094//eos/uscms/store/user  --> //xrootd.unl.edu//store/user

The runHats command is used to generate trees containing jet observables (various substructure algorithm quantities). To run the command, you must provide 5 arguments. The first is the cone size, second is the jet clustering algorithm to be used (ak = anti-kT, ca = Cambridge-Aachen, kt = kT). Then you must provide the first event number to process and the number of events to process. This allows you to split up a dataset into sections. The last argument specifies which sample to use: 0=QCD, 1=ttbar, 2=WW. For this tutorial, we will provide you with various samples that have already been produced to save processing time.

Sample Locations

To use the runHats command you can create a fileList. We have the PUPPI/BACON N-tuples located in the following locations:

/eos/uscms/store/user/jdolen/PUPPI/bacon/RSGluonToTT_M-3000_Tune4C_13TeV-pythia8_Fall13dr-tsg_PU40bx50_POSTLS162_V2-v1_BACON/
/eos/uscms/store/user/ntran/PUPPI/bacon/rsgww1000_62x_PU40BX50
/eos/uscms/store/user/ntran/PUPPI/bacon/qcd10001400_62x_PU40BX50

We have already created the N-tuples to save processing time. The combined output files can be found in the following locations:

QCD Samples

SubstrAna/Summer14/bin/samples/QCD_AK4_combined.root
SubstrAna/Summer14/bin/samples/QCD_AK8_combined.root
SubstrAna/Summer14/bin/samples/QCD_CA8_combined.root
SubstrAna/Summer14/bin/samples/QCD_CA15_combined.root

RSGluon -> ttbar Samples

SubstrAna/Summer14/bin/samples/ttbar_AK4_combined.root
SubstrAna/Summer14/bin/samples/ttbar_AK8_combined.root
SubstrAna/Summer14/bin/samples/ttbar_CA8_combined.root
SubstrAna/Summer14/bin/samples/ttbar_CA15_combined.root

RSGluon -> WW Samples

SubstrAna/Summer14/bin/samples/WW_AK4_combined.root
SubstrAna/Summer14/bin/samples/WW_AK8_combined.root
SubstrAna/Summer14/bin/samples/WW_CA8_combined.root
SubstrAna/Summer14/bin/samples/WW_CA15_combined.root

Plotting Scripts

We have produced a few simple plotting scripts to help you quickly make plots of the substructure quantities and compare different clustering algorithms and cone sizes, etc.

compareQuantities.C -- A simple script to compare 1-D distributions from two different files.

Usage:

compareQuantities(string filename1, string filename2, string var1, string var2, int nbins, float min, float max, string cut = "")

Arguments:

filename1 -- the name of the first file to draw a quantitiy from
filename2 -- the name of the second file to draw a quantity from (if filename2 is "", the first file is used for both quantities)
var1 -- the first quantity to draw, from filename1
var2 -- the second quantity to draw, from filename2 (unless not specified, then filename1)
nbins -- the number of bins in the X-axis of the histogram
min -- the minimum value for the X-axis of the histogram
max -- the maximum value for the X-axis of the histogram
cut -- a selection string (in TTree language) that can be applied to both files

Example:

compareQuantities("out_QCD_ca_R8_0_999.root", "", "PFjetMass", "CHSjetMass", 50, 0, 1000, "PFjetPt > 400")

plot2DQuantities.C -- A script to make a two-dimensional correlation plot

Usage:

compare2DQuantities(string filename1, string var1, string var2, int nbins, float min, float max, int nbinsY, float minY, float maxY, string cut){

Arguments:

filename1 -- the name of the first file to draw a quantitiy from
var1 -- the first quantity to draw, on the Y-axis
var2 -- the second quantity to draw, on the X-axis
nbins -- the number of bins in the X-axis of the histogram
min -- the minimum value for the X-axis of the histogram
max -- the maximum value for the X-axis of the histogram
nbinsY -- the number of bins in the Y-axis of the histogram
minY -- the minimum value for the Y-axis of the histogram
maxY -- the maximum value for the Y-axis of the histogram
cut -- a selection string (in TTree language) that can be applied to the events

Example:

compare2DQuantities("out_QCD_ca_R8_0_999.root", "GENjetMass", "GENjetPt", 50, 0, 1000, 50, 0,1000, "GENjetPt > 400")

compareEfficiencies.C -- A script to plot efficiency curves with different tagging requirements.

Usage:

compareEfficiencies(string filename1, string filename2, string variable, int nbins, float min, float max, string DenomCut = "", string NumCut = "")

Arguments:

filename1 -- the name of the first file to plot efficiencies for
filename2 -- the name of the second file to plot efficiencies for (if filename2 is "", the first file is used for both)
variable -- the quantity which the efficiency will be plotted as a function of (X-axis quantity, where Y-axis will be efficiency)
nbins -- the number of bins in the X-axis of the histogram
min -- the minimum value for the X-axis of the histogram
max -- the maximum value for the X-axis of the histogram
DenomCut -- a selection string (in TTree language) that is applied as a baseline pre-selection
NumCut -- a selection string (in TTree language) that defines the tagging selection. The efficiency plotted will be for this selection relative to DenomCut.

Example:

compareEfficiencies("out_0615_RS3000T_ca_R8_0_999.root", "out_0615_QCD_ca_R8_0_999.root", "PFjetPt", 20, 300, 800, "", "PFjetMass > 150 && PFjetMass < 240")

Tutorial Session I: Clustering Algorithms and Jet Grooming

Software Setup

Please follow the above instructions to get set up with the code for the tutorial from the git repository and compile.

To make sure things are up and running, try the following:

runHats 0.8 ca 0 10 1

We will first examine different jet clustering algorithms, including anti-kT, Cambridge-Aachen, and kT. We have provided you with already made N-tuples, found in the locations described above. You can make custom N-tuples with various algorithms and cone sizes with the runHats command.

For QCD jets, plot the jet mass , jet area , and number of jet constituents . You should start with the CHS collection in the N-tuples, as this is the default for CMS.

  • How does the jet mass distribution look different for signal (RSG->ttbar) compared to QCD?
  • How much do the PF reconstruction and CHS application change the jet mass distribution, compared to GEN level particles?
  • What happens when you change the cone size of the jet?

For these questions you should use the compareQuantities.C script.

  • How do the area of CA8 jets and AK4 jets compare? What about the mass? Why is one bigger?
  • What is the difference in the jet mass distributions between signal and background? What features do you see?
Next, we will examine how the jet mass depends on jet momentum (pT). For this exercise you should use the plot2DQuantities.C script.

  • What is the expected relationship between pT and mass?
  • How does QCD compare to ttbar or WW? What thresholds do you see in the 2D plots?

Jet Grooming Techniques

Next we will examine some jet grooming techniques, which aim to remove soft and large-angle radiation from the jet.

The quantities have the prefix 'Trimmed', 'Filtered', 'Pruned' in the N-tuples, for each of the grooming algorithms.

For QCD and signal events, plot the jet mass and jet area for the different grooming algorithms.

  • How do you expect these quantities to change?
  • What happens to the groomed jet mass for signal and background?
  • Which grooming algorithm is most agressive? Which is best for improving jet resolution?
Examine the 2-D correlations between jet mass and pT, for the groomed scenarios.

  • What happens when you plot the groomed jet mass against the number of pileup vertices, compared with the ungroomed mass?
  • Which grooming algorithm is best for pileup mitigation?
Can you do better? If time permits, try to modify runHats.cpp to change the grooming parameters and produce a new sample to compare against. Can you improve the pileup mitigation or jet resolution?

Tutorial Session II: Object (W/Z/H/Top) Tagging Algorithms

We have seen that jet mass can be an effective discriminant between signal and background events. In this section we explore some additional specialized algorithms for W and Top tagging.

W-Tagging:

Plot the basic quantities used for W-tagging, for the signal (RS->WW) and background, including mass drop, N-subjettiness, and Qjet volatility

  • Where do the distributions peak for the signal? Why do they peak in these areas?
  • Is the Qjet volatility larger for signal or background? Why?
Add a mass cut and see what happens to the other substructure distributions, comparing them between QCD and WW events.

  • Why is discrimination power lost when adding this mass cut?
  • Which variables are the most powerful in identifying boosted W jets?
Finally, make the efficiency curves as a function of jet Pt, and number of pileup vertices.

  • Can you explain the shape of this curve?

Top Tagging:

There are also many tools for identifying boosted top quarks decaying hadronically, resulting in a single jet. We will explore the CMS and HEP top taggers, as well as the N-subjettiness variables.

The CMS top tagger has been designed for a cone size of R = 0.8, while the HEP Top Tagger has been designed for R = 1.5. Feel free to explore other cone sizes, however!

For the CMS top tagger, plot the quantities used in the tagger, including jet mass, minimum pairwise mass, and number of subjets. Do so for signal (RS->ttbar) and QCD events. Also examine N-subjettiness and Qjets volatility.

  • What happens as a function of pileup?
  • What happens for different jet cone sizes?
  • How is the number of subjets different for signal and background?
Next, examine the tagging requirements for the HEP Top Tagger and CMS Top Tagger. Make efficiency curves for the two different algorithms, as a function of pT and number of primary vertices.

  • How are the algorithms optimized for different pT regions? Why does the efficienciy for CMS top tagger fall off below 400 GeV?
  • What do you expect would happen at extremely high values of pT?
  • What happens if you include a grooming algorithm before running the top taggers?
- JustinPilot - 13 Jun 2014
Edit | Attach | Watch | Print version | History: r10 < r9 < r8 < r7 < r6 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r10 - 2014-06-19 - JoshuaSwanson
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback