Jet substructure performance at high luminosity
BOOST 2012 working group



The purpose of this working group is to study the performance of jet substructure algorithms at very high luminosity, and to investigate the use of new techniques for pile-up subtraction and suppression presented at BOOST 2012.


Below is a list of proposed projects. The goal is to establish the performance of jet substructure techniques for four different luminosity scenarios: mu = 30 (2012 LHC conditions), 60, 100, 200, and for three signal samples: dijets, boosted tops, and boosted W(lnu)H(bb).

Jet substructure perfromance at high luminosity

  • Jet mass response and resolution
  • jet substructure observables
  • S/Sqrt(B)

Pile-up subtraction plus grooming

  • Study the application of jet-areas pile-up subtraction during grooming
  • Compare jet substructure performance with the use of pile-up subtraction after grooming (CMS approach)
  • Consider all luminosity scenarios separately
  • Figures of merit include: mass vs. number of vertices, mass resolution, S/sqrt(B), etc.


Pile-up subtraction for jet shapes

  • Study the performance of the proposed jet-areas corrections for jet shapes in all four high luminosity scenarios
  • Focus on n-subjettiness and jet width in all four luminosity scenarios


Pile-up suppression using jet substructure

Pile-up local fluctuations within a same event can lead to fake pile-up jets that need to be tagged and rejected. Fake pile-up jets are made of an uniform distribution of particles from multiple interactions, leading to jets with anomalous structure and no high pT core

  • Understand the relative contribution of pile-up hard jets vs. combinatorial background (fake) jets from overlapping pile-up particles
  • Study jet substructure techniques to identify 0-core (pile-up) jets using minimum-bias only data (no signal Monte Carlo) Potential interesting methods include:
    • ACF (Angular Correlation Function)
    • Jet width using R2 weighting
    • groomed pT fraction
    • QJets
    • n-subjettiness beta=1 vs. beta=2


Samples and analysis software

A common set of 108 minimum bias events, generated with Pythia8 Tune 4C, is available at the public server

Signal and background samples from previous BOOST conferences may be found here:

Organization of minBias samples

These are single event minimum bias samples (mu = 1). The events are organized as follows:

Number of events per run ...   2,000,000
Number of runs .............          50
Total number of events ..... 100,000,000

Total data size on disk is about 365 GBytes in 5000 files. A run is a single production job with a new random number generator seed. Each run produces 100 files with 20,000 single vertex events each. On the server file system, groups of 10 runs are stored in a directory under the path given above (directory names 00_09...40_49). The average file size is 75MBytes, meaning about 3.8 kBytes/event.

Data format

The events are stored in tuples using ROOT. The ROOT tree name is MB_Py8. The tuple branches are (_FLOAT_T_ = float to optimize disk space):

   int Nentry;                  // total # entries
   int Npartons;                // # partons
   int Nparticles;              // # particles
   int ID[Nentry];              // PDG Id
   int Stat[Nentry];            // internal status word (-1,-2 for partons, 2 for particles - can be dropped)
   _FLOAT_T_ Charge[Nentry];    // charge
   _FLOAT_T_ Px[Nentry];        // momentum Px
   _FLOAT_T_ Py[Nentry];        // momentum Py
   _FLOAT_T_ Pz[Nentry];        // momentum Pz
   _FLOAT_T_ P0[Nentry];        // energy E
   _FLOAT_T_ Pm[Nentry];        // mass m
   _FLOAT_T_ Pt[Nentry];        // transverse momentum
   _FLOAT_T_ Rap[Nentry];       // rapidity y
   _FLOAT_T_ Phi[Nentry];       // azimuth phi
   _FLOAT_T_ Eta[Nentry];       // pseudo-rapidity eta
For partons we save those only with Pythia8 status=-21 and -23, thus preserving a minimum info on the scattering process. The first Npartons entries in the ROOT tuple are these partons. The following Nparticles entries are the "stable" particles as defined by Pythia8 (based on the default cτ cut). The total number of entries is then Nentry = Npartons + Nparticles. Further reduction on disk space is possible by omitting derived kinematic quantities and reducing the actual data to something like (Px,Py,Pz,m).

ROOT codes for unpacking and analysis

The ROOT tuples on files are in the MB_Py8 tree, with the following leaves matching the data structure described above (code snippet):

fChain->SetBranchAddress("Nentry", &Nentry, &b_Nentry);
fChain->SetBranchAddress("Npartons", &Npartons, &b_Npartons);
fChain->SetBranchAddress("Nparticles", &Nparticles, &b_Nparticles);
fChain->SetBranchAddress("ID", ID, &b_ID);
fChain->SetBranchAddress("Stat", Stat, &b_Stat);
fChain->SetBranchAddress("Charge", Charge, &b_Charge);
fChain->SetBranchAddress("Px", Px, &b_Px);
fChain->SetBranchAddress("Py", Py, &b_Py);
fChain->SetBranchAddress("Pz", Pz, &b_Pz);
fChain->SetBranchAddress("P0", P0, &b_P0);
fChain->SetBranchAddress("Pm", Pm, &b_Pm);
fChain->SetBranchAddress("Pt", Pt, &b_Pt);
fChain->SetBranchAddress("Rap", Rap, &b_Rap);
fChain->SetBranchAddress("Phi", Phi, &b_Phi);
fChain->SetBranchAddress("Eta", Eta, &b_Eta);
The auto-generated code templates useful to analyze the tuples (using MB_Py8->MakeClass()) can be found in MB_Py8.h and MB_Py8.C.

In addition to the 8 TeV samples described above, samples are available for 7 TeV and 13 TeV in the same format, but with lower statistics (estimated few 107 events for each center-of-mass energy). Please contact Peter L. for information on how to access these files.

Building pile-up events on the fly

Pile-up events can be constructed on the fly from the single interaction data described above. For this, an event builder is available which merges μ single vertex events, where μ follows a Poisson distribution around a central value <μ>. The code, together with a simple analysis module, is available in the archive. A recent version of ROOT and Fastjet v.3.X are needed. A brief description of the files in this compressed archive follows here (the following is a bit technical and should be considered in place of the still missing documentation of the code):

Simple main program example: anal01.C, Makefile

This main program instantiates a ROOT::TChain, adds files to this chain as given by an external text file, instantiates the MB_Py8 object and and executes its Loop method. Assuming the executable generated from anal01.C is named anal01.exe, the program configuration can be controlled by a few switches:

./anal01.exe --mu=<average number of interactions/pile-up event> --flist=<name of text file containing list of root files to be processed> --nevts=<number of pile-up events to be generated>
Note that the unpacking of the options in anal01.C is rudimentary and requires to precisely follow the command line syntax, e.g. insertion of extra spaces will yield to unrecognized configurations.

Warning, important The Makefile in the archive is extremely simple and may need to be modified according to your environment!

Raw data mode and event loop: MB_Py8.h, MB_Py8.C

The raw data has the same tuple structure as discussed in the ROOT codes for unpacking and analysis section, but the code is changed with respect to what is in the MB_Py8.h and MB_Py8.C files. While the same class name MB_Py8 is used, the class has been modified/extended with respect to the ones in these files, which are generated by MB_Py8::MakeClass(). The major modifications are:

  • new interface for constructor (now expects a TChain pointer instead of a TTree pointer);
  • corresponding interface change for MB8_Py8::Init method;
  • new interface Stop MB_Py8::Loop(int mu=1,Long64_t nevts=-1,const std::string& outName="") to accomodate <μ>, the maximum number of pile-up events to be generated (-1 means all single vertex events in the files loaded into the TChain are used), and the name for an output file;
  • methods added (note that ROOT::Long64_t is long long):
    • TIP MB_Py8::book() books a few default histograms ;
    • TIP MB_Py8::write(const std::string& outName="") writes out all booked histograms;
    • Stop MB_Py8::writeHist(H* hist) writes one histogram of type H (template parameter) if its number of entries is greater 0;
    • TIP MB_Py8::analyze(Event& event) contains the event analysis implementation;
    • Stop MB_Py8::ticker(Long64_t nVtx,Long64_t nfreq,Long64_t nevts) produces an output message for an event with nVtx vertices for every nfreq pile-up events, from a total of nevts requested events;
  • new class TIP EtaRange implements a selector only accepting particles within a given ηmin < η < ηmax;

TIP indicates methods or classes which could (and should) be modified for specific user needs, while Stop indicates general methods which do not need modifications - except concerning the addition of a signal event! The MB_Py8::Loop method contains the principal event loop and invoked MB_Py8::analysis for each complete pile-up event.

Warning, important Please check anal01.C and MB_Py8::Loop (in NB_Py8.C) for examples on how to use the code!

Event filling, container and data model: Event.h, Event.icc, Event.cxx,

The principal container for storing one pile-up event is Event. It is filled for nVtx collisions by the

bool DataHandler<T>::fillEvent(T& dataSource,Event& event,long long& nPtr,int nVtx)
function (in Event.h), which is a template for a fill function accepting any object (data source T, here T=MB_Py8) containing the most important variables and arrays described above. The normal return value of this function is true. It will return false if there are not enough single interaction events left in the input stream.

The Event container itself returns several important event features, including vertices (described in the Vertex class in Event.h) and a list of particles from all or selected vertices of the pile-up event. Please check Event.h for available methods (improved doxygen based documentation is in development).

Each particle in Event is described by a fastjet::PseudoJet, with a ParticleInfo object (see ParticleInfo class in Event.h), which contains the PDG code, the charge, and the vertex for the particle. Fastjet version 3.X is required for this to work!.

Event.h also contains some useful static functions for (see code for features):

  • struct Converters:conversion of the data in MB_Py8 to fastjet::PseudoJet (with attached ParticleInfo) and std::vector<fastjet::PseudoJet> (again with ParticleInfo attached to each PseudoJet);
  • struct Features: collection of static functions checking if particle represented by PseudoJet and attached ParticleInfo is charged, and counting the number of charged and neutral particles in a std::vector<fastjet::PseudoJet> collection;
  • struct Utils:
    • static functions for deep copying of indivual PseudoJet, or PseudoJet in collections (explicitely copies attached ParticleInfo);
    • static functions for setting the order context (rule for ordering) for vertices represented by Vertex, and performs vertex ordering according to context;
  • template struct DataHandler contains the fill function fillEvent (see above) and the getActVtx(int mu) function, which returns the actual number or vertices for a pile-up event from a Poisson distribution with mean mu;

Helpers and virtual interfaces: RandomEngine.h, RandomEngine.cxx, IFinalStateSelector

Random::RandomEngine is a wrapper generating random number sequences following various distributions. It is based on ROOT::TRandom1 and can generate, besides the usual uniform, Poisson, and Gaussian, Landau (precise) and Landau (approximated by Moyal function) distributed random numbers (it may be slow with some of that). RandomEngine follows a singleton pattern, and uses a random seed determined from the UNIX time since epoch (as returned by the time(0) system function).

Warning, important RandomEngine may have to be compiled with -D_HAVE_NO_BOOST_ if the boost/math/special_functions/round.hpp is not available (meaning an older or missing boost). Note that the Makefile in the archive assumes a "modern" boost!

The pure abstract interface IFinalStateSelector allows implementations of selectors accepting or rejecting fastjet::PseudoJet typed objects in an event. The selection is performed whenever a PseudoJet is added to the event. All selectors implementing IFinalStateSelector can be handed to the Event object ay construction. Example: EtaRange in MB_Py8.h.

Additional production: pile-up on a grid

In addition to the full spectrum of stable particles available in the datasets described above, single interaction events have been collected into complete pile-up events for μ = 30, 60, 100 and 200 minbias events overlayed. The number of individual collisions in each of these pile-up events is Poission distributed. The overlaid particles are projected onto grids of Δη × Δφ = 0.1 × 0.1, with -5.0 < η < 5.0 and -π < φ < π. There are two grids for each pile-up event, one filled from all stable particles and one filled only from charged particles. Presently there is no further selection of the particles entered in to the grid.

The data structure of the ROOT tree MBGridEvent is as follows. The common event information is stored in

Int_t    NEntries;              // entries in arrays below (6400)
Int_t    NPartTotal;            // total number of all particles in event
Int_t    NChargedPartTotal;     // total  number of charged particles in event
Int_t    NInteractions;         // number of pp interactions in this event
The kinematic information in each grid bin is the momentum p = ((Σpx)2 + (Σpy)2 + (Σpz)2)½, the transverse momentum pT = ((Σpx)2 + (Σpy)2)½, the mass m = ((ΣE)2 - p2)½, the "kinematic" η = log((ΣE + Σpz)/(ΣE - Σpz)), and the "kinematic" φ = cos-1((Σpx)/pT). Here Σ indicates the sum over all particles projected into a given grid bin.

Note that the central (ηii) of any grid bin i are not stored in the data structure, but can be calculated from the bin index (array index) i = 0, ..., 6399 itself, as implemented in the example MBGridEvent.h and MBGridEvent.C.

The data structure in the MBGridEvent tree is (for all and charged particles):

Int_t      NParticles[6400];       // number of particles in grid bin
Float_t  P[6400];                  // momentum
Float_t  Pt[6400];                 // transverse momentum
Float_t  M[6400];                  // mass
Float_t  EtaKine[6400];            // eta from particle kinematics
Float_t  PhiKine[6400];            // eta from particle kinematics
Int_t    NChargedParticles[6400];  // number of charged particles in bin
Float_t  ChargedP[6400];           // momentum of charged particles
Float_t  ChargedPt[6400];          // transverse momentum of charged particles
Float_t  ChargedM[6400];           // mass of charged particles
Float_t  ChargedEtaKine[6400];     // eta from charged particle kinematics
Float_t  ChargedPhiKine[6400];     // phi from charged particle kinematics

The overall grid description is stored in a separate tree GridGeometry, which is used to reconstruct the grid center η and φ. Details on this data structure, which is stored only once per file, can be found in MBGridEvent.h, together with some code performing the calculations. Also check MBGridEvent.C for an example on how to use this implementation. The branches are:

fChain->SetBranchAddress("NEntries", &NEntries, &b_NEntries);
fChain->SetBranchAddress("NPartTotal", &NPartTotal, &b_NPartTotal);
fChain->SetBranchAddress("NChargedPartTotal", &NChargedPartTotal, &b_NChargedPartTotal);
fChain->SetBranchAddress("NInteractions", &NInteractions, &b_NInteractions);
fChain->SetBranchAddress("NParticles", NParticles, &b_NParticles);
fChain->SetBranchAddress("P", P, &b_P);
fChain->SetBranchAddress("Pt", Pt, &b_Pt);
fChain->SetBranchAddress("M", M, &b_M);
fChain->SetBranchAddress("EtaKine", EtaKine, &b_EtaKine);
fChain->SetBranchAddress("PhiKine", PhiKine, &b_PhiKine);
fChain->SetBranchAddress("NChargedParticles", NChargedParticles, &b_NChargedParticles);
fChain->SetBranchAddress("ChargedP", ChargedP, &b_ChargedP);
fChain->SetBranchAddress("ChargedPt", ChargedPt, &b_ChargedPt);
fChain->SetBranchAddress("ChargedM", ChargedM, &b_ChargedM);
fChain->SetBranchAddress("ChargedEtaKine", ChargedEtaKine, &b_ChargedEtaKine);
fChain->SetBranchAddress("ChargedPhiKine", ChargedPhiKine, &b_ChargedPhiKine);

Grid events are presently available for 7 and 13 TeV center-of-mass energies. Please contact Peter L. on how to access the files.

Analysis software

SpartyJet setup

SpartyJet can be retrieved from the SpartyJet HEPForge site.

Full instructions for your initial SpartyJet setup can be accessed here:

tar xf spartyjet-4.0.2.tar
cd spartyjet-4.0.2/
sed -i -e 's/if (m_tree) delete m_tree;/\/\/if (m_tree) delete m_tree;/g' IO/
make fastjet
cd examples_py/

ALERT! Note: If you are using a Mac, you need to set: export DYLD_LIBRARY_PATH=$DYLD_LIBRARY_PATH:$SPARTYJETDIR/lib
On some versions of Linux (e.g., slc5_amd64_gcc462), you need to set: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$SPARTYJETDIR/lib

Analysis setup

A testing macro is provided in the Attachments on this TWiki page. To use this macro:

  1. Download both a pileup sample file and a signal sample file
cd data/
gunzip herwig65-lhc7-ttbar2hadrons-pt0500-0600.UW.gz
  1. Download the analysis macro from this TWiki page
  2. Set the primary options inside the macro to define the output files and paths
    • The current defaults should be appropriate for most people but please check
  3. Run the analysis macro


Please upload your contributions and results to the following live indico page

-- ArielSchwartzman - 03-Aug-2012 -- PeterL - 14-Aug-2012 -- PeterL - 24-Jan-2013 -- PeterL - 28-Feb-2013

Edit | Attach | Watch | Print version | History: r10 < r9 < r8 < r7 < r6 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r10 - 2013-06-05 - PeterL
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback