This package was designed to do H -> 4-lepton analysis and performance studies using AthenaROOTAccess (releases 13+) and/or PyAthena (release 14+).


You should have a working area set with AthenaROOTAccess if you intend to use it. In order to have PyH4l in your InstallArea, create the package:

cd $TestArea
mkdir PhysicsAnalysis/
cd PhysicsAnalysis/

Set the artemis CVS and get PyH4l (replace $USER by another name if you don't use the same user in both systems):

export CVSROOT="$"
export CVS_RSH="ssh"
cvs co PyH4l

Install PyH4l (to have it under your InstallArea and to compile the C++ utilities):

WARNING: In 14.4.0, egammaEvent has moved to Reconstruction/egamma, so if you are using an older release you have to change the requirements file

cd PyH4l/cmt
cmt config

If you want to test your installation, launch python and follow PyH4l/share/, replacing AOD.pool.root by the full path of your AOD file. An example for running in athena is given by PyH4l/share/

Running the code

Go to the share/ directory. To run the CSC analysis (in ARA), type python -h. You should also look at and to run in athena.

Rules for running

Using AthenaROOTAccess:

  • The first thing to do is create the ARAtree, which MUST be done using the ARAtree function from (only because the chain is attached to the tree)
  • Then instantiate the PyH4l object (say H) passing the tree, the name of the output file and the mode (4mu, 4e or 2e2mu).
  • Now define cuts, functions and trees using H.AddCut, H.AddFunction and H.AddTree
  • You should ALWAYS call H.initialize() before analysing events. It is also possible to define the events to analyse.
  • You can either call to analyse all the events or H.AnalyseEvent(ievt)
  • If you haven't called but used H.AnalyseEvent(ievt), call H.finalize() to get the efficiency table and write the output

Using PyAthena:

ONLY works in release 14+

  • Create a PyH4lAlg instance, passing the output file, the mode and the name of the algorithm ('PyH4lAlg' if not supplied).
  • Add cuts, functions and trees just like in ARA.
  • Register the algorithm to the main sequence, see share/

Defining the events to analyse

(only for ARA)

You can analyse only a subset of the tree, defined by a python list or a ROOT TEventList. This is extremely useful for example to load only the "real 4mu events", defined in a previous run. Before calling initialize, you can say something like H.events_to_analyse = [0,2,5,7] or H.events_to_analyse = range(100) to use only the first 100 events. See for how to load and use an eventlist.

Adding cuts

Cuts are added by calling H.AddCut(cut_class, arguments), where cut_class is either the name of the class (more common) or the class itself. The arguments are defined according to the cut. For example:

H.AddCut('TriggerCut', trig = 'EF_mu20')
H.AddCut('Preselection', pt_min = 10e3, nmin = 4)

TriggerCut and Preselection are the name of the cut classes, defined in, trig and pt_min are arguments pertinent to those classes only. Cry C++.

Adding functions

Functions are added exactly like cuts, only now the classes should be under The functions currently implemented are Matching, GetSumPt, VBF_info, JetPairInfo, MSInfo, ScaleElectronEnergy, RemoveElectronDuplicates, RemoveJetsCloseToLeptons, KeepMinDeltaR, RemoveFakes, GetTrueEnergyLoss, CopyGeneratedToReconstructed, JetSelection and GetCellsAroundMuon. By default, a function is called before any cut, unless after_cuts is set. This is a list that tells after which cuts the function should be called (ex: after_cuts = [1,3] means the function will be called after the 1st and the 3rd cut).

See PyH4l#PyH4lFunction for more info. Example:

H.AddFunction('Matching', collections = ['truth', 'muon', 'photon'])
H.functions[0].associations['photon'][1] = 'Z_FSR' # match photons with Z_FSR (default is to match with all true_photons)

Adding trees

Since there is only one PyH4lTree class, you only need to pass the arguments, such as the name (mandatory). Ex: H.AddTree(name = 'electron')

The following trees are foreseen: muon, electron, photon, jet, pair, quadruplet, track, true_muon, true_electron, true_photon, Z_FSR, true_Z, true_Higgs, and keep the normal variables used in the analysis.

Whatever starts by true_ (also Z_FSR) refers to the generated particles. If you want to have the true information linked to the reconstructed, use the matching function.

In principle any tree can be added as long as the user defines what are the methods to fill it. Also, any other variable can be stored in the tree. To use non-standard stuff, make sure you understand the behaviour of the trees, described in PyH4l#PyH4lTree.

Adding plots

Plots can be produced either by setting H.AddPlots() after H.initialize() (only in ARA), or calling share/ outfile from the command prompt after running. See Plots.

Performing an analysis at generator level

The code was conceived to do analysis with reconstructed particles, using generated particles just on functions, trees and plots. However, it possible to pretend that the generated particles are also the reconstructed ones calling the function CopyGeneratedToReconstructed. This allow one to use the exact same cuts as for the rec particles (when the information is available), and to form pairs and quadruplets out of gen muons and electrons. It is even possible apply the matching for pairs and quadruplets to compare the variables for Z bosons or b-bar pairs for instance.

An example can be found in share/, and more info is given in PyH4l#McOnly.


  • Cannot work with any reconstruction information (etcone, sumPt, lepton quality), so cuts that use those variables are not expected to work
  • For the same reason, do not add electron or muon trees as they normally retrieve those information. All the relevant information is in true_muon and true_electron trees
  • Since there are dozens of electrons, one better do it in the 4mu case. If you want to check the 4e case, you should not try to match pairs and/or quadruplets before a preselection cut, otherwise a long time will be spend in this step.

Using the output of the code

Apart from printing the cut efficiencies, the code writes in the output file the events analysed (events_analysed) and the list of the events that passed each cut (both as ROOT TEventList's). In the directory RunInfo one can find the mode, information about each cut (cut_info and cut_class) and function (function_info and function_class), the list of files used (file_list) and the number of events per file (evts_per_file) as ROOT TNamed (maybe not the best choice but it works). Plots can also be produced, see Plots.

An example of the output can be found at lxplus:~blenzi/public/PyH4l_output.root.

Cut efficiencies

By default the cut flow with the relative and total efficiencies is printed after calling (or H.finalize()). You can always call H.PrintSummary() to see the table again.

The normalisation used to calculate the efficiencies is the defined by H.normalisation, and is set by the number of events in the ARA tree by default. You can change the normalisation at any time, passing a list (or a ROOT TEventList) or a number. To set it to the number of events passing the first cut, set H.normalisation = H.cuts[0].evlist. Call H.PrintSummary() to see the updated efficiencies.

Cut efficiencies from ROOT file

The script share/ can retrieve and display the table from a saved file. From the command prompt you can run: <file> <normalisation>

(or python ... if you do not have permission to execute it). If you don't give the normalisation, events_analysed will be used.

Merging saved files

If you run each AOD (or group of AODs) separately you can merge their outputs later by running from the share/ directory: <outfile> <file1> ... <fileN>

(or python ...). It merges the eventlists, file_list and check if the mode, cuts and functions used to produce each file were the same. Trees are also merged and if plots exist in at least one file they are produced. The offsets for the eventlists are defined by evts_per_file but can passed as argument in python.

WARNING: your output files better follow the alphabetical order of the input files

Intersecting lists

It is possible to intersect one list from a file with all the lists from another file. The typical use case is when you apply a cut and want to see its impact if it was the first cut in the analysis. The TrueCase is a nice example: you can run the analysis without taking it into account and intersect the outputs later to get the right efficiencies. Other examples could be a trigger, or the VBFCut. From the share/ directory run: -e <evlist> <file1> <file2>

file1 is where evlist is and file2 holds the other lists. Type -h for help.

Reading the trees

Trees contain information about particles and events, and the name of the variable should talk by itself. The only exceptions are:

  • ev_last_cut that tells you what was the last cut applied to the event, starting from 0. If the event failed the 2nd cut, ev_last_cut = 1. If the event survived all the cuts, ev_last_cut = Ncuts.
  • last_cut that is the same thing for particles. If the event survived but the particle was eliminated (for example muons during the pre-selection), last_cut < ev_last_cut. If the particle survived as far as the event, last_cut = ev_last_cut. If it survived but the event was eliminated (you require 4 leptons but you have only 3, or you have 4 leptons but you required 2 jets), last_cut = ev_last_cut + 1.
  • index, the index of the particle in the collection. It should be ordered, unless the matching function was applied

Apart from the trees you defined, there is an empty one called tree which is a friend of all the others (in ROOT language). It can be useful when you call Draw or Scan, but you cannot call GetEntry with this tree.

If you used the matching, particles and true particles should be aligned. So tree.Draw("", "true_muon.reco") will give you directly the correlation between the measured and the true pt.

WARNING: If you don't require true_muon.reco, you will get whatever. Using !muon.fake only for muons works fine, but if you use another tree and the value is not present (imagine you have more true muons than reconstructed ones), ROOT assumes 0 for those values and plots more than you would expect. Compare the following if you want (notice how many values at 0):

tree->Draw("pair.m:true_Z.m", "true_Z.reco", "box")
tree->Draw("pair.m:true_Z.m", "!pair.fake", "box")


Plots can be produced either by setting H.AddPlots() after H.initialize() or calling share/ outfile from the command prompt after running. In both cases the plots are produced from the available trees and written under the Plots/ directory in the output file. They are named according to the following convention:


where particle is actually the name of the tree, variable can be pt, eta, phi, ... and icut is a number between 0 and Ncuts, where 0 means before any cut and i means after the i-th cut. By default, plots for the number of particles and events after each cut will also be produced (event_flow, muon_flow, ...). If trees for reconstructed and generated particles are present, the output will also contain plots for resolution, fakes, efficiencies and fake rates for the usual quantities.

You can add plots by issuing the command H.plots.AddPlot(**args) after H.AddPlots(). You need to pass as arguments the tree, the variable to plot and optionally the title, the binning (a list [Nbins, min, max]) and a key to identify the plot. If not supplied, the title will follow the convention variable for particle before any cut / after ith cut. The default key is (particle, variable), and the name of the histogram will be h_key_icut where an '_' is placed between each entry of the key. The binning is defined in a dictionary (H.plots.binning) that uses the variable as key (eg: 'pt') or a pair (particle, variable) if present - eg: ('jet', 'eta'). You can change it before calling

It is also possible to add plots sorting a quantity in each event. For example you might be interested in the highest-pt muon, the 2nd highest, etc. This is achieved by calling H.plots.AddPlotsWithSorting(particle, variable, N = 1, cut = None), where N is the number of plots to be produced (1 is only for the highest value, etc). So H.plots.AddPlotsWithSorting('muon', 'pt', 4) will produce 4 plots: h_muon_pt1 ... h_muon_pt4 where pt1 corresponds to the highest-pt in the event and ptN is the N-th highest (may not be present in all events).

WARNING: Only 1-D plots are implemented, but if there is a demand for 2 and 3-D, it can be extended rather easily.

How does it work

The idea behind the code is rather simple: for each event that is analysed, cuts are applied until the event is eliminated. Some information about the event and the particles can be dumped as output to a ROOT TFile / TTree and some functions can be applied during the analysis.

The code is located under the python/ directory. The main file and class is PyH4l which basically executes the main loop over the events and call methods of other classes. Every object of other classes is aware of the PyH4l instance by its attribute pyH4l, both for reading and modifying.

The input is a POOL file that can be accessed using Athena or an ARATree, which uses branches / collections like StacoMuonCollection, ElectronAODCollection, etc. The default collections are set when the main instance is created and can be changed before calling initialize.

One important thing to keep in mind: the code relies (moderately) on adding attributes and methods to existing classes from ROOT or Athena (hail Python!). This is done in and it is extremely useful, since it unifies methods that were not the same for different particles (such as track or etcone20) and it allows one to keep information like sumPt directly attached to the particle.

Default collections

Currently, the code supports and uses by the default the following branches:

Object Release 13 Release 14
muon_collection_name StacoMuonCollection
electron_collection_name ElectronAODCollection
track_collection_name TrackParticleCandidate
truth_collection_name SpclMC
trigger_collection_name McEventInfo
jet_collection_name Cone4H1TopoParticleJets Cone4H1TopoJets
true_jet_collection_name Cone4TruthParticleJets Cone4TruthJets
photon_collection_name PhotonAODCollection
MStruth_collection_name MuonEntryLayerFilter

MStruth contains the track parameters at the entrance of the muon spectrometer for generated muons. To change the jet collection one should call H.jet_collection_name = 'Cone7H1TopoParticleJets' before the initialization, and so on.

Automatic check for branches NEW

The code checks automatically for the branches which are present in the input files. You should not define any cuts or functions that need those branches. For each one that is expected and is not there, a message is displayed, such as:

muon branch not present: StacoMuonCollection


Apart from PyH4l, there are 6 main classes:

  • PyH4lCollector: used to collect particles. There is only one collector class, which is instantiated once per "particle". Different particles are collected by different methods like Collect_muon, Collect_photon, etc
  • PyH4lCutter: the base class for cuts. Each cut should be a class derived from this class
  • PyH4lFunction: defines the functions that are applied during the analysis, meaning before any cut or after each or some cuts. Functions are classes derived from this one
  • PyH4lTree: derived from ROOT TTree, defines which information is saved from the particles
  • PyH4lPlots: a class to define the standard plots. Uses the output trees to produce them
  • defines an algorithm so the code can run in PyAthena. Instead of the ARAtree, it uses the the StoreGateSvc to load the collections and the EventSelector to get the name of the input files. The other necessary changes are incorporated in the files, mainly in and

Other files:

  • defines new attributes and methods for the Athena classes as mentioned in the introduction. It can also contain functions and classes that need Athena to exist.
  • defines new methods for the existing ROOT classes and some functions that use ROOT
  • holds any simple function, class or number that can be used by the code. It should not depend on Athena as it can be loaded only with PyROOT.

The event loop

Each event is analysed by calling H.AnalyseEvent(ievt). does nothing more than calling it for each event and calling H.finalize(). Analyse an event means:

  • Clear the collectors (see PyH4l.ClearCollectors)
  • Call the functions before any cut
  • Apply the cuts until the event is eliminated
  • After each cut, call the functions
  • Fill the trees


A cut is something that loads the collections it needs, perform some operation on the collections and return True or False. It also keeps a list of the events that passed. This is done by the method EventSurvives, which has the following default behavior:

  • Collect the needed particles
  • Apply the cut, by calling apply_cut
  • Update the collections (see PyH4l#UpdateCollections)
  • Call the result method and keep its return value in event_survived (True or False)
  • Add the event to the eventlist if it survives
  • Return True or False (event_survived)

The default for the result method is: return True if there are still quadruplets, 2 pairs or 4 leptons depending weather the quadruplets or pairs had been formed before. The leptons are tested according to the mode (if mode = '2e2mu', requires at least 2e's and 2mu's).

Cuts have an attribute upd (True by default) that tells whether the function PyH4l#UpdateCollections should be called.


Particles are collected upon request, which can come from a cut, a function or (rarely) a tree. The collectors are identified by the name of the collection, like muon, jet, etc (see the default collection names) which also defines the collect_method. When you (or any function) call Collect(ievt), the collector checks if it has not done so by HasCollected(ievt), and if not it calls Collect_, like Collect_muon.

The collector usually saves a list of particles in pyH4l, like pyH4l.electrons, pyH4l.photons, etc. But it can also save different things such as true_muons, Z_FSR. Since these lists might be modified by the cuts, a list called all_particles is attached to the collector to keep all the particles that were collected. This is needed in order to write the trees, and is done currently for electrons, muons, photons, jets, pairs and quadruplets.

Updating the collections

The method PyH4l.UpdateCollections is at the same time vital for the analysis and kind of complicated. The idea is to synchronize leptons, pairs and quadruplets after a cut that could change any one of the 3 is applied. If they haven't been collected, nothing is done. The method removes quadruplets which are formed by leptons or pairs that were eliminated, then removes pairs that do not form valid quadruplets and finally leptons that do not form quadruplets (and pairs).

Adding a new collector

The default method for collecting particles is just to copy them from the input collection (PyH4lCollector.Collect_default). To add a new collector one can:

  • Add a new collection to PyH4l.collections (can be done on the fly, before initialization)
  • Add a new method Collect_X to PyH4lCollector


Functions are objects like cuts, except that they can be called many times per event and they do not return any value. They load the collections they need (defined by the argument collections) and perform some operation.

The method CallNow defines if the function should be applied or not. If the list after_cuts is not set (default), CallNow returns True once per event, at the first attempt. Otherwise, after_cuts is used and True is returned after each cut in the list, identified by the number (ex: after_cuts = [0,2] means the function will be called after the 1st and the 3rd cut). You (or whatever function) can redefine the method if necessary.

The important method of a PyH4lFunction is function, and this is the only thing that should change from one class to another.


GetSumPt loads the track collection, calculates and store the sumPt for each lepton in p.sumPt. It is used by NormTrackIsoCut, which tells the function to be called after the previous cut. You can define the size of the cone and whether to consider or not particles without hits on the blayer, for muons and electrons individually using a list.


The matching function works for muons, electrons, photons and jets just looking for the closest true particle in deltaR, within dR < min_dR (0.1 by default). For jets, the VBF_quarks are used by default. It also works for pairs and quadruplets asking if their daughters are matched and have the same parent.

The default behavior is to associate, for example, H.muon_collector.all_particles with H.true_muons. The associations are defined by a dictionary and can be changed as shown in

After the matching, the reconstructed and the generated collections are aligned (H.muon_collector.all_particles[0] match H.true_muons[0]) so that the matching is kept for the trees (this is why all_particles is used).

Warning: The default matching allows two reconstructed particles to match the same generated particle. This can lead to duplication in the true particle trees. To avoid that, set kUnique = False (requires numpy).


Trees are one of the main outputs of the code, and their presence introduced several complications. Filling the trees has to be done after applying all cuts, as we want to have updated information about sumPt, last_cut and whatever variable that is changed during the analysis. On the other hand, particles are removed by the cuts so PyH4l.muons, etc cannot be used since it does not contain everything in the end. One could think of using the original collection tt.StacoMuonCollection, but it does not keep changes in the objects. This is why a list called all_particles has to be kept by every collector, and cleared in the beginning of the event, to avoid filling the tree with the information from the previous event. This is also the reason why the matching function uses those lists by default.

By default the trees do not take collections as arguments and even if they do the collections are not loaded. There is a good reason for that: usually you don't want to waste time forming quadruplets if the event did not pass the trigger. If you really want to load the collections you passed, set load_collections to True.

Another complication is how to define which variables to keep and how to translate the information stored in Athena classes to a ROOT tree. The following scheme is adopted:

  • Each tree has 3 lists containing strings that define the information that will be stored: attributes, methods and functions. The behaviour is better described with examples:
    • If 'index' is in attributes, a branch called index will be created and will contain the values of p.index, where p is each particle that was loaded
    • If 'pt' is in methods, will be written in a branch called pt
    • If 'xx' is in functions, xx(p) will be kept in the xx branch. The function has to be defined previously by the user and can only take one particle as argument.
  • The information is stored in C++ vectors that are cleared and filled every event
  • Speed was a main concern, so the methods to retrieve the information are defined in initialization. For both methods and attributes, lambda functions are defined on the fly (class methods ( were used before but this was not working 100% and provided no gain w.r.t. lambda functions)

Leptons, pairs and quadruplets

Leptons is a class defined in to keep track of electrons and muons separately and allow loops using both. Leptons are muons in the 4mu case, electrons in the 4e case and both in the 2e2mu case. This means that cuts that operate on leptons will for example, ignore electrons in the 4mu case. You MUST NEVER redefine the object H.leptons during the analysis, only remove elements if you wish. Leptons.remove(p) will automatically remove the particle from H.muons or H.electrons depending on its type.

Pairs and quadruplets were just tuples in the beginning, but now are objects from the class MyCompositeParticle. The class was created to avoid the confusion that has been going on with the CompositeParticle class from Athena and to allow easy access to information like invariant mass, which is necessary for the trees. It derives from CLHEP::LorentzVector, and have all its methods (px, py, e, m) plus pt, p, charge, add (to append another particle to the soup), hlv (the sum of the LorentzVector's), and methods to loop over the particles, check if a certain particle is among the constituents, etc.

The methods for looping, comparing and checking constituents were written because of PyH4l#UpdateCollections, so that everything can work both with tuples and MyCompositeParticle, except for trees. The cuts were not changed either.

Analysis at generator level

If the muon and the electron collections are not present, the flag mcOnly will be set to True, for generator only studies. This analysis is made possible by the function CopyGeneratedToReconstructed.

For each particle passed to the function CopyGeneratedToReconstructed, a collector with an empty branch is created (no information is loaded from the ARAtree or the StoreGateSvc). For each event, the content of true_X is copied to X_collector.collection, so the normal Collect method works, looping over the generated particles as if they came from the reconstructed object:

for name in self.particles:
      self.pyH4l[name + '_collector'].collection = self.pyH4l['true_%ss' % name]

Also, last_cut is appended to each true tree that is present and being copied, so one can monitor how far each particle survives in this pseudo-analysis. In principle there should be no difference between the two types of analysis if one restricts itself to kinematic information. The matching for muons and electrons is modified if mcOnly is set. A dummy method MatchSelf is used, that associates each electron and muon with itself.

If you use the function when the muon / electron collections are present a warning will be displayed. It might work nevertheless, specially if this function is the first thing that will be applied in each event.


Tests done in my laptop (Core 2 Duo 1.8 GHz, 2 GB RAM) with one H130 AOD, using

The full analysis, without using the truth info and without dumping trees runs in ~3s per 1k events. If output trees are included it can increase to ~5s, and if the truth collection is loaded it goes up to ~1 min. Producing plots in the end takes less than 5s.

The ARA initialization takes about 20s (not included in the previous numbers) and a great fraction of the execution time is spent loading the collections. This justifies running all modes and mass points at once, which is done in for example. Dealing with the truth collection to classify the particles can take 90% of the execution time due to the large number of particles and slowness of loops in python. (This part was translated to C++, but there is no comparison yet)

Some comments about improvements:

  • Possibilities: collectors for truth particles according to mode (true_muon / true_electron), only keep all_particles and index if trees are present
  • When merging several files, producing plots can take some time. If the plots are already available they could be merged using HAdd (AddRootFiles)
  • Applying the matching function and using trees for truth electrons can be quite slow as there are many of those, and by default their origin is checked (from_Z, from_gamma, etc)
  • The deltaR function is called many times. In release 14 there is an improved version, but I am using hlv().deltaR when this is not available.
  • When using trees, the function eval used to be called for retrieving the appropriate collection. eval is rather slow and was replaced by getattr using the function reduce to deal with multiple objects, like a.b.c (see PyH4lTools.my_getattr). This function is also used by the matching function.
  • Trees use C++ vectors, that need to be cleared and filled every event, but using arrays would require storing their length and other complications.

Warnings / general comments

  • If the truthColection is not present, cuts / functions that require truth cannot be instantiated. If you forget that, the following error appears upon initialization: AttributeError: PyH4l instance has no attribute 'truth_collector'
  • Analysing the same event more than once has no impact since it will not enter the eventlists or trees
  • Having the same branch in multiple cutters and trying to load it more than once is harmless (in speed as well)
  • sumPt has a default value before calling the function GetSumPt (which is called before applying NormTrackIsoCut, if present).
  • last_cut is implemented for ParticleBase and MyCompositeParticle, but it is only updated for leptons, pairs and quadruplets. If one wishes to use it for all objects (photons, jets, etc) see UpdateLastCut in
  • UpdateLastCut and UpdateCollections should not be merged, since some cuts do not call the latter
  • Argument against a lepton tree (in favor of muon and electron trees): exclusive information like isem, isCombinedMuon, etc.

Default values:

  • sumPt = -9999
  • last_cut = -1
  • fake = True, reco = False


  • Constraint fit
  • Include ID info (MS info can be included with PyH4lFunction.MSInfo)
  • Include X-axis title in plots (and possibility to change it)

-- BrunoLenzi - 05 Aug 2008

Edit | Attach | Watch | Print version | History: r25 < r24 < r23 < r22 < r21 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r25 - 2011-04-07 - BrunoLenzi
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback