Dilepton Differential Cross-Section Code
Introduction
All of the code used in the dilepton 13
TeV differential cross-section measurements (at least, all the really cool ones) can be found here:
Phase 2 (current): svn+ssh://<username>@svn.cern.ch/reps/atlasphys-top/Physics/Top/Summer17/INT_XS_dilepdiff/
Phase 1 (previous): svn+ssh://svn.cern.ch/reps/atlasphys-top/Physics/Top/Winter1516/INT_XS_dilepdiff_13TeV/
Below are some descriptions of what the code does. The section headings mimic the structure of the code directory (i.e. all the event saver related information is detailed in Top Dilepton Analysis and everything related to unfolding is detailed in, you guessed it, the Unfolding section).
Top Dilepton Analysis
The
TopDileptonAnalysis package is basically just a normal
AnalysisTop event saver with a few extra features. The package uses the standard overlap removal in
AnalysisTop, as well as the standard object definitions (we don't overwrite the overlap removal code or the objects, just the event saver). The code structure looks something like this:
What information is provided?
Four vector information is written for leptons, jets, bjets, met, top, tbar, ttbar etc. In addition, the b-jet multiplicities are stored for all defined working points, as well as some dilepton reconstruction specific information (such as which jets were used to reconstruct the ttbar system etc). Some of these branches duplicate information that is always written out by
AnalysisTop (e.g.
AnalysisTop writes out el_ and mu_ branches, whereas
TopDileptonAnalysis adds the d_lep_X branches). This duplication is a little annoying, but as it isn't currently limiting our NTuple sizes, the information is kept for sanity check purposes. The following branches are probably of most interest to users:
tree_name = nominal / particleLevel / truth:
* d_top_ pt, eta, phi, m, e, y: four vector + information on the highest weight top quark from NuW
* d_tbar_ pt, eta, phi, m, e, y: as above but for the tbar
* d_ttbar_ pt, eta, phi, m, e, y: as above but for the ttbar
* d_nu_ pt, eta, phi, m, e, y: as above but for the neutrino
* d_nubar_ pt, eta, phi, m, e, y: as above but for the neutrino
* d_Wp_ pt, eta, phi, you get the idea: Positive W boson information (p stands for plus)
* d_Wm_ pt, eta, phi, you get the idea: Negative W boson information (m stands for minus)
* d_top(tbar)_match: true/false flag to say if the top(anti-top) matched to the particle-level top within dR < 0.4 (doesn't consider pT, m, or e differences)
* d_weight_max: weight returned by NuW. As a general rule, you want to cut on this to remove low-weight shitty solutions
The current version of this event saver also recalculates the JVT scale factors. You shouldn't use the
AnalysisTop version of this scale factor until it has been fixed (still not done as of 2.4.25).
Neutrino Weighting
Neutrino weighting is now available as a standalone class in c++ (a python version exists as well but it is much slower and is therefore not recommended). You can find the necessary code in the
NeutrinoWeighter.cxx file (if you want to change the top mass / W mass smearing, you'll need to edit the code by hand). Examples of how to instantiate the class and run the reconstruction can be found in the saveEvent function in
TopDileptonEventSaver.cxx.
Unfolding
Most of the unfolding scripts assume that you have already run the necessary NTuples using the
TopDileptonAnalysis EventSaver or some other suitably similar NTuples. You can find a list of files that the unfolding scripts assume exist in files.py (these are the files needed for an emu analysis at 13
TeV). Other unfolding scripts only require the output from the generate_unfolding_input script (such as creating reco-level plots or migration matrices).
In general, all of the scripts rely on a python
Observable
class (see
observable.py
) which provides all of the binning, tree projection, latex/ROOT histogram labels, and meta data for single and double-differential observables. Most scripts should run identically on single differential and double differential distributions without the user needing to do anything other than to configure the observable class. At the moment, all default observables are stored in
unfold_config.py
but many more can be added as long as the branches exist in the TTrees.
Generating inputs for unfolding
In order to generate the unfolding inputs, we assume you have run
AnalysisTop and have some outputs for data, signal, and backgrounds. The idea is that you get all the histograms you need by projecting them from the Trees. This is a very flexible way of generating inputs without relying on the grid/batch systems (though you can run the scripts on a batch system if you like). You can specify what types of output you want with arguments to the script. Running the unfolding input is as simple as:
> python generate_unfolding_input.py do_parton do_particle do_systematics obs: top_pt el_pt
The flags
do_parton
and
do_particle
determine if you want to plot the particle-level and parton-level plots, as well as their migration matrices in the output. If you leave these out the code will simply make reco-level output. The
do_systematics
flag turns on or off the systematics (currently, it's not possible to define a sub-set, but a simple tweak to the argument parser could fix this). Without this flag, only the
nominal
is used. Finally, any arguments following
obs:
will be observables to be added. These must go at the end of the arguments (because I wrote a lazy arg parser). Without this flag, the default 5 top/ttbar differential observables are used. If you want to add your own default observables, just add them to the list of default observables found in
unfold_config.py
.
Creating reco-level plots
Creating reco-level plots is probably the quickest thing you can do once you have your NTuples. Here is an example of how one might create some default control plots with just MC stat. uncertainties:
> python generate_unfolding_input.py obs: el_pt mu_pt jet_n bjet_n dphi_ll_vs_ttbar_m
> python do_reco_plots.py obs: el_pt mu_pt jet_n bjet_n dphi_ll_vs_ttbar_m
If you now look in the
Unfolding/Plots
directory, you should have
.pdf
files for each of these observables. In the case of double differential observables, you will have one plot per secondary differential bin (in the example above this translates to one dphi_ll figure per bin in ttbar_m). Adding systematic uncertainties to these plots increases the execution time from seconds to minutes but shouldn't cause too many problems (NOTE: Though coded, this option is not yet tested).
Drawing migration matrices
The migration matrices will be drawn automatically when you run the unfolding. However, if you would like to run the code stand-alone for some reason, you must have first run the
generate_unfolding_input.py
script. The migration matrices can then be constructed using:
> python do_migration_matrices obs: top_pt dphi_ll_vs_ttbar_m
For single differential observables, you will get a migration matrix that has been normalised by row (i.e. the entries represent a percentage of events and sum to 1 along the rows). For double differential observables, you will get one large migration matrix, sub-divided depending on the number of secondary differential bins. The code currently still normalised the rows to 1. If the colour range is not to your liking, simply disable the
set_palette
command at the start of the
do_migration_matrices
script, or edit the
atlas_labels.py
code where the palette is defined.
Running the unfolding
The unfolding will only run if you already have generated the unfolding inputs (in the previous step) and set the correct path in unfold_config for where they are stored. When testing, it is advisable to turn off the PDF systematics (as this adds hundreds of new systematic variations and will increase your run time to minutes, rather than seconds). Run the unfolding like so:
> python unfold.py
This code will unfold all of your reco spectra (be they data or MC) and store them all in a file in the Plots directory as a root file. The final script you need to run will take these unfolded spectra and compare them to the relevant particle-level spectra, thus calculating your results/systematics. It is run similarly to the unfolding:
> python unfold_results.py
Generating pseudo data
Many scripts require pseudo experiments and to avoid making these on the fly each time a script is run, we generate them separately and simply reuse them as needed. Pseudo experiments are generated with statistically independent testing and training samples (called
even
and
odd
in the code). The script that does this can take quite a lot of time so it is recommended to only run 100 experiments and to only run one observable at a time. The experiments are generated using:
> python generate_pseudo_experiments.py do_particle do_parton obs: dphi_ll_vs_ttbar_m
If you don't specify
do_parton
or
do_particle
the code will return a warning.
Pull tests
There are currently two ways to run the pull tests; by using independent testing and train samples for each pseudo experiment, or by taking a single pseudo experiment and Poisson fluctuating the bin contents. The latter case is referred to as the "simple" option and is the one we recommend for the time being. Both scripts run in the same way:
> python do_pull_tests(_simple).py do_particle do_parton obs: dphi_ll_vs_ttbar_m
If you don't specify
do_parton
or
do_particle
the code will return a warning.
A note on systematics
If you look in unfold_config.py you will see many lists of systematic names. Some are used for the unfolding input generation (those starting SYSTEMATICS_) and some are used for the unfolding (LIST_OF_SYSTEMATICS). The reason for this difference is that some of the systematics don't require special inputs from generate_unfolding_inputs. For example, background cross-section shifts simply use the nominal background estimations, scaled up or down by a certain amount, and so there is no need to include this systematic when generating the unfolding inputs.
Other useful scripts
Scripts used to perform various studies haven't been refactored for 2016 yet (you will notice they look a little messier and don't rely on the same helper python scripts). We are working to update them as we test the 2016 data.
My Links