Setup Instructions

Requirements

Needs to have access to Root, Rootcore, Python, YAML (PyYAML).

Have currently only done this on lxplus, but should work on any machine with access to the above.

First Time Setup

Code repository currently on svn:

https://svnweb.cern.ch/cern/wsvn/atlasinst/Institutes/Dresden/Tau/HTauTauHHFramework/

Checkout with:

svn co svn+ssh://svn.cern.ch/reps/atlasinst/Institutes/Dresden/Tau/HTauTauHHFramework HZPtautau/

You may need permission to access the repository [Note - may switch to git at some point].

One part of the setup script (when it runs ELCore/fetch_externals.sh) already uses git. So before starting, do:

setupATLAS

lsetup git

Then, run the setup script:

cd HZPtautau

source setup.sh

Compile with rootcore, note that the second command takes a while to run:

rc find_packages

rc compile

And that should be it.

Before running, you also need to set up the PyYAML package, and if making new productions on the grid, panda

lsetup "sft releases/LCG_87/pyyaml/3.11"

lsetup panda

Setup

After first time, can set the environment using the rcSetup script (which calls setupATLAS):

cd HZPtautau

source rcSetup.sh

lsetup "sft releases/LCG_87/pyyaml/3.11"

lsetup panda

Recompiling

Note, if you need to recompile the whole thing, after setting up the area can do:

rc clean

rc find_packages

rc compile

If working on the C++ packages in PlottingTools, you can recompile just those:

cd PlottingTools make

Creating ntuple Productions

The code for creating new ntuples is in /HZPtautau/HtautauAnalysis. See the Root folder for the source code and HtautauAnalysis for headers. The run folder contains a python script, run.py, which starts the analysis. For making productions you will want to run on the grid, and to do "lsetup panda" first.

To run the code, the best way is to use the "Makefile", which has lines to run "run.py". To create a particular sample, do:

make name

Where name is one of the data or MC samples. See the list in Makefile, eg.

make DYtautau

Will create ntuples for all of the DYtautau MC samples.

More coming soon...

Plotting Code

The scripts for making plots from the ntuples are in HZPtautau/HtautauAnalysis/macros.

plot_quick.py is a wrapper script for QuickPlotter.py. The latter calls code from HZPtautau/PlottingTools.

QuickPlotter has "init" function, which calls other functions in the code. Can look here to see the order of everything. It has functions to get the cross section weights, configure the merging of the data files, define cuts, get the histograms etc. At the end it runs "start_plotting", which takes the histograms for each signal/background process and creates the final THStack plots.

This part requires the ntuples to already exist. Instructions on creating ntuple productions are in the previous section. However, these are usually produced by one person and shared, so check where the latest production is stored.

Making Plots

To run the plotting:

python plot_quick.py /path/to/input/files/ --dist distname --region regionname

"--dist" sets which variable to plot. eg. leadtaupt deltaphi mt0. For a full list, see the file /HZPtautau/HtautauAnalysis/macros/configs/HiggsPlotConfig.yml. This sets the options for plotting each distribution, for example leadtaupt:

leadtaupt:
 dist : tau_pt[0]/1000.
 xTitle : "leading #tau p_{T}"
 bins : 25
 binsBtag : 10
 xmin : 0
 xmax : 500
 rebin : 1
 unit : GeV
 two_col_leg : True

"dist" is the string of the variable in the ntuples, in this case [0] is selecting the lead tau. "leadtaupt" is the name you use when running the plotting script. You can also make combinations of variables in the ntuples, see the yaml file for examples. If the variable is not in this config file, you will have to create a new entry for it.

There is also a "cut" entry. Here you can make a cut for only a particular plot. It needs to be in the form of a TCut string.

"--region" is setting signal/control region, b-tag/b-veto etc. Options are:

Region Name Meaning
OS_SR Opposite Sign Signal Region
OS_CR Opposite Sign Control Region
SS_SR Same Sign Signal Region
SS_CR Same Sign Control Region

Each of these can be suffixed with "_BTAG" or "_BVETO" . eg. OS_SR_BVETO. For the inclusive category, leave as in the table.

Making 2D Plots

The plotting scripts allow 2D colour plots to be created too. For this you simply put the two variables you want to plot, separated by a ":" for the --dist option, eg.:

python plot_quick.py /path/to/input/files/ --dist distname_x:distname_y --region regionname

Note that systematics are not calculated when making 2D plots.

Useful Scripts

There are some useful bash scripts in the macros folder that make several plots at once: make_higgs_plots.sh and make_2d_plots.sh. These make use of the gnu parallel program to launch multiple plot_quicks at once:

https://www.gnu.org/software/parallel/

Parallel is not available on lxplus by default, but Dirk compiled it on his own. The easiest way to use it is to make an alias in your .bashrc:

alias parallel='/afs/desy.de/user/d/dduschin/workspace/software/parallel/src/parallel'

You should also change the directory option to wherever you have the ntuples stored (in make_2d_plots.sh it is the DIR variable, in make_higgs_plots.sh it is just in the python command). The scripts look through the list of variables and list of regions that you give it, see the HISTS and REGIONS variables.

Once you have the correct directory and parallel set up, run with:

. make_higgs_plots.sh

Note that with the 2 plotting, it plots every distribution in HISTS_X against every one in HISTS_Y. You can soon end up with many plots, so try not to run too many at once.

PlottingTools

This folder contains BasePlotter.py. QuickPlotter.py calls this script to make the THStack plots at the end, with "BP.plot" or the 2D plots with "BP.plotsimple2D". Usually this does not need to be changed.

The rest of PlottingTools are written in C++ and are also called by QuickPlotter.py, see the src folder. Again, these should not need editing. You should be able to see where each tool is called in QuickPlotter. There are two tools which deal with creating the histograms, HFetcher.cxx and HHistGetter.cxx. The getHists function in HFetcher is called by QuickPlotter, which is effectively a wrapper for the getHistograms function in HHistGetter. In getHistograms, it uses the merge configuration - strings for the process names (eg. each signal model, each background, data), mapped to a list of ints for the dataset IDs. For each file, it creates the TH1/TH2 object, retrieves which cuts to use, then makes a projection to draw the histogram. Afterwards it applies the correct scaling and merges the histograms for each process. It returns a "std::map<std::string, TH1F*>" - string for the process and the merged histograms.

Cuts

Cuts on Ntuples

In general, cuts can be specified in QuickPlotter to be applied later on in HHistGetter. If cuts need to be added, it is best to put them into the prepareCuts function of QuickPlotter by appending to the "tCut" object. These cuts are applied to all of the data. Note, most of the cuts for the analysis are already applied when making the ntuples. Therefore, at this stage existing cuts can only be tightened, or new cuts defined.

Some examples:

tCut *= "deltaPhi > 2.7" 
tCut *= "Sum$(jet_isB == 1) >= 1"
tCut *= "mt0/1000. > 160. || MET_met/1000. > 95."
tCut *= "MET_met/1000. > 80." 
tCut *= "MET_SumET/1000. > 600. || MET_met/1000. > 60."
tCut *= "jet_pt[0]/1000. > 60"

Conditional Cuts

Conditional cuts are set up using the configConditionalCuts function of QuickPlotter. This is a vector of tuples of two strings. The first string in the pair is a filename, or part of a filename. The second string in the pair is a TCut to apply. If a file contains the first string, the cut is applied to it. The configConditionalCuts functions creates the vector of pairs, which are then applied when creating the histograms in HHistGetter.cxx.

Process Cuts

These work in a similar way to the conditional cuts, but are for a particular process name. The process names are those defined in configs/HiggsPlotOptions.yml. The configProcCuts function in QuickPlotter sets the vector of tuples of strings. If a process name contains the string in the first entry of the tuple, the cut defined in the second entry of the tuple is applied. This is currently used to apply Z' cuts, as the same set of files are used multiple times with different cuts to select different signal masses.

-- AdamBailey - 2017-04-06

Edit | Attach | Watch | Print version | History: r6 < r5 < r4 < r3 < r2 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r6 - 2017-11-07 - AdamBailey
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback