CMG W mass Analysis Code

The code to run the W mass analysis

The CERN Analysis Framework is based on the so called CMG Tools.

The git repository containing the CMG Tools can be found here: git repository. See the general twiki page for technical informations about it. The current code is based on CMSSW_8_0_25 for the analysis of the Moriond17 dataset.

How to get the CMG Tools Analysis code

The analysis code for 8 TeV analysis on the legacy re-reco runs in release CMSSW_5_3_22_patch1. For the time being the git branch is this private one, and it will be merged later on the official cmg-cmssw repository.

Git MiniAOD release for 8 TeV analysis (CMSSW_5_22_patch1) Work in progress, under construction

Installation instructions:

cmsrel CMSSW_5_3_22_patch1
cd CMSSW_5_3_22_patch1/src
cmsenv

# create empty repository
git cms-init

# add the central CMG repository, and fetch it
git remote add cmg-central https://github.com/CERN-PH-CMG/cmg-cmssw.git
git fetch cmg-central

# add emanuele's CMG repository, and fetch it
git remote add cmg-emanuele git@github.com:emanueledimarco/cmg-cmssw.git
git fetch cmg-emanuele

# add your mirror (see https://twiki.cern.ch/twiki/bin/viewauth/CMS/CMGToolsGitMigration#Prerequisites )
git remote add origin git@github.com:YOUR_GITHUB_REPOSITORY/cmg-cmssw.git

# configure the sparse checkout
git config core.sparsecheckout true
cp /afs/cern.ch/work/e/emanuele/public/wmass/CMG_PAT_V5_18_from-CMSSW_5_3_14.sparse-checkout  .git/info/sparse-checkout 

# checkout the release, make a branch for it, and push it to your CMG repository
git checkout cmg-emanuele/wmass_53X
git checkout -b wmass_53X
git push -u origin wmass_53X

# Set up the LHAPDF code and the access to the MagneticField
# LHA PDF
scram setup lhapdffull

# Magnetic field 
cp /afs/cern.ch/work/e/emanuele/public/wmass/TTHAnalysis.BuildFile.xml CMGTools/TTHAnalysis/BuildFile.xml 

#compile
scram b -j 8
You can then merge or rebase the branch containing your existing developments on top of this release.

cd CMGTools/TTHAnalysis/python/plotter
root.exe -b -l -q smearer.cc++ mcCorrections.cc++
root.exe -b -l -q functions.cc++
root.exe -b -l -q fakeRate.cc++
 echo '.L TH1Keys.cc++' | root.exe -b -l

  • Compile a few files needed by python and root
cd CMGTools/TTHAnalysis/python/plotter
root.exe -b -l -q smearer.cc++ mcCorrections.cc++
root.exe -b -l -q functions.cc++
root.exe -b -l -q fakeRate.cc++
echo '.L TH1Keys.cc++' | root.exe -b -l

( the last one requires you to have a gSystem->SetIncludePath("-I$ROOFITSYS/include"); in your rootlogon; if you don't have it, open root, then do the SetIncludePath and then .L TH1Keys.cc++ ; also, don't worry if at the end it complains that "TH1Keys()" does not exist, that's fine)

Description step 1: from CMGTuples to flat trees with the WMass python package for the electron channel

Standard configuration

The standard configuration file to run the WMass python code and produce the trees for the electron final state is the based on many analyzers in TTHAnalysis and is the following: run_wmass_ele_cfg.py

Description:

  • the cfg runs a cfg.Sequence of analyzers. The main ones are defined in the susyCore_modules_cff.py, defined in CMGTools/TTHAnalysis/python/analyzers/susyCore_modules_cff.py where the parameters of the relevant analyzers have been customised in the cfg
    • skimAnalyzer: it runs the CMGTools/RootTools/python/skimAnalyzerCount.py to store in the SkimReport.txt (see next section) the number of events that have been processed
    • jsonAna: it runs the CMGTools/RootTools/python/JSONAnalyzer.py it to apply the json file skim for data (NB: json-files for each dataset are defined here: to-be-filled with the first 13 TeV data JSONS)
    • triggerAna:it runs CMGTools/RootTools/python/analyzers/triggerBitFilter.py to apply the trigger filters
    • pileUpAna:it runs CMGTools/RootTools/python/analyzers/PileUpAnalyzer.py to compute the pile-up reweighting and store the weights in the trees
    • vertexAna: it runs CMGTools/RootTools/python/analyzers/VertexAnalyzer.py to store in the trees the vertices information
    • lepAna: applies lepton cross cleaning, muon ghost cleaning, basic and advanced electron and muons selections, energy/momentum calibration/regression/smearing, and to store in the trees the leptons information
    • jetAna: it applies jet selections and clean the jets collections from the leptons, can ally jet JEC on the fly with a different GT than the one of the miniAOD
    • metAna: it runs several mets (std PF met, metNoMu, metNoEle, metNoPU)
    • treeProducer: it runs CMGTools/MonoXAnalysis/python/analyzers/treeProducerDarkMatterMonoJet.py which produces the flat tree with the final variables

Trees description

Content of the directories:

  • The corresponding .root file of the tree can be found here: TREES_XXX/Sample/treeProducerWMassEle/treeProducerWMassEle_tree.root
  • In each sample directory there is also a lot of other stuff: the most useful thing is the TREES_XXX/Sample/skimAnalyzerCount/SkimReport.txt file which reports the number of events on which the tree production run, useful for the event weight
  • Additional directories such as mjvars could exist, containing sets of friend trees that will be described in the next section

Weights and filters: the trees or friend trees contains some useful event weights already:

  • events_ntot is the sum(genWeights) in the case of weighted samples (like madgraph, which have positive and negative weights) or sum of processed events in the other cases
  • puWeight an estimate of the pileup weight, done with just the number of vertices after a loose selection on Z#mu#mu events in 2.11/fb
  • triggers: no triggers are required for MC, while the OR of monojet triggers are required to make the data trees on MET dataset. For the DoubleMuon / DoubleEG datasets the OR of DoubleMu / DoubleEG have been requested. The single bits can be required by using the variables in the trees (booleans)
  • JSON the latest golden JSON file is used (see below)

Contents of the trees: the main trees are flat frees with arrays of variables. They contain the main informations about the event (timestamps, global event quantities, like number of reconstructed vertices, rho, etc.) and and about collections of physics objects. Each "collection" has an integer / event with the size of the collection (eg. number of loosely identified leptons in the event, denoted by "n" and the associated variables are always called "_", which is an array of floats with length "n". The main collections are (very brief description):

  • LepGood: are the leptons with a very loose identification attached (muons/electrons together, with pt>5/7 GeV and loose isolation applied). These should be good for 90% of the analyses. Note: electrons and muons can be distinguished by LepGood_pdgId varaible (11 or 13, with sign for the reco charge)
  • LepOther: if you want all the reconstructed leptons, these are the ones failing the criteria for the LepGood ones
  • Jet: ak5 (53X version) or ak4 (80X version). They are the CHS Jets with latest greatest JECs for 53X and 80X for 2016 data (Sep 2016 re-reco and prompt-reco for the following). Kinematic cuts are: pT>25 GeV and |eta|<2.4.
  • JetFwd: same as above, but the ones with 2.4<|eta|<4.7
  • met / tkmet / metraw: PF met (with type-1 corrections) / track met / raw met (this is a dummy collection with just 1 element)
  • GenP6StatusThree: collection of the generator particles with pythia status 3 (stable). The variables "motherId" and "grandmaId" (53X) identify the lundID of the mother and grandmother of the particle to partially reconstruct the decay tree. The 80X version has the index of the mother to backward-navigate the decay tree.

Tree location

Depending on the skim, the trees and friend trees are from 10-100 GB, and are located either on EOS or on local disk. The public AFS dir contains the directories structure, and instead of the root file, there is the xxx.root.url file with the link to the real file on EOS.

skim public AFS dir !EOS at CERN pccmsrm29
>= 1e or 1μ loose /afs/cern.ch/work/e/emanuele/TREES/MCTrees_1LEP_80X_V1 /eos/cms/store/cmst3/group/susy/emanuele/wmass/trees/MCTrees_1LEP_80X_V1/ /u2/emanuele/TREES_1LEP_80X_V1
>= 1e or 1μ loose /afs/cern.ch/work/e/emanuele/TREES/TREES_1LEP_53X_V1 /eos/cms/store/cmst3/group/susy/emanuele/wmass/trees/TREES_1LEP_53X_V1/ /u2/emanuele/TREES_1LEP_53X_V1
>= 1e or 1μ tight /afs/cern.ch/work/e/emanuele/TREES/TREES_1LEP_53X_V2 /eos/cms/store/cmst3/group/susy/emanuele/wmass/trees/TREES_1LEP_53X_V2/ /u2/emanuele/TREES_1LEP_53X_V2

Configuration for 53X trees

condition V1 V2
skim at least 1 lepton loosely identified (mu: tight ID and relIso<0.5; electron: MVA non-triggering ID applied) at least 1 lepton identified (mu: tight ID and relIso<1; electron: MVA triggering ID applied
electron corrections enabled, final Run1 enabled, final Run1
muon corrections no no
JECs final run1 ones final run1 ones

  • The data is Run2012 22Jan2013 legacy re-reco for RunA-D with electron corrections applied: regression + residual scale correction (no muon corrections applied)
  • The MC is Summer12 53X_DR MC with electron corrections enabled: regression + smearing (no muon corrections applied)
  • Golden JSON for 2012 is required

Show 13 TeV trees configuration Hide

Configuration for 80X_V1 trees

condition value
skim at least 1 lepton loosely identified (mu: tight ID and relIso<0.5; electron: HLT-safe ID applied)
electron corrections enabled, Winter16
muon corrections KaMuCa V4
JECs Spring16_25nsV6

  • The data is Run2016 23Sep16 for RunB-G and prompt reconstruction for RunH.
  • The MC is Spring16 MC with lepton corrections enabled
  • Golden JSON for 2016 is required

How to produce the trees running the configuration files

  • for testing the cfg (it runs on only on 1000 events of one component, output dir Trash )
# set test = 1 in the cfg
multiloop Trash run_wmass_ele_cfg.py -N 1000 -f

It creates a Trash directory with one tree-directory with the name of the component which has run (which is chosen in the cfg file for test = 1). It runs over 1000 events.

  • for running on all the Monte Carlo samples (it runs on all selectedComponents, output dir Output)
# set test = 0 in the cfg
pybatch.py -o Output run_wmass_ele_cfg.py 'bsub -q 8nh -J Prod < batchScript.sh'

It creates a Output directory which contains several tree-direcories. The number of tree-directories for each component depends on the component splitFactor which is define in samples_8TeV.py. At this point two scripts are available to check that all chunks terminated correctly and to add the root files (and also all cut flow counters and averages):

  • deep check the output trees, and print the command to resubmit the failed chunks
cd Output
chunkOutCheck.py * 

  • hadd the output files, and move the chunks outside the output directory, to be eventually removed
haddChunks.py -c -r Output
Before using this script, if you fear you might lack space on AFS to store the merged trees (hadd command does not remove the chunks), you should log into a local pc at cern, for instance ssh pccmsrmXYZ, then create symbolic links to directories on AFS using the linkChunks.sh script, which is this one: linkChunks.sh If you create a MCTrees directory on the local pc, go inside it and then use:
linkChunks.sh /afs/<Path_to_MCTrees_on_AFS>/MCTrees
Now you can safely use the haddChunks.py (provided you have enough space in the local pc). Remember that before hadding files, there must be no empty directories: if there were some failed chunks you didn't want to resubmit, remove them.

WARNING: Once you have merged all the trees, make sure that all of them are present. In case there were some samples not splitted in chunks, they will be left untouched by haddChunks.py so that, at the end, only the link to the one on AFS will be present in the local pc. If this is the case, remember to copy these files from AFS to the local pc before removing all the directories on AFS.

  • if the trees are too large to stay on AFS, it is preferable to copy them on EOS, and then leave only the structure of the directories and the links to the files in place of the root file (eg to make friend trees using lxbatch). To do this, first run the archival on EOS. Get just outside the directory whose content you want to copy and use the following command
$CMSSW_BASE/src/CMGTools/WMass/scripts/archiveTreesOnEOS.py -t treeProducerWMassEle -T treeProducerWMassEle_tree.root  <dir_to_copy>/ /eos/cms/<PATH_TO_EOS>/<destination_dir>
This will copy all the files inside dir_to_copy in destination_dir. You will be shown which files will be copied and their destination path and the script will ask for confirmation that you want to proceed.

For example, the following command

$CMSSW_BASE/src/CMGTools/WMass/scripts/archiveTreesOnEOS.py -t trees /eos/cms/store/cmst3/group/susy/emanuele/wmass/trees/TREES_1LEP_53X_V1
will copy the content of TREES_1LEP_53X_V1 inside trees.

  • then copy the structure, including the .url files to AFS, but exclude the copy to AFS of the root files. You can choose whatever path you like on afs (it might be your public area, so that everyone can use your trees):
rsync -av --exclude '*.root' <LOCALDIR_WITH_TREES> <username>@lxplus:<PATH_ON_AFS>
In the above command, the path to AFS points to a directory inside which the structure of the local directory will be copied. It is good practice, although not necessary, to name this AFS directory as the local one, just to make it easier to remember what it is.

Adding friend trees

Friend trees are a technique in root to add a variable to an existing tree by creating a second tree that just contains the value of that variable for each entry of the main tree. We use friend trees to add variables that are too complex to compute on the fly from the flat tree (e.g. because they require looping on the objects in an event), and that are still in development and so we don't have them yet in the final trees.

For convenience, the method we do to create friend trees is to have small python classes that compute the values of the friend tree variables, and two main python scripts that take care of running those classes on all the trees, of the book-keeping and so on.

For the final kinematic variables we use a driver script that runs both on data and on MC, macros/prepareEventVariablesFriendTree.py. Example python classes that we use for this are:

  • Kinematic variables of dilepton events such as the delta phi distance of the leading two jets (in case of 2 jets are present) python/tools/eventVars_wmass.py

The corresponding directories of friend trees can be found here for the latest set of trees:

  • XXXXXXX/FRIENDS_EVENTVARS

NB: additional per-event weights have been computed during the original tree production and are stored in the main trees such as: puWeight (weight for pile-up reweighting), LepEff _2lep (weight for lepton preselection data/mc SF in di-lepton final state), Eff_3lep (weight for lepton preselection data/mc SF in three-lepton final state). How to apply event weights will be discussed in the following.

An example of how to run the prepareEventVariablesFriendTree.py script, that will list the bsub commands needed to produce the friend trees for each sample, is given here:

mkdir {path to old trees}/TREES_XXX/FRIENDS_EVENTVARS
python prepareEventVariablesFriendTree.py -q 8nh -N 25000 {path to old trees}/TREES_XXX {path to old trees}/TREES_XXX/FRIENDS_EVENTVARS
Here {path to old trees}/TREES_XXX is the global path (e.g. /afs/.../user/.../directory_with_samples) to the directory where the folders with trees are stored, while FRIENDS_EVENTVARS is the name of the directory where friends will be stored (N.B.: you MUST CREATE this directory, the code won't do that by itself).

You might redirect the output of the previous commands in a script file to submit the jobs. This output consists of some information (like the number of chunks created for each sample) that you should remove from the script before using it. The remaining part is the actual commands to submit jobs, which looks like:

bsub -q <queue> $CMSSW_BASE/src/CMGTools/WMass/macros/lxbatch_runner.sh $CMSSW_BASE/src/CMGTools/WMass/macros $CMSSW_BASE python prepareEventVariablesFriendTree.py -N 25000 -T 'mjvars'  {path to old trees}/TREES_XXX/ {path to old trees}/TREES_XXX/FRIENDS_EVENTVARS/ --vector  -d <sample_name> -c <number>
where the -c option followed by a number identifies a specific chunk. It is highly recommended to perform a test in local before using the queues: to do a test you only need to select one command line and copy the part from "python prepareEventVariablesFriendTree.py [...]".

If you used the rsync command to create the directory structures on AFS (see above), you can set {path to old trees}/TREES_MONOJET to that path. Then, after friends are created, you might either leave them on AFS (they should not be very big files) or copying them manually on EOS.

The "-N 25000" option (without inverted commas) splits the job into 25k events/job. Same command, but using the prepareScaleFactorsFriendTree.py script, can be used to produce sfFriend trees (friend trees with scale factors to be used in MC). Both scripts can be found in the macros directory (N.B.: the command MUST be launched from inside macros).

  • To check that all the chunks run correctly, go inside the directory containing the friend root files and run the script
scripts/friendChunkCheck.sh -z <prefix>
where prefix is evVarFriend or sfFriend. Option -z is optional but useful because it will test the presence of zombies.
  • To merge the chunks, run the script (from the same directory as above)
TTHAnalysis/macros/leptons/friendChunkAdd.sh <prefix>

* To copy manually friends in a directory on EOS you can do the following: get inside the directory where friends are stored (likely there will be the chunk as well) and use these commands.

files=`ls | grep -v chunk`
for file in $files; do cmsStage -f $file <eos_path_to_dir_with_rootfiles>; done
cmsStage is an old command to copy file on EOS (you could use eos cp instead). The path to EOS must start with /store when using cmsStage. Option -f is to force overwriting of already existing files (so be careful if you have two files with same name).

Description step 2: from trees to yields and plots

For the final steps of the analysis, such as computing yields, making plots or filling datacards, we use the python scripts in /python/plotter/.

Computing event yields and cut flows
The script to compute the yields is called mcAnalysis.py and takes as input:

  • a text file with the list of MC and data samples, e.g. as in mca.txt
    • the first column is the name you want to give to the sample (e.g. TTW); the data must be "data", and samples derived from data applying FR or similar should have a "data" in their name.
      a plus sign at the end of the name means it's a signal.
    • the second column is the name of the dataset, i.e. the directory containing in the trees. You can group multiple datasets in the same sample.
    • the third column, only for MC, is the cross section in pb (including any branching ratio and filter efficiencies, but no skim efficiency)
    • the fourth column, optional, is a cut to apply
    • then, after a semicolon, you can give labels, plot styles and normalization uncertainties, data/mc corrections, fake rates, ...
  • a text file with a list of cuts to apply, e.g. as in bins/3l_tight.txt
    • the first column is a name of the cut, to put in the tables
    • the second column is the actual cut (same syntax as TTree::Draw; you can also use some extra functions defined in functions.cc)
You normally can have to specify some other options in the command line:
  • the name of the tree: --tree ttHLepTreeProducerTTH (which is the default, so you normally don't need it)
  • the path to the trees (e.g. -P /afs/cern.ch/work/g/gpetrucc/TREES_270314_HADD or better to your copy on a fast local disk )
  • --s2v which allows cut files to have the variables of the objects written as if they were scalars (e.g. LepGood1_pt) while the trees have variables saved as vectors (e.g. LepGood_pt[0]) (s2v stands for scalar to vector).
  • the luminosity, in fb–1 (e.g. -l 19.6 )
  • the weight to apply to MC events (e.g. -P 'puWeight*LepEff_2lep' to apply PU re-weight and efficiency re-weight for the first two leptons)
Options to select or exclude samples:
  • -p selects one or more processes, separated by a comma; regular expressions are used (e.g. -p 'ttH,TT[WZ],TT' to select only signal and main backgrounds)
  • --xp excludes one more processes, separated by a comma (e.g. --xp 'data' to blind the yields)
  • --sp selects which processes are to be presented as signal; if not specified, the ones with a "+" in the samples file are the signals; (e.g. use --sp WZ in a control region targeting WZ)
  • --xf excludes one dataset, (e.g. to skip the DoubleElectron and MuEG PD's do --xf 'DoubleEle.*,MuEG.*' ) ("f" is for "files")
Options to manipulate the cut list on the fly (can specify multiple times):
  • -X pattern removes the cut whose name contains the specified pattern (e.g. -X MVA will remove the 'lepMVA' cut in the example file bins/3l_tight.txt)
  • -I pattern inverts a cut
  • -R pattern newname newcut replaces the selected cut with a new cut of which it specifies the name and cut (e.g. =-R 2b 1b 'nBJetLoose25 = 1' will replace the request of two b-jets with a request of one b-jet)
  • -A pattern newname newcut adds a new cut after the selected one (use "entry point" as pattern to add the cut at the beginning).
    Newly added cuts are not visible to option -A, so if you want to add two cuts C1 C2 after a cut C0, just do -A C0 C1 'whatever' -A C0 C2 'whatevermore' and you'll get C0 C1 C2.
  • -U pattern reads the cut list only up to the selected cut, ignoring any one following it
  • --n-minus-one will present, instead of the cut flow, the total yields after all cuts and after sets of N-1 cuts.
  • pedantic note: The options are processed in this order A U I X R
Presentation options:
  • -f to get only the final yields, not the full cut flow (this also speeds up things, of course)
  • -G to not show the efficiencies
  • -e to show the uncertainties in the final yields ("e" for "errors")
  • -u to report unweighted MC yields (useful for debugging)
Other options:
  • -j to specify how many CPUs o use; 3-4 is usually ok for normal disks or AFS, you can go up to 8 or so with good SSD disks.
Example output:
$ python mcAnalysis.py -P /data1/emanuele/monox/TREES_040515_MET200SKIM --s2v -j 6 -l 5.0 -G   mca-Phys14.txt --s2v  sr/monojet.txt   -F mjvars/t "/data1/emanuele/monox/TREES_040515_MET200SKIM/0_eventvars_mj_v1/evVarFriend_{cname}.root" 

     CUT           M10V        Top       GJets     DYJets      WJets      ZNuNu     ALL BKG
-------------------------------------------------------------------------------------------
entry point          2042      83584       3672      17163     206013      80738     391172
2j                   1089      13293       1993      11297     134831      55227     216643
pt110                1052      11172       1870      10704     127583      52315     203645
dphi jj            892.22       6905       1392       8971     105320      44903     167494
photon veto        892.22       6905       1392       8971     105320      44903     167494
lep veto           885.29       1414     677.67     576.30      33215      44533      80417
met250             596.73     440.60     223.33     166.42      10933      17822      29586
met300             408.85     177.69     100.41      59.35       4158       8064      12561
met400             206.50      43.75      20.90      13.75     852.12       2148       3079
met500             110.87      14.99       6.88       4.20     243.72     727.21     997.00

Making plots
The script to compute the yields is called mcPlots.py and takes the same two inputs text files as mcAnalysis.py plus a third file to specify the plots (e.g. see standard-candles/zjet-plots.txt

  • the first column is the plot name, which will also be the histogram name in the output rootfile and the filename for the png or pdf images
  • the second is the expression to plot (again, you can use the extra functions in functions.cc); if you have colons in the expression, e.g. to call TMath::Hypot(x,y) you should escape them with a backslash ( TMath\:\:Hypot)
  • the third column is the binning, either as nbins,xmin,xmax or as [x0,x1,...,xn].
  • then you have options like labels (XTitle, YTitle), location of the legend (TL for top-left, TR for top-right), axis ticks (NXDiv), log scale (Logy).
    For plots with uneven binnings, in the options you can put "Density=True" in these options to have the bin values correspond to event densities rather than event counts (ie so that a uniform distribution gives a flat histogram whatever is the binning)
Besides all the options of mcAnalysis.py you usually want also to specify:
  • --print=png,pdf to produce plots in png and pdf format (otherwise they're only saved in a rootfile)
  • --pdir some/path/to/plots to specify the directory where to print the files
  • -f you normally want this option to just produce the plots after all the cuts; otherwise, it will produce also additional sets of plots at each step of the selection.
Other useful options
  • --sP to select which plots to make from the plot file instead of making all of them
  • -o to specify the output root file (normally produced in pdir, and called as the plot file if the option is not specified)
  • --rebin rebins all the plots by this factor (or a smaller one if needed to have it divide correctly the number of bins)
  • --showRatio adds a data/mc ratio
  • --showSigShape draws also an outline of the signal, normalized to the total MC yield; the signal is also included in the stack, unless the option --noStackSig is also given
  • --showSFitShape draws an outline of the "signal"+background in which the "signal" is scaled so that the total ("signal"+background) normalization matches the data; this is useful mainly in control regions, together with --sp to define what is the "signal"
  • --plotmode can be used to choose to to produce instead of the stacked plots some non-stacked outlines normalized to the yield ( --plotmode=nostack) or normalized to unity ( --plotmode=norm)

Application of data/sim scale factors from friend trees

As described previously, while the trees already contain the reweighting factors for the base lepton selection, additional scale factors to the simulation are computed afterwards as friend trees. The main macro that computes the scale factors is macros/prepareScaleFactorsFriendTree.py, which uses classed defined under python/tools.

The tree with the scale factors are located in the friends directory within the main directory of the trees, and can be attached with the option
--FM sf/t /full/path/to/trees/friends/sfFriend_{cname}.root
where FM means 'friend for MC only', sf/t is the name of the directory and tree within the file, and sfFriend_{cname}.root is the pattern of the file name (the framework will replace {cname} with the name of the component, i.e. of the directory)

Currently, the following scale factors are provided, and can be added in the expression passwd to the W option:

  • SF_Lep{TightLoose,Tight}: scale factor for the lepton working points. SF_LepTightLoose is intended to be applied to 2l control regions, while SF_Tight in the 1l control regions (both e,μ)
  • SF_BTag ND-provided reweighting for the CSV discriminator (with 4 pairs of systematic variations will be added later: SF_btagRwt_{JES,LF,Stats1,Stats2}{Up,Down})
  • SF_trig1lep: scale factor for the single lepton trigger (e and μ are present, but these have to be used for 2e and 1e control samples only, since the 2μ and 1μ use the METNoMu triggers)
  • SF_trigmetnomu: scale factor for the trigger METNoMu. To be used for the signal selection, 2μ and 1μ control regions
  • SF_NLO: a weight to apply NLO-LO k-factors dependent on the pT of the W and Z. This means that the x-sec in the MCA files, like python/plotter/monojet/mca-74X-Vm.txt have to be the LO ones
Note that the scale factors for lepton efficiencies are appropriate for samples that have prompt leptons, not for samples with fakes.

An example for the 2μ selection would be:

$ python mcAnalysis.py monojet/mca-74X-Vm.txt -P /data1/emanuele/monox/TREES_25ns_MET200SKIM_1DEC2015 --s2v -j 8 -l 2.215 -G monojet/zmumu_twiki.txt -F mjvars/t "/data1/emanuele/monox/TREES_25ns_MET200SKIM_1DEC2015/friends/evVarFriend_{cname}.root" --FM sf/t "/data1/emanuele/monox/TREES_25ns_MET200SKIM_1DEC2015/friends/sfFriend_{cname}.root" -W 'vtxWeight*SF_trigmetnomu*SF_LepTightLoose*SF_NLO' --sp DYJetsHT

An example for the 1e selection would be:

$ python mcAnalysis.py monojet/mca-74X-Ve.txt  -P /data1/emanuele/monox/TREES_25ns_1LEPSKIM_23NOV2015 --s2v -j 8 -l 2.215  -G   monojet/wenu_twiki.txt    -F mjvars/t "/data1/emanuele/monox/TREES_25ns_1LEPSKIM_23NOV2015/friends/evVarFriend_{cname}.root"   --FM sf/t "/data1/emanuele/monox/TREES_25ns_1LEPSKIM_23NOV2015/friends/sfFriend_{cname}.root"  -W 'vtxWeight*SF_trig1lep*SF_LepTight*SF_BTag*SF_NLO'  --sp WJetsHT 

The summary of the scale factors to be applied in the different selections is the following:

selection trigger leptonID btag xsec
Z→μμ SF_trigmetnomu SF_LepTightLoose SF_BTag SF_NLO
W→μν SF_trigmetnomu SF_LepTight SF_BTag SF_NLO
Z→ee SF_trig1lep SF_LepTightLoose SF_BTag SF_NLO
W→eν SF_trig1lep SF_LepTight SF_BTag SF_NLO
signal SF_trigmentnomu 1.0 SF_BTag SF_NLO

Producing the 'prefit' plots

The scripts at the basis of the plotting is mcPlots.py, which inherits from mcAnalysis.py. There is an helper script, analysis.py which puts together the correct options for each region or need. The script can be run with the option "-r" or "--dry-run", which prints the base command, so one can modify options on the fly, and then run it.

These are the commands used to produce the yields and the 'prefit' plots for all channels:

signal region yields:  vbfdm/analysis.py -r SR --pdir plots/SR/ -d
signal region yields:  vbfdm/analysis.py -r SR -d

The regions are: 'ZM' (Z to muons),'WM' (W to muon),'ZE' (Z to electrons), 'WE' (W to electron), 'SR' (signal region). Similarly to the base scripts mcAnalysis.py and mcPlots.py, one can add some options. Most common ones are:

  • -U pattern reads the cut list only up to the selected cut, ignoring any one following it
  • --fullControlRegions loops over all the regions of the fit and makes plots / tables for all of them

Producing the inputs of the fit

The main inputs of the fit are the templates of the variable chosen for fit (TH1 histograms) with alternative shapes obtained by varying weights according to given systematic uncertainties, and transfer factors from a control region to signal region. Both can be produced by running analysis.py. Note that the latter need the former as input.

  • vbfdm/analysis.py --propSystToVar : make the nominal template and the alternative ones by propagating the systematic uncertainties. These are defined ad varied weights in the files vbfdm/syst_"CR".txt for each CR region (where "CR"=SR, ZM, etc.). The output files are in the dir "templates" if not specified by the option --pdir
  • vbfdm/analysis.py --tF : creates all the transfer factors, running over the templates previously produced. Note that what systematics to consider in numerator and denominator is dependent on the analysis, and here it is hardcoded for monojet / VBF Hinvisible analysis

To make the same for the 2D case, one has just to add the option:

  • --twodim: this reads the plot file vbfdm/common_plots_2D.txt instead of vbfdm/common_plots.txt

Producing the datacards for shape analysis

The datacards for the shape analysis are done running on the fly the selection producing the templates for the fit, but needs transfer factors previously produced and stored in a ROOT file. The main script is makeShapeCards.py, which has to be run with different options depending on the region (SR or any of the CRs), and if it is 1D or 2D fit. The script, together with the usual mca,cuts and plots files, also reads another argument with the list of normalisation systematics assigned to each process (eg. "vbfdm/systsEnv.txt"). The main options are:
  • --region : specifies the region, which will be treated by combine as a dedicated "channel". "SR" will be the one containing the signal process(es), i.e. the ones on which measure "mu"
  • --processesFromCR <process1,process2,...> specify the list of processes, separated by comma (eg. ZNuNu,W) that have to be constrained by control regions through transfer factors. This is only for SR
  • --correlateProcessCR  'process_to_be_constrained_in_SR,name_of_SR,name_of_the_histo_of_TF,name_of_ROOT_file_where_TF': connect a given process to the "signal" process in this CR. This has to be accompanied by the exclusion of the processes that make the signal in this CR, because the difference of data - backgrounds_in_CR will make the signal. Eg., when running on ZM, one has to add "--xp ZLL,EWKZLL"

The helper script vbfdm/make_cards.sh helps in making the datacards for signal and 4 CRs. The variable to be used in shape analysis and the binning is hardcoded in the script. The typical way of running the script is by giving these sorted arguments:

  • output_dir: where the datacards and ROOT files with combine inputs are
  • luminosity: the luminosity, in 1/fb, to which normalise the MC yields
  • sel_step: the selection step up to use. The subsequent cuts will be ingored. Note that this has to be consistent with the one used to produce the TFs
  • region: can be a signal region, or "all" to make SR and 4CRs in series.
Note that the correct variable has to be uncommented in the .sh file. To make the 2D fit datacards, add the additional argument at the end:
  • twodim: this adds the rebinning function to use the final bin for 2D->1D unrolling

Eg: vbfdm/make_cards.sh cards 24.7 vbfjets all.

Then one has to combine the datacards for the SR and the control regions in one. One can use combineCards.py treating each region as a "channel" (but note that since the regions are correlated, one needs at least the SR, one Z and one W CR). The combination command is:

combineCards.py SR=vbfdm.card.txt ZM=zmumu.card.txt ZE=zee.card.txt WM=wmunu.card.txt WE=wenu.card.txt > comb.card.txt

Producing the 'postfit' plots

Producing pre/post-fit plots from the output of combine is a good diagnostic tool that the fit is working correctly (eg. looking at the post-fit plots in the control regions). To do that one has first to run combine with MaximumLikelihood option, and then running a script to make the plots with the mcPlots style. To do that:
  • combine -M MaxLikelihoodFit --saveNormalizations --saveShapes --saveWithUncertainties comb.card.txt: runs the ML fit, and saves the yields after the fit, and saves the output of pre-fit and post-fit shapes, both for the B-only (signal constrained to be 0) and S+B fit. The output is in the mlfit.root file.
The file mlfit.root contains everything, but since the inputs have been converted to RooDataHist all the TH1Fs have the x-axis that corresponds to the observable and the bin content will be the PDF density / events divided by the bin width. The script postFitPlots.py helps making the conversion, apply the std plot style convention, take the data distribution, make ratio data/prediction plot. The data distribution is taken by the output of mcPlots on the desired variable, so one has to have run mcPlots.py, at least on that variable. The way to run it is:

python postFitPlots.py mcafile.txt plots.root varname mlfit.root region_name

Eg, for ZM region and "mjj_fullsel" variable: python postFitPlots.py vbfdm/mca-80X-muonCR.txt plots/ZMCR/vbfjets/plots.root mjj_fullsel mlfit.root ZM. The file "plots.root" is the one produced by mcPlots.py script.

Producing the limit plots

-- EmanueleDiMarco - 2015-04-28

Edit | Attach | Watch | Print version | History: r51 < r50 < r49 < r48 < r47 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r51 - 2017-05-03 - unknown
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback