Running GlaNtp
Introduction
We plan to run the GlaNtp package for the WH analysis in order to document what I learn from Mike and Rick. Later we will change the configuration files for the ZH analysis.
ssh -Y ppepc137.physics.gla.ac.uk
Then open an xterm terminal.
xterm
Setup
export GLANTP_TAG=00-00-49
source ~/scripts/setup_glantp.sh $GLANTP_TAG
Our code is in the folder with this the path (saved also as an environment variable)
echo $GLANTPDIR
ls $GLANTPDIR
Now we have access to several script/executable files created when the package GlaNtp was built.
ls /home/abuzatu/GlaNtpPackage/GlaNtpSVN$GLANTP_TAG/bin/Linux2.6-GCC_4_1/
A key notion to the GlaNtp package is that of steering files, which are text configuration files that are then parsed are used in C++ code. One script to test such steering files is "testSteerrv5.exe".
Workspace
cd /data/atlas13/$USER
mkdir TEST_RICKS_FRAMEWORK
export WORKSPACE=/data/atlas13/$USER/TEST_RICKS_FRAMEWORK
cd $WORKSPACE
Input files
We will work on the WH->enubb analysis with a minimal set of files, i.e. a signal, a background and a data file. Mike's files with v16 of Athena (mc11c tag p605) are located at
cd /data/atlas12/mwright/DATA/FlatTuple/V1_R1_p605/
cd /data/atlas12/mwright/MC/FlatTuple/V1_R1_p605/
We choose WH120, W+2jets and data paying attention that the W boson decays to enu in the MC samples and we pick the egamma stream for data.
ln -s /data/atlas12/mwright/MC/FlatTuple/V1_R1_p605/user.mwright.20120406142345.ATLASHbb_V1_R2_FT.id_207.WH120.12.WH120.j2327.t207.trf12.u0.2380.ANALY_TOKYO/user.mwright.020804.EXT0._00917.SMD3PDv16modifier.root test_input_WH120.root
ln -s /data/atlas12/mwright/MC/FlatTuple/V1_R1_p605/user.mwright.20120406142345.ATLASHbb_V1_R2_FT.id_207.WenuNp2.42.WenuNp2.j2297.t207.trf42.u0.4849.ANALY_GRIF-LPNHE/user.mwright.020732.EXT0._00072.SMD3PDv16modifier.root test_input_Wjets.root
ln -s /data/atlas12/mwright/DATA/FlatTuple/V1_R1_p605/user.mwright.20120408151848.DATA_periodE_EGAMMA_V1_R2_FT1.id_220.Simple_Transform.0.periodE.j2353.t220.trf0.u0.2198.ANALY_IHEP/user.mwright.020925.EXT0._00001.SMD3PDv16modifier_data.root test_input_data.root
Template file first step: runFlatReader
We will use the signal sample to create a template file, i.e. a file that contains the histograms we want properly filled with the correct weights (scale factors times efficiencies times luminosity for MC and one for data) for each event that passes our desired WenuHbb selection.
We will need to use configuration files called steer files. We will copy them from Mike's folder one by one as we need them, so that we can learn easily for what is used each one of them.
export TUTORIAL_WORKSPACE=/data/atlas13/mwright/TEST_RICKS_FRAMEWORK
ls $TUTORIAL_WORKSPACE
The first command we need to run is runFlatReader. This reads the flat ntuple we have symbolic links to and validates that the steer file we need to read them is consistent with the .root file itself.
runFlatReader
This will tell us the syntax. It looks like this.
Usage: runFlatReader[steerfile] [[file1] [file2] ..]
So we need a steer (configuration file) and then all the .root files that belong to the same physical process (like WH120). It allows for multiple files as sometimes we get different files from the grid and we do not need to merge them prior to this. We copy the appropriate steer file and run.
cp $TUTORIAL_WORKSPACE/teststeerFlatReaderATLASWHSemileptonic-v16.txt .
runFlatReader teststeerFlatReaderATLASWHSemileptonic-v16.txt test_input_WH120.root
It goes on and on printing the weights. So let's turn off the debug for the weights.
emacs -nw teststeerFlatReaderATLASWHSemileptonic-v16.txt
Replace line
DebugWeight=1
with
DebugWeight=0
Run again and it will only take a few seconds.
runFlatReader teststeerFlatReaderATLASWHSemileptonic-v16.txt test_input_WH120.root
We will end up modifying a lot the configuration files during our analysis, so we will not want to do this by hand, especially since we want to automatize the process: modify one file, run, modify the file again, run again. This is why we will use a script that Adrian created for CDF to modify a file with sed.
cp ~abuzatu/modifyFileWithSed.sh .
To change DebugWeight from 0 to 1.
./modifyFileWithSed.sh teststeerFlatReaderATLASWHSemileptonic-v16.txt DebugWeight 0 1
Let's check that the change is done correctly.
emacs -nw teststeerFlatReaderATLASWHSemileptonic-v16.txt
To change back from 1 to 0
./modifyFileWithSed.sh teststeerFlatReaderATLASWHSemileptonic-v16.txt DebugWeight 1 0
Let's check that the change is done correctly.
emacs -nw teststeerFlatReaderATLASWHSemileptonic-v16.txt
Also, we will run jobs locally and it make take a few minutes for them to finish. During that time we may do other things (like reading emails). It is helpful for the computer to give us a warning sound when it finishes running. So let's also copy this perl script Adrian wrote for CDF.
cp ~abuzatu/beeper.pl .
Now let's run the flat reader command again with the beeper at the end.
runFlatReader teststeerFlatReaderATLASWHSemileptonic-v16.txt test_input_WH120.root && ./beeper.pl
Template file second step: runFlatPlotter for one file
Now that we checked that the flat reader steer file is correct for our .root file, we can run the next command, runFlatPlotter. Actually, its better name should be "HistogramMaker" as it fills histograms with appropriate weights and fills them into a .root file, but does not plot anything. The stacked plots will be done at a later stage by taking these .root files as an input. The syntax is the same as for the runFlatReader, so we will move quicker now.
runFlatPlotter
cp $TUTORIAL_WORKSPACE/teststeerFlatPlotterATLASWHSemileptonic-v16.txt .
runFlatPlotter teststeerFlatPlotterATLASWHSemileptonic-v16.txt test_input_WH120.root && ./beeper.pl
We get errors, as this new steer file needs other steer files. We check carefully the file and then copy them as well.
cp $TUTORIAL_WORKSPACE/LeptonConfigATLASttHSemiLeptonic.txt .
cp $TUTORIAL_WORKSPACE/CutWordConfigurationAtlas.txt .
cp $TUTORIAL_WORKSPACE/TreeSpecATLASWH-v16_event.txt .
cp $TUTORIAL_WORKSPACE/TreeSpecATLASWH-v16_global.txt .
cp $TUTORIAL_WORKSPACE/VariableTreeToNTPATLASWHSemiLeptonic-v16.txt .
Now we run again
runFlatPlotter teststeerFlatPlotterATLASWHSemileptonic-v16.txt test_input_WH120.root && ./beeper.pl
We see that it was not able to save the output .root file because the folder templates did not exist. So let's create it.
mkdir templates
We also see it takes a few minutes to run, so let's allow us to change the number of events we run on.
emacs -nw teststeerFlatReaderATLASWHSemileptonic-v16.txt
old: GeneralParameter int 0 NEvent=10
new: GeneralParameter int 1 NEvent=0
These two are equivalent. 1 means we take the statement into account and so we will run on 0 events, which means all events (it is a standard convention in HEP experiments).
Now we can use the script above to change the number of events we run on. To change to 10 events.
./modifyFileWithSed.sh teststeerFlatReaderATLASWHSemileptonic-v16.txt NEvent 0 10
Now we run again
runFlatPlotter teststeerFlatPlotterATLASWHSemileptonic-v16.txt test_input_WH120.root && ./beeper.pl
We see that the beeper let us know when the running finished, the text lets us know that there were no errors and we see there is an output file in the templates folder. Lets open it and inspect it.
cd templates
root.exe TestNameChange.root
TBrowser a
We see we have very very very many histograms. Some are 2D, some are 1D. Since this is a tutorial, we want the minimal set of variables, which is therefore only ... two. Let's check MET and METPhi. It will run quicker and we can actually check the .root file better.
The list of variables to be made histograms for are in the file "VariableTreeToNTPATLASWHSemiLeptonic-v16.txt". Each variable has a number from 1 onward. We will keep only three variables and number them 1 and 2 and remove the rest. Let's choose them to be
# Variables to create templates for
ListParameter EvInfoTree:1 1 my_MET:MET_RefFinal_et_corrected/MET_RefFinal_et_corrected
ListParameter EvInfoTree:2 1 my_METPhi:MET_RefFinal_phi_corrected/MET_RefFinal_phi_corrected
We note that the names "my_MET" and "my_METPhi" are chosen by us, whereas the following two are the names of the branch and the leaf in the .root file, so they have to be defined in the tree spec steer file.
What other files do we need to change regarding the variables? We use grep to guide us.
grep MET_RefFinal_et_corrected *.txt
This points us to the flat plotter steer file where we tell for each variable the properties of its histograms. We keep these and remove the rest.
# Use VariableTreeToNTP in FlatReader to get variables to match in flat plotter steer.
ColumnParameter ExtraPlotVariables 1 my_MET=1
ColumnParameter ExtraPlotVariables 2 my_METPhi=2
# Histogram properties
ColumnParameter SpecifyHist:my_MET 1 OnOff=1:Max=100000:NBin=10
ColumnParameter SpecifyHist:my_METPhi 2 OnOff=1:Min=-3.5:Max=3.5:NBin=10
We run again
runFlatPlotter teststeerFlatPlotterATLASWHSemileptonic-v16.txt test_input_WH120.root && ./beeper.pl
We check the .root file and indeed we see that we have two 1D histograms plus one 2D histogram between them. For each of these we have unweighted and weighted. We look at the unweighted single histograms like "InputVarWenuHbb_1_0_0". 1 refers to the first variable, the lepton pt. Then 0 means unweighted (I think). Then 0 means 0 jets. Since we have a signal with 2 jets, we will get zero except for the two jets. So we will look at the variables "InputVarWenuHbb_1_0_2", and "InputVarWenuHbb_2_0_2". Now we already see that there is a problem, as there are entries but with a weight of zero. The problems lie in the input ntuples.
root.exe templates/TestNameChange.root
To see the problem with the weights, let's turn on the debug weight and run again.
./modifyFileWithSed.sh teststeerFlatReaderATLASWHSemileptonic-v16.txt DebugWeight 0 1
runFlatPlotter teststeerFlatPlotterATLASWHSemileptonic-v16.txt test_input_WH120.root && ./beeper.pl
We see that most events have the scale factor of zero, which brings the total weight to zero.
evInfo_sf: 0
globalInfo_Xsect: 0.127944
globalInfo_BrFrac: 1
globalInfo_FilterEff: 0.6923
evInfo_lumiForType: 1034.95
nGenForType: 59974
weight: 0
Precomputed weight: 0
Most MC events have a sf of zero and Mike is looking into this. It seems there is a bug in the flat ntuple.
Let's check how this looks for data events
runFlatPlotter teststeerFlatPlotterATLASWHSemileptonic-v16.txt test_input_data.root && ./beeper.pl
The output is
evInfo_sf: 0
globalInfo_Xsect: 1
globalInfo_BrFrac: 1
globalInfo_FilterEff: 1
evInfo_lumiForType: 1034.95
nGenForType: 48085
weight: 0
Precomputed weight: 0
Mike suggests the following solution: set the use computed weight to 1 (instead of zero) in the data files only. When I do that, I get indeed files filled in.
./modifyFileWithSed.sh teststeerFlatReaderATLASWHSemileptonic-v16.txt UseComputedWeight 0 1
runFlatPlotter teststeerFlatPlotterATLASWHSemileptonic-v16.txt test_input_data.root && ./beeper.pl
Now we turn off again the weight debug.
./modifyFileWithSed.sh teststeerFlatReaderATLASWHSemileptonic-v16.txt DebugWeight 1 0
runFlatPlotter teststeerFlatPlotterATLASWHSemileptonic-v16.txt test_input_WH120.root && ./beeper.pl
Now we bring back the default of UseComputedWeight of zero and to run on all events.
./modifyFileWithSed.sh teststeerFlatReaderATLASWHSemileptonic-v16.txt UseComputedWeight 1 0
./modifyFileWithSed.sh teststeerFlatReaderATLASWHSemileptonic-v16.txt NEvent 0 10
Template file third step: runFlatPlotter for all processes
At this point the steer files ask to run on all events in one file for one process, to not use the computed weight and to not show debug statements. All output files have the same name. We will change that also according to the process name. We modify another script from CDF to create a small script that will be able to run on all the process one at a time, but in one command.
cp ~/runTemplates.sh .
./runTemplates.sh
It shows you the instructions
Usage: ./runTemplates PROCESSes EVENTs UseComputedWeight DebugWeight
Usage: ./runTemplates WH120+Wjets+data 0 0 0
Usage: ./runTemplates WH120+Wjets+data 10 0 0
PROCESSes : List of processes we want to run on
EVENTs : Number of events to run on (0 means all)
UseComputeWeight : When 1 it will force the event weight to be one (necessary for data) and temporary for MC until we fix the bug of SF=0
DebugWeight : When 1 it will show the weights for events, good for debugging, not shown though when UseComputeWeight is 1.
So now we can run on all these templates (WH120, Wjets and data) with the caveat that we force a weight of 1 not only for the data, but also for MC events (the latter until the bug is fixed). Then we can have nice templates to use when making stacked plots. It ran on all 3 processes in 4 minutes.
Stacked control plots
The command to run is
DisplayFlatStackedInput
When we run with no argument, it does not give an example, but simply crashes. So the syntax is the following
DisplayFlatStackedInput tmva_file_not_needed ${inputFileDir}/ ${outputFileDir}/ ${steerFileName} ${sysFileName} ${mH}
The first is a tmva file which we do not need at the moment, as this is used only for the NN output. The second is the folder where the templates are located, in our case "templates". Then we have a steering file for plotting and another one for the systematic, which will add the band for systematic uncertainty on the plots. The final is the Higgs boson mass, which is 120 in our case. Let's copy the configuration files.
cp $TUTORIAL_WORKSPACE/FlatStackInputSteer.txt .
cp $TUTORIAL_WORKSPACE/FlatSysSetAtlaswh1.txt .
Now we try to run
DisplayFlatStackedInput tmva_file_not_needed templates/ plots/ FlatStackInputSteer.txt FlatSysSetAtlaswh1.txt 120
It will crash saying that maps are empty. So we need to modify these steer files for our three processes.
emacs -nw FlatStackInputSteer.txt
ListParameter InputWjets 1 test_output_Wjets.root
ListParameter InputWH120 1 test_output_WH120.root
ListParameter InputData 1 test_output_data.root
ListParameter ProcessList 1 Wjets:WH120:Data
We also see that this steer file requires this other steer file "ATLASWHDiscrToLabel.txt", so we copy it as well.
cp $TUTORIAL_WORKSPACE/ATLASWHDiscrToLabel.txt .
In this file we need to tell what to plot on the axes for each histogram we want to create. We have only two variables, so the content of the file is like this after we change the order of the variables to be consistent with the other files in 1 and 2.
ListParameter DiscrToLabel:1 1 MET_RefFinal_et_corrected:E_{t}^{MET}_{1},MeV
ListParameter DiscrToLabel:2 1 MET_RefFinal_phi_corrected:#phi^{MET}_{1},rad
Let's also edit the systematics steer file to contain only these lines for our processes.
ColumnParameter Lumi 0 OnOff=1:Low=-0.034:High=0.034:Channel=1 # Process=Wjets
ColumnParameter Lumi 1 OnOff=1:Low=-0.034:High=0.034:Channel=1 # Process=WH120
ColumnParameter Lumi 2 OnOff=1:Low=-0.034:High=0.034:Channel=1 # Process=Data
Now I have done all the changes I think I should do. But I still get a big error with empty maps.
InputVarFile: tmva_file_not_needed
BaseInputDir: templates/
OFile : plots
Steer : FlatStackInputSteer.txt
Sysfile : FlatSysSetAtlaswh1.txt
Mh : 120
Debug : 0
InputVarFile: tmva_file_not_needed
BaseInputDir: templates/
OFile : plots
Steer : FlatStackInputSteer.txt
Sysfile : FlatSysSetAtlaswh1.txt
Mh : 120
Debug : 0
FlatStackInput::LoadProcessInfo_: Start
Background list Start
Map is empty
Background list End
Signal list Start
Map is empty
Signal list End
Signal Scale Start
Map is empty
Signal Scale End
Data list Start
Map is empty
Data list End
PseudoData list Start
Map is empty
PseudoData list End
Begin Printout of all Processes
Map is empty
End Printout of all Processes
Begin Printout of Process order
Map is empty
End Printout of Process order
Begin Printout of all Labels for Processses
Item Map is empty
End Printout of all Labels for Processes
Begin Printout of all Palettes
Any : 0
FlatStackInput::CreateProcessToFileMap_: Process Wjets not known. Stop reading.
FlatStackInput::Display: Failed to create process to file map
It fails because one other steer file is missing, the one about the physics processes, the one that tells which is signal, which is background and which is data, what colors should they be printed with, what order they should be printed in.
cp $TUTORIAL_WORKSPACE/FlatAtlasWHPhysicsProc1.txt .
I limited the file to only three processes. It looks like this.
File Edit Options Buffers Tools Help
ColumnParameter BackgroundList 0 Wjets=0
ColumnParameter SignalList 1 WH120=1
ColumnParameter DataList 2 Data=2
#
# ColumnParameter PseudoDataList 0 Fake=0
#
ListParameter ProcessLabels:0 1 Wjets:ttbar
ListParameter ProcessLabels:1 1 WH120:WH120
ListParameter ProcessLabels:2 1 Data:Data
#
# A list of the palettes available and defined below - labelled 0,1,2...
ColumnParameter PaletteList 1 UCSDPalette=0:PrimaryColorPalette=1
#
ColumnParameter UCSDPalette 0 Wjets=10
ColumnParameter UCSDPalette 1 WH120=2
#
ColumnParameter PrimaryColorPalette 0 Wjets=400
ColumnParameter PrimaryColorPalette 1 WH120=632
#
ColumnParameter ProcessOrder 0 Wjets=0
ColumnParameter ProcessOrder 1 WH120=1
ColumnParameter ProcessOrder 2 Data=2
Now we try to run again
DisplayFlatStackedInput tmva_file_not_needed templates/ plots/ FlatStackInputSteer.txt FlatSysSetAtlaswh1.txt 120
And it works. We get plots in the plots folder.
Limit Calculation
Using the templates we can now compute the limits. The command to use is "Fit"
Fit
We see then the syntax
Fit <histlist> <basehistdir> <sysfile> <steerfile> <mh>
We have to copy the file for "histlist"
cp $TUTORIAL_WORKSPACE/atlaswh_histlist_flat-v16.txt .
The second argument is the folder that holds the templates, so in our case "templates/"
The systematic steer file is the one we already have and have used when making the stacked plots, namely "FlatSysSetAtlaswh1.txt".
The main steer file for the fit, "steerfile", also must be copied
cp $TUTORIAL_WORKSPACE/FlatFitSteer.txt .
The mass point is the last argumet and in our case is "120".
The file "atlaswh_histlist_flat-v16.txt" contains the name of the histograms and their path and their file to go in the limit calculation. For our three processes it looks like this.
-1 -------------------------------------------------------- x -1
-1 Wjets x -1
-1 -------------------------------------------------------- x -1
0 test_output_Wjets.root FlatPlotter/InputVarWenuHbb_1_0_0 0
-1 -------------------------------------------------------- x -1
-1 WH120 x -1
-1 -------------------------------------------------------- x -1
1 test_output_WH120.root FlatPlotter/InputVarWenuHbb_1_0_0 0
-1 -------------------------------------------------------- x -1
-1 Data x -1
-1 -------------------------------------------------------- x -1
2 test_output_data.root FlatPlotter/InputVarWenuHbb_1_0_0 0
-1 -------------------------------------------------------- x -1
In the file "FlatFitSteer.txt" we also make a few changes. "ProcToFit" tells the signal process. "NumberPseudoExperiments" tells the number of pseudoexperiments. "NSysTestExpected" tells the number of systematic variations for the expected limit. "NSysTestObserved" the same for the observed limit. "HistOutput" the name of the output files. We also see that this file uses a few other files. Some we already have, as "FlatSysSetAtlaswh1.txt" and "FlatAtlasWHPhysicsProc1.txt", but some we have to copy as well.
cp $TUTORIAL_WORKSPACE/SysNamesAtlasWH1.txt .
cp $TUTORIAL_WORKSPACE/AtlasWHRealTitles.txt .
The file "SysNamesAtlasWH1.txt" looks like this, with only the luminosity systematic considered
ListParameter SysInfoToSysMap:1 1 Lumi:Lumi
# ListParameter SysInfoToSysMap:2 1 Met:Met
# ListParameter SysInfoToSysMap:3 1 nloAccep:NLOAccep
# ListParameter SysInfoToSysMap:4 1 xsec:xsec
# ListParameter SysInfoToSysMap:5 1 pdf:pdf
# ListParameter SysInfoToSysMap:6 1 Fake:fake
# ListParameter SysInfoToSysMap:7 1 btag:btag
# ListParameter SysInfoToSysMap:8 1 JES:JES
We modify the file "AtlasWHRealTitles.txt" to contain only our processes.
Process_0_0 Wjets:WenuHbb
Process_1_0 WH120:WenuHbb
Process_2_0 Data:WenuHbb
We are now ready to run
Fit atlaswh_histlist_flat-v16.txt templates/ FlatSysSetAtlaswh1.txt FlatFitSteer.txt 120
And we have an expected limit for our five pseudoexperiments! Do not care about the values, they are wrong, but we have the code running.
Done fill pseudoexperiment distn with 5 entries
Bayesian s95 limit: nan +- 0
Expected limits:
120 -2 sigma: 0.312945 NSig= 840 NBkg= 4273 NData= 3559
120 -1 sigma: 0.361552 NSig= 840 NBkg= 4273 NData= 3559
120 median: 0.400759 NSig= 840 NBkg= 4273 NData= 3559
120 +1 sigma: 0.566271 NSig= 840 NBkg= 4273 NData= 3559
120 +2 sigma: 0.566271 NSig= 840 NBkg= 4273 NData= 3559
You see the median expected limit, and the plus and minus, one and two sigma variations. The observed limit should be after "Bayesian s95 limit:", but there is no answer. We will have to see how to compute an observed limit.
Congratulations! Now you have a cut-based analysis from beginning to end in the signal region!
Latest version
Now that you ran all the step by step process, you can run directly the optimized version with more automatization and improved flat ntuples and addition of ttbar by copying the folder
cp -r /data/atlas13/abuzatu/TUTORIAL_WH_RUN_RICKS_FRAMEWORK .
cd TUTORIAL_WH_RUN_RICKS_FRAMEWORK
and run
./runTemplates WH120+Wjets+ttbar 0 0 0
./runTemplates data 0 1 0
./runPlots 120
./runLimits 120
Easy, right? :P
--
AdrianBuzatu - 25-Apr-2012