VHbb boosted - Xbb framework
Directory: /t3home/acalandr/VHbb/boosted_2016/CMSSW_10_1_0/src/Xbb/python
1lep 2016: /t3home/acalandr/VHbb/boosted_2016/CMSSW_10_1_0/src/Xbb/python/Wlv2016config
2lep 2016: /t3home/acalandr/VHbb/boosted_2016/CMSSW_10_1_0/src/Xbb/python/Zll2016Nanoconfig
1lep 2017:
/t3home/acalandr/VHbb/boosted_2016/CMSSW_10_1_0/src/Xbb/python/Wlv2017config
Training 1lep boosted 2017: /t3home/acalandr/VHbb/boosted_2016/CMSSW_10_1_0/src/Xbb/python/weights_1lep2017_boosted
LogFIles for 1lep 2016 studies in /mnt/t3nfs01/data01/shome/acalandr
1 - Running framework - List of inputs
for j in `ls -1`; do for i in `ls -1 ${j}/*/*/*/*.root`; do echo ${PWD}/${i} | sed 's/\eos\/cms\///g'; done > ./new_dir/${j}.txt; done
2 - Running framework - Prep Step
1) Prep step
./submit.py -T Wlv2016 -F prep-v1 -J prep -N 10
—> check all samples are processed:
./submit.py -T Wlv2016 -J checklogs —resubmit
PREPout: root://t3dcachedb03.psi.ch:1094//pnfs/psi.ch/cms/trivcat/store/user/acalandr/VHbb/VHbbPostNano2016/Wlv/prep_v1/
--> New method for PREPStep (USE THIS METHOD!!!):
./submit.py -T
Zll2016Nano -J run --modules=Prep.VHbb -F prep --input PREPin --output PREPout --set='Directories.samplefiles:=<!Directories|samplefiles_split!>'
3 - Running framework - Sys Step
./submit.py -T Wlv2016 -F sysnew-sys -J sysnew --addCollections Sys.sys_all_BoostedAndResolved -I
List of systematics uncertainties included in general.ini:
sys_all_BoostedAndResolved = ['Sys.TTweights','Sys.LeptonWeights','Sys.EWKweights','Sys.BTagWeights','Sys.isSignal', 'Sys.isWH', 'Sys.isData', 'Sys.HeppyStyleGen', 'Sys.FitCorr','Sys.GetTopMass','Sys.GetWTMass','Sys.DYspecialWeight','Sys.VptWeightSimFit','Sys.DoubleBTagWeightsSimFit']
The python modules for each systematics uncertainty is in myutils/.
—> run without Higgs module: ‘Sys.HiggsCandidateSystematics’ to make it faster. Then, I’ll have to run with the Higgs module inside after the plots are done
—> using
BTagWeight cMVAv2 for 94X campaign (2016 reprocessing), csv file: cMVAv2_Moriond17_B_H.csv
SYSout: root://t3dcachedb03.psi.ch:1094//pnfs/psi.ch/cms/trivcat/store/user/acalandr/VHbb/VHbbPostNano2016/Wlv/sys_v6/
—> check all samples are processed:
./submit.py -T Wlv2016 -J checklogs —resubmit
The weights in 'weightF' (in general.ini) are produced after the SYS step because they come from the modules in general.ini which accounts for the systematics uncertainties.
4 - Running framework - Cacheplot Step
./submit.py -T Wlv2016 -F cacheplot-v2 -J cacheplot -i
In order to cache one specific sample, add '-S
SingleElectron ' to command.
—> check all samples are processed:
./submit.py -T Wlv2016 -J checklogs —resubmit
Output of cachestep: tmpSamples = root://t3dcachedb03.psi.ch:1094//pnfs/psi.ch/cms/trivcat/store/user/acalandr/VHbb/VHbbPostNano2016/Wlv/tmp/v3/
[no need to change the output of the cachepot each time as hash module will take care of picking correct samples. Everything depends on the plottingSamples: <!Directories|SYSout!>]
--> Plots can be performed either after the PREP step but some of the weights can be missing because some of the weights are calculated at the SYS step stage (they come from the evaluation of the modules for the systematics uncertainties).
5 - Running framework - Plot Step
change stuff in
VHbbPlotDef (new variables) and add stuff in plot.ini
NB:
Cut_BOOST = (<!General|Boost_doubleb!> && <!General|DphiMET_Lep!> < 2 && <!General|NaddLep!> == 0 && V_pt > 250) in cuts.ini —> if I use Boost_doubleb it doesn’t complain because of the btag_jetidx being wrong in the case it doesn’t find one fat jet. For the prep step, I had <!General|Boost_doubleb!> which was more inclusive.
Additional line in plots.ini to define which variable to plot:
var_additionalBTAGALGOS: DeepAK8_bbVSlight,DeepAK8_bbVST
The variable definition for the plots is in vhbbPlotDef.ini.
6 - Running framework - BDT Training Step
** In plots.ini:
trainingBKG = <!Plot_general|WJet!>,<!Plot_general|DY!>,<!Plot_general|ST!>,<!Plot_general|TT!>,<!Plot_general|VV!>
trainingSig = <!Plot_general|allSIG!>
where allSig in trainingSig is 'WminusH','WplusH','ZH','ggZH' and trainingBkg has all the backgrounds
*except* QCD (because it's spiky).
///////
**Variables used for training:
Nominal:
FatJet _msoftdrop_nom
FatJet _pt_nom MET_Pt V_mt SA5
FatJet _pt[Hbb_fjidx]/V_pt abs(
FatJet _eta[Hbb_fjidx]-V_eta)
FatJet _deepTagMD_bbvsLight[Hbb_fjidx] 1/(1+(
FatJet _deepTagMD_TvsQCD[Hbb_fjidx]/FatJet_deepTagMD_HbbvsQCD[Hbb_fjidx])*(1-FatJet_deepTagMD_HbbvsQCD[Hbb_fjidx])/(1-FatJet_deepTagMD_TvsQCD[Hbb_fjidx]))
///////
** In training.ini:
- systematics = nominal
./submit.py -T
LxplusZll -F cachetraining-v1 -J cachetraining [cachetraining step] (./submit.py -T Wlv2016 -J checklogs - - resubmit to repeat for killed jobs)
./submit.py -T
LxplusZll -F runtraining-v1 -J runtraining [training step]
6B - Running framework - BDT Training Step [adding systematics]
Using /t3home/acalandr/VHbb/boosted_2016/CMSSW_10_1_0/src/Xbb/python/MakeSysList.py, I get the list of UP/DOWN variations for the systematics uncertainties on the BDT inputs to be added in training.ini
7 - Running framework - BDT Evaluation Step
./submit.py -T Zvv2017 -F eval-v1 -J eval -I -N 1
If there are failed jobs, launch:
./submit.py -T
LxplusZll -F eval-v1-reprocess-failed-jobs -J eval -k -N 1
If you want to run with
DeepAK8 weights in: ./submit.py -T Wlv2018 -J run --modules=Eval.BDT_Wlv_BOOSTFinal_wdB,VHbbCommon.DoubleBtagSF -F tmp --input MVAin --output MVAout
If you want to run with
DeepAK8 weights + new flavour definition (in 2018; in 2017 we already have the flavour definition in the ntuple): /submit.py -T Wlv2018 -J run --modules=Eval.BDT_Wlv_BOOSTFinal_wdB,VHbbCommon.DoubleBtagSF,VHbbCommon.HeppyStyleGen -F tmp --input MVAin --output MVAout
For DNN evaluation, if I have a new DNN:
./submit.py -T Wlv2017 -F log -J run --input MVAin --output MVAout --addCollections Sys.Eval
where Eval contains the DNN to be evaluated (inside the block Sys in general.ini)
8 - Running framework - Datacards
1) cache the dactacards: ./submit.py -T
LxplusZll -F cachedc-v1 -J cachedc --parallel=8 ('./submit.py -T
LxplusZll -F cachedc-v1 -J cachedc --parallel=8 -k' to resubmit the failed jobs). The output of this step goes to the /tmp directroy (similarly as for the cacheplot step)
2) produce datacards: ./submit.py -T
LxplusZll -F rundc-v1 -J rundc
3) merge root files: ./submit.py -T
LxplusZll -F rundc-v1 -J mergedc
4) run Combine to get the Significance (see section 9)
9 - Running framework - CombineHarvester Statistical method
Datacards produced in Step 8.3 are in log_Wl2016_v2/run-dc/Limits/*txt
To merge all the datacards: python ../../../../scripts/combineCards.py Wlfe=vhbb_DC_TH_Wle_Wlfv11_BOOST.txt Wlfm=vhbb_DC_TH_Wlm_Wlfv11_BOOST.txt Whfe=vhbb_DC_TH_Wle_Whf_BOOST.txt Whfm=vhbb_DC_TH_Wlm_Whf_BOOST.txt tte=vhbb_DC_TH_Wle_tt_BOOST.txt ttm=vhbb_DC_TH_Wlm_tt_BOOST.txt SRe=vhbb_DC_TH_Sige_BOOST.txt SRm=vhbb_DC_TH_Sigu_BOOST.txt > vhbb_DC_TH_M125_Wlv_Boostovb.txt
Using
CombineHarvester to get the significance: /t3home/acalandr/VHbb/boosted_2016/CMSSW_10_1_0/src/Xbb/python/CMSSW_10_2_13/src/HiggsAnalysis/
-> in the /script directory, the datacards are inside the Limits/ folder (with and without
DeepAK8 in the BDT training for the SR)
10 - Running framework - Fit Convergence
combine -M
FitDiagnostics -m 125 --robustFit=1 --stepSize=0.01 --X-rtd MINIMIZER_MaxCalls=9999999 --cminApproxPreFitTolerance=10 --saveNorm -v 3 --saveShapes --saveWithUncertainties --cminPreScan vhbb_DC_TH_M125_Wlv_Boostovb_removeJETscalesReso_all.txt
Datcards without resolution and shape corrections for jets removed: /t3home/acalandr/VHbb/boosted_2016/CMSSW_10_1_0/src/Xbb/python/CMSSW_10_2_13/src/HiggsAnalysis/CombinedLimit/scripts/ vhbb_DC_TH_M125_Wlv_Boostovb_removeJETscalesReso_all.txt
11 - Running framework - Producing profit and postfit plots
Prefit:
./submit.py -T Wlv2016 -J postfitplot --local -F postfit_test
Postfit:
./submit.py -T Wlv2016 -J postfitplot --local -F postfit_test --set="Fit.FitType:=shapes_fit_s"
abs(
FatJet _eta[Hbb_fjidx]-V_eta)
//////
//////
//////
//////
12 - Running framework -Running module that adds 'isBoosted' flag.
./submit.py -T Zvv2017 -J run --modules=VHbbCommon.isBoosted --input MVAin --output MVAout
Statistical tools 1 - Asimov in SR (VHbb)
1) creation of datacards: python scripts/VHLegacy.py --Znn_fwk Xbb --Wmn_fwk Xbb --Wen_fwk Xbb --Zee_fwk Xbb --Zmm_fwk Xbb
==
2) creation of workspace: combineTool.py -M
T2W -i output/${COMBFOLDERSTXS}2017/cmb/ -o "ws_stxs_fine.root" -P
HiggsAnalysis.CombinedLimit.PhysicsModel:multiSignalModel --PO verbose --PO 'map=.*/.*ZH_lep_PTV_75_150_hbb:r_zhlow[1,0,5]' --PO 'map=.*/.*ZH_lep_PTV_150_250_0J_hbb:r_zhmednoj[1,0,5]' --PO 'map=.*/.*ZH_lep_PTV_150_250_GE1J_hbb:r_zhmedwithj[1,0,5]' --PO 'map=.*/.*ZH_lep_PTV_GT250_hbb:r_zhhi[1,0,5]' --PO 'map=.*/.*WH_lep_PTV_150_250_0J_hbb:r_whmed[1,0,5]' --PO 'map=.*/.*WH_lep_PTV_150_250_GE1J_hbb:r_whmed[1,0,5]' --PO 'map=.*/.*WH_lep_PTV_GT250_hbb:r_whhi[1,0,5]’
==
3) best fit: combineTool.py -M
MultiDimFit -d output/${COMBFOLDERSTXS}2017/cmb/ws_stxs_fine.root --setParameters r_zhlow=1,r_zhmednoj=1,r_zhmedwithj=1,r_zhhi=1,r_whmed=1,r_whhi=1 --redefineSignalPOIs $(./scripts/getPOIs_STXS.py STXSfine -p) --setParameterRanges $(./scripts/getPOIs_STXS.py STXSfine -r) $(./scripts/getPOIs_STXS.py STXSfine -O) --saveInactivePOI=1 --saveToys --saveWorkspace -t -1 -n .STXSfine_BestFit_prefit
==
4) scan with systematics: combineTool.py -M
MultiDimFit -d higgsCombine.STXSfine_BestFit_prefit.MultiDimFit.mH120.123456.root -D 'toys/toy_asimov' --generate $(./scripts/getPOIs_STXS.py STXSfine -g) --redefineSignalPOIs $(./scripts/getPOIs_STXS.py STXSfine -p) --setParameterRanges $(./scripts/getPOIs_STXS.py STXSfine -r) $(./scripts/getPOIs_STXS.py STXSfine -O) --saveInactivePOI=1 --points 50 --floatOtherPOIs 1 --snapshotName "MultiDimFit" --skipInitialFit --algo grid --split-points 1 --job-mode script --task-name STXS_FINE_VH_scans -n .STXS.FINE.VH >jobs1.txt
5) running in batch scan for systematics: for i in `cat jobs1.txt | awk '{print $4}'`; do sbatch --job-name=STXSfit${i/.sh/} --mem=3000M --time=0-01:30 --output=/mnt/t3nfs01/data01/shome/$USER/VHbb/CMSSW_10_1_0/src//Xbb/python/logs_Wlv2017//fit_${i/.sh/}.log ./${i} ; done
6) scan without systematics: combineTool.py -M
MultiDimFit -d higgsCombine.STXSfine_BestFit_prefit.MultiDimFit.mH120.123456.root -D 'toys/toy_asimov' --generate $(./scripts/getPOIs_STXS.py STXSfine -g) --redefineSignalPOIs $(./scripts/getPOIs_STXS.py STXSfine -p) --setParameterRanges $(./scripts/getPOIs_STXS.py STXSfine -r) $(./scripts/getPOIs_STXS.py STXSfine -O) --saveInactivePOI=1 --points 50 --floatOtherPOIs 1 --snapshotName "MultiDimFit" --skipInitialFit --algo grid --split-points 1 --freezeParameters allConstrainedNuisances --job-mode script --task-name STXS_FINE_VH_scans_frall -n .STXS.fr.all.FINE.VH >jobs2.txt
7) running in batch scan without systematics: for i in `cat jobs2.txt | awk '{print $4}'`; do sbatch --job-name=STXSfit${i/.sh/} --mem=3000M --time=0-00:30 --output=/mnt/t3nfs01/data01/shome/$USER/VHbb/CMSSW_10_1_0/src//Xbb/python/logs_Wlv2017//fit_${i/.sh/}.log ./${i} ; done
8) plot likelihood scan:
mkdir VHbb_STXS_scans
cd VHbb_STXS_scans
mkdir results
mkdir plots
mv ../higgsCombine.STXS.FINE.VH.*.root results/
mv ../higgsCombine.STXS.fr.all.FINE.VH.*.root results/
cd results
for P in $(../../scripts/getPOIs_STXS.py STXSfine -P); do hadd -k -f scan.${P}.root higgsCombine.STXS.FINE.VH.${P}.POINTS.*.root; rm higgsCombine.STXS.FINE.VH.${P}.POINTS.*.root; done;
for P in $(../../scripts/getPOIs_STXS.py STXSfine -P); do hadd -k -f scan.${P}.fr.all.root higgsCombine.STXS.fr.all.FINE.VH.${P}.POINTS.*.root; rm higgsCombine.STXS.fr.all.FINE.VH.${P}.POINTS.*.root; done;
cd ../
INPUT="results"; OUTPUT="plots"; for P in $(../scripts/getPOIs_STXS.py STXSfine -P); do eval python ../scripts/plot1DScan.py -o scan_nominal_${P} --POI ${P} --translate ../scripts/pois.json --model STXS --json ${OUTPUT}/STXSfine.json --others \"${INPUT}/scan.${P}.fr.all.root:Freeze all:8\" --breakdown "Syst,Stat" --meta "POIs:${P}" -m ${INPUT}/scan.${P}.root --y-max 10 --no-input-label --outdir ${OUTPUT}/; done
9) plot final: python ../scripts/summaryPlot.py -i 'plots/STXSfine.json:STXS/r_whmed,r_whhi,r_zhlow,r_zhmednoj,r_zhmedwithj,r_zhhi' --vlines '1.0:LineStyle=2' --subline="41.5 fb^{-1} (13
TeV - 2017)" -o plots/summary_stxs --translate ../scripts/pois.json
Statistical tools 2 - Asimov in SR (VZbb)
1) as VH
2a) inclusive: combineTool.py -M
T2W -i output/vhbb2017_VZvers1/cmb/ -o "ws_stxs_2017_VZbb_inclusive_2704.root" -P
HiggsAnalysis.CombinedLimit.PhysicsModel:multiSignalModel --PO verbose --PO 'map=.*/.*VVHF:r_vzbb[1,0,5]'
2b) STXS: combineTool.py -M
T2W -i output/vhbb2017_VZvers1/cmb/ -o "ws_stxs_2017_VZbb_v5_2704_STXSbased.root" --PO verbose -P
HiggsAnalysis.CombinedLimit.PhysicsModel:multiSignalModel --PO 'map=vhbb_Z.*_5_.*/.*VVHF:r_zhmednoj[1,0,5]' --PO 'map=vhbb_Z.*_1_.*/.*VVHF:r_zhlow[1,0,5]' --PO 'map=vhbb_Z.*_9_.*/.*VVHF:r_zhmedwithj[1,0,5]' --PO 'map=vhbb_Z.*_13_.*/.*VVHF:r_zhhi[1,0,5]' --PO 'map=vhbb_Z.*_17_.*/.*VVHF:r_zhhi[1,0,5]' --PO 'map=vhbb_W.*_5_.*/.*VVHF:r_whmed[1,0,5]' --PO 'map=vhbb_W.*_13_.*/.*VVHF:r_whhi[1,0,5]' --PO 'map=vhbb_W.*_17_.*/.*VVHF:r_whhi[1,0,5]'
3) combineTool.py -M
MultiDimFit -d output/vhbb2017_VZvers1/cmb/ws_stxs_2017_VZbb_inclusive_2704.root --setParameters r_vzbb=1 --redefineSignalPOIs $(./scripts/getPOIs_STXS_VZbb.py STXSfine -p) --setParameterRanges $(./scripts/getPOIs_STXS_VZbb.py STXSfine -r) $(./scripts/getPOIs_STXS_VZbb.py STXSfine -O) --saveInactivePOI=1 --saveToys --saveWorkspace -t -1 -n .STXSfine_BestFit_prefit
4) combineTool.py -M
MultiDimFit -d higgsCombine.STXSfine_BestFit_prefit.MultiDimFit.mH120.123456.root -D 'toys/toy_asimov' --generate $(./scripts/getPOIs_STXS_VZbb.py STXSfine -g) --redefineSignalPOIs $(./scripts/getPOIs_STXS_VZbb.py STXSfine -p) --setParameterRanges $(./scripts/getPOIs_STXS_VZbb.py STXSfine -r) $(./scripts/getPOIs_STXS_VZbb.py STXSfine -O) --saveInactivePOI=1 --points 50 --floatOtherPOIs 1 --snapshotName "MultiDimFit" --skipInitialFit --algo grid --split-points 3 --job-mode script --task-name STXS_FINE_VH_scans -n .STXS.FINE.VH >jobs1.txt
5) as VH
6) combineTool.py -M
MultiDimFit -d higgsCombine.STXSfine_BestFit_prefit.MultiDimFit.mH120.123456.root -D 'toys/toy_asimov' --generate $(./scripts/getPOIs_STXS_VZbb.py STXSfine -g) --redefineSignalPOIs $(./scripts/getPOIs_STXS_VZbb.py STXSfine -p) --setParameterRanges $(./scripts/getPOIs_STXS_VZbb.py STXSfine -r) $(./scripts/getPOIs_STXS_VZbb.py STXSfine -O) --saveInactivePOI=1 --points 50 --floatOtherPOIs 1 --snapshotName "MultiDimFit" --skipInitialFit --algo grid --split-points 3 --freezeParameters allConstrainedNuisances --job-mode script --task-name STXS_FINE_VH_scans_frall -n .STXS.fr.all.FINE.VH >jobs2.txt
7) as VH
8) as VH
- -> All the _VZbb files are here:
where getPOIs_STXS_VZbb.py is /work/acalandr/merging_resolvedBoosted_2018/2017_final/CMSSW_10_2_13/src/CombineHarvester/VHLegacy/scripts/
Statistical tool 3: unblinded fit to data CR for SF determination
1) Creating ws on CR-only datacards/shapes: combineTool.py -M
T2W -i output/${COMBFOLDERSTXS}2017/cmb_CRonly/ -o "ws_forCR.root" --PO verbose
2A) Example Fit 1lep: combineTool.py -M FitDiagnostics -d output/vhbb2017/Wln_CRonly/ws_forCR_1lep.root --there --freezeParameters r --X-rtd MINIMIZER_MaxCalls=9999999 --cminPreFit 1 --cminDefaultMinimizerTolerance 10 --verbose 5
2B) Example Fit 2lep: combineTool.py -M
FitDiagnostics -d output/vhbb2017/Zll_CRonly/ws_forCR_2lep.root --there --freezeParameters r --X-rtd MINIMIZER_MaxCalls=9999999 --cminPreFit 1 --cminDefaultMinimizerTolerance 10 --verbose 5
Merging datacards for combined fit
Merging datacards:
python scripts/prepareVHbbComb.py --dir2016 vhbb2016 --dir2017 vhbb2017 --dir2018 vhbb2017 --postfix myoutput
Creating a workspace (CR-only fit):
combineTool.py -M
T2W -i mycombined_dc.txt -o "ws_combined_CRonly_yeardependentSF.root" --PO verbose
add: --for-fits --no-wrappers to workspace creation, for quick workspace creation
Fit (number 1):
combineTool.py --X-rtd MINIMIZER_MaxCalls=9999999 --cminPreFit 1 --cminDefaultMinimizerStrategy 2 --verbose 5 -M
MultiDimFit -d workspace.root |& tee file.txt
Fit (number 2) some examples:
1) TT_Znn:
combine -M
MultiDimFit -d ws_combined_CRonly_nominalSF.root -v 5 --expectSignal 1 --redefineSignalPOIs SF_TT_Znn_2016,SF_TT_Znn_2017,SF_TT_Znn_2018 --freezeParameters r --algo fixed --fixedPointPOIs SF_TT_Znn_2016=0.985606,SF_TT_Znn_2017=0.985606,SF_TT_Znn_2018=0.985606 --X-rtd MINIMIZER_MaxCalls=9999999 --cminDefaultMinimizerStrategy 0 |& tee file_fit2.txt
2) TT_Wln:
combine -M
MultiDimFit -d ws_combined_CRonly_nominalSF.root -v 5 --expectSignal 1 --redefineSignalPOIs SF_TT_Wln_2016,SF_TT_Wln_2017,SF_TT_Wln_2018 --freezeParameters r --algo fixed --fixedPointPOIs SF_TT_Wln_2016=0.900112,SF_TT_Wln_2017=0.900112,SF_TT_Wln_2018=0.900112 --X-rtd MINIMIZER_MaxCalls=9999999 --cminDefaultMinimizerStrategy 0 |& tee file_fit2.txt
P-value calculator:
https://www.emathhelp.net/calculators/probability-statistics/p-value-calculator/?dist=chi&s=31.168&df=1&tail=right
--
AlessandroCalandri - 2019-07-02