VHbb boosted - Xbb framework

Directory: /t3home/acalandr/VHbb/boosted_2016/CMSSW_10_1_0/src/Xbb/python

1lep 2016: /t3home/acalandr/VHbb/boosted_2016/CMSSW_10_1_0/src/Xbb/python/Wlv2016config

2lep 2016: /t3home/acalandr/VHbb/boosted_2016/CMSSW_10_1_0/src/Xbb/python/Zll2016Nanoconfig

1lep 2017: 

Training 1lep boosted 2017: /t3home/acalandr/VHbb/boosted_2016/CMSSW_10_1_0/src/Xbb/python/weights_1lep2017_boosted

1 - Running framework - List of inputs

for j in `ls -1`; do for i in `ls -1 ${j}/*/*/*/*.root`; do echo ${PWD}/${i} | sed 's/\eos\/cms\///g'; done > ./new_dir/${j}.txt; done

2 - Running framework - Prep Step

1) Prep step
 ./submit.py -T Wlv2016 -F prep-v1 -J prep -N 10   

—> check all samples are processed:

./submit.py -T Wlv2016 -J checklogs —resubmit

PREPout: root://t3dcachedb03.psi.ch:1094//pnfs/psi.ch/cms/trivcat/store/user/acalandr/VHbb/VHbbPostNano2016/Wlv/prep_v1/

--> New method for PREPStep: ./submit.py -T Zll2016Nano -J run --modules=Prep.VHbb -F prep --input PREPin --output PREPout --set='Directories.samplefiles:=<!Directories|samplefiles_split!>'

3 - Running framework - Sys Step

./submit.py -T Wlv2016 -F sysnew-sys -J sysnew --addCollections Sys.sys_all_BoostedAndResolved  -I

List of systematics uncertainties included in general.ini:

sys_all_BoostedAndResolved = ['Sys.TTweights','Sys.LeptonWeights','Sys.EWKweights','Sys.BTagWeights','Sys.isSignal', 'Sys.isWH', 'Sys.isData', 'Sys.HeppyStyleGen', 'Sys.FitCorr','Sys.GetTopMass','Sys.GetWTMass','Sys.DYspecialWeight','Sys.VptWeightSimFit','Sys.DoubleBTagWeightsSimFit'] 

The python modules for each systematics uncertainty is in myutils/.

—> run without Higgs module: ‘Sys.HiggsCandidateSystematics’ to make it faster. Then, I’ll have to run with the Higgs module inside after the plots are done

—> using BTagWeight cMVAv2 for 94X campaign (2016 reprocessing), csv file: cMVAv2_Moriond17_B_H.csv

SYSout: root://t3dcachedb03.psi.ch:1094//pnfs/psi.ch/cms/trivcat/store/user/acalandr/VHbb/VHbbPostNano2016/Wlv/sys_v6/

—> check all samples are processed:

./submit.py -T Wlv2016 -J checklogs —resubmit

The weights in 'weightF' (in general.ini) are produced after the SYS step because they come from the modules in general.ini which accounts for the systematics uncertainties.

4 - Running framework - Cacheplot Step

./submit.py -T Wlv2016 -F cacheplot-v2 -J cacheplot -i 

In order to cache one specific sample, add '-S SingleElectron' to command.

—> check all samples are processed:

./submit.py -T Wlv2016 -J checklogs —resubmit

Output of cachestep: tmpSamples = root://t3dcachedb03.psi.ch:1094//pnfs/psi.ch/cms/trivcat/store/user/acalandr/VHbb/VHbbPostNano2016/Wlv/tmp/v3/ [no need to change the output of the cachepot each time as hash module will take care of picking correct samples. Everything depends on the plottingSamples: <!Directories|SYSout!>]

--> Plots can be performed either after the PREP step but some of the weights can be missing because some of the weights are calculated at the SYS step stage (they come from the evaluation of the modules for the systematics uncertainties).

5 - Running framework - Plot Step

change stuff in VHbbPlotDef (new variables) and add stuff in plot.ini

NB: Cut_BOOST = (<!General|Boost_doubleb!> && <!General|DphiMET_Lep!> < 2 && <!General|NaddLep!> == 0 && V_pt > 250) in cuts.ini —> if I use Boost_doubleb it doesn’t complain because of the btag_jetidx being wrong in the case it doesn’t find one fat jet. For the prep step, I had <!General|Boost_doubleb!> which was more inclusive.

Additional line in plots.ini to define which variable to plot:

var_additionalBTAGALGOS: DeepAK8_bbVSlight,DeepAK8_bbVST 
The variable definition for the plots is in vhbbPlotDef.ini.

6 - Running framework - BDT Training Step

** In plots.ini:

trainingBKG = <!Plot_general|WJet!>,<!Plot_general|DY!>,<!Plot_general|ST!>,<!Plot_general|TT!>,<!Plot_general|VV!>

trainingSig = <!Plot_general|allSIG!>

where allSig in trainingSig is 'WminusH','WplusH','ZH','ggZH' and trainingBkg has all the backgrounds *except* QCD (because it's spiky).


**Variables used for training:

Nominal: FatJet_msoftdrop_nom FatJet_pt_nom MET_Pt V_mt SA5 FatJet_pt[Hbb_fjidx]/V_pt abs(FatJet_eta[Hbb_fjidx]-V_eta) FatJet_deepTagMD_bbvsLight[Hbb_fjidx] 1/(1+(FatJet_deepTagMD_TvsQCD[Hbb_fjidx]/FatJet_deepTagMD_HbbvsQCD[Hbb_fjidx])*(1-FatJet_deepTagMD_HbbvsQCD[Hbb_fjidx])/(1-FatJet_deepTagMD_TvsQCD[Hbb_fjidx]))


** In training.ini:

- systematics = nominal

./submit.py -T LxplusZll -F cachetraining-v1 -J cachetraining [cachetraining step] (./submit.py -T Wlv2016 -J checklogs - - resubmit to repeat for killed jobs)

./submit.py -T LxplusZll -F runtraining-v1 -J runtraining [training step]

6B - Running framework - BDT Training Step [adding systematics]

Using /t3home/acalandr/VHbb/boosted_2016/CMSSW_10_1_0/src/Xbb/python/MakeSysList.py, I get the list of UP/DOWN variations for the systematics uncertainties on the BDT inputs to be added in training.ini

7 - Running framework - BDT Evaluation Step

./submit.py -T LxplusZll -F eval-v1 -J eval

If there are failed jobs, launch:

./submit.py -T LxplusZll -F eval-v1-reprocess-failed-jobs -J eval -k -N 1

8 - Running framework - Datacards

1) cache the dactacards: ./submit.py -T LxplusZll -F cachedc-v1 -J cachedc --parallel=8 ('./submit.py -T LxplusZll -F cachedc-v1 -J cachedc --parallel=8 -k' to resubmit the failed jobs). The output of this step goes to the /tmp directroy (similarly as for the cacheplot step)

2) produce datacards: ./submit.py -T LxplusZll -F rundc-v1 -J rundc

3) merge root files: ./submit.py -T LxplusZll -F rundc-v1 -J mergedc

4) run Combine to get the Significance (see section 9)

9 - Running framework - CombineHarvester Statistical method

Datacards produced in Step 8.3 are in log_Wl2016_v2/run-dc/Limits/*txt

To merge all the datacards: python ../../../../scripts/combineCards.py Wlfe=vhbb_DC_TH_Wle_Wlfv11_BOOST.txt Wlfm=vhbb_DC_TH_Wlm_Wlfv11_BOOST.txt Whfe=vhbb_DC_TH_Wle_Whf_BOOST.txt Whfm=vhbb_DC_TH_Wlm_Whf_BOOST.txt tte=vhbb_DC_TH_Wle_tt_BOOST.txt ttm=vhbb_DC_TH_Wlm_tt_BOOST.txt SRe=vhbb_DC_TH_Sige_BOOST.txt SRm=vhbb_DC_TH_Sigu_BOOST.txt > vhbb_DC_TH_M125_Wlv_Boostovb.txt

Using CombineHarvester to get the significance: /t3home/acalandr/VHbb/boosted_2016/CMSSW_10_1_0/src/Xbb/python/CMSSW_10_2_13/src/HiggsAnalysis/

-> in the /script directory, the datacards are inside the Limits/ folder (with and without DeepAK8 in the BDT training for the SR)

10 - Running framework - Fit Convergence

combine -M FitDiagnostics -m 125 --robustFit=1 --stepSize=0.01 --X-rtd MINIMIZER_MaxCalls=9999999 --cminApproxPreFitTolerance=10 --saveNorm -v 3 --saveShapes --saveWithUncertainties --cminPreScan vhbb_DC_TH_M125_Wlv_Boostovb_removeJETscalesReso_all.txt

Datcards without resolution and shape corrections for jets removed: /t3home/acalandr/VHbb/boosted_2016/CMSSW_10_1_0/src/Xbb/python/CMSSW_10_2_13/src/HiggsAnalysis/CombinedLimit/scripts/ vhbb_DC_TH_M125_Wlv_Boostovb_removeJETscalesReso_all.txt

11 - Running framework - Producing profit and postfit plots

Prefit: ./submit.py -T Wlv2016 -J postfitplot --local -F postfit_test

Postfit: ./submit.py -T Wlv2016 -J postfitplot --local -F postfit_test --set="Fit.FitType:=shapes_fit_s"

***Ongoing production V1***

- PREP STEP [DONE]: PREPout : root://t3dcachedb03.psi.ch:1094//pnfs/psi.ch/cms/trivcat/store/user/acalandr/VHbb/VHbbPostNano2016/Wlv/prep_v2/

- SYS STEP w/o HiggsCandidateSystematics [DONE]: SYSout: root://t3dcachedb03.psi.ch:1094//pnfs/psi.ch/cms/trivcat/store/user/acalandr/VHbb/VHbbPostNano2016/Wlv/sys_v9/

- PLOT STEP: /t3home/acalandr/VHbb/boosted_2016/CMSSW_10_1_0/src//Xbb/python/logs_Wlv2016_v2//runplot-v11/Plots/

- PLOT STEP (with DeepAK8 cuts on SR/CR to be used for the fit): /t3home/acalandr/VHbb/boosted_2016/CMSSW_10_1_0/src//Xbb/python/logs_Wlv2016_v2//runplot-v16/Plots/

////// ////// ////// //////

- SYS STEP w HiggsCandidateSystematics [still to be done]: SYSout: root://t3dcachedb03.psi.ch:1094//pnfs/psi.ch/cms/trivcat/store/user/acalandr/VHbb/VHbbPostNano2016/Wlv/sys_v10/

////// ////// ////// //////


v1 has the full set of training variables in V11 with DeepAK8 inputs: Nominal: FatJet_msoftdrop FatJet_pt MET_Pt V_mt SA5 FatJet_pt[Hbb_fjidx]/V_pt abs(FatJet_eta[Hbb_fjidx]-V_eta) FatJet_deepTagMD_bbvsLight[Hbb_fjidx] 1/(1+(FatJet_deepTagMD_TvsQCD[Hbb_fjidx]/FatJet_deepTagMD_HbbvsQCD[Hbb_fjidx])*(1-FatJet_deepTagMD_HbbvsQCD[Hbb_fjidx])/(1-FatJet_deepTagMD_TvsQCD[Hbb_fjidx]))

v2 has the full set of training variables in V11 WITHOUT DeepAK8 inputs: Nominal: FatJet_msoftdrop FatJet_pt MET_Pt V_mt SA5 FatJet_pt[Hbb_fjidx]/V_pt abs(FatJet_eta[Hbb_fjidx]-V_eta)

////// ////// ////// //////

MVAin: root://t3dcachedb03.psi.ch:1094//pnfs/psi.ch/cms/trivcat/store/user/acalandr/VHbb/VHbbPostNano2016/Wlv/sys_v9/

MVA out: root://t3dcachedb03.psi.ch:1094//pnfs/psi.ch/cms/trivcat/store/user/acalandr/VHbb/VHbbPostNano2016/Wlv/mva_v6/ (BDT training in SR WITH DeepAK8 - default)

MVA out: root://t3dcachedb03.psi.ch:1094//pnfs/psi.ch/cms/trivcat/store/user/acalandr/VHbb/VHbbPostNano2016/Wlv/mva_v3/ (BDT training in SR WITHOUT DeepAK8)

////// ////// ////// //////

- DC step: data cards are in log_Wl2016_v2/run-dc/Limits/*txt

- DC step (stat+sys): /mnt/t3nfs01/data01/shome/acalandr/Logs/eval-v6/Logs/rundc-v1/Limits/

Some useful scripts

1) to remove systematics uncertainties: /t3home/acalandr/VHbb/boosted_2016/CMSSW_10_1_0/src/Xbb/python/remove_sys.py

2) to plot systematics: /work/gaperrin/VHbb2018/CMSSW_10_1_0/src/Xbb/python/plot_systematics.py - python plot_systematics.py -C Wlv2016Nanoconfig/general.ini -C Wlv2016Nanoconfig/datacards.ini -C Wlv2016Nanoconfig/plots.ini -C Wlv2016Nanoconfig/paths.ini -C Wlv2016Nanoconfig/vhbbPlotDef.ini -C Wlv2016Nanoconfig/samples_nosplit.ini

Statistical tools

I am working here: /t3home/acalandr/VHbb/boosted_2016/combineHarvester/CMSSW_10_2_13/src/CombineHarvester/CombineTools

1) creating workspace from txt: text2workspace.py datacard_1lep2016_boosted.txt -o ws_boosted.root

2) FitDiagnostics: combine -M FitDiagnostics -t -1 --expectSignal 1 -d ws_boosted_2016_1lep_29Aug_noBinByBinUnc.root --cminDefaultMinimizerStrategy 0 -v 3 --freezeParameters CMS_vhbb_scale_j_PileUpPtEC1_13TeV

This command creates: error on mu (POI), postfit plots

3) Checking NP: combineTool.py -M GenerateOnly -m 125 -t -1 --expectSignal 1 --saveToys -d ws_boosted.root, combineTool.py -M FastScan -w ws_boosted.root -d higgsCombine.Test.GenerateOnly.mH125.123456.root -f fitDiagnostics.root:fit_s which produces nll.pdf

4) Extract error on POI: combine -M MultiDimFit -t -1 --expectSignal 1 --algo singles -m 125 -d ws_boosted.root --cminDefaultMinimizerStrategy 0

5) Doing impact plots: combineTool.py -M Impacts -d htt_tt.root -m 125 --doInitialFit --robustFit 1 —cMinDefaultMinimizerStrategy 0, combineTool.py -M Impacts -d htt_tt.root -m 125 --robustFit 1 —doFits —cMinDefaultMinimizerStrategy 0

Validation on 1lep 2017

—> Framework: /t3home/acalandr/VHbb/boosted_2016/CMSSW_10_1_0/src/Xbb/python/Wlv2017config_boostedanalysis


1) Prep Step —> DONE!

a)Logfile: /t3home/acalandr/VHbb/boosted_2016/CMSSW_10_1_0/src/Xbb/python/logs_Wlv2017boosted/prep-v1 Output files: PREPout: root://t3dcachedb03.psi.ch:1094//pnfs/psi.ch/cms/trivcat/store/user/acalandr/VHbb/VHbbPostNano2017boosted/Wlv/prep_v4/

b) Launch command to reprocess samples - > DONE!!


2) Sys step —> DONE !! a) SYSout: root://t3dcachedb03.psi.ch:1094//pnfs/psi.ch/cms/trivcat/store/user/acalandr/VHbb/VHbbPostNano2017boosted/Wlv/sys_v3/

b) Launch command to reprocess samples —> DONE !!


3) Plots SR/CR —> DONE !!

a) cacheplot —> Log: cacheplot-v1

b) plot


4) BDT training + significance —> DONE!!

Cachetraining: MVAout: root://t3dcachedb03.psi.ch:1094//pnfs/psi.ch/cms/trivcat/store/user/acalandr/VHbb/VHbbPostNano2017boosted/Wlv/mva_v2/ Log: cachetraining-v2


5) BDT evaluation [sabato/domenica] —> ONGOING

MVAout: root://t3dcachedb03.psi.ch:1094//pnfs/psi.ch/cms/trivcat/store/user/acalandr/VHbb/VHbbPostNano2017boosted/Wlv/mva_v6/ Log: eval-v2

a) launch command [domenica] —> DONE

b) launch command to reprocess samples [domenica] —> DONE

c) plots BDT [domenica] Log: cacheplot_v3 Log: runplot_v12


6) adding SYS to training and to evaluation —> ONGOING !!!! MVAout: root://t3dcachedb03.psi.ch:1094//pnfs/psi.ch/cms/trivcat/store/user/acalandr/VHbb/VHbbPostNano2017boosted/Wlv/mva_v8/ Log: eval-v8


7) creation of datacards

a) cachedc —> DONE!! Log: cachedc-v1

b) produce DC —> ONGOING!!! Log: rundc-v1[where DC are stored]

c) merge root files: —> DONE!! Log: rundc-v1 Where to find DC: /t3home/acalandr/VHbb/boosted_2016/CMSSW_10_1_0/src/Xbb/python/logs_Wlv2017boosted/rundc-v1/Limits

d) merge DC: —> DONE !!!! Working here: /t3home/acalandr/VHbb/boosted_2016/CMSSW_10_1_0/src/Xbb/python/CMSSW_10_2_13/src/HiggsAnalysis/ Merged datacard: /t3home/acalandr/VHbb/boosted_2016/CMSSW_10_1_0/src/Xbb/python/logs_Wlv2017boosted/rundc-v1/Limits/vhbb_DC_TH_M125_Wlv_Boostovb_Sept30.txt


8) stat-only results on significance —> DONE !!!

cd /t3home/acalandr/VHbb/boosted_2016/combineVersionPirmin/CMSSW_8_1_0 Significance: combine -M Significance /t3home/acalandr/VHbb/boosted_2016/CMSSW_10_1_0/src/Xbb/python/logs_Wlv2017boosted/rundc-v1/Limits/vhbb_DC_TH_M125_Wlv_Boostovb_Sept30.txt -t -1 --expectSignal=1 --X-rtd MINIMIZER_MaxCalls=9999999 --cminDefaultMinimizerStrategy 2 -v 5 --cminApproxPreFitTolerance=10 --X-rtd MINIMIZER_analytic --freezeParameters all


9) stat+sys results on significance —> ONGOING cd /t3home/acalandr/VHbb/boosted_2016/combineVersionPirmin/CMSSW_8_1_0 Significance: combine -M Significance /t3home/acalandr/VHbb/boosted_2016/CMSSW_10_1_0/src/Xbb/python/logs_Wlv2017boosted/rundc-v1/Limits/vhbb_DC_TH_M125_Wlv_Boostovb_Sept30.txt -t -1 --expectSignal=1 --X-rtd MINIMIZER_MaxCalls=9999999 --cminDefaultMinimizerStrategy 2 -v 5 --cminApproxPreFitTolerance=10 --X-rtd MINIMIZER_analytic

=== 10) stat+sys results on significance —> DONE

cd /t3home/acalandr/VHbb/boosted_2016/combineHarvester/CMSSW_10_2_13/src/CombineHarvester/CombineTools text2workspace.py /t3home/acalandr/VHbb/boosted_2016/CMSSW_10_1_0/src/Xbb/python/logs_Wlv2017boosted/rundc-v1/Limits/vhbb_DC_TH_M125_Wlv_Boostovb_Sept30.txt -o ws_boosted_2017_1lep.root combine -M MultiDimFit -t -1 --expectSignal 1 --algo singles -m 125 -d ws_boosted_2017_1lep.root --cminDefaultMinimizerStrategy 0 -v


*Implementation of boosted analysis in resolved Xbb framework and in the VHbb Legacy CH area*

1) Changes: https://github.com/piberger/Xbb/pull/102/files

2) Pull request: https://github.com/piberger/Xbb/pull/102

3) myXbb repo: https://github.com/acalandr/Xbb

4) My Xbb area for tests: https://github.com/acalandr/Xbb/tree/boostedAnalysis/python/

5) t3 area: /t3home/acalandr/VHbb/boosted_2016/merge_resolevd_boosted/CMSSW_10_1_0/src/Xbb/

—> OVERLAP: treated by removing in the list *_High_ from the resolved and adding the pieced to the boosted list in the datacard.ini file

*VHbb Legacy CH area - implementation of boosted analysis*

--> directory: /t3home/acalandr/VHbb/boosted_2016/merge_resolevd_boosted/CMSSW_10_1_0/src/Xbb/

Pirmin’s CH legacy analysis: https://gitlab.cern.ch/piberger/VHLegacy

——> FORK of CH area: https://gitlab.cern.ch/cms-hcg/ch-areas/VHLegacy


Area (in t3ui04): /t3home/acalandr/VHbb/boosted_2016/merge_resolevd_boosted/CH_VHbb_Legacy/CMSSW_10_2_13/src/CombineHarvester/VHLegacy

*Stuff for qstat/batch*

alias qs="qstat -xml | tr '\n' ' ' | sed 's#<job_list[^>]*>#\n#g' | sed 's#<[^>]*>##g' | grep ' ' | column -t"


qs|grep Wlv2017 | awk '{print $1}' | tr '\n' ' ' (to select 2017 jobs)

-- AlessandroCalandri - 2019-07-02

This topic: Sandbox > WebPreferences > TestTopic11111202
Topic revision: r30 - 2019-11-08 - AlessandroCalandri
This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback