VHbb boosted - Xbb framework

Directory: /t3home/acalandr/VHbb/boosted_2016/CMSSW_10_1_0/src/Xbb/python

1lep 2016: /t3home/acalandr/VHbb/boosted_2016/CMSSW_10_1_0/src/Xbb/python/Wlv2016config

2lep 2016: /t3home/acalandr/VHbb/boosted_2016/CMSSW_10_1_0/src/Xbb/python/Zll2016Nanoconfig

1 - Running framework - List of inputs

for j in `ls -1`; do for i in `ls -1 ${j}/*/*/*/*.root`; do echo ${PWD}/${i} | sed 's/\eos\/cms\///g'; done > ./new_dir/${j}.txt; done

2 - Running framework - Prep Step

1) Prep step
 ./submit.py -T Wlv2016 -F prep-v1 -J prep -N 10   

—> check all samples are processed:

./submit.py -T Wlv2016 -J checklogs —resubmit

PREPout: root://t3dcachedb03.psi.ch:1094//pnfs/psi.ch/cms/trivcat/store/user/acalandr/VHbb/VHbbPostNano2016/Wlv/prep_v1/

3 - Running framework - Sys Step

./submit.py -T Wlv2016 -F sysnew-sys -J sysnew --addCollections Sys.sys_all_BoostedAndResolved  -I

List of systematics uncertainties included in general.ini:

sys_all_BoostedAndResolved = ['Sys.TTweights','Sys.LeptonWeights','Sys.EWKweights','Sys.BTagWeights','Sys.isSignal', 'Sys.isWH', 'Sys.isData', 'Sys.HeppyStyleGen', 'Sys.FitCorr','Sys.GetTopMass','Sys.GetWTMass','Sys.DYspecialWeight','Sys.VptWeightSimFit','Sys.DoubleBTagWeightsSimFit'] 

The python modules for each systematics uncertainty is in myutils/.

—> run without Higgs module: ‘Sys.HiggsCandidateSystematics’ to make it faster. Then, I’ll have to run with the Higgs module inside after the plots are done

—> using BTagWeight cMVAv2 for 94X campaign (2016 reprocessing), csv file: cMVAv2_Moriond17_B_H.csv

SYSout: root://t3dcachedb03.psi.ch:1094//pnfs/psi.ch/cms/trivcat/store/user/acalandr/VHbb/VHbbPostNano2016/Wlv/sys_v6/

—> check all samples are processed:

./submit.py -T Wlv2016 -J checklogs —resubmit

The weights in 'weightF' (in general.ini) are produced after the SYS step because they come from the modules in general.ini which accounts for the systematics uncertainties.

4 - Running framework - Cacheplot Step

./submit.py -T Wlv2016 -F cacheplot-v2 -J cacheplot -i 

In order to cache one specific sample, add '-S SingleElectron' to command.

—> check all samples are processed:

./submit.py -T Wlv2016 -J checklogs —resubmit

Output of cachestep: tmpSamples = root://t3dcachedb03.psi.ch:1094//pnfs/psi.ch/cms/trivcat/store/user/acalandr/VHbb/VHbbPostNano2016/Wlv/tmp/v3/ [no need to change the output of the cachepot each time as hash module will take care of picking correct samples. Everything depends on the plottingSamples: <!Directories|SYSout!>]

--> Plots can be performed either after the PREP step but some of the weights can be missing because some of the weights are calculated at the SYS step stage (they come from the evaluation of the modules for the systematics uncertainties).

5 - Running framework - Plot Step

change stuff in VHbbPlotDef (new variables) and add stuff in plot.ini

NB: Cut_BOOST = (<!General|Boost_doubleb!> && <!General|DphiMET_Lep!> < 2 && <!General|NaddLep!> == 0 && V_pt > 250) in cuts.ini —> if I use Boost_doubleb it doesn’t complain because of the btag_jetidx being wrong in the case it doesn’t find one fat jet. For the prep step, I had <!General|Boost_doubleb!> which was more inclusive.

Additional line in plots.ini to define which variable to plot:

var_additionalBTAGALGOS: DeepAK8_bbVSlight,DeepAK8_bbVST 
The variable definition for the plots is in vhbbPlotDef.ini.

6 - Running framework - BDT Training Step

** In plots.ini:

trainingBKG = <!Plot_general|WJet!>,<!Plot_general|DY!>,<!Plot_general|ST!>,<!Plot_general|TT!>,<!Plot_general|VV!>

trainingSig = <!Plot_general|allSIG!>

where allSig in trainingSig is 'WminusH','WplusH','ZH','ggZH' and trainingBkg has all the backgrounds *except* QCD (because it's spiky).

///////

**Variables used for training:

Nominal: FatJet_msoftdrop_nom FatJet_pt_nom MET_Pt V_mt SA5 FatJet_pt[Hbb_fjidx]/V_pt abs(FatJet_eta[Hbb_fjidx]-V_eta) FatJet_deepTagMD_bbvsLight[Hbb_fjidx] 1/(1+(FatJet_deepTagMD_TvsQCD[Hbb_fjidx]/FatJet_deepTagMD_HbbvsQCD[Hbb_fjidx])*(1-FatJet_deepTagMD_HbbvsQCD[Hbb_fjidx])/(1-FatJet_deepTagMD_TvsQCD[Hbb_fjidx]))

///////

** In training.ini:

- systematics = nominal

./submit.py -T LxplusZll -F cachetraining-v1 -J cachetraining [cachetraining step] (./submit.py -T Wlv2016 -J checklogs - - resubmit to repeat for killed jobs)

./submit.py -T LxplusZll -F runtraining-v1 -J runtraining [training step]

6B - Running framework - BDT Training Step [adding systematics]

Using /t3home/acalandr/VHbb/boosted_2016/CMSSW_10_1_0/src/Xbb/python/MakeSysList.py, I get the list of UP/DOWN variations for the systematics uncertainties on the BDT inputs to be added in training.ini

7 - Running framework - BDT Evaluation Step

./submit.py -T LxplusZll -F eval-v1 -J eval

If there are failed jobs, launch:

./submit.py -T LxplusZll -F eval-v1-reprocess-failed-jobs -J eval -k -N 1

8 - Running framework - Datacards

1) cache the dactacards: ./submit.py -T LxplusZll -F cachedc-v1 -J cachedc --parallel=8 ('./submit.py -T LxplusZll -F cachedc-v1 -J cachedc --parallel=8 -k' to resubmit the failed jobs). The output of this step goes to the /tmp directroy (similarly as for the cacheplot step)

2) produce datacards: ./submit.py -T LxplusZll -F rundc-v1 -J rundc

3) merge root files: ./submit.py -T LxplusZll -F rundc-v1 -J mergedc

4) run Combine to get the Significance (see section 9)

9 - Running framework - CombineHarvester Statistical method

Datacards produced in Step 8.3 are in log_Wl2016_v2/run-dc/Limits/*txt

To merge all the datacards: python ../../../../scripts/combineCards.py Wlfe=vhbb_DC_TH_Wle_Wlfv11_BOOST.txt Wlfm=vhbb_DC_TH_Wlm_Wlfv11_BOOST.txt Whfe=vhbb_DC_TH_Wle_Whf_BOOST.txt Whfm=vhbb_DC_TH_Wlm_Whf_BOOST.txt tte=vhbb_DC_TH_Wle_tt_BOOST.txt ttm=vhbb_DC_TH_Wlm_tt_BOOST.txt SRe=vhbb_DC_TH_Sige_BOOST.txt SRm=vhbb_DC_TH_Sigu_BOOST.txt > vhbb_DC_TH_M125_Wlv_Boostovb.txt

Using CombineHarvester to get the significance: /t3home/acalandr/VHbb/boosted_2016/CMSSW_10_1_0/src/Xbb/python/CMSSW_10_2_13/src/HiggsAnalysis/

-> in the /script directory, the datacards are inside the Limits/ folder (with and without DeepAK8 in the BDT training for the SR)

10 - Running framework - Fit Convergence

combine -M FitDiagnostics -m 125 --robustFit=1 --stepSize=0.01 --X-rtd MINIMIZER_MaxCalls=9999999 --cminApproxPreFitTolerance=10 --saveNorm -v 3 --saveShapes --saveWithUncertainties --cminPreScan vhbb_DC_TH_M125_Wlv_Boostovb_removeJETscalesReso_all.txt

Datcards without resolution and shape corrections for jets removed: /t3home/acalandr/VHbb/boosted_2016/CMSSW_10_1_0/src/Xbb/python/CMSSW_10_2_13/src/HiggsAnalysis/CombinedLimit/scripts/ vhbb_DC_TH_M125_Wlv_Boostovb_removeJETscalesReso_all.txt

11 - Running framework - Producing profit and postfit plots

Prefit: ./submit.py -T Wlv2016 -J postfitplot --local -F postfit_test

Postfit: ./submit.py -T Wlv2016 -J postfitplot --local -F postfit_test --set="Fit.FitType:=shapes_fit_s"

***Ongoing production V1***

- PREP STEP [DONE]: PREPout : root://t3dcachedb03.psi.ch:1094//pnfs/psi.ch/cms/trivcat/store/user/acalandr/VHbb/VHbbPostNano2016/Wlv/prep_v2/

- SYS STEP w/o HiggsCandidateSystematics [DONE]: SYSout: root://t3dcachedb03.psi.ch:1094//pnfs/psi.ch/cms/trivcat/store/user/acalandr/VHbb/VHbbPostNano2016/Wlv/sys_v9/

- PLOT STEP: /t3home/acalandr/VHbb/boosted_2016/CMSSW_10_1_0/src//Xbb/python/logs_Wlv2016_v2//runplot-v11/Plots/

- PLOT STEP (with DeepAK8 cuts on SR/CR to be used for the fit): /t3home/acalandr/VHbb/boosted_2016/CMSSW_10_1_0/src//Xbb/python/logs_Wlv2016_v2//runplot-v16/Plots/

////// ////// ////// //////

- SYS STEP w HiggsCandidateSystematics [still to be done]: SYSout: root://t3dcachedb03.psi.ch:1094//pnfs/psi.ch/cms/trivcat/store/user/acalandr/VHbb/VHbbPostNano2016/Wlv/sys_v10/

////// ////// ////// //////

- TRAINING STEP:

v1 has the full set of training variables in V11 with DeepAK8 inputs: Nominal: FatJet_msoftdrop FatJet_pt MET_Pt V_mt SA5 FatJet_pt[Hbb_fjidx]/V_pt abs(FatJet_eta[Hbb_fjidx]-V_eta) FatJet_deepTagMD_bbvsLight[Hbb_fjidx] 1/(1+(FatJet_deepTagMD_TvsQCD[Hbb_fjidx]/FatJet_deepTagMD_HbbvsQCD[Hbb_fjidx])*(1-FatJet_deepTagMD_HbbvsQCD[Hbb_fjidx])/(1-FatJet_deepTagMD_TvsQCD[Hbb_fjidx]))

v2 has the full set of training variables in V11 WITHOUT DeepAK8 inputs: Nominal: FatJet_msoftdrop FatJet_pt MET_Pt V_mt SA5 FatJet_pt[Hbb_fjidx]/V_pt abs(FatJet_eta[Hbb_fjidx]-V_eta)

////// ////// ////// //////

MVAin: root://t3dcachedb03.psi.ch:1094//pnfs/psi.ch/cms/trivcat/store/user/acalandr/VHbb/VHbbPostNano2016/Wlv/sys_v9/

MVA out: root://t3dcachedb03.psi.ch:1094//pnfs/psi.ch/cms/trivcat/store/user/acalandr/VHbb/VHbbPostNano2016/Wlv/mva_v6/ (BDT training in SR WITH DeepAK8 - default)

MVA out: root://t3dcachedb03.psi.ch:1094//pnfs/psi.ch/cms/trivcat/store/user/acalandr/VHbb/VHbbPostNano2016/Wlv/mva_v3/ (BDT training in SR WITHOUT DeepAK8)

////// ////// ////// //////

- DC step: data cards are in log_Wl2016_v2/run-dc/Limits/*txt

- DC step (stat+sys): /mnt/t3nfs01/data01/shome/acalandr/Logs/eval-v6/Logs/rundc-v1/Limits/

Some useful scripts

1) to remove systematics uncertainties: /t3home/acalandr/VHbb/boosted_2016/CMSSW_10_1_0/src/Xbb/python/remove_sys.py

2) to plot systematics: /work/gaperrin/VHbb2018/CMSSW_10_1_0/src/Xbb/python/plot_systematics.py - python plot_systematics.py -C Wlv2016Nanoconfig/general.ini -C Wlv2016Nanoconfig/datacards.ini -C Wlv2016Nanoconfig/plots.ini -C Wlv2016Nanoconfig/paths.ini -C Wlv2016Nanoconfig/vhbbPlotDef.ini -C Wlv2016Nanoconfig/samples_nosplit.ini

Statistical tools

I am working here: /t3home/acalandr/VHbb/boosted_2016/combineHarvester/CMSSW_10_2_13/src/CombineHarvester/CombineTools

1) creating workspace from txt: text2workspace.py datacard_1lep2016_boosted.txt -o ws_boosted.root

2) FitDiagnostics: combine -M FitDiagnostics -t -1 --expectSignal 1 -d ws_boosted_2016_1lep_29Aug_noBinByBinUnc.root --cminDefaultMinimizerStrategy 0 -v 3 --freezeParameters CMS_vhbb_scale_j_PileUpPtEC1_13TeV

This command creates: error on mu (POI), postfit plots

3) Checking NP: combineTool.py -M GenerateOnly -m 125 -t -1 --expectSignal 1 --saveToys -d ws_boosted.root, combineTool.py -M FastScan -w ws_boosted.root -d higgsCombine.Test.GenerateOnly.mH125.123456.root -f fitDiagnostics.root:fit_s which produces nll.pdf

4) Extract error on POI: combine -M MultiDimFit -t -1 --expectSignal 1 --algo singles -m 125 -d ws_boosted.root --cminDefaultMinimizerStrategy 0

5) Doing impact plots: combineTool.py -M Impacts -d htt_tt.root -m 125 --doInitialFit --robustFit 1 —cMinDefaultMinimizerStrategy 0, combineTool.py -M Impacts -d htt_tt.root -m 125 --robustFit 1 —doFits —cMinDefaultMinimizerStrategy 0

-- AlessandroCalandri - 2019-07-02

Edit | Attach | Watch | Print version | History: r23 < r22 < r21 < r20 < r19 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r23 - 2019-09-17 - AlessandroCalandri
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Sandbox All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback