VHbb boosted - Xbb framework

Directory: /t3home/acalandr/VHbb/boosted_2016/CMSSW_10_1_0/src/Xbb/python
1lep 2016: /t3home/acalandr/VHbb/boosted_2016/CMSSW_10_1_0/src/Xbb/python/Wlv2016config
2lep 2016: /t3home/acalandr/VHbb/boosted_2016/CMSSW_10_1_0/src/Xbb/python/Zll2016Nanoconfig
1lep 2017: 
/t3home/acalandr/VHbb/boosted_2016/CMSSW_10_1_0/src/Xbb/python/Wlv2017config
Training 1lep boosted 2017: /t3home/acalandr/VHbb/boosted_2016/CMSSW_10_1_0/src/Xbb/python/weights_1lep2017_boosted

LogFIles for 1lep 2016 studies in /mnt/t3nfs01/data01/shome/acalandr

1 - Running framework - List of inputs

for j in `ls -1`; do for i in `ls -1 ${j}/*/*/*/*.root`; do echo ${PWD}/${i} | sed 's/\eos\/cms\///g'; done > ./new_dir/${j}.txt; done

2 - Running framework - Prep Step

1) Prep step

 ./submit.py -T Wlv2016 -F prep-v1 -J prep -N 10   

—> check all samples are processed:

./submit.py -T Wlv2016 -J checklogs —resubmit

PREPout: root://t3dcachedb03.psi.ch:1094//pnfs/psi.ch/cms/trivcat/store/user/acalandr/VHbb/VHbbPostNano2016/Wlv/prep_v1/

--> New method for PREPStep (USE THIS METHOD!!!): ./submit.py -T Zll2016Nano -J run --modules=Prep.VHbb -F prep --input PREPin --output PREPout --set='Directories.samplefiles:=<!Directories|samplefiles_split!>'

3 - Running framework - Sys Step

./submit.py -T Wlv2016 -F sysnew-sys -J sysnew --addCollections Sys.sys_all_BoostedAndResolved  -I

List of systematics uncertainties included in general.ini:

sys_all_BoostedAndResolved = ['Sys.TTweights','Sys.LeptonWeights','Sys.EWKweights','Sys.BTagWeights','Sys.isSignal', 'Sys.isWH', 'Sys.isData', 'Sys.HeppyStyleGen', 'Sys.FitCorr','Sys.GetTopMass','Sys.GetWTMass','Sys.DYspecialWeight','Sys.VptWeightSimFit','Sys.DoubleBTagWeightsSimFit'] 

The python modules for each systematics uncertainty is in myutils/.

—> run without Higgs module: ‘Sys.HiggsCandidateSystematics’ to make it faster. Then, I’ll have to run with the Higgs module inside after the plots are done

—> using BTagWeight cMVAv2 for 94X campaign (2016 reprocessing), csv file: cMVAv2_Moriond17_B_H.csv

SYSout: root://t3dcachedb03.psi.ch:1094//pnfs/psi.ch/cms/trivcat/store/user/acalandr/VHbb/VHbbPostNano2016/Wlv/sys_v6/

—> check all samples are processed:

./submit.py -T Wlv2016 -J checklogs —resubmit

The weights in 'weightF' (in general.ini) are produced after the SYS step because they come from the modules in general.ini which accounts for the systematics uncertainties.

4 - Running framework - Cacheplot Step

./submit.py -T Wlv2016 -F cacheplot-v2 -J cacheplot -i 

In order to cache one specific sample, add '-S SingleElectron ' to command.

—> check all samples are processed:

./submit.py -T Wlv2016 -J checklogs —resubmit

Output of cachestep: tmpSamples = root://t3dcachedb03.psi.ch:1094//pnfs/psi.ch/cms/trivcat/store/user/acalandr/VHbb/VHbbPostNano2016/Wlv/tmp/v3/ [no need to change the output of the cachepot each time as hash module will take care of picking correct samples. Everything depends on the plottingSamples: <!Directories|SYSout!>]

--> Plots can be performed either after the PREP step but some of the weights can be missing because some of the weights are calculated at the SYS step stage (they come from the evaluation of the modules for the systematics uncertainties).

5 - Running framework - Plot Step

change stuff in VHbbPlotDef (new variables) and add stuff in plot.ini

NB: Cut_BOOST = (<!General|Boost_doubleb!> && <!General|DphiMET_Lep!> < 2 && <!General|NaddLep!> == 0 && V_pt > 250) in cuts.ini —> if I use Boost_doubleb it doesn’t complain because of the btag_jetidx being wrong in the case it doesn’t find one fat jet. For the prep step, I had <!General|Boost_doubleb!> which was more inclusive.

Additional line in plots.ini to define which variable to plot:

var_additionalBTAGALGOS: DeepAK8_bbVSlight,DeepAK8_bbVST 

The variable definition for the plots is in vhbbPlotDef.ini.

6 - Running framework - BDT Training Step

** In plots.ini:

trainingBKG = <!Plot_general|WJet!>,<!Plot_general|DY!>,<!Plot_general|ST!>,<!Plot_general|TT!>,<!Plot_general|VV!>

trainingSig = <!Plot_general|allSIG!>

where allSig in trainingSig is 'WminusH','WplusH','ZH','ggZH' and trainingBkg has all the backgrounds *except* QCD (because it's spiky).

///////

**Variables used for training:

Nominal: FatJet _msoftdrop_nom FatJet _pt_nom MET_Pt V_mt SA5 FatJet _pt[Hbb_fjidx]/V_pt abs(FatJet _eta[Hbb_fjidx]-V_eta) FatJet _deepTagMD_bbvsLight[Hbb_fjidx] 1/(1+(FatJet _deepTagMD_TvsQCD[Hbb_fjidx]/FatJet_deepTagMD_HbbvsQCD[Hbb_fjidx])*(1-FatJet_deepTagMD_HbbvsQCD[Hbb_fjidx])/(1-FatJet_deepTagMD_TvsQCD[Hbb_fjidx]))

///////

** In training.ini:

- systematics = nominal

./submit.py -T LxplusZll -F cachetraining-v1 -J cachetraining [cachetraining step] (./submit.py -T Wlv2016 -J checklogs - - resubmit to repeat for killed jobs)

./submit.py -T LxplusZll -F runtraining-v1 -J runtraining [training step]

6B - Running framework - BDT Training Step [adding systematics]

Using /t3home/acalandr/VHbb/boosted_2016/CMSSW_10_1_0/src/Xbb/python/MakeSysList.py, I get the list of UP/DOWN variations for the systematics uncertainties on the BDT inputs to be added in training.ini

7 - Running framework - BDT Evaluation Step

./submit.py -T Zvv2017 -F eval-v1 -J eval -I -N 1

If there are failed jobs, launch:

./submit.py -T LxplusZll -F eval-v1-reprocess-failed-jobs -J eval -k -N 1

If you want to run with DeepAK8 weights in: ./submit.py -T Wlv2018 -J run --modules=Eval.BDT_Wlv_BOOSTFinal_wdB,VHbbCommon.DoubleBtagSF -F tmp --input MVAin --output MVAout

If you want to run with DeepAK8 weights + new flavour definition (in 2018; in 2017 we already have the flavour definition in the ntuple): /submit.py -T Wlv2018 -J run --modules=Eval.BDT_Wlv_BOOSTFinal_wdB,VHbbCommon.DoubleBtagSF,VHbbCommon.HeppyStyleGen -F tmp --input MVAin --output MVAout

For DNN evaluation, if I have a new DNN:

./submit.py -T Wlv2017 -F log -J run --input MVAin --output MVAout --addCollections Sys.Eval

where Eval contains the DNN to be evaluated (inside the block Sys in general.ini)

8 - Running framework - Datacards

1) cache the dactacards: ./submit.py -T LxplusZll -F cachedc-v1 -J cachedc --parallel=8 ('./submit.py -T LxplusZll -F cachedc-v1 -J cachedc --parallel=8 -k' to resubmit the failed jobs). The output of this step goes to the /tmp directroy (similarly as for the cacheplot step)

2) produce datacards: ./submit.py -T LxplusZll -F rundc-v1 -J rundc

3) merge root files: ./submit.py -T LxplusZll -F rundc-v1 -J mergedc

4) run Combine to get the Significance (see section 9)

9 - Running framework - CombineHarvester Statistical method

Datacards produced in Step 8.3 are in log_Wl2016_v2/run-dc/Limits/*txt

To merge all the datacards: python ../../../../scripts/combineCards.py Wlfe=vhbb_DC_TH_Wle_Wlfv11_BOOST.txt Wlfm=vhbb_DC_TH_Wlm_Wlfv11_BOOST.txt Whfe=vhbb_DC_TH_Wle_Whf_BOOST.txt Whfm=vhbb_DC_TH_Wlm_Whf_BOOST.txt tte=vhbb_DC_TH_Wle_tt_BOOST.txt ttm=vhbb_DC_TH_Wlm_tt_BOOST.txt SRe=vhbb_DC_TH_Sige_BOOST.txt SRm=vhbb_DC_TH_Sigu_BOOST.txt > vhbb_DC_TH_M125_Wlv_Boostovb.txt

Using CombineHarvester to get the significance: /t3home/acalandr/VHbb/boosted_2016/CMSSW_10_1_0/src/Xbb/python/CMSSW_10_2_13/src/HiggsAnalysis/

-> in the /script directory, the datacards are inside the Limits/ folder (with and without DeepAK8 in the BDT training for the SR)

10 - Running framework - Fit Convergence

combine -M FitDiagnostics -m 125 --robustFit=1 --stepSize=0.01 --X-rtd MINIMIZER_MaxCalls=9999999 --cminApproxPreFitTolerance=10 --saveNorm -v 3 --saveShapes --saveWithUncertainties --cminPreScan vhbb_DC_TH_M125_Wlv_Boostovb_removeJETscalesReso_all.txt

Datcards without resolution and shape corrections for jets removed: /t3home/acalandr/VHbb/boosted_2016/CMSSW_10_1_0/src/Xbb/python/CMSSW_10_2_13/src/HiggsAnalysis/CombinedLimit/scripts/ vhbb_DC_TH_M125_Wlv_Boostovb_removeJETscalesReso_all.txt

11 - Running framework - Producing profit and postfit plots

Prefit: ./submit.py -T Wlv2016 -J postfitplot --local -F postfit_test

Postfit: ./submit.py -T Wlv2016 -J postfitplot --local -F postfit_test --set="Fit.FitType:=shapes_fit_s" abs(FatJet _eta[Hbb_fjidx]-V_eta)

////// ////// ////// //////

12 - Running framework -Running module that adds 'isBoosted' flag.

./submit.py -T Zvv2017 -J run --modules=VHbbCommon.isBoosted --input MVAin --output MVAout

Statistical tools 1 - Asimov in SR (VHbb)

1) creation of datacards: python scripts/VHLegacy.py --Znn_fwk Xbb --Wmn_fwk Xbb --Wen_fwk Xbb --Zee_fwk Xbb --Zmm_fwk Xbb

==

2) creation of workspace: combineTool.py -M T2W -i output/${COMBFOLDERSTXS}2017/cmb/ -o "ws_stxs_fine.root" -P HiggsAnalysis.CombinedLimit.PhysicsModel:multiSignalModel --PO verbose --PO 'map=.*/.*ZH_lep_PTV_75_150_hbb:r_zhlow[1,0,5]' --PO 'map=.*/.*ZH_lep_PTV_150_250_0J_hbb:r_zhmednoj[1,0,5]' --PO 'map=.*/.*ZH_lep_PTV_150_250_GE1J_hbb:r_zhmedwithj[1,0,5]' --PO 'map=.*/.*ZH_lep_PTV_GT250_hbb:r_zhhi[1,0,5]' --PO 'map=.*/.*WH_lep_PTV_150_250_0J_hbb:r_whmed[1,0,5]' --PO 'map=.*/.*WH_lep_PTV_150_250_GE1J_hbb:r_whmed[1,0,5]' --PO 'map=.*/.*WH_lep_PTV_GT250_hbb:r_whhi[1,0,5]’

==

3) best fit: combineTool.py -M MultiDimFit -d output/${COMBFOLDERSTXS}2017/cmb/ws_stxs_fine.root --setParameters r_zhlow=1,r_zhmednoj=1,r_zhmedwithj=1,r_zhhi=1,r_whmed=1,r_whhi=1 --redefineSignalPOIs $(./scripts/getPOIs_STXS.py STXSfine -p) --setParameterRanges $(./scripts/getPOIs_STXS.py STXSfine -r) $(./scripts/getPOIs_STXS.py STXSfine -O) --saveInactivePOI=1 --saveToys --saveWorkspace -t -1 -n .STXSfine_BestFit_prefit

==

4) scan with systematics: combineTool.py -M MultiDimFit -d higgsCombine.STXSfine_BestFit_prefit.MultiDimFit.mH120.123456.root -D 'toys/toy_asimov' --generate $(./scripts/getPOIs_STXS.py STXSfine -g) --redefineSignalPOIs $(./scripts/getPOIs_STXS.py STXSfine -p) --setParameterRanges $(./scripts/getPOIs_STXS.py STXSfine -r) $(./scripts/getPOIs_STXS.py STXSfine -O) --saveInactivePOI=1 --points 50 --floatOtherPOIs 1 --snapshotName "MultiDimFit" --skipInitialFit --algo grid --split-points 1 --job-mode script --task-name STXS_FINE_VH_scans -n .STXS.FINE.VH >jobs1.txt

5) running in batch scan for systematics: for i in `cat jobs1.txt | awk '{print $4}'`; do sbatch --job-name=STXSfit${i/.sh/} --mem=3000M --time=0-01:30 --output=/mnt/t3nfs01/data01/shome/$USER/VHbb/CMSSW_10_1_0/src//Xbb/python/logs_Wlv2017//fit_${i/.sh/}.log ./${i} ; done

6) scan without systematics: combineTool.py -M MultiDimFit -d higgsCombine.STXSfine_BestFit_prefit.MultiDimFit.mH120.123456.root -D 'toys/toy_asimov' --generate $(./scripts/getPOIs_STXS.py STXSfine -g) --redefineSignalPOIs $(./scripts/getPOIs_STXS.py STXSfine -p) --setParameterRanges $(./scripts/getPOIs_STXS.py STXSfine -r) $(./scripts/getPOIs_STXS.py STXSfine -O) --saveInactivePOI=1 --points 50 --floatOtherPOIs 1 --snapshotName "MultiDimFit" --skipInitialFit --algo grid --split-points 1 --freezeParameters allConstrainedNuisances --job-mode script --task-name STXS_FINE_VH_scans_frall -n .STXS.fr.all.FINE.VH >jobs2.txt

7) running in batch scan without systematics: for i in `cat jobs2.txt | awk '{print $4}'`; do sbatch --job-name=STXSfit${i/.sh/} --mem=3000M --time=0-00:30 --output=/mnt/t3nfs01/data01/shome/$USER/VHbb/CMSSW_10_1_0/src//Xbb/python/logs_Wlv2017//fit_${i/.sh/}.log ./${i} ; done

8) plot likelihood scan:

mkdir VHbb_STXS_scans

cd VHbb_STXS_scans

mkdir results

mkdir plots

mv ../higgsCombine.STXS.FINE.VH.*.root results/

mv ../higgsCombine.STXS.fr.all.FINE.VH.*.root results/

cd results

for P in $(../../scripts/getPOIs_STXS.py STXSfine -P); do hadd -k -f scan.${P}.root higgsCombine.STXS.FINE.VH.${P}.POINTS.*.root; rm higgsCombine.STXS.FINE.VH.${P}.POINTS.*.root; done;

for P in $(../../scripts/getPOIs_STXS.py STXSfine -P); do hadd -k -f scan.${P}.fr.all.root higgsCombine.STXS.fr.all.FINE.VH.${P}.POINTS.*.root; rm higgsCombine.STXS.fr.all.FINE.VH.${P}.POINTS.*.root; done;

cd ../

INPUT="results"; OUTPUT="plots"; for P in $(../scripts/getPOIs_STXS.py STXSfine -P); do eval python ../scripts/plot1DScan.py -o scan_nominal_${P} --POI ${P} --translate ../scripts/pois.json --model STXS --json ${OUTPUT}/STXSfine.json --others \"${INPUT}/scan.${P}.fr.all.root:Freeze all:8\" --breakdown "Syst,Stat" --meta "POIs:${P}" -m ${INPUT}/scan.${P}.root --y-max 10 --no-input-label --outdir ${OUTPUT}/; done

9) plot final: python ../scripts/summaryPlot.py -i 'plots/STXSfine.json:STXS/r_whmed,r_whhi,r_zhlow,r_zhmednoj,r_zhmedwithj,r_zhhi' --vlines '1.0:LineStyle=2' --subline="41.5 fb^{-1} (13 TeV - 2017)" -o plots/summary_stxs --translate ../scripts/pois.json

Statistical tools 2 - Asimov in SR (VZbb)

1) as VH

2a) inclusive: combineTool.py -M T2W -i output/vhbb2017_VZvers1/cmb/ -o "ws_stxs_2017_VZbb_inclusive_2704.root" -P HiggsAnalysis.CombinedLimit.PhysicsModel:multiSignalModel --PO verbose --PO 'map=.*/.*VVHF:r_vzbb[1,0,5]'

2b) STXS: combineTool.py -M T2W -i output/vhbb2017_VZvers1/cmb/ -o "ws_stxs_2017_VZbb_v5_2704_STXSbased.root" --PO verbose -P HiggsAnalysis.CombinedLimit.PhysicsModel:multiSignalModel --PO 'map=vhbb_Z.*_5_.*/.*VVHF:r_zhmednoj[1,0,5]' --PO 'map=vhbb_Z.*_1_.*/.*VVHF:r_zhlow[1,0,5]' --PO 'map=vhbb_Z.*_9_.*/.*VVHF:r_zhmedwithj[1,0,5]' --PO 'map=vhbb_Z.*_13_.*/.*VVHF:r_zhhi[1,0,5]' --PO 'map=vhbb_Z.*_17_.*/.*VVHF:r_zhhi[1,0,5]' --PO 'map=vhbb_W.*_5_.*/.*VVHF:r_whmed[1,0,5]' --PO 'map=vhbb_W.*_13_.*/.*VVHF:r_whhi[1,0,5]' --PO 'map=vhbb_W.*_17_.*/.*VVHF:r_whhi[1,0,5]'

3) combineTool.py -M MultiDimFit -d output/vhbb2017_VZvers1/cmb/ws_stxs_2017_VZbb_inclusive_2704.root --setParameters r_vzbb=1 --redefineSignalPOIs $(./scripts/getPOIs_STXS_VZbb.py STXSfine -p) --setParameterRanges $(./scripts/getPOIs_STXS_VZbb.py STXSfine -r) $(./scripts/getPOIs_STXS_VZbb.py STXSfine -O) --saveInactivePOI=1 --saveToys --saveWorkspace -t -1 -n .STXSfine_BestFit_prefit

4) combineTool.py -M MultiDimFit -d higgsCombine.STXSfine_BestFit_prefit.MultiDimFit.mH120.123456.root -D 'toys/toy_asimov' --generate $(./scripts/getPOIs_STXS_VZbb.py STXSfine -g) --redefineSignalPOIs $(./scripts/getPOIs_STXS_VZbb.py STXSfine -p) --setParameterRanges $(./scripts/getPOIs_STXS_VZbb.py STXSfine -r) $(./scripts/getPOIs_STXS_VZbb.py STXSfine -O) --saveInactivePOI=1 --points 50 --floatOtherPOIs 1 --snapshotName "MultiDimFit" --skipInitialFit --algo grid --split-points 3 --job-mode script --task-name STXS_FINE_VH_scans -n .STXS.FINE.VH >jobs1.txt

5) as VH

6) combineTool.py -M MultiDimFit -d higgsCombine.STXSfine_BestFit_prefit.MultiDimFit.mH120.123456.root -D 'toys/toy_asimov' --generate $(./scripts/getPOIs_STXS_VZbb.py STXSfine -g) --redefineSignalPOIs $(./scripts/getPOIs_STXS_VZbb.py STXSfine -p) --setParameterRanges $(./scripts/getPOIs_STXS_VZbb.py STXSfine -r) $(./scripts/getPOIs_STXS_VZbb.py STXSfine -O) --saveInactivePOI=1 --points 50 --floatOtherPOIs 1 --snapshotName "MultiDimFit" --skipInitialFit --algo grid --split-points 3 --freezeParameters allConstrainedNuisances --job-mode script --task-name STXS_FINE_VH_scans_frall -n .STXS.fr.all.FINE.VH >jobs2.txt

7) as VH

8) as VH

- -> All the _VZbb files are here: where getPOIs_STXS_VZbb.py is /work/acalandr/merging_resolvedBoosted_2018/2017_final/CMSSW_10_2_13/src/CombineHarvester/VHLegacy/scripts/

Statistical tool 3: unblinded fit to data CR for SF determination

1) Creating ws on CR-only datacards/shapes: combineTool.py -M T2W -i output/${COMBFOLDERSTXS}2017/cmb_CRonly/ -o "ws_forCR.root" --PO verbose

2A) Example Fit 1lep: combineTool.py -M FitDiagnostics -d output/vhbb2017/Wln_CRonly/ws_forCR_1lep.root --there --freezeParameters r --X-rtd MINIMIZER_MaxCalls=9999999 --cminPreFit 1 --cminDefaultMinimizerTolerance 10 --verbose 5

2B) Example Fit 2lep: combineTool.py -M FitDiagnostics -d output/vhbb2017/Zll_CRonly/ws_forCR_2lep.root --there --freezeParameters r --X-rtd MINIMIZER_MaxCalls=9999999 --cminPreFit 1 --cminDefaultMinimizerTolerance 10 --verbose 5

Merging datacards for combined fit

Merging datacards:

python scripts/prepareVHbbComb.py --dir2016 vhbb2016 --dir2017 vhbb2017 --dir2018 vhbb2017 --postfix myoutput


Creating a workspace (CR-only fit):

combineTool.py -M T2W -i mycombined_dc.txt -o "ws_combined_CRonly_yeardependentSF.root" --PO verbose

add: --for-fits --no-wrappers to workspace creation, for quick workspace creation

Fit (number 1): combineTool.py --X-rtd MINIMIZER_MaxCalls=9999999 --cminPreFit 1 --cminDefaultMinimizerStrategy 2 --verbose 5 -M MultiDimFit -d workspace.root |& tee file.txt

Fit (number 2) some examples:

1) TT_Znn: combine -M MultiDimFit -d ws_combined_CRonly_nominalSF.root -v 5 --expectSignal 1 --redefineSignalPOIs SF_TT_Znn_2016,SF_TT_Znn_2017,SF_TT_Znn_2018 --freezeParameters r --algo fixed --fixedPointPOIs SF_TT_Znn_2016=0.985606,SF_TT_Znn_2017=0.985606,SF_TT_Znn_2018=0.985606 --X-rtd MINIMIZER_MaxCalls=9999999 --cminDefaultMinimizerStrategy 0 |& tee file_fit2.txt

2) TT_Wln: combine -M MultiDimFit -d ws_combined_CRonly_nominalSF.root -v 5 --expectSignal 1 --redefineSignalPOIs SF_TT_Wln_2016,SF_TT_Wln_2017,SF_TT_Wln_2018 --freezeParameters r --algo fixed --fixedPointPOIs SF_TT_Wln_2016=0.900112,SF_TT_Wln_2017=0.900112,SF_TT_Wln_2018=0.900112 --X-rtd MINIMIZER_MaxCalls=9999999 --cminDefaultMinimizerStrategy 0 |& tee file_fit2.txt

P-value calculator: https://www.emathhelp.net/calculators/probability-statistics/p-value-calculator/?dist=chi&s=31.168&df=1&tail=right

-- AlessandroCalandri - 2019-07-02

Edit | Attach | Watch | Print version | History: r50 < r49 < r48 < r47 < r46 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r50 - 2021-04-01 - AlessandroCalandri
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Sandbox All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback