VHbb boosted - Xbb framework

Directory: /t3home/acalandr/VHbb/boosted_2016/CMSSW_10_1_0/src/Xbb/python

1lep 2016: /t3home/acalandr/VHbb/boosted_2016/CMSSW_10_1_0/src/Xbb/python/Wlv2016config

2lep 2016: /t3home/acalandr/VHbb/boosted_2016/CMSSW_10_1_0/src/Xbb/python/Zll2016Nanoconfig

1lep 2017: 
/t3home/acalandr/VHbb/boosted_2016/CMSSW_10_1_0/src/Xbb/python/Wlv2017config

Training 1lep boosted 2017: /t3home/acalandr/VHbb/boosted_2016/CMSSW_10_1_0/src/Xbb/python/weights_1lep2017_boosted

LogFIles for 1lep 2016 studies in /mnt/t3nfs01/data01/shome/acalandr

1 - Running framework - List of inputs

for j in `ls -1`; do for i in `ls -1 ${j}/*/*/*/*.root`; do echo ${PWD}/${i} | sed 's/\eos\/cms\///g'; done > ./new_dir/${j}.txt; done

2 - Running framework - Prep Step

1) Prep step
 ./submit.py -T Wlv2016 -F prep-v1 -J prep -N 10   

—> check all samples are processed:

./submit.py -T Wlv2016 -J checklogs —resubmit

PREPout: root://t3dcachedb03.psi.ch:1094//pnfs/psi.ch/cms/trivcat/store/user/acalandr/VHbb/VHbbPostNano2016/Wlv/prep_v1/

--> New method for PREPStep (USE THIS METHOD!!!): ./submit.py -T Zll2016Nano -J run --modules=Prep.VHbb -F prep --input PREPin --output PREPout --set='Directories.samplefiles:=<!Directories|samplefiles_split!>'

3 - Running framework - Sys Step

./submit.py -T Wlv2016 -F sysnew-sys -J sysnew --addCollections Sys.sys_all_BoostedAndResolved  -I

List of systematics uncertainties included in general.ini:

sys_all_BoostedAndResolved = ['Sys.TTweights','Sys.LeptonWeights','Sys.EWKweights','Sys.BTagWeights','Sys.isSignal', 'Sys.isWH', 'Sys.isData', 'Sys.HeppyStyleGen', 'Sys.FitCorr','Sys.GetTopMass','Sys.GetWTMass','Sys.DYspecialWeight','Sys.VptWeightSimFit','Sys.DoubleBTagWeightsSimFit'] 

The python modules for each systematics uncertainty is in myutils/.

—> run without Higgs module: ‘Sys.HiggsCandidateSystematics’ to make it faster. Then, I’ll have to run with the Higgs module inside after the plots are done

—> using BTagWeight cMVAv2 for 94X campaign (2016 reprocessing), csv file: cMVAv2_Moriond17_B_H.csv

SYSout: root://t3dcachedb03.psi.ch:1094//pnfs/psi.ch/cms/trivcat/store/user/acalandr/VHbb/VHbbPostNano2016/Wlv/sys_v6/

—> check all samples are processed:

./submit.py -T Wlv2016 -J checklogs —resubmit

The weights in 'weightF' (in general.ini) are produced after the SYS step because they come from the modules in general.ini which accounts for the systematics uncertainties.

4 - Running framework - Cacheplot Step

./submit.py -T Wlv2016 -F cacheplot-v2 -J cacheplot -i 

In order to cache one specific sample, add '-S SingleElectron' to command.

—> check all samples are processed:

./submit.py -T Wlv2016 -J checklogs —resubmit

Output of cachestep: tmpSamples = root://t3dcachedb03.psi.ch:1094//pnfs/psi.ch/cms/trivcat/store/user/acalandr/VHbb/VHbbPostNano2016/Wlv/tmp/v3/ [no need to change the output of the cachepot each time as hash module will take care of picking correct samples. Everything depends on the plottingSamples: <!Directories|SYSout!>]

--> Plots can be performed either after the PREP step but some of the weights can be missing because some of the weights are calculated at the SYS step stage (they come from the evaluation of the modules for the systematics uncertainties).

5 - Running framework - Plot Step

change stuff in VHbbPlotDef (new variables) and add stuff in plot.ini

NB: Cut_BOOST = (<!General|Boost_doubleb!> && <!General|DphiMET_Lep!> < 2 && <!General|NaddLep!> == 0 && V_pt > 250) in cuts.ini —> if I use Boost_doubleb it doesn’t complain because of the btag_jetidx being wrong in the case it doesn’t find one fat jet. For the prep step, I had <!General|Boost_doubleb!> which was more inclusive.

Additional line in plots.ini to define which variable to plot:

var_additionalBTAGALGOS: DeepAK8_bbVSlight,DeepAK8_bbVST 
The variable definition for the plots is in vhbbPlotDef.ini.

6 - Running framework - BDT Training Step

** In plots.ini:

trainingBKG = <!Plot_general|WJet!>,<!Plot_general|DY!>,<!Plot_general|ST!>,<!Plot_general|TT!>,<!Plot_general|VV!>

trainingSig = <!Plot_general|allSIG!>

where allSig in trainingSig is 'WminusH','WplusH','ZH','ggZH' and trainingBkg has all the backgrounds *except* QCD (because it's spiky).

///////

**Variables used for training:

Nominal: FatJet_msoftdrop_nom FatJet_pt_nom MET_Pt V_mt SA5 FatJet_pt[Hbb_fjidx]/V_pt abs(FatJet_eta[Hbb_fjidx]-V_eta) FatJet_deepTagMD_bbvsLight[Hbb_fjidx] 1/(1+(FatJet_deepTagMD_TvsQCD[Hbb_fjidx]/FatJet_deepTagMD_HbbvsQCD[Hbb_fjidx])*(1-FatJet_deepTagMD_HbbvsQCD[Hbb_fjidx])/(1-FatJet_deepTagMD_TvsQCD[Hbb_fjidx]))

///////

** In training.ini:

- systematics = nominal

./submit.py -T LxplusZll -F cachetraining-v1 -J cachetraining [cachetraining step] (./submit.py -T Wlv2016 -J checklogs - - resubmit to repeat for killed jobs)

./submit.py -T LxplusZll -F runtraining-v1 -J runtraining [training step]

6B - Running framework - BDT Training Step [adding systematics]

Using /t3home/acalandr/VHbb/boosted_2016/CMSSW_10_1_0/src/Xbb/python/MakeSysList.py, I get the list of UP/DOWN variations for the systematics uncertainties on the BDT inputs to be added in training.ini

7 - Running framework - BDT Evaluation Step

./submit.py -T Zvv2017 -F eval-v1 -J eval -I -N 1

If there are failed jobs, launch:

./submit.py -T LxplusZll -F eval-v1-reprocess-failed-jobs -J eval -k -N 1

8 - Running framework - Datacards

1) cache the dactacards: ./submit.py -T LxplusZll -F cachedc-v1 -J cachedc --parallel=8 ('./submit.py -T LxplusZll -F cachedc-v1 -J cachedc --parallel=8 -k' to resubmit the failed jobs). The output of this step goes to the /tmp directroy (similarly as for the cacheplot step)

2) produce datacards: ./submit.py -T LxplusZll -F rundc-v1 -J rundc

3) merge root files: ./submit.py -T LxplusZll -F rundc-v1 -J mergedc

4) run Combine to get the Significance (see section 9)

9 - Running framework - CombineHarvester Statistical method

Datacards produced in Step 8.3 are in log_Wl2016_v2/run-dc/Limits/*txt

To merge all the datacards: python ../../../../scripts/combineCards.py Wlfe=vhbb_DC_TH_Wle_Wlfv11_BOOST.txt Wlfm=vhbb_DC_TH_Wlm_Wlfv11_BOOST.txt Whfe=vhbb_DC_TH_Wle_Whf_BOOST.txt Whfm=vhbb_DC_TH_Wlm_Whf_BOOST.txt tte=vhbb_DC_TH_Wle_tt_BOOST.txt ttm=vhbb_DC_TH_Wlm_tt_BOOST.txt SRe=vhbb_DC_TH_Sige_BOOST.txt SRm=vhbb_DC_TH_Sigu_BOOST.txt > vhbb_DC_TH_M125_Wlv_Boostovb.txt

Using CombineHarvester to get the significance: /t3home/acalandr/VHbb/boosted_2016/CMSSW_10_1_0/src/Xbb/python/CMSSW_10_2_13/src/HiggsAnalysis/

-> in the /script directory, the datacards are inside the Limits/ folder (with and without DeepAK8 in the BDT training for the SR)

10 - Running framework - Fit Convergence

combine -M FitDiagnostics -m 125 --robustFit=1 --stepSize=0.01 --X-rtd MINIMIZER_MaxCalls=9999999 --cminApproxPreFitTolerance=10 --saveNorm -v 3 --saveShapes --saveWithUncertainties --cminPreScan vhbb_DC_TH_M125_Wlv_Boostovb_removeJETscalesReso_all.txt

Datcards without resolution and shape corrections for jets removed: /t3home/acalandr/VHbb/boosted_2016/CMSSW_10_1_0/src/Xbb/python/CMSSW_10_2_13/src/HiggsAnalysis/CombinedLimit/scripts/ vhbb_DC_TH_M125_Wlv_Boostovb_removeJETscalesReso_all.txt

11 - Running framework - Producing profit and postfit plots

Prefit: ./submit.py -T Wlv2016 -J postfitplot --local -F postfit_test

Postfit: ./submit.py -T Wlv2016 -J postfitplot --local -F postfit_test --set="Fit.FitType:=shapes_fit_s" abs(FatJet_eta[Hbb_fjidx]-V_eta)

////// ////// ////// //////

Output 2017 analysis [after synchronisation with AT]

—> Xbb: /t3home/acalandr/VHbb/boosted_2016/datacard_check_boosted_21Jan2020

—> CombineHarvester: /t3home/acalandr/VHbb/boosted_2016/datacard_check_boosted_21Jan2020/combineHarvester/CMSSW_10_2_13/src/CombineHarvester

1) Zvv:

/pnfs/psi.ch/cms/trivcat/store/user/acalandr/VHbb/Zvv/VHbbPostNano2017_V11/eval/Jan2020/v2

cache datacards: /pnfs/psi.ch/cms/trivcat/store/user/acalandr/VHbb/Zvv/VHbbPostNano2017_V11/Jan2020/tmp/v4/

Produce shapes: /work/acalandr/logs_Zvv2017/rundc-v39/Limits/rundc-v39

——

2) Wlv:

/pnfs/psi.ch/cms/trivcat/store/user/acalandr/VHbb/Wlv/VHbbPostNano2017_V11/eval/Jan2020/v1

cache: /pnfs/psi.ch/cms/trivcat/store/user/acalandr/VHbb/Wlv/VHbbPostNano2017_V11/tmp/v12/ —> DONE!

Produce shapes: /work/acalandr/logs_Wlv2017/rundc-v91

———

3) Zll:

/pnfs/psi.ch/cms/trivcat/store/user/acalandr/VHbb/Zll/VHbbPostNano2017_V11/

cache: /pnfs/psi.ch/cms/trivcat/store/user/acalandr/VHbb/Zll/VHbbPostNano2017_V11/tmp/v6/

Produce shapes: /work/acalandr/logs_Zll2017/rundc-v61

Statistical tools 1 - Asimov in SR

1) creation of datacards: python scripts/VHLegacy.py --Znn_fwk Xbb --Wmn_fwk Xbb --Wen_fwk Xbb --Zee_fwk Xbb --Zmm_fwk Xbb

==

2) creation of workspace: combineTool.py -M T2W -i output/${COMBFOLDERSTXS}2017/cmb/ -o "ws_stxs_fine.root" -P PhysicsModel:multiSignalModel --PO verbose --PO 'map=.*/.*ZH_lep_PTV_75_150_hbb:r_zhlow[1,0,5]' --PO 'map=.*/.*ZH_lep_PTV_150_250_0J_hbb:r_zhmednoj[1,0,5]' --PO 'map=.*/.*ZH_lep_PTV_150_250_GE1J_hbb:r_zhmedwithj[1,0,5]' --PO 'map=.*/.*ZH_lep_PTV_GT250_hbb:r_zhhi[1,0,5]' --PO 'map=.*/.*WH_lep_PTV_150_250_0J_hbb:r_whmed[1,0,5]' --PO 'map=.*/.*WH_lep_PTV_150_250_GE1J_hbb:r_whmed[1,0,5]' --PO 'map=.*/.*WH_lep_PTV_GT250_hbb:r_whhi[1,0,5]’

==

3) best fit: combineTool.py -M MultiDimFit -d output/${COMBFOLDERSTXS}2017/cmb/ws_stxs_fine.root --setParameters r_zhlow=1,r_zhmednoj=1,r_zhmedwithj=1,r_zhhi=1,r_whmed=1,r_whhi=1 --redefineSignalPOIs $(./scripts/getPOIs_STXS.py STXSfine -p) --setParameterRanges $(./scripts/getPOIs_STXS.py STXSfine -r) $(./scripts/getPOIs_STXS.py STXSfine -O) --saveInactivePOI=1 --saveToys --saveWorkspace -t -1 -n .STXSfine_BestFit_prefit

==

4) scan with systematics: combineTool.py -M MultiDimFit -d higgsCombine.STXSfine_BestFit_prefit.MultiDimFit.mH120.123456.root -D 'toys/toy_asimov' --generate $(./scripts/getPOIs_STXS.py STXSfine -g) --redefineSignalPOIs $(./scripts/getPOIs_STXS.py STXSfine -p) --setParameterRanges $(./scripts/getPOIs_STXS.py STXSfine -r) $(./scripts/getPOIs_STXS.py STXSfine -O) --saveInactivePOI=1 --points 50 --floatOtherPOIs 1 --snapshotName "MultiDimFit" --skipInitialFit --algo grid --split-points 3 --job-mode script --task-name STXS_FINE_VH_scans -n .STXS.FINE.VH >jobs1.txt

5) running in batch scan for systematics: for i in `cat jobs1.txt | awk '{print $4}'`; do sbatch --job-name=STXSfit${i/.sh/} --mem=3000M --time=0-01:30 --output=/mnt/t3nfs01/data01/shome/$USER/VHbb/CMSSW_10_1_0/src//Xbb/python/logs_Wlv2017//fit_${i/.sh/}.log --account=cn-test ./${i} ; done

6) scan without systematics: combineTool.py -M MultiDimFit -d higgsCombine.STXSfine_BestFit_prefit.MultiDimFit.mH120.123456.root -D 'toys/toy_asimov' --generate $(./scripts/getPOIs_STXS.py STXSfine -g) --redefineSignalPOIs $(./scripts/getPOIs_STXS.py STXSfine -p) --setParameterRanges $(./scripts/getPOIs_STXS.py STXSfine -r) $(./scripts/getPOIs_STXS.py STXSfine -O) --saveInactivePOI=1 --points 50 --floatOtherPOIs 1 --snapshotName "MultiDimFit" --skipInitialFit --algo grid --split-points 3 --freezeParameters allConstrainedNuisances --job-mode script --task-name STXS_FINE_VH_scans_frall -n .STXS.fr.all.FINE.VH >jobs2.txt

7) running in batch scan without systematics: for i in `cat jobs2.txt | awk '{print $4}'`; do sbatch --job-name=STXSfit${i/.sh/} --mem=3000M --time=0-00:30 --output=/mnt/t3nfs01/data01/shome/$USER/VHbb/CMSSW_10_1_0/src//Xbb/python/logs_Wlv2017//fit_${i/.sh/}.log --account=cn-test ./${i} ; done

8) plot likelihood scan:

mkdir VHbb_STXS_scans

cd VHbb_STXS_scans

mkdir results

mkdir plots

mv ../higgsCombine.STXS.FINE.VH.*.root results/

mv ../higgsCombine.STXS.fr.all.FINE.VH.*.root results/

cd results

for P in $(../../scripts/getPOIs_STXS.py STXSfine -P); do hadd -k -f scan.${P}.root higgsCombine.STXS.FINE.VH.${P}.POINTS.*.root; rm higgsCombine.STXS.FINE.VH.${P}.POINTS.*.root; done;

for P in $(../../scripts/getPOIs_STXS.py STXSfine -P); do hadd -k -f scan.${P}.fr.all.root higgsCombine.STXS.fr.all.FINE.VH.${P}.POINTS.*.root; rm higgsCombine.STXS.fr.all.FINE.VH.${P}.POINTS.*.root; done;

cd ../

INPUT="results"; OUTPUT="plots"; for P in $(../scripts/getPOIs_STXS.py STXSfine -P); do eval python ../scripts/plot1DScan.py -o scan_nominal_${P} --POI ${P} --translate ../scripts/pois.json --model STXS --json ${OUTPUT}/STXSfine.json --others \"${INPUT}/scan.${P}.fr.all.root:Freeze all:8\" --breakdown "Syst,Stat" --meta "POIs:${P}" -m ${INPUT}/scan.${P}.root --y-max 10 --no-input-label --outdir ${OUTPUT}/; done

9) plot final: python ../scripts/summaryPlot.py -i 'plots/STXSfine.json:STXS/r_whmed,r_whhi,r_zhlow,r_zhmednoj,r_zhmedwithj,r_zhhi' --vlines '1.0:LineStyle=2' --subline="41.5 fb^{-1} (13 TeV - 2017)" -o plots/summary_stxs --translate ../scripts/pois.json

Statistical tool 2: fit to data CR for SF determination

1) Creating ws on CR-only datacards/shapes: combineTool.py -M T2W -i output/${COMBFOLDERSTXS}2017/cmb_CRonly/ -o "ws_forCR.root" --PO verbose

2) Fit: combineTool.py -M FitDiagnostics -d output//cmb_CRonly/ws_forCR.root --there --cminDefaultMinimizerStrategy 0 --robustFit 1 —-freezeParameters r —verbose 5 --X-rtd MINIMIZER_MaxCalls=9999999 —saveShapes --saveWithUncertainties

-- AlessandroCalandri - 2019-07-02

Edit | Attach | Watch | Print version | History: r40 | r37 < r36 < r35 < r34 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r35 - 2020-02-10 - AlessandroCalandri
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Sandbox All webs login

  • Edit
  • Attach
This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback