H -> ZZ -> 2l2j / 2l1J analysis (Full 2016 Run 2 data)
Communication
Meetings
- HZZ meetings (Friday, 14:00)
- Working meetings are called when needed on Monday at 16:00, CERN time.
Documentation
- Analysis Note: AN-17-019
- PAS (2016, full dataset): HIG-17-012, together with other channels
Samples, Cross sections
Data
The data to be used is the September 23rd ReReco for Runs B to G and the Prompt Reco for Run H.
/DoubleMuon/Run2016B-23Sep2016-v3/MINIAOD
/DoubleMuon/Run2016D-23Sep2016-v1/MINIAOD
/DoubleMuon/Run2016C-23Sep2016-v1/MINIAOD
/DoubleMuon/Run2016F-23Sep2016-v1/MINIAOD
/DoubleMuon/Run2016E-23Sep2016-v1/MINIAOD
/DoubleMuon/Run2016G-23Sep2016-v1/MINIAOD
/DoubleMuon/Run2016H-PromptReco-v[1,2,3]/MINIAOD
/DoubleEG/Run2016B-23Sep2016-v3/MINIAOD
/DoubleEG/Run2016D-23Sep2016-v1/MINIAOD
/DoubleEG/Run2016C-23Sep2016-v1/MINIAOD
/DoubleEG/Run2016F-23Sep2016-v1/MINIAOD
/DoubleEG/Run2016E-23Sep2016-v1/MINIAOD
/DoubleEG/Run2016G-23Sep2016-v1/MINIAOD
/DoubleEG/Run2016H-PromptReco-v[1,2,3]/MINIAOD
/SingleMuon/Run2016B-23Sep2016-v3/MINIAOD
/SingleMuon/Run2016D-23Sep2016-v1/MINIAOD
/SingleMuon/Run2016C-23Sep2016-v1/MINIAOD
/SingleMuon/Run2016F-23Sep2016-v1/MINIAOD
/SingleMuon/Run2016E-23Sep2016-v1/MINIAOD
/SingleMuon/Run2016G-23Sep2016-v1/MINIAOD
/SingleMuon/Run2016H-PromptReco-v[1,2,3]/MINIAOD
/SingleElectron/Run2016B-23Sep2016-v3/MINIAOD
/SingleElectron/Run2016D-23Sep2016-v1/MINIAOD
/SingleElectron/Run2016C-23Sep2016-v1/MINIAOD
/SingleElectron/Run2016F-23Sep2016-v1/MINIAOD
/SingleElectron/Run2016E-23Sep2016-v1/MINIAOD
/SingleElectron/Run2016G-23Sep2016-v1/MINIAOD
/SingleElectron/Run2016H-PromptReco-v[1,2,3]/MINIAOD
These are the data for ttbar control region:
/MuonEG/Run2016B-23Sep2016-v3/MINIAOD
/MuonEG/Run2016C-23Sep2016-v1/MINIAOD
/MuonEG/Run2016D-23Sep2016-v1/MINIAOD
/MuonEG/Run2016E-23Sep2016-v1/MINIAOD
/MuonEG/Run2016F-23Sep2016-v1/MINIAOD
/MuonEG/Run2016G-23Sep2016-v1/MINIAOD
/MuonEG/Run2016H-PromptReco-v{1,2,3]/MINIAOD
JSON:
Cert_271036-284044_13TeV_23Sep2016ReReco_Collisions16_JSON.txt
(36.814 /fb)
Luminosities:
- ICHEP dataset (Runs B, C, D): 12.9 /fb
- Run E: 4.32 /fb
- Run F: 3.37 /fb
- Run G: 8.02/fb
- Run H: 9.20 /fb
Grand total is 36.8/ fb
MC
The trick to merge the DYbjets correctly:
https://github.com/CJLST/ZZAnalysis/blob/2l2q_80X/AnalysisStep/test/prod/setGenericRedirAndFilters.sh
It must be run between job creation and submission, so you do
batch.py -i xxx.py -o <outDir> samples.csv
source setGenericRedirAndFilters.sh <outDir>
resubmit.csh
This is not super-important but it gives you some btagged events at high-mass, if you run alpha factor on regular jet-binned samples there are really few...
Pileup Reweighting (Needs to be updated)
TO BE DONE
Ntuples
V3 Ntuples
Location: /eos/cms/store/caf/user/sudha/ZZ2l2q/Moriond2017/V3
V2 Ntuples
Location: /eos/cms/store/caf/user/sudha/ZZ2l2q/Moriond2017/V2
V1 Ntuples
Location of samples:
https://github.com/trtomei/ZZAnalysis/blob/2l2q_80X/AnalysisStep/test/Plotter/goodDatasets_Moriond2017_V1.txt
Signal samples:
/afs/cern.ch/user/t/tomei/public/HZZ2L2Q/Moriond2017/V1/
Data, backgrounds and high mass VBF signal:
/store/caf/user/sudha/ZZ2l2q/Moriond2017/V1/
V0 Ntuples
Location in:
/afs/cern.ch/user/t/tomei/public/HZZ2L2Q/Moriond2017/V0/
Instructions to make ntuples and plots
Framework -
https://github.com/sudhaahuja/ZZAnalysis/tree/2l2q_80X
Instructions -
https://github.com/CJLST/ZZAnalysis/wiki/SubmittingJobs
List of datasets to be used -
https://github.com/sudhaahuja/ZZAnalysis/blob/2l2q_80X/AnalysisStep/test/prod/samples2l2q_Moriond2017.csv
Plotting script -
https://github.com/sudhaahuja/ZZAnalysis/blob/2l2q_80X/AnalysisStep/test/Plotter/plotDataVsMC_2l2q.C
Temporary note (before submitting jobs)::
For Signal: In ZZAnalysis/AnalysisStep/prod/analyzer_20152l2q.py, add the following line
process.ZZTree.skipEmptyEvents = False
For Data: Make the following changes
In ZZAnalysis/AnalysisStep/test/MasterPy/ZZ2l2qAnalysis.py - change the global tags for Runs B-G (
SeptRepro) & Run H (Prompt) data accordingly:
process.GlobalTag = GlobalTag(process.GlobalTag, '80X_dataRun2_2016SeptRepro_v7', '')
#process.GlobalTag = GlobalTag(process.GlobalTag, '80X_dataRun2_Prompt_v16', '') # For RunH
Objects, methods (Needs to be cross-checked and updated)
Anywhere except where mentioned otherwise, content is taken directly from miniAODs; for documentation please refer to
WorkBookMiniAOD
Muons
ID:
- As in 4-lepton analysis :
- Loose Muons: pT > 5, |eta| < 2.4, dxy< 0.5, dz < 1, (isGlobalMuon || (isTrackerMuon && numberOfMatches>0)) && muonBestTrackType!=2
- Tight Muons: as Loose Muons+ PF Muon Isolation: iso/pT < 0.35, using PF combined relative isolation with cone size R=0.3, and Δβ correction. The cut is applied after recovered FSR photons are subtracted from the isolation cone (see below).
Ghost cleaning:
process.cleanedMu = cms.EDProducer("PATMuonCleanerBySegments",
src = cms.InputTag("calibratedMuons"),
preselection = cms.string("track.isNonnull"),
passthrough = cms.string("isGlobalMuon && numberOfMatches >= 2"),
fractionOfSharedSegments = cms.double(0.499))
Electrons
ID:
- As in 4-lepton analysis:
- Loose Electrons: pT > 7, |eta| < 2.5, dxy< 0.5, dz < 1,
The conversion rejection cut gsfTrack.hitPattern().numberOfHits(HitPattern::MISSING_INNER_HITS)<=1
was used before we moved to the Spring15 BDT; we don't use it anymore since this variable is now part of the BDT inputs.
- Tight Electrons: Loose Electrons + non triggering MVA ID, using the following recipe:
- Lepton cross cleaning: Remove electrons which are within ΔR(eta,phi)<0.05 of a muon passing tight ID && SIP<4
- Isolation: New recipe, as of 7_6_X samples: The isolation cut is applied after recovered FSR photons are subtracted from the isolation cone (see below).
- iso/pT < 0.35, using PF combined relative isolation with cone size R=0.3:
double Ana::pfIso03(pat::Electron elec, double Rho) {
double PUCorr = Rho*ElecEffArea(elec.superCluster()->eta());
double iso = (elec.pfIsolationVariables().sumChargedHadronPt+std::max(elec.pfIsolationVariables().sumPhotonEt+elec.pfIsolationVariables().sumNeutralHadronEt-PUCorr,0.0))/elec.pt();
return iso;
}
- using rho correction with Spring15-25ns-based effective areas from EGamma POG
. Note that these EA are binned in eta of the electron's supercluster, not the electron eta.
Photons for FSR
Start from PF photons from the particleFlow collection.
- Preselection: pT > 2 GeV, |η| < 2.4, photon PF relative isolation less than 1.8.
The PF isolation is computed using a cone of 0.3, a threshold of 0.2 GeV on charged hadrons with a veto cone of 0.0001, and 0.5 GeV on neutral hadrons and photons with a veto cone of 0.01, including also the contribution from PU vertices (same radius and threshold as per charged isolation) .
- Supercluster veto: remove all PF photons that match with any electron passing loose ID and SIP cuts; matching is according to (|Δφ| < 2, |Δη| < 0.05) OR (ΔR < 0.15), with respect to the electron's supercluster.
- Photons are associated to the closest lepton in the event among all those passing loose ID + SIP cut.
- Discard photons that do not satisfy the cuts ΔR(γ,l)/ETγ2 < 0.012, and ΔR(γ,l)<0.5
- If more than one photon is associated to the same lepton, the lowest-ΔR(γ,l)/ETγ2 is selected.
- For each FSR photon that was selected, exclude that photon from the isolation cone all leptons in the event passing loose ID + SIP cut if it was in the isolation cone and outside the isolation veto (ΔR>0.01 for muons and (ele->supercluster()->eta() < 1.479 || dR > 0.08) for electrons; note: these requirements should probably be rechecked for consistency with isolation algorithms).
Lepton efficiency scale factors
Muon scale factors
The histogram with overall data to simulation scale factors (for tracking, reconstruction, identification, impact parameter and isolation requirements
) is available here
.
Electron scale factors
Electron efficiencies are measured for ID|Reco, ID+ISO+SIP|Reco and SIP|Reco in 6 electron pT bins (7, 10, 20, 30, 40, 50, 1000) and 4 electron superclaster |η| bins (0.0, 0.8, 1.479, 2.0, 2.5). A novelty with respect to the run I is a different set of scale factors for crack electrons. gsf::Electron->isGap()
is used to determine whether the electron is a crack electron. These scale factors are officially approved by egamma POG ( presentation
), derived with 76X data and ready for use.
- Scale factors for new MVA ID working point with respect to the reconstruction with full systematics :
- Scale factors for ID+ISO+SIP, for |SIP| < 4 and iso/pT < 0.35 with cone size R=0.3 working points with respect to the reconstruction with full systematics:
- Scale factors for SIP with respect to the reconstruction with only central values provided:
Please note that scale factors are measured and therefore should be applied for electrons up to 1000 GeV but for simplicity the upper bound in the provided root files is always 200 GeV. All the fits and plots can be found here
.
Lepton momentum scale and resolution, event-by-event mass error calibration
Muon scale and resolution corrections
Muon momentum scale corrections are applied using the KalmanMuonCorrector class.
The corrections can be downloaded by doing, under $CMSSW_BASE/src
, the following
git clone https://github.com/bachtis/Analysis.git -b KaMuCa_V2 KaMuCa
A KalmanMuonCalibrator
object can be created using as input string "DATA_76X_13TeV"
or "MC_76X_13TeV"
. Then:
/// ====== ON DATA (correction only) =====
double corrPt = calibrator.getCorrectedPt(mu.pt(), mu.eta(), mu.phi(), mu.charge());
double corrPtError = corrPt * calibrator.getCorrectedError(corrPt, mu.eta(), mu.bestTrack()->ptError()/corrPt );
/// ====== ON MC (correction plus smearing) =====
double corrPt = calibrator.getCorrectedPt(mu.pt(), mu.eta(), mu.phi(), mu.charge());
double corrPtError = corrPt * calibrator.getCorrectedError(corrPt, mu.eta(), mu.bestTrack()->ptError()/corrPt );
double smearedPt = calibrator.smear(corrPt, mu.eta());
double smearedPtError = smearedPt * calibrator.getCorrectedErrorAfterSmearing(smearedPt, mu.eta(), corrPtError /smearedPt );
Electron scale and resolution corrections
WARNING: Egamma corrections should be applied only on 25ns data, not on 50ns data :WARNING
Use 7_6_X branch of the EGamma code from https://twiki.cern.ch/twiki/bin/view/CMS/EGMSmearer
Apply the correction before all cuts and the computation of the relative isolation, but do not recompute the MVA id discriminator
ak4 Jets
Starts from the slimmedJets
collection.
In 76X, reapply Jet energy corrections:
- MC: Fall15_25nsV2_MC, apply 'L1FastJet','L2Relative','L3Absolute'
- data: Fall15_25nsV2_DATA, apply 'L1FastJet','L2Relative','L3Absolute','L2L3Residual'
looseJetID: follow instructions in
https://twiki.cern.ch/twiki/bin/viewauth/CMS/JetID#Recommendations_for_13_TeV_data
pileupJetId: PU jet ID is currently buggy; a fix will arrive soon. In the meanwhile, it should not be applied. Still true?
Previously used cut:
float jpumva=0.;
jpumva=j->userFloat("pileupJetId:fullDiscriminant");
if(jpt>20){
if(jeta>3.){
if(jpumva<=-0.45)passPU=false;
}else if(jeta>2.75){
if(jpumva<=-0.55)passPU=false;
}else if(jeta>2.5){
if(jpumva<=-0.6)passPU=false;
}else if(jpumva<=-0.63)passPU=false;
}else{
if(jeta>3.){
if(jpumva<=-0.95)passPU=false;
}else if(jeta>2.75){
if(jpumva<=-0.94)passPU=false;
}else if(jeta>2.5){
if(jpumva<=-0.96)passPU=false;
}else if(jpumva<=-0.95)passPU=false;
}
The jets are required to have Pt>30 and |eta|<2.4.
They must be cleaned with a DeltaR>0.4 cut wrt all tight leptons in the event (cf. ID definition at HiggsZZ4l2015#Muons and HiggsZZ4l2015#Electrons) passing the SIP and isolation cut computed after FSR correction, as well as with all FSR collected photons attached to these leptons.
For extra jets in the event (not forming ZZ candidates) the |eta| cut is relaxed to 4.7.
ak8 (merged) Jets
Starts from the slimmedJetsAK8
collection.
In 80X, reapply Jet energy corrections:
- MC: Fall15_25nsV2_MC, apply 'L1FastJet','L2Relative','L3Absolute'
- data: Fall15_25nsV2_DATA, apply 'L1FastJet','L2Relative','L3Absolute','L2L3Residual'
Additionally, apply the L2 and L3 corrections only to the jet pruned mass.
looseJetID: follow instructions in
https://twiki.cern.ch/twiki/bin/viewauth/CMS/JetID#Recommendations_for_13_TeV_data
The jets are required to have Pt>170, |eta|<2.4, tau21 < 0.6.
They must be cleaned with a DeltaR>0.8 cut wrt all tight leptons in the event (cf. ID definition at HiggsZZ4l2015#Muons and HiggsZZ4l2015#Electrons) passing the SIP and isolation cut computed after FSR correction, as well as with all FSR collected photons attached to these leptons.
The scale factors for the working points can be found at the dedicated jet W-tagging twiki:
https://twiki.cern.ch/twiki/bin/viewauth/CMS/JetWtagging#Working_points_and_scale_factors
Trigger requirements
- The paths are now:
- HLT_Ele23_Ele12_CaloIdL_TrackIdL_IsoVL_DZ_v* || HLT_DoubleEle33_CaloIdL_GsfTrkIdVL_v* || HLT_Ele27_WPTight_Gsf_v* || HLT_Ele25_eta2p1_WPTight_Gsf_v* HLT_Ele27_eta2p1_WPLoose_Gsf_v*
- HLT_Mu17_TrkIsoVVL_Mu8_TrkIsoVVL_DZ_v* || HLT_Mu17_TrkIsoVVL_TkMu8_TrkIsoVVL_DZ_v* || HLT_IsoMu24_v* || HLT_IsoTkMu24_v*
Do we really need / want such a complicated trigger selection?
Analysis flow (Needs to be updated)
- requiring at least one good vertex:
!isFake && ndof > 4 && |z| <= 24 && position.Rho <= 2
cms.EDFilter("VertexSelector",
src = cms.InputTag("offlinePrimaryVertices"),
cut = cms.string('!isFake && ndof > 4 && abs(z) <= 24 && position.Rho <= 2'),
filter = cms.bool(True),
)
- IP and isolation requirement: all leptons should satisfy cuts described above (on FSR-subtracted isolation)
- Z to lepton candidates made of OSSF lepton pairs passing the above requirements.
- In addition, require: 55 < mll < 120 GeV and ptll > 100 GeV
- Z to jet candidates are either a merged jet passing the above requirements for ak8 or a jet pair passing the above requirements for ak4
- In addition require: 40 < mjj or mJ < 180 GeV and ptjj > 100 GeV (ptJ is already cut at 170 GeV at miniAOD level): the mass cuts are very loose to keep both signal region and sideband events, the latter will be used for background estimation.
- Among all ZZ pairs, require that:
- ΔR(eta,phi)>0.02 between each of the leptons (to remove ghosts)
- the two highest-pT leptons pass pT > 40 and 24 GeV
- define the Z1 as the one with jets (two ak4 jets or one ak8 jet); the Z with leptons is the Z2.
- m(2l2j / 2l1J) > 300 GeV
- If more than one ZZ candidate survives, choose the one with the Z2 closest in mass to nominal Z. If two or more candidates include the same Z2 and differ just by the Z1, choose the one with the highest-pT (of the merged jets or the vector sum of jet pTs in case of resolved jets). Double check this
K factors
- For the time being we only apply a FLAT k-factor = 1.231 for the DY+jets background.
Kinematic Discriminants
The latest construction of spin0/spin2 discriminants and the corresponding templates can be ref erred to the talk :
https://indico.cern.ch/event/566815/contributions/2290419/attachments/1344113/2025575/HighMass_2l2q160926.pdf
Event Categorization
- VBF-tagged category: At least two extra jets and
vbfmela > 1.043 - 460. / ZZMass + 634.
- B-tagged category: At least two b-tags either in the two jets (subjets) that make the resolved (merged) Z1. We use CSVv2 with b-tagging threshold of 0.46 (I think this is MEDIUM, but check it)
Using Z mass constraint to fit jet momenta, more details in: kinZfitter
To get the code:
cd $CMSSW_BASE/src
git clone https://github.com/tocheng/KinZfitter.git
cd KinZfitter
git checkout -b from-Zhadv1.1 Zhadv1.1
Combination Cards
Frameworks
CJLST
Efficiency calculation (notes from Candice)
Main repo: https://github.com/CandiceYou/HighMassCombInputs
For running efficiency, resolution and 2D templates, it is sufficient to just uncomment the relevant block in this bash script and run it:
https://github.com/CandiceYou/HighMassCombInputs/blob/2l2q/run/run2l2q_all.sh
When the new trees arrive, the input files will be changed here:
https://github.com/CandiceYou/HighMassCombInputs/blob/2l2q/interface/ResoEff.h#L19-L21
and here
https://github.com/CandiceYou/HighMassCombInputs/blob/2l2q/src/selection.cc#L41-L134
- The inputfiles_spin0ggH and inputfiles_spin0VBF arrays list the sample masses , and get the file in the following format:
-
/ggHiggs/ZZ2l2qAnalysis.root
-
/VBFHiggs/ZZ2l2qAnalysis.root
- The background samples and data files are in selection.cc.
Since the current samples are in different directories with different naming, I temporarily added a inputDir2 and hacked selection.cc a bit to add high mass samples. This will be simplified once the new samples arrive.
These are the 2D template I got with the last samples:
http://cayou.web.cern.ch/cayou/HighMass/170213/
I think efficiency will be a easier test run, since resolution and template involve many scripts. If you run the default run2l2q_all.sh, the efficiency will be produced.
The code will make a new set of trees which make the selections and label each event for 12 categories. This step usually takes some time.
To speed it up for a quick test, you can select only a few samples by reducing the arrays here:
https://github.com/CandiceYou/HighMassCombInputs/blob/2l2q/interface/ResoEff.h#L20-L21
Review page
-- ThiagoTomei - 2017-02-17
-- SudhaAhuja - 2017-01-12