PAT Examples: Higgs Analysis: H->ZZ->4l analyses

This example comes from the link: here.

Physics Motivation/Goals

The purpose of the High Mass H->ZZ exercise is to run the full H->ZZ->4l analysis on 2010+2011 data and Summer11 Monte Carlo samples. The 4e, 4mu and 2e2mu final state will be treated at the same time consistently. Emphasys on the strategy to suppress the background from Zbbar and ttbar will be given.
The event yields at each step of the selection will be studied with Monte Carlo simulation for backgrounds and signal with different masses.
Distributions of the relevant observables for data and MC will be looked at different selection steps.
The students will be requested to evaluate the significance of the various cuts and optimize them: are you able to find even better cuts than the standard analysis?

Contacts

Sara Bolognesi & Nicola De Filippis & Alexey Drozdetskiy & Marco Meneghelli

H->ZZ->4l analysis in a nutshell:

Introductory slides here

  • Signatures: 4e, 4mu and 2e2mu final state
  • Smaller rate compared to H->WW but very clean signal and mass peak reconstruction
  • Very small backgrounds:
    • irreducible ZZ
    • reducible Zbb, tt with leptons from b/c hadrons decays
    • reducible Z+jets, W+jets, QCD with fake leptons
  • Selection strategy and observables:
    • 4 well reconstructed and isolated leptons
    • leptons coming from the primary vertex
    • di-lepton fitting at least one on-mass shell Z

Pre-requisites

  • Basic knowledge of C++ and CMSSW
  • Basic knowledge of physics objects and the identification techniques: electrons and muons
  • Knowledge of ROOT and ability to write, compile and execute macros
  • Short Exercises suggested:
    • Electrons at link here
    • Muons at link: here

Getting started

Setup CMSSW_4_4_2 locally (eg, in your cmslcp account or lxplus account). Four configurations are provided to do the exericses: one for "FNAL", one for "CERN", one for "PISA", one for a generic local PBS batch scheduler.

At FNAL:

bash (->to move to bash shell)
source /uscmst1/prod/sw/cms/bashrc prod
cmscvsroot CMSSW
cd nobackup

AT CERN:

bash (->to move to bash shell)
cd scratch0 

At PISA: setup here

bash (->to move to bash shell)
source /afs/pi.infn.it/grid_exp_sw/cms/scripts/setcms.sh
cmscvsroot CMSSW
cd /gpfs/gpfsddn/cms/user/`whoami`

Go ahead:

scramv1 project CMSSW CMSSW_4_4_2
cd CMSSW_4_4_2/src/
eval `scramv1 runtime -sh`

Download the HiggsToZZ4leptons code and compile it

cvs co HiggsAnalysis/HiggsToZZ4Leptons
cvs co HiggsAnalysis/Skimming
cvs co Configuration/Skimming/python/PDWG_HZZSkim_cff.py
scramv1 b 

This code run on the samples (MC and data) produced centrally and it stores the relevant variables into root-tuples. The configuration files that apply the skim and produce the root-ples are:

HiggsAnalysis/HiggsToZZ4Leptons/test/HiggsToZZ_HZZSkim_mc.py
HiggsAnalysis/HiggsToZZ4Leptons/test/HiggsToZZ_HZZSkim_data.py

You don't need to run this code since it's slow and we will provide root-tuples to you. You can try to look at it and run it in your spare time wink

Location of root-ples

  • At FNAL:
/pnfs/cms/WAX/11/store/user/cmsdas/2012/HZZHighMassExercise/data  for 2010 and 2011 data
/pnfs/cms/WAX/11/store/user/cmsdas/2012/HZZHighMassExercise/sig    for MC H->ZZ->4l signal (gluon fusion and VBF)
/pnfs/cms/WAX/11/store/user/cmsdas/2012/HZZHighMassExercise/bkg   for Summer11 background samples

  • At CERN:
/castor/cern.ch/user/n/ndefilip/DAS/data for 2010 and 2011 data
/castor/cern.ch/user/n/ndefilip/DAS/sig for MC H->ZZ->4l signal (gluon fusion and VBF)
/castor/cern.ch/user/n/ndefilip/DAS/bkg for Summer11 background samples
/castor/cern.ch/user/m/mene/HIGGS or the exercise on monitoring

  • At PISA:
/gpfs/gpfsddn/srm/cms/store/user/cmsdas/2012/HZZHighMassExercise/data  for 2010 and 2011 data
/gpfs/gpfsddn/srm/cms/store/user/cmsdas/2012/HZZHighMassExercise/sig    for MC H->ZZ->4l signal (gluon fusion and VBF)
/gpfs/gpfsddn/srm/cms/store/user/cmsdas/2012/HZZHighMassExercise/bkg   for Summer11 background samples
/gpfs/gpfsddn/srm/cms/store/user/cmsdas/2012/HZZHighMassExercise/HIGGS for the exercise on monitoring

Preliminary setup

Step 1: The input files

Go in the HiggsAnalysis/HiggsToZZ4Leptons/test/macros directory. A lot of material is provided.

cd HiggsAnalysis/HiggsToZZ4Leptons/test/macros
Create the tree of directories which will be used later to store the output:
bash createdir.sh

In case of CERN the input files are stored on castor (as reported above). We have to be sure that the files are copied into disk (and not only on tape), i.e. they are "STAGED". To do that run this script:

bash stager_get.sh

Copy locally one of the tuple and look into it. If you are on lxplus you can copy it into /tmp/ directory, for instance.
In case of FNAL configuration:

/opt/d-cache/dcap/bin/dccp  /pnfs/cms/WAX/11/store/user/cmsdas/2012/HZZHighMassExercise/data/roottree_leptons_DoubleMu_Cert_160404-177515_7TeV_PromptReco_Collisions11_JSON_03Oct2011-v1.root some_local_path_you_like
In case of CERN configuration:
rfcp /castor/cern.ch/user/n/ndefilip/DAS/data/roottree_leptons_DoubleMu_Cert_160404-177515_7TeV_PromptReco_Collisions11_JSON_03Oct2011-v1.root some_local_path_you_like
In case of PISA configuration:
cp  /gpfs/gpfsddn/srm/cms/store/user/cmsdas/2012/HZZHighMassExercise/data/roottree_leptons_DoubleMu_Cert_160404-177515_7TeV_PromptReco_Collisions11_JSON_03Oct2011-v1.root some_local_path_you_like
Open the root-ple copied locally with the command:
root -l some_local_path_you_like/roottree_leptons_DoubleMu_Cert_160404-177515_7TeV_PromptReco_Collisions11_JSON_03Oct2011-v1.root 

Content of root-ples and brief explanation is provided here.
Let's look just at some basic variables:
How many events does it contains?
How many muons per events?
What is their muon pT spectrum?
What is their isolation distribution?

Step 2: The analysis code

Caveat: you don't need to know these macros in details, just try to run them. If you are interested, you can look further at them in your spare time wink

Software macros to analyze the roottuple are
BaselineMacro4mu.C, BaselineMacro4e.C, BaselineMacro2e2mu.C.
Those macros run the full selection for 4mu, 4e and 2e2mu and save histograms in output files in ROOT format. They also print some info in text format for each event.

Those macros are run by the following script
compilebaseline_4mu_single.C, compilebaseline_4e_single.C, compilebaseline_2e2mu_single.C
Let's modify the 4mu script putting a signal input file with our favorite mass:

  • for CERN /castor/cern.ch/user/n/ndefilip/DAS/sig/roottree_leptons_GluGluToHToZZTo4L_M-120_7TeV-powheg-pythia6.root
  • for FNAL dcap://cmsgridftp.fnal.gov:24125/pnfs/cms/WAX/11/store/user/cmsdas/2012/HZZHighMassExercise/sig/roottree_leptons_GluGluToHToZZTo4L_M-120_7TeV-powheg-pythia6.root
  • for PISA /gpfs/gpfsddn/srm/cms/store/user/cmsdas/2012/HZZHighMassExercise/sig/roottree_leptons_GluGluToHToZZTo4L_M-120_7TeV-powheg-pythia6.root
Compile all the macros:
cmsenv; bash compilebaseline.sh

Thus you created several executables called RunBaselineMacro*
Let's run them. You should put an argument corresponding to the site configuration: CERN or FNAL. Moreover you may redirect the output into a local file (e.g., on /tmp/)

./RunBaselineMacro4mu FNAL > some_local_path_you_like/out.txt &
or
./RunBaselineMacro4mu CERN  > some_local_path_you_like/out.txt &
or
./RunBaselineMacro4mu PISA  > some_local_path_you_like/out.txt &

While the script is running, let's look at the output text file. The macro has also produced a file with histograms, let's look at it.
How many events survived the cut?
How much is the resolution on the Higgs mass peak?

Try to answer to this question more quantitatively. In the folder MASS_AN/ you can find some macros for fits. The main one is Analyze_Mass.C
First copy the file with mass histogram in the folder MASS_AN/ (better changing its name)

cp output.root MASS_AN/output_4mu.root

Then run Analyze_Mass.C giving as input the file name, the file typr (S signal or B background) and the fit type.

cd MASS_AN
root -l
.x Analyze_Mass.C("output_4mu.root","S","gaus")
Try the various fits proposed:

Change the range for fitting in Analyze_Mass.C according to the Higgs mass range.

Which is the best one? What is the mass resolution?
Do the same for 4e and 2e2mu final state.


Run now on one sample of signal (let's say 4mu with mH=150), the ttbar sample and the ZZ and try to superimpose the histograms for mZ1, isolation and impact parameter for the 3 samples normalizing to the same area. Try to understand which cuts are the best to discriminate the signal from the background. Copy the histograms file from the samples area:

At FNAL:

/opt/d-cache/dcap/bin/dccp /pnfs//cms/WAX/11/store/user/cmsdas/2012/HZZHighMassExercise/output_GluGluToHToZZTo4L_M-150_7TeV-powheg-pythia6.root .
/opt/d-cache/dcap/bin/dccp /pnfs//cms/WAX/11/store/user/cmsdas/2012/HZZHighMassExercise/output_TTTo2L2Nu2B_7TeV-powheg-pythia6_H150.root . 
/opt/d-cache/dcap/bin/dccp /pnfs//cms/WAX/11/store/user/cmsdas/2012/HZZHighMassExercise/output_ZZTo4mu_7TeV-powheg-pythia6_H150.root . 

At CERN:

rfcp /castor/cern.ch/user/n/ndefilip/DAS/output_GluGluToHToZZTo4L_M-150_7TeV-powheg-pythia6.root .
rfcp /castor/cern.ch/user/n/ndefilip/DAS/output_TTTo2L2Nu2B_7TeV-powheg-pythia6_H150.root . 
rfcp /castor/cern.ch/user/n/ndefilip/DAS/output_ZZTo4mu_7TeV-powheg-pythia6_H150.root . 

At Pisa:

cp  /gpfs/gpfsddn/srm/cms/store/user/cmsdas/2012/HZZHighMassExercise/output_GluGluToHToZZTo4L_M-150_7TeV-powheg-pythia6.root .
cp  /gpfs/gpfsddn/srm/cms/store/user/cmsdas/2012/HZZHighMassExercise/output_TTTo2L2Nu2B_7TeV-powheg-pythia6_H150.root . 
cp  /gpfs/gpfsddn/srm/cms/store/user/cmsdas/2012/HZZHighMassExercise/output_ZZTo4mu_7TeV-powheg-pythia6_H150.root . 


You may find useful the macro Sgn_Bkg_superimpose/Sgn_Bkg_superimpose.C, designed to normalize and plot the histograms of mZ1, isolation and impact parameter for a set of files given as input:. First copy the root files inside the folder Sgn_Bkg_superimpose/, then:

root -l
.x Sgn_Bkg_superimpose.C("file1","file2","file3")
If you want you can add some other variables, by editing the macro.
Try also to fit the ZZ background with the dedicated fit "ZZ" (ZZ.C) (it is a complicate function, for reference see CMS AN 202/2011).

Step 3: Run the analysis code iteratively, on all the samples, using the batch system:

In this step we will run the previous executable on all the list of samples: data, signal and background Monte Carlo (Skip this step if at PISA).
The ingredients are the following:
  • The list of all the samples with their cross section and weights is reported here:
    data_input_all.txt, bkg_input_all.txt, sig_input_all.txt
  • Script to run the executable and save the stdout/stderr:
    submit_HZZ4LeptonsAnalysis_FNAL.sh (to run at FNAL, open it and modify the path to your working directory)
    submit_HZZ4LeptonsAnalysis_CERN.sh (to run at CERN, open it and modify the path to your working directory)
  • A template for running with CONDOR at FNAL: condor_template.cfg ( open it and modify the email address to for notification)
  • Scripts to submit jobs on all the samples of data, MC signal and bkg:
    loopcheck_signal_2e2mu.sh loopcheck_signal_4e.sh loopcheck_signal_4mu.sh
    loopcheck_bkg_2e2mu.sh loopcheck_bkg_4e.sh loopcheck_bkg_4mu.sh
    loopcheck_data_2e2mu.sh loopcheck_data_4e.sh loopcheck_data_4mu.sh

Each loopcheck script does the following:

  • need an argument: FNAL, CERN or something generic to run at FNAL or CERN
  • loops on the numbers of samples in the input txt files (data_input_all.txt, bkg_input_all.txt, sig_input_all.txt)
  • creates cards for each samples saved in BkgCards4mu, BkgCards4e, BkgCards2e2mu
  • prepares a bash script for each job saved in
    • jobs/submit_HZZ4LeptonsAnalysis_h150_sample name_4mu.sh
    • jobs/submit_HZZ4LeptonsAnalysis_h150_sample name_4e.sh
    • jobs/submit_HZZ4LeptonsAnalysis_h150_sample name_2e2mu.sh
  • creates scripts for CONDOR scheduler at FNAL for each shell submit_*.sh script.
  • submits each job to the local scheduler

Let's run them! As usual, put CERN or FNAL as argument if you are running at CERN or FNAL.

chmod +x loopcheck*

For 2e2mu analysis at FNAL or CERN:
./loopcheck_data_2e2mu.sh FNAL (or CERN)
./loopcheck_bkg_2e2mu.sh FNAL (or CERN)
./loopcheck_signal_2e2mu.sh FNAL (or CERN)

For 4mu analysis at FNAL or CERN:
./loopcheck_data_4mu.sh FNAL (or CERN)
./loopcheck_bkg_4mu.sh FNAL (or CERN)
./loopcheck_signal_4mu.sh FNAL (or CERN)

For 4e analysis at FNAL or CERN:
./loopcheck_data_4e.sh FNAL (or CERN)
./loopcheck_bkg_4e.sh FNAL (or CERN)
./loopcheck_signal_4e.sh FNAL (or CERN)

The output files with histograms are saved in histos4mu, histos4e, histos2e2mu directories. Those files are used as input to produced plots of the analysis.
The files with standard output are saved in

*jobs/RunBaselineMacro<4l>_<type of data>_<sample name>_<4l>.log*  for FNAL configuration
*$CASTOR_HOME/DAS/RunBaselineMacro<4l>_<type of data>_<sample name>_<4l>.log*  for CERN configuration

This output is the same as we looked into the previous step but now is produced for all the samples (data and MC). In the next step we will use macros to look at the physics content of what we just produced.

Exercise 1: Make your own favorite plots

The macro PlotEvent4mu.C makes a plot with the number of event after each selection cut for all samples and the data. Let's run it.

root -l
.L PlotEvent4mu.C+
PlotEvent4mu()
.q
display plots/h_nEvent_4l_new_4mu_log.png

Look at the plot:
Each bin on x-axis corresponds to a selections step, full description of the cuts is provided here and this is a summary:

  1. First Z: a pair of lepton candidates of opposite charge and matching flavour satisfying m1,2 > 50 GeV/c^2, pT,1 > 20 GeV/c and pT,2 > 10 GeV/c, Riso,1 + Riso,2 < 0.35 and |SIP3D1,2| < 4; the pair with reconstructed mass closest to the nominal Z boson mass is retained and denoted Z1.
  2. Three or more leptons: at least another lepton candidate of any flavour or charge.
  3. Four or more leptons and a matching pair: a fourth lepton candidate with the flavour of the third lepton candidate from the previous step, and with opposite charge.
  4. Choice of the ``best 4l'' and Z1, Z2 assignments: retain a second lepton pair, denoted Z2, among all the remaining l+l- combinations with MZ2 > 12 GeV/c^2 and such that the reconstructed four-lepton invariant mass satisfies mZ1Z2 > 100 GeV/c^2. For the 4e and 4mu final states, at least three of the four combinations of opposite sign pairs must satisfy mll > 12 GeV/c^2. If more than one Z2 combination satisfies all the criteria, the one built from leptons of highest pT is chosen.
  5. Relative isolation for selected leptons: for any combination of two leptons i and j, irrespective of flavour or charge, the sum of the combined relative isolation Riso,j + Riso,i < 0.35.
  6. Impact parameter for selected leptons: the significance of the 3D impact parameter to the event vertex, SIP3D, is required to satisfy |SIP3D| = < 4 for each lepton
  7. Z1 and Z2 kinematics:
    Low-Mass selection : 50 < MZ1 < 120 GeV/c^2 and 12 < MZ2 < 120 GeV/c^2 (best for significance at MH < 130 GeV/c^2)
    Baseline selection : 60 < MZ1 < 120 GeV/c^2 and 20 < MZ2 < 120 GeV/c^2 (best for significance at 130 < MH < 180 GeV/c^2)
    High-Mass selection : 60 < MZ1 < 120 GeV/c^2 and 60 < MZ2 < 120 GeV/c^2 (best for significance at MH > 180 GeV/c^2)

What is the main background before the cuts? What is the main background after all the cuts?
What cuts are more effective in cutting the various backgrounds?
What is the S/B (signal over background) before the cuts and after all the cuts?
Are there big differences between 4mu and 4e final state?

Another useful macro is PlotStack4l.C which creates plot of interesting variables at different selection step for all the samples and the data. By default it produce plots summing up the distribution for 4e, 4mu and 2e2mu for the samples listed in the input file filelist_4l.txt. You can run separately on each channel editing the macro and replacing filelist_4l.txt with filelist_4e.txt, filelist_4mu.txt, filelist_2e2mu.txt according to the final state.

root -l
.L PlotStack4l.C+
PlotStack4l()
.q
display plots/h_hafterbestZ1_Z1mass_4l_log.png
This macro produce a plot of the mass of the first Z at the beginning of the selection (just after the cut 1.).
Which background has a real Z in it? Do you understand the shape of the distributions for each backgorund source ?

Let's try to produce the plot for another variable: the mass of the 4 lepton after cut 1.,2.,3.,4. and after the full selection.

  • comment the line
     std::string histolabel = "hafterbestZ1_Z1mass"; 
  • uncomment the line
     std::string histolabel = "hfourlepbestmass_4l_afterpresel";  
  • change the X and Y range of the histogram in the line
    TH2F *hframe= new TH2F("hframe","hframe",500,95,605.,500,0.0004,18.);//mass 
  • change the binning using the variable nRebin (e.g., change in the macro nRebin =10)
  • change the scale from logarithmic to linear: change in the macro useLogY = false
  • You need also to change the label on x and y axis in the lines:
         hframe->SetYTitle("Events/10 GeV/c^{2}");
         hframe->SetXTitle("M_{4l} [GeV/c^{2}]");
         
  • run again the macro

Try to produce the same plot after the full selection:

std::string histolabel = "hfourlepbestmass_4l_afterSel_new"; 

Let's look at isolation (e.g., hafterbestZ1_X) and impact parameter (e.g., hafterbestZ1_IP) distribution. Which are the backgrounds most rejected with these cuts?

Please produce the plots for the 3 channels separately and alltogheter.

For more experts: please produce a plot of the effieicny of the selection vs the higgs mass at the main step of the selection.

Exercise 2: Monitor the main variables

The search for the Higgs boson in the decay channel H → ZZ → 4l requires, as a first step, the reconstruction and selection of a Z boson decaying into a pair of electrons or muons. It is therefore important to monitor continuously the behavior of the selection of Z bosons decaying leptonically.

In order to achieve this, we divide data recorded by CMS in 2011 in successive luminosity slots of 20 /pb each. For each lumi-slot we will monitor 4 variables (after the first step of selection, the Z1 reconstruction):

  • a variable called Z-yield", defined as the number of lepton pairs Nll , with l = e, μ, with a reconstructed invariant mass mll lying in the range [80, 100] GeV/c2 , divided by L, the luminosity of the lumi-slot.With this definition Zyield is equivalent to the cross section σ ( pp → Z ) times the branching ratio BR( Z → ll ) times the overall selection efficiency.
  • The (mean value of) transverse momentum: pt of Z1
  • The (mean value of) isolation of leptons: Riso = Riso_l1 + Riso_l2
  • The (mean value of) significance of impact parameters SIP3D = max( SIP3D_l1,SIP3D_l2 )

The code you need for this exercise is in the folder: HiggsAnalysis/HiggsToZZ4Leptons/test/macros/ES_PISA

The .root files with all the 2011 statistics and the associated .csv files, containing the lumi info, are in CASTOR at CERN:

 /castor/cern.ch/user/m/mene/HIGGS/Mu_ALL_7nov.root 
 /castor/cern.ch/user/m/mene/HIGGS/Mu_ALL_7nov.csv 
 /castor/cern.ch/user/m/mene/HIGGS/Ele_ALL_7nov.root 
 /castor/cern.ch/user/m/mene/HIGGS/Ele_ALL_7nov.csv 

or in Pisa in

/gpfs/gpfsddn/srm/cms/store/user/cmsdas/2012/HZZHighMassExercise/HIGGS
directory.

First run the code to divide the 2011 data (4.64 /fb) in 20 /pb lumi-slots, to apply the Z1 selection and construct the histograms for the variables you want to monitor. You will get 4 histos for each lumi-slot, one per variable: Z-yield, pt, Iso, SIP.
You have to do this twice, for muons Z → 2 μ and electrons Z → 2 e.
The code for this is in the files ZmumuMonitor.C and ZeleeleMonitor.C, take a look before to run it..
Run the code (this might take hours..):

  • Real-time: open the files and modify the path to your working directory
     
         CERN:  rfcp /castor/cern.ch/user/m/mene/HIGGS/Ele_ALL_7nov.csv .       or     PISA: cp /gpfs/gpfsddn/srm/cms/store/user/cmsdas/2012/HZZHighMassExercise/HIGGS/Ele_ALL_7:nov.csv .   
         CERN:  rfcp /castor/cern.ch/user/m/mene/HIGGS/Mu_ALL_7nov.csv .       or     PISA: cp /gpfs/gpfsddn/srm/cms/store/user/cmsdas/2012/HZZHighMassExercise/HIGGS/Mu_ALL_7:nov.csv .
         root -q -b submit_ZeleeleMonitor.C
         root -q -b submit_ZmumuMonitor.C 
         
  • Via batch: open the files and modify the path to your working directory
     
         bsub -q 1nd -J Mu_CERN < submit_ZmumuMonitor.sh    (at CERN)
         bsub -q 1nd -J Ele_CERN < submit_ZeleeleMonitor.sh     (at CERN)
         
     
         bsub -q local < submit_ZmumuMonitor.sh    (at PISA)
         bsub -q lcal < submit_ZeleeleMonitor.sh       (at PISa)
         
Once you have run the code, you will have the histograms stored in the files: ZmumuMass.histo and ZeleeleMass.histo
(Copies of) these files can be found in CASTOR ( /castor/cern.ch/user/m/mene/HIGGS/ ).
Plot one or more of these histograms for the various variables you want to monitor. What about these distributions?

Run on the histos to obtain the variables you want to monitor as functions of the lumi-slot number (what do you expect to see?):

     root -l
     .x  PlotZmumuX.C
     
     root -l
     .x  PlotZeleeleX.C
     

You should get 4 plots for each Z decay channel ( μμ and ee ), one for every variable.
Comment on these plots! Are they as you thought they should have looked? Can you see any anomalies? If yes, do you have explanations?

Try to think about other variables that (in your opinion) are worth to be monitored.. Edit the files (.C) to monitor them.
In this exercise we stopped after the first step of selection chain (Z1 selection), why? Want you try to go further in the selection chain to monitor other variables?

Exercise 3: Optimize the cuts

The script cut_optimization.C is a template one can use to re-optimize some final cuts of the HZZ4l analysis. It runs on 4e final states
It runs on the set of ASCII input files prepared in the previous section
(You can even copy them locally on your laptop and run from there)

Let's look at the macro and run it.

root -l
.L cut_optimization.C+
analysis()

Output is the table with data/MC-background/MC-signal yields after sets of cuts implemented and measures of signal vs. background discrimination. There are various ways of estimating the "powerfulness" of the cuts in enhancing the signal and rejecting the background (aka "significance")
In the macros two estimators are used

  • simple ratio on number of signal events over number of background event: s/b
  • a more correct estimator is the significance defined by the log likelihood ratio, ScL: sqrt(2.0*(s+b)*log(1+s/b)-2.0*s).

Now, edit the macro and try to find more optimized set of cuts in terms of signal vs. background. Try to changes the macro to run also on 4mu and 2e2mu

Useful links

SwGuide for the HZZ4leptons package here

Contents

Introduction

Goal: Build and run the H->ZZ->4l analyses by using PAT objects algorithms

How to achieve that goal:

  • setup PATLayers to run on current HZZ skim samples
  • produce PATtuples with relevant collection in the event
  • make the analysis code working with edm::View to use the same code on RECO and PAT objects
  • prepare configuration files to use PAT objects and modules for the analysis
  • run the analysis with PAT configuration files
  • extract the results (plots and numbers)

How to get the code

Create a CMSSW working area:

 
   scramv1 project CMSSW CMSSW_2_2_13
   cd CMSSW_2_2_13/src

Download the relevant tags for HiggsAnalysis code:

 
   cvs co -r hzz4l_2213_V01_01_04 HiggsAnalysis/HiggsToZZ4Leptons
   cvs co -r V00-02-18 HiggsAnalysis/Skimming
   cvs co -r V00-01-11-1 RecoEgamma/ElectronIdentification
   cvs co -r CMSSW_2_2_13 PhysicsTools/Utilities/interface/AndSelector.h
   cvs co -r CMSSW_2_2_13 PhysicsTools/Utilities/interface/OrSelector.h
   cvs co -r CMSSW_2_2_13 PhysicsTools/UtilAlgos/interface/EventSetupInitTrait.h

How to setup the code to create PAT-tuples:

Modify some files:

 
   PhysicsTools/Utilities/interface/AndSelector.h
   PhysicsTools/Utilities/interface/OrSelector.h
   PhysicsTools/UtilAlgos/interface/EventSetupInitTrait.h

replacing the string "helpers" with "newhelpers" everywhere.

Follow the recipe to setup PATv2 in CMSSW_2_2_13 at link:

 
   addpkg CondFormats/JetMETObjects  V01-08-04
   addpkg PhysicsTools/RecoAlgos V08-06-16-06-02
   addpkg PhysicsTools/PFCandProducer V03-01-16
   addpkg RecoMET/Configuration V00-04-02-17
   addpkg RecoMET/METAlgorithms V02-05-00-21
   addpkg RecoMET/METProducers V02-08-02-17
   addpkg DataFormats/METReco V00-06-02-09
   addpkg DataFormats/MuonReco V07-02-12-03
   addpkg JetMETCorrections/Type1MET VB04-00-02-04
   addpkg RecoJets/JetAssociationAlgorithms V01-04-03
   addpkg JetMETCorrections/Algorithms V01-08-02-01
   addpkg JetMETCorrections/Configuration V01-08-15
   addpkg JetMETCorrections/JetPlusTrack V03-02-06
   addpkg JetMETCorrections/Modules V02-09-02

Modify PhysicsTools/PatAlgos/python/patEventContent_cff.py to add some collections needed for the analysis

 
   patEventContent = [
       'keep *_cleanLayer1Photons_*_*', 
       'keep *_cleanLayer1Electrons_*_*', 
       'keep *_cleanLayer1Muons_*_*', 
       'keep *_cleanLayer1Taus_*_*', 
       'keep *_cleanLayer1Jets_*_*',
       'keep *_layer1METs_*_*',
       'keep *_cleanLayer1Hemispheres_*_*',
       'keep *_cleanLayer1PFParticles_*_*',
       'keep *_offlinePrimaryVertices_*_*',
       'keep recoGsfTrackExtras_pixelMatchGsfFit_*_*',
       'keep recoTrackExtras_pixelMatchGsfFit_*_*',
       'keep recoTracks_generalTracks_*_*',
       'keep recoTrackExtras_generalTracks_*_*',
       'keep *_offlineBeamSpot_*_*'
   ]

Modify PhysicsTools/PatAlgos/python/recoLayer0/electronId_cff.py to run the same loose electronId we are currently using for the HZZ analyses

 
   import FWCore.ParameterSet.Config as cms

   from RecoEgamma.ElectronIdentification.electronIdCutBasedClassesExt_cfi import *
   import RecoEgamma.ElectronIdentification.electronIdCutBasedClassesExt_cfi
 
   eidRobustHighEnergy = 
RecoEgamma.ElectronIdentification.electronIdCutBasedClassesExt_cfi.eidCutBasedClassesExt.clone()

   patElectronId = cms.Sequence(
     eidRobustHighEnergy
   )

Modify the file PhysicsTools/PatAlgos/python/patSequences_cff.py by removing the trigger matching:

 
   beforeLayer1Objects = cms.Sequence(
      patAODReco +  # use '+', as there is no dependency
      patMCTruth   # among these sequences
   )

How to run the code to create PAT-tuples:

* Compile the code:*

 
   cd CMSSW_2_2_13/src
   scramv1 b

* Run the code to produce PAT-tuples:*

   
   cd HiggsAnalysis/HiggsToZZ4Leptons/test  
   cmsRun patLayer1_fromAOD_full.cfg.py

* The output PATtuple will include the following collections:*

PAToutput.bmp

How to produce PAT-tuples:

 
      /afs/cern.ch/user/n/ndefilip/public/crab_pat_H350_ZZ_4l_10TeV_GEN_HLT_4mu.cfg
      

Selection for HZZ analysis

Preselection consists of:

   -- PAT Layers for cleaning and electronID with the same tuning
   -- PAT Layers for building <PAT::object>  objects
   -- at least 2 PAT electrons with pT> 5 GeV/c irrespective of the charge
   -- at least 2 PAT muons with pT > 5 GeV/c  irrespective of the charge
   -- candidate combiner to build Zs and H 
   -- at least 1 Z->ee candidate with mll  >  12 GeV/c2
   -- at least 1 Z->mumu candidate with mll  >  12 GeV/c2
   -- at least one H candidate    with mllll > 100 GeV/c2
   -- two loose isolated electrons and muons

Full Selection consists of:

   -- tight isolation on leptons
   -- impact parameter constraint
   -- 2dIso vs pT cuts, pT cuts, mZ, mZ*, mH cuts --> not included in the python sequences

A simplified schema of the analysis could be found in this schema.

How to setup the HZZ analysis code

  • Use of edm::View to access physics objects (RECO or PAT), such as:

     edm::Handle<edm::View<Muon> > muons;
     edm::Handle<edm::View<CMS.GsfElectron> > electrons;

  • Prepare configuration files to use PAT for preselection and complete analysis, such as:

    HiggsAnalysis/HiggsToZZ4Leptons/python/hTozzTo4leptonsPreselectionPAT_2e2mu_cff.py
    HiggsAnalysis/HiggsToZZ4Leptons/python/hTozzTo4leptonsCompleteAnalysisPAT_2e2mu_cff.py

  • Example of a preselection python cfg file for 2e2mu analysis:

# Electron selection
from CMS.PhysicsTools.PatAlgos.selectionLayer1.electronSelector_cfi import *
import CMS.PhysicsTools.PatAlgos.selectionLayer1.electronSelector_cfi 
hTozzTo4leptonsElectronSelector=
CMS.PhysicsTools.PatAlgos.selectionLayer1.electronSelector_cfi.selectedLayer1Electrons.clone()
hTozzTo4leptonsElectronSelector.src = cms.InputTag("cleanLayer1Electrons")
hTozzTo4leptonsElectronSelector.cut = cms.string('pt > 5. & abs(eta) < 2.5')

# Muon selection
from CMS.PhysicsTools.PatAlgos.selectionLayer1.muonSelector_cfi import *
import CMS.PhysicsTools.PatAlgos.selectionLayer1.muonSelector_cfi 
hTozzTo4leptonsMuonSelector=
CMS.PhysicsTools.PatAlgos.selectionLayer1.muonSelector_cfi.selectedLayer1Muons.clone()
hTozzTo4leptonsMuonSelector.src = cms.InputTag("cleanLayer1Muons")
hTozzTo4leptonsMuonSelector.cut = 
cms.string('(pt > 5. & abs(eta) < 1.1) | (pt > 3. & p > 9. & abs(eta) >= 1.1)')

# zToEE
from HiggsAnalysis.HiggsToZZ4Leptons.zToEE_cfi import *
# zToMuMu
from HiggsAnalysis.HiggsToZZ4Leptons.zToMuMu_cfi import *                       

# hTozzToEEMuMu
from HiggsAnalysis.HiggsToZZ4Leptons.hTozzTo4leptons_cfi import *

# Electron loose isolation
from CMS.PhysicsTools.PatAlgos.selectionLayer1.electronSelector_cfi import *
import CMS.PhysicsTools.PatAlgos.selectionLayer1.electronSelector_cfi 
hTozzTo4leptonsElectronIsolationProducer=
CMS.PhysicsTools.PatAlgos.selectionLayer1.electronSelector_cfi.selectedLayer1Electrons.clone()
hTozzTo4leptonsElectronIsolationProducer.src = cms.InputTag("hTozzTo4leptonsElectronSelector")
hTozzTo4leptonsElectronIsolationProducer.cut = cms.string('trackIso/pt < 0.7')

# Muon loose isolation
from CMS.PhysicsTools.PatAlgos.selectionLayer1.muonSelector_cfi import *
import CMS.PhysicsTools.PatAlgos.selectionLayer1.muonSelector_cfi 
hTozzTo4leptonsMuonIsolationProducer=
CMS.PhysicsTools.PatAlgos.selectionLayer1.muonSelector_cfi.selectedLayer1Muons.clone()
hTozzTo4leptonsMuonIsolationProducer.src = cms.InputTag("hTozzTo4leptonsMuonSelector")
hTozzTo4leptonsMuonIsolationProducer.cut = 
cms.string('(2.0*trackIso+1.5*ecalIso+1.*hcalIso) < 60')

# Common preselection 
from HiggsAnalysis.HiggsToZZ4Leptons.hTozzTo4leptonsCommonPreselectionSequences_cff import *

How to run the HZZ analysis code

  • Configuration files to run the preselection in HiggsAnalysis/HiggsToZZ4Leptons/test:
   - HiggsToZZPreselection_2e2mu.py for 2e2mu preselection
   - HiggsToZZPreselection_4e.py for 4e preselection
   - HiggsToZZPreselection_4mu.py for 4mu preselection
   - HiggsToZZPreselection_4l.py for 2e2mu,4e and 4mu preselection

  • Configuration files to run the full analysis in HiggsAnalysis/HiggsToZZ4Leptons/test:
   - HiggsToZZCompleteAnalysis_2e2mu.py for 2e2mu full analysis
   - HiggsToZZCompleteAnalysis_4e.py for 4e full analysis
   - HiggsToZZCompleteAnalysis_4mu.py for 4mu full analysis
   - HiggsToZZCompleteAnalysis_4l.py for 2e2mu,4e and 4mu full analysis

  • Edit a configuration file and set a flag for PAT usage to 'true'
   usePAT='true'

  • Be sure that the input list of files is built with PAT-tuples such as
   /castor/cern.ch/user/n/ndefilip/PAT/PATLayer1_Output.fromAOD_full_1_h150_2e2mu.root
   /castor/cern.ch/user/n/ndefilip/PAT/PATLayer1_Output.fromAOD_full_1_h150_4e.root
   /castor/cern.ch/user/n/ndefilip/PAT/PATLayer1_Output.fromAOD_full_1_h150_4mu.root

  • Run the analysis:
   - cmsRun HiggsToZZCompleteAnalysis_2e2mu.py  #  for 2e2mu full analysis
   - cmsRun HiggsToZZCompleteAnalysis_4e.py       #  for 4e full analysis
   - cmsRun HiggsToZZCompleteAnalysis_4mu.py     #  for 4mu full analysis

  • Output files for 2e2mu analysis:
   preselect2e2mu.out             --> preselection efficiency for 2e2mu
   offselect2e2mu.out              --> offline selection efficiency for 2e2mu
   hTozzToEEMuMuCSA07.root  --> EDM file with filtered events
   roottree_2e2mu.root            --> ROOT tree with relevant variables

How to extract results: numbers and plots

Preselection:

cat preselect2e2mu.out
*********************** 
Preselection efficiency 
*********************** 

nSkim           : 155
nElec           : 107
nMuon           : 95
Z->EE           : 93
Z->MuMu         : 87
H->ZZ           : 87
loose IsolEle   : 87
loose IsolMu    : 82

If you run on all the signal samples a use the ROOT macro:

/afs/cern.ch/user/n/ndefilip/public/HZZ2e2muEfficiency.C
root -q HZZ2e2muEfficiency.C

you could compile a plot like this:

Effpresel.bmpf.

Plots after the complete analysis could be done with the macro:

/afs/cern.ch/user/n/ndefilip/public/simpleplots.C
root -q simpleplots.C

and you could obtain distributions like: masees.gif

Review status

Reviewer/Editor and Date (copy from screen) Comments
RogerWolf - 13 May 2009 Created the template page
SamirGuragain - 31 May 2012 Copied the recent instructions from https://twiki.cern.ch/twiki/bin/view/CMS/SWGuideCMSDataAnalysisSchoolHighMassHiggsSearchExercise and pasted here. The existing instructions in deprecated CMSSW release are in show/hide under the Useful links.

Responsible: KatiLassilaPerini
Last reviewed by: Samir Guragain 05-31-2012

Topic attachments
I Attachment History Action Size Date Who Comment
Bitmapbmp Effpresel.bmp r1 manage 2229.2 K 2009-06-19 - 11:42 NicolaDeFilippis  
Bitmapbmp HZZ4LeptonsAnalysisSchema.bmp r1 manage 1766.3 K 2009-06-19 - 01:33 NicolaDeFilippis  
Bitmapbmp PAToutput.bmp r3 r2 r1 manage 1206.9 K 2009-06-18 - 23:54 NicolaDeFilippis  
GIFgif masees.gif r1 manage 10.2 K 2009-06-19 - 13:03 NicolaDeFilippis  
Edit | Attach | Watch | Print version | History: r11 < r10 < r9 < r8 < r7 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r11 - 2012-05-31 - unknown
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback