Notes on Monte Carlo Validation for E/gamma HLT


The validation procedure for the electron and photon triggers is based on histograms produced by the module HLTriggerOffline/Egamma/src/EmDQM.cc. This module is very versatile and is usually configured to take the products used by the various EDFilters in an HLT path as input and fill histograms of the corresponding objects' transverse energy, pseudorapidity and azimuthal angle.

Current workflow for central DQM (starting from CMSSW_7_0_0_pre10)

Since CMSSW version 4.3.0.pre6 the validation histograms are produced centrally based on release validation (relval) samples and stored on the relval DQM server in the HLT/HLTEgammaValidation workspace. If in the default automatic mode, the module EmDQM searches the HLT menu with which the sample was created for egamma paths and then produces the histograms for these paths. The configuration file for the procedure can be found here.

This workflow is obsolete as of CMSSW_7_0_0_pre10. The module EmDQMFeeder searches the HLT menu with which the sample was created for egamma paths and generates the configuration for the EmDQM module, which then produces the histograms for these paths. The configuration file for the procedure can be found here.

There are four groups of E/gamma HLT paths considered for MC validation: Single electron, double electron, single photon and double photon paths.

To avoid having very low values when calculation the efficiencies for high Et threshold paths, only events with particles that pass the Et cut of the path are considered. (e.g. for a HLT_Photon50 path the events has to have at least one photon with Et above 50GeV.)

Running the validation steps yourself

Running all validation (not only E/gamma and not only HLT)

This can be done by running step 2 and step 3 described here (full and fast simulation).

In the fastsim examples given there one can replace the first (file) argument given to the cmsDriver.py command for step 2+3 (e.g. SingleMuPt10_cfi.py) by other files in Configuration/Generator/python/, e.g.:

cmsDriver.py WE_cfi.py -s GEN,FASTSIM,HLT,VALIDATION --pileup=NoPileUp --conditions auto:mc --eventcontent=FEVTDEBUGHLT --datatier GEN-SIM-DIGI-RECO -n 1000

(note that then also the name of the file to be used in step 3 changes).

Step 2 needs Relval data samples (GEN-SIM-DIGI-RAW-HLTDEBUG data tier) and generates a CMSSW ROOT file. It runs instances of the EmDQM module to produce histograms. hltEle27CaloIdTCaloIsoTTrkIdTTrkIsoTTrackIsoFilter Step 3 runs an instance of EmDQMPostProcessor and generates a ROOT file with (DQM) histograms.

Running just E/gamma HLT valdiation

Building HLTriggerOffline/Egamma

CMSSW_7_0_0_pre11 and higher

export SCRAM_ARCH=slc6_amd64_gcc481
cmsrel CMSSW_7_0_0_pre11
cd CMSSW_7_0_0_pre11/src
cmsenv
git cms-addpkg HLTriggerOffline/Egamma
scram b
cd HLTriggerOffline/Egamma/test

CMSSW_5_X

(tested with CMSSW_5_0_0_pre4)

export SCRAM_ARCH=slc5_amd64_gcc434
cmsrel CMSSW_5_0_0_pre4
cd CMSSW_5_0_0_pre4/src
cmsenv
addpkg HLTriggerOffline/Egamma
scram b
cd HLTriggerOffline/Egamma/test

Note that from CMSSW version 5_1_Y onwards slc5_amd64_gcc461 should be used.

CMSSW_4_2

(tested with CMSSW_4_2_0_pre7)

export SCRAM_ARCH=slc5_amd64_gcc434
cmsrel CMSSW_4_2_0_pre7
cd CMSSW_4_2_0_pre7/src
cmsenv
addpkg HLTriggerOffline/Egamma
sed -i 's/^use_new_method = False/use_new_method = True/' HLTriggerOffline/Egamma/python/EgammaValidation_cff.py
scram b
cd HLTriggerOffline/Egamma/test

CMSSW_4_2 code in CMSSW_3_11

cmsrel CMSSW_3_11_1_hltpatch1
cd CMSSW_3_11_1_hltpatch1/src
cmsenv
addpkg HLTriggerOffline/Egamma
cvs update -r V00-03-07 HLTriggerOffline/Egamma/python HLTriggerOffline/Egamma/test
scram b
cd HLTriggerOffline/Egamma/test

CMSSW_3_11

cmsrel CMSSW_3_11_0
cd CMSSW_3_11_0/src
cmsenv
addpkg HLTriggerOffline/Egamma V00-03-00
scram b
cd HLTriggerOffline/Egamma/test

Running the DQM step

Release > CMSSW_7_0_0_pre11

Run the DQM step and harvesting for W→eν (single electron):

./egammaHltValidate.py --follow --this-project-area --cfg=testEmDQM_cfg.py wen

This will look for a WEN relval sample for the current CMSSW version in DBS, run the egamma HLT validation on the egamma paths used in that sample, and create a ROOT file with the relval histograms.

Other supported samples are Zee (double electron), PhotonJetPt10 (single photon) and H130gg (double photon).

A specific data file to run on instead of the standard relval MC datasets can be specified with the --file option (can be given more than once):

./egammaHltValidate.py --follow --this-project-area --cfg=testEmDQM_cfg.py --file=/store/relval/path/to/your/file.root
When running on real data the --data option can be used. (This option is not really mature yet.)

CMSSW_5_X

Run the DQM step and harvesting for W→eν (single electron):

./egammaHltValidate.py --follow --this-project-area --cfg=testEmDQMFeeder_cfg.py wen

This will look for a WEN relval sample for the current CMSSW version in DBS, run the egamma HLT validation on the egamma paths used in that sample, and create a ROOT file with the relval histograms.

Other supported samples are Zee (double electron), PhotonJetPt10 (single photon) and H130gg (double photon).

A specific data file to run on instead of the standard relval MC datasets can be specified with the --file option (can be given more than once):

./egammaHltValidate.py --follow --this-project-area --cfg=testEmDQMFeeder_cfg.py --file=/store/relval/path/to/your/file.root
When running on real data the --data option can be used. (This option is not really mature yet.)

CMSSW_3_11

Run the DQM step and harvesting for W→eν (single electron):

./egammaHltValidate.py --follow --this-project-area wen
This will go to DBS, find a RelValWE sample for CMSSW_3_11_0 (if there is more than one, you will be prompted to select one) and run the DQM and harvesting steps on it.

You should then see a file WEN_3_11_0.root in the working directory which contains the histograms which normally appear on the DQM relval server.

You can look at the produced histograms with ROOT or get a quick text summary of the efficiencies per filter module by running

./egammaHltPrintRelvalEfficiencies.py --ignore-empty WEN_X_Y_Z.root
if you want to get an even shorter summary printing just the efficiency of the paths, do:
./egammaHltPrintRelvalEfficiencies.py --ignore-empty --summary WEN_X_Y_Z.root

Tools (other than the DQM gui)

Most of these tools print a small help text with available options when run with a single command line argument -h.

  • HLTriggerOffline/Egamma/test/egammaHltPrintRelvalEfficiencies.py: A tool to print a summary of efficiencies (and number of events) in the histograms found in an DQM histogram file. This is useful to get a quick overview if and at which step the histogramming procedure failed (i.e. the efficiency suddenly drops to zero from some module on).

Generating old-style python configuration files

In order to generate a set of (old-style) configuration files from the MC menu, one can use the tool HLTriggerOffline/Egamma/test/makePerPathConfigFiles.py. This creates a set of configuration files which can be included from HLTriggerOffline/Egamma/python/EgammaValidation_cff.py. Example usage:

cd $CMSSW_BASE/src/HLTriggerOffline/Egamma/test 
cvs update -A makePerPathConfigFiles.py
./makePerPathConfigFiles.py testdir

where testdir is a directory which will be created with one configuration file per path (which is not skipped). These files can then be moved to the python subdirectory but must be added by hand in HLTriggerOffline/Egamma/python/EgammaValidation_cff.py: paths.Wenu, paths.Zee, paths.GammaJet and paths.DiGamma must be adapted and pathlumi must have one entry per path which can e.g. be achieved by doing:

for path in paths.Wenu + paths.Zee + paths.GammaJet + paths.DiGamma:
    pathlumi[path] = '8e29'

Frequently Asked Questions (FAQ)

Which histogram (produced by the final harvesting step) contains what ?

A directory (inside the produced ROOT file) with histograms for E/gamma HLT validation typically looks like:

DQMData/Run 1/HLT/Run summary/HLTEgammaValidation/<path_name>/

where the run number is 1 for MC and a real run number of data. <path_name> is derived from the path being monitored, starting with HLT_ and ending with _DQM.

In each path's directory, there are several histograms:

histogram name description
global histograms
final_eff_vs_<var>  
gen_<var>  
reco_<var>  
total_eff, total_eff_MC_matched  
efficiency_by_step, efficiency_by_step_MC_matched  
per-module histograms
<moduleName><var>{_isolation,}{_all,_MC_matched,}
efficiency_<moduleName>_vs_<var>{_isolation,}{_all,_MC_matched,} efficiency histograms produced by the harvesting step (EmDQMPostProcessor.cc). The denominator are the bins of the histogram gen_<var> (e.g. for MC) or reco_<var> (e.g. for data).

In the above table, <var> can be et, eta or phi. {a,b} means 'a or b', e.g. {_all,_MC_matched,} means _all or _MC_matched or the empty string (note the last comma).

Note that _isolation histograms do not exist for all modules, only for those where it makes sense. Also, there are (currently) no efficiency_ histograms for _isolation histograms and for histograms with the suffix _all.

A python script (for illustration purposes) which generates all names for a given example path can be downloaded here.

MC matched histograms

  • The MC-matched histograms contain only HLT candidate objects that match a generator particle (electron or photon) within some ΔR.
  • In the non-MC-matched plots all trigger candidates are filled.

In the standard relval production the differences (between 'all' and 'MC matched') are usually small: there's few candidates around that are not corresponding to real electrons e.g. from the W/Z. It will typically show you spuriously low efficiencies for the early selection steps (L1 matching, cluster Et) as these are objects, where there is noticeable contamination even in signal MC.

Histograms

Despite the automatic menu building, some histograms in the DQM harvesting output are empty/have zero efficiency. What went wrong ?

  • check for you were using the correct HLT process name. Usually this is HLT but e.g. in the case of reprocessing, the name can be different (e.g. REHLT). You may want to check with edmDumpEventContent.

  • especially for pre-releases, check whether the HLT path itself actually accepted some events. You may find cms.print-hlt-summary.py useful for this task.

How can I get an efficiency greater than 100% for the regional match filter ?

In other words: how can more events pass the regional match filter than passed the level 1 seed (which is typically just before the regional match filter) ?

Most likely this is a problem related to the MC-matching which uses a maximum distance in ΔR. The η and φ resolutions of the level 1 trigger is quite coarse and it is possible for an event to have ΔR(Level 1) > ΔR(HLT) such that the object fails the L1 seed MC match while it passes the regional match filter ΔR cut.

Keep in mind that it is not trivial to find a perfect cut on ΔR for the MC matching of the Level 1 objects. If the cut is too strict (maximum value too small) one can get efficiencies greater than 100%, if it is too loose (maximum value too high) one starts collecting random background clusters which causes an apparent inefficiency of the Level 1 match filter.

Links

-- AndreHolzner - 15-Dec-2010

Edit | Attach | Watch | Print version | History: r21 < r20 < r19 < r18 < r17 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r21 - 2014-01-07 - ThomasReis
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback