"Cuts in Categories" (CiC) Electron Identification

Introduction

This electron identification method was developed in the e/gamma POG and has been available in CMSSW since version 1.6. It is simply a set of cuts optimized to select electrons from Z or W decay and reject fakes from jets or conversions, however, the technique has also been employed to select electrons from other sources, for example from J/psi.

It is possible to get a very clean sample of identified electrons in CMS but the cost in efficiency is high. For most analysis, we want to get the highest efficiency possible yet reduce the effect of fake electrons to the point that they don't increase errors for this analysis.

Efficient electron ID in CMS is quite different from ID in many other experiments due to the large and varying amount of material in the tracker and the high magnetic field. We have found that many of the features of the electron ID problem in CMS can be dealt with by dividing the problem into categories. The first and most basic division is between barrel and endcap where both the ECAL and Tracker change considerably and different cuts must be applied. The second or fbrem vs. E/p categorization deals with the large amount of radiation in tracker material and the significant probability that the track will not be well measured. Finally we find that there is a great deal of difference between low ET electrons and high ET electrons due to the very high magnetic field so we implement cuts in a few different ET regions (from version V06 or CMSSW_3_8_5 the ET bins are replaced by cuts varying continuously with the transverse energy).

The Cut Variables

The selection is performed with cuts (one cut-value for each category) on the following variables:

  • Track match with ECAL: delta_phi_in, delta_eta_in, e_seed/p_in
  • HCAL energy directly behind ECAL cluster: H/E
  • Cluster Shape: sigma_ieta_ieta
  • Track vertex: Impact Parameter w.r.t. reco vertex
  • Conversion Rejection number of missing hits near beginning of track (also rejects really bad tracks), dcot and dist variables
  • Isolation: Tracker Isolation (0.3), ECAL Isolation (jurassic 0.4), HCAL Isolation (0.4)

The Outputs

The algorithm gives as output a bit pattern for each electron candidate for 9 defined severity levels of cuts. There is one bit for electron ID a second one for electron Isolation then one for conversion rejection and the last one for impact parameter. The cut severity levels are called

  • VeryLoose
  • Loose
  • Medium
  • Tight
  • SuperTight
  • HyperTight1
  • HyperTight2
  • HyperTight3
  • HyperTight4

The names would give an indication of how severe the cuts are for selecting di-leptons from Z. The HyperTight cuts might be appropriate for selecting single leptons without much help from missing ET. In any case, each step of cut severity decreases the fake rate by about a factor of two for ET>20 GeV.

Performance

Electron Efficiency

The selection efficiency for the Cuts in Categories Electron ID was evaluated using a sample including both Z ->ee and W->enu events. For electrons with ET of 30 GeV the efficiency in the barrel is greater than 97% for the “VeryLoose” while it is greater than 75 % for the “Hyper Tight1”. In the two mentioned selections the efficiency reaches a plateau at 99% and 91% for electrons with ET > 40 GeV. Our optimization procedure maximizes the overall efficiency for the signal sample. In the endcap, where the background is higher, the procedure sets the cuts tighter than in the barrel and the resulting signal efficiency is lower.

Fake Rate

The rate in fake electrons is shown below. In the barrel, the estimated fake rate goes from 1% at 12 GeV to 10% for high energy electrons in the “Very-Loose” selection. It is greatly reduced for “Hyper Tight1” selection: 0.05% at 12 GeV to a 0.5% for higher energy electrons. Each intermediate level of selection provides about a factor 2 rejection on the background. The fake rate is lowest at low ET where the signal to background ratio is worst and cuts must be the tightest. Similarly in the endcap, S/B is worse than in the barrel so cuts must be tighter and the efficiency and fake rate drop. It is important to note that the QCD di-jet sample used in these studies has been pre-selected at generator level to enhance its electromagnetic component. The pre-selection included a requirement for at least one electromagnetic cluster with energy above 20 GeV in the event hence underestimating the background fraction for ET < 20 GeV. In the latest versions of the selections we have improved the performance of the selection at low pT allowing to select efficiently electons down to 10 GeV, anyway at lower energies the selection is still effective. A binned QCD sample has been used to optimize correctly the cuts in the lowest energy range.

What's it good for?

This electron ID should be generally useful for electrons from W, Z, and top, in an ET range between 5 and perhaps 500 GeV. We have extended it to very low ET for J/psi and Higgs to 4 leptons, however, the cut values currently in CMSSW used only W and Z as signal in the optimization. The cut values can be changed with a configuration file so, in principle, more physics channels can be accommodated. There was no attempt made to be efficient for electrons from B decay but we are told ID cuts without isolation cuts have some reasonable efficiency.

Categories

We divide the electron candidates between barrel and endcap at the ECAL boundary.

The fbrem vs. E/p categorization

The fbrem vs. E/p categorization separates electrons with quite different measurement characteristics and with very different signal to background ratios, S/B. Some electron tracks are measured to lose significant energy in the tracker material and thus are very unlikely to be fakes from particles that do not radiate. These electrons are particularly well identified if the track momentum and the ECAL energy match indicating that both are well measured. We can maintain high efficiency on these electrons by not applying cuts that are too tight. On the other hand some electrons do not radiate much energy in the inner parts of the tracker and are thus not well separated from normal charged hadrons that are plentiful. Since the thickness of the tracker material varies, these non-radiating electrons are an important fraction of our best measured electrons and must be accepted. To reduce the background due to charged particles and overlaps, some tight cuts need to be placed on these electron candidates, and since they are well measured tight cuts are possible. One particularly important cut removes low E/pin candidates that likely come from charged hadrons. This is a useful feature of the CMS ECAL where hadrons are about half as effective in producing light as are electrons of the same energy. A fairly large fraction of electrons also have a track that is mis-measured, primarily due to large energy loss in the tracker at low radius. This category of electron might be faked by an overlap between a lower momentum charged particle and a high ET pi-zeros. To help reduce fakes yet keep reasonable efficiency, we need to place some tighter cuts in this category, but since the track may not be well measured, the delta_phi_in cut, for example, shouldn't be tightened.

An example of the classification plot for signal and background in the barrel is shown below:

class_signal_barrel.jpg class_bg_barrel.jpg

At this time, we distinguish three categories of electron candidates:

  • Low-Brem (green): (0.9 < E/pin < 1.2 - fbrem < 0.12 (barrel), 0.82 < E/pin < 1.22 - fbrem < 0.2 (endcap)), fake-like region with high population from both real and fake electrons,
  • Bremming (blue): (0.9 < E/pin < 1.2 - fbrem > 0.12 (barrel), 0.82 < E/pin < 1.22 - fbrem > 0.2 (endcap)), electrons-like region with little contamination from fakes,
  • Bad-Track (red): (remaining regions), region with not many real electrons, but too many just to cut out.

The code is setup to further categorize pure tracker-driven electrons and electrons in cracks.

ET Dependence

Low ET electrons can dominate the fake rate if measures are not taken to deal with the ET dependence. At the same time, we want good efficiency and simply cutting out low ET electrons would be a bad choice for many analyses. Therefore, the cuts are continuosly varying as a function of ET. Clearly life would be simpler if we would just require ET>30, but that will not ultimately be what CMS needs. From version V06 we have decreased the electron threshold down to 10 GeV improving the performance of the selection in the low ET region.

We already have cuts for the J/psi region exploiting both the ECAL driven and the tracker driven electrons. These low ET cuts work very well. We essentially eliminate background when looking for J/psi by requiring the Loose selection for each electron. Of course the efficiency of the cuts is somewhat lower than for 30 GeV electrons.

Cut Optimization

What are we trying to optimize?

The cuts are optimized to give the best signal to background ratio for single electrons. While the cuts are optimized in categories, the method assures that individual electrons all have a s/b larger than some given value, thus the cuts are matched in the barrel and endcap varying linearly as a function of 1/ET. The cuts for ID and isolation are also matched to accept the same s/b for a single electron.

Optimization Method

A fairly simple procedure to set the cuts is used:

  • produce the “n-1” distributions for each cut variable, that is, distributions where all the cuts have been applied save the one for the plotted variable.
  • use a very safe procedure to fit a smooth curve to the background to signal ratio as a function of this one cut variable in question.
  • set the cut for this variable at a pre-specified value of b/s.
  • set cuts using the same b/s specification for all the cut variables and iterate a few times.

In this way, each cut is removing events with the same purity, that is the same signal to background ratio, thus the overall purity of accepted electrons is maximized for a given efficiency. This clearly means that cuts will be “tighter” in the bad-track category, than in the bremming-electron category, and tighter in the endcap than in the barrel. It has been found that this procedure gives a stable result even for very low statistics signal and background samples.

The result of this procedure to set the cuts must depend on the signal and background samples chosen. For our baseline electron id cuts, we have used di-electrons with masses above 40 GeV/c which is dominated by the Z resonance as signal. For background, we have used di-jet events. The cut selection is done at the single electron level. With this choice of signal and background, the background to signal ratio decreases rapidly with ET for ET less than about half the Z mass.

At this time, we have determined cuts for 9 different b/s values. Each level of cuts reduces the background rate by about a factor of 2, thus the cuts range from very high efficiency that might be useful in multilepton events, to very large background rejection that might be useful to detect single electrons.

Currently we are reviewing rho-based corrected isolation cuts in categories.

Checking the Optimization

One advantage of this method is that we have the ability to both check and understand the performance of the cuts by looking at the n-1 plots used in the optimization. There are a lot of plots but they can all be viewed rather simply as a single web page. One can see which variables are powerful and which are doing essentially nothing. Of course these plots are viewed to make sure there were no problems with fits for cut setting. An example of a web page showing all the n-1 cuts, for ET>50 GeV, is given here. Please only view this with a good network connection and a computer with sufficient memory.

Data and MC optimization

The current selectionhas been optimized for electrons down to 10 GeV and due to the lack of MC background statistic we have tried two strategies. The first option was to set the cuts using background electrons selected from data (electrons from unprescaled triggers vetoing Z and W events). Here it has been possible to set even the tightest working points because we had plenty of background elections (so we have up to HT4). The second option was the usage of QCD MC and in this case the limited MC statistic did not allow to set the cuts further HT1.

Actually there is not much difference in the final performance but if you need to go down in ET we suggest to use the DataTuning.

Special Optimization for HZZ searches

A special selection has been released using only ID and conversion rejection cuts. Below you can find the performance (efficiency and fakerate) for different working points, available from VeryLoose to HyperTight1. See below for instructions on how to use it.


Why not just throw the problem into a multivariate blender?

The software used to optimize the cuts also produces a likelihood function for each level of the cuts. For looser cuts, this likelihood function performs somewhat better than the fixed cuts but we find that for the tighter cuts, the likelihoods don't improve over the cuts. There may be some value to using these likelihoods later, however, we want to understand as many features of the electron ID as possible because one can often do better than the blender method when things are understood. At least for the early data analysis, the cuts make sense. We believe that the above method augmented by a likelihood will perform as well as any multivariate analysis.

What about di-electrons?

As stated above, the cuts optimize S/B for single electrons. This can be simply applied to di-electrons, however, one will find that S/B is lower for events with both electrons in the endcap than for an event with both electrons in the barrel. While the barrel and the endcap are cut at the same s/b value for each electron, still the overall S/B for all electrons in the endcap is much lower than in the barrel. One could improve this situation simply by applying a more severe cut level for endcap-endcap di-electrons than for barrel-barrel or barrel-endcap.

How to use it in CMSSW

CMSSW >35X

9 different set of cuts are currently provided:

  • VeryLoose
  • Loose
  • Medium
  • Tight
  • SuperTight
  • HyperTight1
  • HyperTight2
  • HyperTight3
  • HyperTight4

In order to save space in the standard event content only the loose and tight cuts are computed by default. If you want to try a different type you need to include to:

  1. check out the tag:
    V00-03-07-08 RecoEgamma/ElectronIdentification
    (in case you want to use the last cuts in CMSSW_3_6_X and CMSSW_3_5_X)
    V00-03-13 RecoEgamma/ElectronIdentification
    (in case you want to use the last cuts in CMSSW_3_7_X)
    V00-03-14-03 RecoEgamma/ElectronIdentification
    (in case you want to use the last cuts in CMSSW_3_8_X)
    V00-03-32 RecoEgamma/ElectronIdentification
    (in case you want to use the last cuts in CMSSW >= 3_9_X)

  1. in the python directory of the package you can find the
    cutsInCategoriesElectronIdentificationV06_cfi.py(sequence names: eidXXXMC)
  2. there are also the corresponding optimization based on data
    cutsInCategoriesElectronIdentificationV06_DataTuning_cfi.py(sequence names: eidXXX)
    and
    cutsInCategoriesFixedIsolationElectronIdentificationV06_DataTuning_cfi.py
  3. if you want to use the optimization for HZZ searches take
    cutsInCategoriesHZZElectronIdentificationV06_cfi.py (sequence names: eidHZZXXX)

The following example shows how to access the results in the corresponding ValueMap:

//Read selectrons
edm::Handle<reco::GsfElectronCollection> electrons ;
e.getByLabel(electronProducer_,electrons) ;

//Read eID results
edm::Handle<edm::ValueMap<float> > eIDValueMap;
e.getByLabel("my_preferred_eID", eIDValueMap);
const edm::ValueMap<float> & eIDmap = * eIDValueMap;

// Loop over electrons
for (unsigned int i = 0; i < electrons->size(); i++){
   edm::Ref<reco::GsfElectronCollection> electronRef(electrons,i);
   std::cout << "Event " << e.id() << " eID = " << eIDmap[electronRef] << std::endl;
}

The returned float can be interpreted as a bit pattern:

  • 0 - no cut passed
  • 1 - eID cuts passed
  • 2 - iso cuts passed
  • 4 - conversion rejection
  • 8 - ip cut

Below three examples to check eId results:

eID+Iso+ConversionRejection+IP -> ((eIDmap[electronRef] &15) == 15)
Iso only -> ((eIDmap[electronRef] & 2) == 2)
eID+ConversionRejection+IP -> ((eIDmap[electronRef] & 13) == 13)

Structure of the Configuration file

CMSSW >= 3_8_X

The configuration file is structured as follows. Each eID selection has its own block starting with:
eidLoose = eidCutBasedExt.clone()
eidLoose.electronIDType = 'classbased'
eidLoose.electronQuality = 'xxxxxxxx'
...

Within each selection block variable cuts are defined in sets of vdouble (let's take detain as an example):

cutdetain = cms.vdouble(1.37e-02, 9.33e-03, 2.57e-02, 2.92e-02, 5.14e-02, 2.89e-02, 4.00e-02, 3.08e-02, 3.20e-02),
cutdetainl = cms.vdouble(1.29e-02, 7.58e-03, 2.57e-02, 2.45e-02, 8.16e-02, 2.55e-02, 1.89e-02, 1.40e-01, 2.77e-02),

Each cut has two sets of thresholds the upper one (cutdetain in our example set at 40 GeV) and the lower one (cutdetainl, note the final l in the name, set at 10 GeV). The actual cut value will be derived by interpolating between these two.

The nine numbers correspond to the different cuts for the electron in the following category:

number electron type region
1 bremming electron barrel
2 lowbrem electron barrel
3 badtrack electron barrel
4 bremming electron endcap
5 lowbrem electron endcap
6 badtrack electron endcap
7 crack electron barrel
8 crack electron endcap
9 pure tracker-driven electron -

The number returned by CutBasedElectronID::classify(..) in RecoEgamma/ElectronIdentification/src/CutBasedElectronID.cc is the number in the above table minus one (??).

CMSSW <= 3_7_X

The configuration file is structured as follows. Each eID selection has its own block starting with:
eidLoose = eidCutBasedExt.clone()
eidLoose.electronIDType = 'classbased'
eidLoose.electronQuality = 'xxxxxxxx'
...

Within each selection block variable cuts are defined in sets of vdouble (let's take detain as ana example):

cutdetain = cms.vdouble(
9.89e-03, 4.84e-03, 1.46e-02, 1.46e-02, 9.02e-03, 1.72e-02, 1.37e-02, 4.77e-02, 2.75e-02,
9.67e-03, 3.77e-03, 9.24e-03, 1.30e-02, 6.66e-03, 1.23e-02, 1.25e-02, 2.28e-02, 1.12e-02,
1.06e-02, 3.80e-03, 8.97e-03, 1.39e-02, 6.67e-03, 1.22e-02, 1.22e-02, 1.93e-02, 2.39e-03
),

Each row corresponds to ET bins (ET>30, 20<ET<30, ET<20) and the nine numbers correspond to the different cuts for the electron in the following category:

  1. bremming electron barrel
  2. lowbrem electron barrel
  3. badtrack electron barrel
  4. bremming electron endcao
  5. lowbrem electron endcap
  6. badtrack electron endcap
  7. crack electron barrel
  8. crack electron endcap
  9. pure tracker-driven electron

Links

-- MatteoSani - 08-Mar-2010

-- JimBranson - 18-Mar-2010

Topic attachments
I Attachment History Action Size Date Who Comment
JPEGjpg bg_fake_vs_eta_category_based.jpg r2 r1 manage 53.0 K 2010-04-06 - 23:04 MatteoSani  
JPEGjpg bg_fake_vs_pt_category_based_barrel.jpg r1 manage 138.5 K 2010-03-10 - 17:37 MatteoSani  
Texttxt categoryBasedElectronIdentificationV02_cfi.py.txt r1 manage 24.5 K 2010-03-23 - 19:17 MatteoSani  
JPEGjpg class_bg_barrel.jpg r1 manage 205.9 K 2010-03-10 - 17:30 MatteoSani  
JPEGjpg class_signal_barrel.jpg r1 manage 148.6 K 2010-03-10 - 16:43 MatteoSani  
JPEGjpg eff1.jpg r1 manage 172.0 K 2010-09-06 - 15:16 MatteoSani  
JPEGjpg eff2.jpg r1 manage 207.9 K 2010-09-06 - 15:16 MatteoSani  
JPEGjpg eff_vs_eta_category_based.jpg r1 manage 142.9 K 2010-03-10 - 16:49 MatteoSani  
JPEGjpg eff_vs_pt_category_based_barrel.jpg r1 manage 138.7 K 2010-03-10 - 16:49 MatteoSani  
JPEGjpg fake1.jpg r1 manage 205.6 K 2010-09-06 - 15:15 MatteoSani  
JPEGjpg fake2.jpg r1 manage 201.0 K 2010-09-06 - 15:16 MatteoSani  
PNGpng fake_vs_eff.png r1 manage 6.5 K 2010-04-12 - 11:25 MatteoSani  
PNGpng hzz1.png r1 manage 21.0 K 2011-06-09 - 11:44 MatteoSani  
PNGpng hzz1_fake.png r1 manage 18.3 K 2011-06-09 - 14:43 MatteoSani  
PNGpng hzz2.png r1 manage 17.4 K 2011-06-09 - 11:44 MatteoSani  
PNGpng hzz2_fake.png r1 manage 21.4 K 2011-06-09 - 14:43 MatteoSani  
PNGpng hzz3.png r1 manage 19.9 K 2011-06-09 - 11:44 MatteoSani  
PNGpng hzz3_fake.png r1 manage 21.7 K 2011-06-09 - 14:43 MatteoSani  
JPEGjpg ww1.jpg r1 manage 179.2 K 2010-09-06 - 15:15 MatteoSani  
JPEGjpg ww2.jpg r1 manage 190.4 K 2010-09-06 - 15:15 MatteoSani  
JPEGjpg ww3.jpg r1 manage 194.9 K 2010-09-06 - 15:15 MatteoSani  
JPEGjpg ww4.jpg r1 manage 181.8 K 2010-09-06 - 15:15 MatteoSani  
Edit | Attach | Watch | Print version | History: r55 < r54 < r53 < r52 < r51 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r55 - 2016-06-22 - BibhuprasadMahakud1
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback