Electron-faking-tau Veto Optimization in 2012

Introduction

This twiki provides documentation for the electron-faking-tau veto (eveto) reoptimization in autumn 2012.

The veto is a BDT made with TMVA and scripted via taumva. The veto is trained and tested on taus (signal) and electrons (background) from DY/Zleplep MC.

Contact Alex Tuna (tuna at cern dot ch) for questions are concerns about the measurement.

Plots

Plots of input variables can be found here.

For now, plots of BDT investigations can be found in the Version notes section.

Input variables

17 variables were considered for the BDT. (Please alert Alex if the descriptions are incorrect!) Some notes are included.

  • absdeltaeta : The difference between the cluster eta and the leading track eta.
    • Created by hand to mimic variables like el_deltaEta1, which are used in electron identification.
  • absdeltaphi : The difference between the cluster phi and the leading track phi.
    • See notes for absdeltaeta .
  • calcVars_corrFTrk : 1.0 / etOverPtLeadTrk, with a nVertex correction.
  • calcVars_corrCentFrac : Same as seedCalo_centFrac, with a nVertex correction applied.
  • calcVars_ChPiEMEOverCaloEME : (Leading track momentum - energy of the HCAL) divided by the energy of the ECAL.
    • E3 is part of the HCAL term.
    • "Leading track momentum" is actually sum of track momenta, but here we only have one track.
  • calcVars_EMFractionAtEMScale_moveE3 : The fraction of energy deposited in the ECAL versus the energy deposited in the ECAL + HCAL.
    • Here, E3 is moved from the HCAL term to the ECAL term by hand.
    • This version of EMFraction is better modeled than the d3pd version because E3 is mis-modeled in 2012.
  • calcVars_pi0BDTPrimaryScore : The score of the tau pi0 BDT.
    • This is not used in the training because the tight timescale does not allow for sufficient studies.
  • calcVars_PSSFraction : The fraction of energy deposited in the pressampler plus strips versus the energy deposited in the ECAL + HCAL.
  • etOverPtLeadTrk : The energy of the ECAL + HCAL divided by the momentum of the leading track.
    • Not used because it is redundant with calcVars_corrFTrk.
  • seedCalo_centFrac : The amount of ECAL energy with dR < 0.1 divded by the total ECAL energy (does not include E3).
    • Not used because it is redundant with calcVars_corrCentFrac .
  • seedCalo_hadRadius : The equivalent of emRadius, except in the HCAL.
    • Not used at high eta because the contribution of E3 is mis-modeled.
  • seedCalo_isolFrac : The amount of ECAL energy in a dR ring from 0.1 to 0.2 divded by the total ECAL energy (does not include E3).
  • seedTrk_hadLeakEt : The amount of energy in the first layer of the HCal divided by the leading track momentum.
  • seedTrk_secMaxStripEt : The maximum single strip energy in a 100x3 eta/phi window of the strips.
    • Only used in the barrel because it is not well defined in the crack and for track eta > 1.7.
  • seedTrk_secMaxStripEtOverPt : seedTrk_secMaxStripEt divided by the leading track momentum.
    • Created by hand to reduce momentum-dependence of seedTrk_secMaxStripEt.
  • seedTrk_sumEMCellEtOverLeadTrkPt : The energy of the ECAL divided by the momentum of the leading track (includes E3).
    • Not used because it is redundant with etOverPtLeadTrk and calcVars_corrFTrk.
  • TRTHTOverLT_LeadTrk : The ratio of high threshold TRT hits to low threshold hits of the leading track.
    • Not used at high eta because the TRT coverage ends near 2.00.

Some helpful links for understanding how the input variables are made:

Grooming

Before being fed to taumva, taus from the d3pd are groomed as follows:

  • pT > 20 GeV
  • eta (cluster) < 2.5
  • numTrack = 1
  • author = 1 | 3
  • pass JetBDTLoose
  • categorization:
    • signal:
      • trueTauAssoc_index >=0
    • background:
      • Leading truth object in dR cone is electron or photon (same as 2012 SF measurement)
      • No overlap with tightPP electrons

Version notes

v02-06 (2012-11-30-04h47m20s)

tau_calcVars_EMFractionAtEMScale_moveE3
tau_seedCalo_isolFrac
tau_seedTrk_hadLeakEt
tau_seedTrk_secMaxStripEt
tau_TRTHTOverLT_LeadTrk

  • 5 input variables, 10 trees.
  • Signal scores, background scores, and ROC curves
  • No trainings crash.
  • Performance of low pT bin improves without additional over-training. Good!
  • Performance of high pT bin improves but with additional over-training. Bad.
  • I am now suspicious of secMaxStripEt because I found out it is simply a raw energy, not energy / lead track pT or something like that.
  • Action item: Discuss with SM.

v02-05 (2012-11-30-04h36m22s)

tau_seedCalo_isolFrac
tau_seedTrk_hadLeakEt
tau_seedTrk_secMaxStripEt
tau_TRTHTOverLT_LeadTrk

  • 4 input variables, 10 trees.
  • Signal scores, background scores, and ROC curves
  • Low pT bin did not crash. This indicates this is at least partly correlated with available statistics.
  • Over-training is reduced further, possibly to acceptable levels. I will now focus on improving performance, which has degraded since I reduced the list of input variables.
  • Action item: Re-introduce EMFraction.

v02-04 (2012-11-30-04h19m59s)

tau_seedCalo_isolFrac
tau_seedTrk_hadLeakEt
tau_seedTrk_secMaxStripEt
tau_TRTHTOverLT_LeadTrk

  • 4 input variables, 10 trees.
  • Signal scores, background scores, and ROC curves
  • pT bin 30-60 GeV bin crashed. It appears there is a threshold between 50 and 20 trees for this configuration for crashing.
  • Over-training is significantly reduced, and performance degradation is small. I will stick with 10 trees for now.
  • Action item: Merge some pT bins to see if statistics boost helps.

v02-03 (2012-11-30-04h07m51s)

tau_seedCalo_isolFrac
tau_seedTrk_hadLeakEt
tau_seedTrk_secMaxStripEt
tau_TRTHTOverLT_LeadTrk

  • 4 input variables, 50 trees.
  • Signal scores, background scores, and ROC curves
  • NB: The plots say v02-02, but these are v02-03. If in doubt, consult the date in the URL.
  • pT bin 30-60 GeV did not crash here. Only change was increased number of trees. This behavior is not understood, but hopefully Noel's patches to taumva will help.
  • Performance is only slightly improved, at the cost of more significant over-training.
  • Action item: Go to 10 trees, keep 4 input variables.

v02-02 (2012-11-30-03h55m42s)

tau_seedCalo_isolFrac
tau_seedTrk_hadLeakEt
tau_seedTrk_secMaxStripEt
tau_TRTHTOverLT_LeadTrk

  • 4 input variables, 20 trees.
  • Signal scores, background scores, and ROC curves
  • pT bin 30-60 GeV crashed. Not clear why.
  • Performance degradation (compared to 7 input variables) is significant.
  • Over-training is reduced but still present.
  • Action item: Go back to 50 trees, keep 4 input variables.

v02-01 (2012-11-29-13h34m54s)

tau_absdeltaeta
tau_calcVars_EMFractionAtEMScale_moveE3
tau_etOverPtLeadTrk
tau_seedCalo_isolFrac
tau_seedTrk_hadLeakEt
tau_seedTrk_secMaxStripEt
tau_TRTHTOverLT_LeadTrk

  • 7 input variables, 20 trees.
  • Signal scores, background scores, and ROC curves
  • Over-training is reduced but still present. Correlated with available statistics?
  • Action item: Reduce number of input variables to see if this reduces over-training.

v02-00 (2012-11-29-13h05m27s)

tau_absdeltaeta
tau_calcVars_EMFractionAtEMScale_moveE3
tau_etOverPtLeadTrk
tau_seedCalo_isolFrac
tau_seedTrk_hadLeakEt
tau_seedTrk_secMaxStripEt
tau_TRTHTOverLT_LeadTrk

v01-00 (2012-11-29-04h46m15s)

tau_absdeltaeta
tau_absdeltaphi
tau_calcVars_ChPiEMEOverCaloEME
tau_calcVars_EMFractionAtEMScale_moveE3
tau_calcVars_PSSFraction
tau_etOverPtLeadTrk
tau_seedCalo_isolFrac
tau_seedTrk_hadLeakEt
tau_seedTrk_secMaxStripEt (not used at high eta)
tau_TRTHTOverLT_LeadTrk (not used at high eta)

  • Signal scores, background scores, and ROC curves
  • Variables expected to offer minor gains (redundant or not powerful) were removed.
  • Two trainings did not converge: pT 60--100 GeV, eta 2.00--3.00 and pT 100+ GeV, eta 2.00--3.00.
  • Action item: Reduce variable list even further. Focus on specific kinematic region (inner barrel eta bin 0.00 - 0.80).

v00-00 (2012-11-28-08h22m31s)

tau_absdeltaeta
tau_absdeltaphi
tau_calcVars_corrFTrk
tau_calcVars_corrCentFrac
tau_calcVars_ChPiEMEOverCaloEME
tau_calcVars_EMFractionAtEMScale_moveE3
tau_calcVars_pi0BDTPrimaryScore
tau_calcVars_PSSFraction
tau_etOverPtLeadTrk
tau_seedCalo_centFrac
tau_seedCalo_hadRadius
tau_seedCalo_isolFrac
tau_seedTrk_secMaxStripEt (not used at high eta)
tau_seedTrk_sumEMCellEtOverLeadTrkPt
tau_TRTHTOverLT_LeadTrk (not used at high eta)

  • First iteration
  • Signal scores, background scores, and ROC curves
  • All 15 variables considered in all regions, except secMaxStripEt and TRTHTOverLT_LeadTrk were excluded at high eta because they are unstable/undefined here.
  • A handful of trainings did not converge.
  • Among converged trainings, over-training is observed everywhere.
  • Action item: Reduce variable list significantly in hopes that that trainings will converge and over-training will be reduced. Re-introduce variables one-by-one to judge value.

Meetings

Meetings (previous e-veto)

Indico search for "Harvey Maddocks"

Links

-- AlexanderTuna - 27-Nov-2012

Edit | Attach | Watch | Print version | History: r18 | r11 < r10 < r9 < r8 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r9 - 2012-11-30 - unknown
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback