MVA track selection for Heavy Ions reconstruction

Setup

In CMSSW 7.5.0 pre5 MVA track selection is in place, but has to be turned on.

process.hiInitialStepSelector.useAnyMVA = cms.bool(True)
process.hiLowPtTripletStepSelector.useAnyMVA = cms.bool(True)
process.hiPixelPairStepSelector.useAnyMVA = cms.bool(True)
process.hiDetachedTripletStepSelector.useAnyMVA = cms.bool(True)

In earlier version one has to cherry-pick the commits

git cherry-pick c70379229c145a7cc3beed736b02d167a7c8f342
git cherry-pick 1ed94c592715db21bc2410c973f47489f44a84ed
git cherry-pick 5fdc9869f35a5cce160c7aa1f5635fae3d08e522
git cherry-pick c5c5093366b7e2d7483227a3fa709dbd8bde9762
git cherry-pick 4b19cb3d0c9e4bf270efa5baa39a466e0f1e4e99
git cherry-pick be692bcfcaa92f98d50d907879da15734eb37068
git cherry-pick d916c065ebd0693229169baca63bf2231aa9cc4e
git cherry-pick 519d275bbd5dfae8485c8e4443a6b46edec27b05

scram b -j20 after that

Also add the following lines to get payloads (if there is no in GlobalTag):

process.gbrforest = cms.ESSource("PoolDBESSource",CondDBSetup,                                                               
                                 toGet = cms.VPSet(                                                                          
        cms.PSet( record = cms.string('GBRWrapperRcd'),                                                                      
                  tag= cms.string('GBRForest_HIMVASelectorIter4_v0_offline'),                                                
                  label  = cms.untracked.string('HIMVASelectorIter4')                                                        
                  ),                                                                                                         
        cms.PSet( record = cms.string('GBRWrapperRcd'),                                                                      
                  tag= cms.string('GBRForest_HIMVASelectorIter5_v0_offline'),                                                
                  label  = cms.untracked.string('HIMVASelectorIter5')                                                        
                  ),                                                                                                         
        cms.PSet( record = cms.string('GBRWrapperRcd'),                                                                      
                  tag= cms.string('GBRForest_HIMVASelectorIter6_v0_offline'),                                                
                  label  = cms.untracked.string('HIMVASelectorIter6')                                                        
                  ),                                                                                                         
        cms.PSet( record = cms.string('GBRWrapperRcd'),                                                                      
                  tag= cms.string('GBRForest_HIMVASelectorIter7_v0_offline'),                                                
                  label  = cms.untracked.string('HIMVASelectorIter7')                                                        
                  )                                                                                                          
                                                                                                                             
        ),                                                                                                                   
        connect =cms.string('frontier://FrontierProd/CMS_CONDITIONS')
)                                                                                                                            
                                                                                                                             
process.es_prefer_forest = cms.ESPrefer("PoolDBESSource","gbrforest")   

Don't look below... In order to get the latest code with MVA track selection run the following commands.

cmsrel CMSSW_7_5_0_pre3
cd CMSSW_7_5_0_pre3/src/
cmsenv
git cms-init
git remote add stas https://github.com/istaslis/cmssw.git
git fetch stas
git checkout -b hiMVAtrackSelector75X stas/hiMVAtrackSelector75X
git cms-addpkg RecoHI/HiTracking
scram b -j20

The following code is needed in cfg since MVA calibrations are not in the GT, and MVA is turned off by default.

process.gbrforest = cms.ESSource("PoolDBESSource",CondDBSetup,                                                               
                                 toGet = cms.VPSet(                                                                          
        cms.PSet( record = cms.string('GBRWrapperRcd'),                                                                      
                  tag= cms.string('GBRForest_HIMVASelectorIter4_v0_offline'),                                                
                  label  = cms.untracked.string('HIMVASelectorIter4')                                                        
                  ),                                                                                                         
        cms.PSet( record = cms.string('GBRWrapperRcd'),                                                                      
                  tag= cms.string('GBRForest_HIMVASelectorIter5_v0_offline'),                                                
                  label  = cms.untracked.string('HIMVASelectorIter5')                                                        
                  ),                                                                                                         
        cms.PSet( record = cms.string('GBRWrapperRcd'),                                                                      
                  tag= cms.string('GBRForest_HIMVASelectorIter6_v0_offline'),                                                
                  label  = cms.untracked.string('HIMVASelectorIter6')                                                        
                  ),                                                                                                         
        cms.PSet( record = cms.string('GBRWrapperRcd'),                                                                      
                  tag= cms.string('GBRForest_HIMVASelectorIter7_v0_offline'),                                                
                  label  = cms.untracked.string('HIMVASelectorIter7')                                                        
                  )                                                                                                          
                                                                                                                             
        ),                                                                                                                   
        connect =cms.string('frontier://FrontierProd/CMS_CONDITIONS')
)                                                                                                                            
                                                                                                                             
process.es_prefer_forest = cms.ESPrefer("PoolDBESSource","gbrforest")   



process.hiInitialStepSelector.useAnyMVA = cms.bool(True)
process.hiLowPtTripletStepSelector.useAnyMVA = cms.bool(True)
process.hiPixelPairStepSelector.useAnyMVA = cms.bool(True)
process.hiDetachedTripletStepSelector.useAnyMVA = cms.bool(True)   
                                 

For the simple check one can open the output root file root -l step3_RAW2DIGI_L1Reco_RECO_nomva.root and run

Events->Draw("floatedmValueMap_hiLowPtTripletStepQual_MVAVals_RERECO.obj.values_","(recoTracks_hiLowPtTripletStepQual__RERECO.obj.qualityMask()&(1<<2))!=0")
which displays MVA output values for highPurity tracks for lowpt step and must be cut at 0.35

In order to get tight and highPurity working points in the analyser (for fakerate only)

process.anaTrack.qualityStrings = cms.untracked.vstring(['highPurity','tight','loose'])

Todo

  1. Train mva tree for some algorithm (e.g. Second) with variables from MultiTrackSelector.cc. (see below)
  2. Check that output (track pt distribution) from MultiTrackSelector is the same as from TMVA.
  3. Train all other algorithms and make general eff/fr plot for all algorithms.

Deadlines

  • CMSSW_7_5_0_pre1: 9 March 2015
  • CMSSW_7_5_0_pre2: 30 March 2015
  • CMSSW_7_5_0_pre3: 13 April 2015
  • CMSSW_7_5_0_pre4: 27 April 2015
  • CMSSW_7_5_0_pre5: 11 May 2015 (Last open release)
  • CMSSW_7_5_0_pre6: 25 May 2015 (Bug fix release as needed
  • CMSSW_7_5_0: 8 June 2015

CMSSW usage

The MultiTrackSelector class in CMSSW 7 defines track quality. It can be defined by cuts (if no MVA info is provided) or by decision tree(s).

The list of variables used is listed here: MultiTrackSelector.cc, Line 599

They are:

  • tmva_lostmidfrac_
  • tmva_minlost_
  • tmva_nhits_
  • tmva_relpterr_
  • tmva_eta_
  • tmva_chi2n_no1dmod_
  • tmva_chi2n_
  • tmva_nlayerslost_
  • tmva_nlayers3D_
  • tmva_nlayers_
  • tmva_ndof_

GBRForest

MultiTrackSelector class uses GBRForest class ( github) to store the set of decision trees. This file could be read from file or database. In order to produce the file with GBRForest from the TMVA xml output one should use GBRForestWriter from RecoMET /METPUSubtraction module. The following code is a simplified version of writeGBRForests_cfg.py. It reads file TMVAClassification_BDT.weights.xml with 4 variables 'var1+var2','var1-var2','var3','var4' and 2 spectators 'var1*2','var1*3' and saves the GBRForest object to GBRForestfile.root file with the label HITrackMVAForest.

import FWCore.ParameterSet.Config as cms
process = cms.Process("writeGBRForests")
# CV: needs to be set to 1 so that GBRForestWriter::analyze method gets called exactly once
process.maxEvents = cms.untracked.PSet( input = cms.untracked.int32(1) )

process.source = cms.Source("EmptySource")
process.load('Configuration/StandardSequences/Services_cff')

process.gbrForestWriter = cms.EDAnalyzer("GBRForestWriter",
    jobs = cms.VPSet(
        cms.PSet(
            inputFileName = cms.string('TMVAClassification_BDT.weights.xml'),
            inputFileType = cms.string("XML"),
            inputVariables = cms.vstring(['var1+var2','var1-var2','var3','var4']),
            spectatorVariables = cms.vstring(['var1*2','var1*3']),
            gbrForestName = cms.string("HITrackMVAForest"),
            outputFileType = cms.string("GBRForest"),
            outputFileName = cms.string("GBRForestfile.root")
        )
    )
)
process.p = cms.Path(process.gbrForestWriter)

In order to use this file from MultiTrackSelector in one of steps:

process.detachedTripletStepSelector.GBRForestFileName = cms.string("GBRForestfile.root")     
process.detachedTripletStepSelector.GBRForestLabel = cms.string("HITrackMVAForest")

the file can contain many forests for each step with different GBRForestLabel.

Response function

GetClassifier function in the GBRForest class defines response function of all the output from trees. It can be different from one used in MethodBDT. Probably there is a simple mapping.

22_02_2015 training

Trained three BDTG's, one for each algorithm with the following parameters: "!H:!V:NTrees=1000:MinNodeSize=2.5%:BoostType=Grad:Shrinkage=0.10:UseBaggedBoost:BaggedSampleFraction=0.5:nCuts=20:MaxDepth=3"

Variables Used (trkLostMidFrac is always 0 in the Forest so I didn't use it):

  • trkMinLost
  • trkNHit
  • relPtErr := TMath::Abs(trkPtError/trkPt)
  • trkEta
  • Chi2perDOF := trkChi2/trkNdof
  • trkNlayersLost
  • trkNlayer3D
  • trkNlayer
  • trkNdof
Samples Used:
  • trkAlgo = 4 or trkAlgo = 5
    • /mnt/hadoop/cms/store/user/dgulhan/HiForest_Dijet_pthat80_740pre6_v2/HiForest_100_1_oCa.root (50 events, training)
    • /mnt/hadoop/cms/store/user/dgulhan/HiForest_Dijet_pthat80_740pre6_v2/HiForest_101_1_CET.root (50 events, training)
    • /mnt/hadoop/cms/store/user/dgulhan/HiForest_Dijet_pthat80_740pre6_v2/HiForest_102_1_LLd.root (50 events, testing)
    • /mnt/hadoop/cms/store/user/dgulhan/HiForest_Dijet_pthat80_740pre6_v2/HiForest_103_1_IfN.root (50 events, testing)
  • trkAlgo = 6
    • /mnt/hadoop/cms/store/user/dgulhan/HiForest_Dijet_pthat80_740pre6_v2_merged/Hiforest_Dijet_pthat80_740pre6_merged.root (5800 events, split randomly 50/50 into training/testing trees)
Output File Locations:
  • Trees: /mnt/hadoop/cms/store/user/abaty/MVATrackSelection/TMVA_algo*.root
  • BDT weight files for implementation: /net/hisrv0001/home/abaty/TMVA/CMSSW_6_2_12_patch1/src/TMVA-v4.2.0/test/weightsStore/training_2_22_2015/
Suggested cuts (can change for our own needs):
  • BDTG4 > -0.047
  • BDTG5 > -0.02
  • BDTG6 > -0.041

Training Update 04/03/2015

Added the following variables:

  • trkDxy1/trkDxyError1
  • trkDz1/trkDzError1
  • trkPt (Low pt step only, trkAlgo==5)

Switched to new 74x sample that has a correct beam spot:

  • trkAlgo = 4 or trkAlgo = 5
    • /mnt/hadoop/cms/store/user/dgulhan/HiForest_Dijet_pthat80_740pre6_BS/HiForest_100_1_C7l.root (50 events, training)
    • /mnt/hadoop/cms/store/user/dgulhan/HiForest_Dijet_pthat80_740pre6_BS/HiForest_101_1_MtA.root (50 events, training)
    • /mnt/hadoop/cms/store/user/dgulhan/HiForest_Dijet_pthat80_740pre6_BS/HiForest_103_1_RG9.root (50 events, testing)
    • /mnt/hadoop/cms/store/user/dgulhan/HiForest_Dijet_pthat80_740pre6_BS/HiForest_104_1_L0g.root (50 events, testing)
  • trkAlgo = 6
    • /mnt/hadoop/cms/store/user/dgulhan/HiForest_Dijet_pthat80_740pre6_BS_merged/HiForest_Dijet_pthat80_740pre6_BS.root (5800 events, split randomly 50/50 into training/testing trees)

Output Files (suggested cuts now in the weights folder):

OR /net/hisrv0001/home/abaty/TMVA/CMSSW_6_2_12_patch1/src/TMVA-v4.2.0/test/weightsStore/training_3_4_2015

-- StanislavLisniak - 2015-02-09

Edit | Attach | Watch | Print version | History: r8 < r7 < r6 < r5 < r4 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r8 - 2015-06-19 - StanislavLisniak
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback