Difference: MVATrackSelection (1 vs. 8)

Revision 82015-06-19 - StanislavLisniak

Line: 1 to 1
 
META TOPICPARENT name="StanislavLisniak"

MVA track selection for Heavy Ions reconstruction

Setup

Added:
>
>
In CMSSW 7.5.0 pre5 MVA track selection is in place, but has to be turned on.

process.hiInitialStepSelector.useAnyMVA = cms.bool(True)
process.hiLowPtTripletStepSelector.useAnyMVA = cms.bool(True)
process.hiPixelPairStepSelector.useAnyMVA = cms.bool(True)
process.hiDetachedTripletStepSelector.useAnyMVA = cms.bool(True)

In earlier version one has to cherry-pick the commits

git cherry-pick c70379229c145a7cc3beed736b02d167a7c8f342
git cherry-pick 1ed94c592715db21bc2410c973f47489f44a84ed
git cherry-pick 5fdc9869f35a5cce160c7aa1f5635fae3d08e522
git cherry-pick c5c5093366b7e2d7483227a3fa709dbd8bde9762
git cherry-pick 4b19cb3d0c9e4bf270efa5baa39a466e0f1e4e99
git cherry-pick be692bcfcaa92f98d50d907879da15734eb37068
git cherry-pick d916c065ebd0693229169baca63bf2231aa9cc4e
git cherry-pick 519d275bbd5dfae8485c8e4443a6b46edec27b05

scram b -j20 after that

Also add the following lines to get payloads (if there is no in GlobalTag):

process.gbrforest = cms.ESSource("PoolDBESSource",CondDBSetup,                                                               
                                 toGet = cms.VPSet(                                                                          
        cms.PSet( record = cms.string('GBRWrapperRcd'),                                                                      
                  tag= cms.string('GBRForest_HIMVASelectorIter4_v0_offline'),                                                
                  label  = cms.untracked.string('HIMVASelectorIter4')                                                        
                  ),                                                                                                         
        cms.PSet( record = cms.string('GBRWrapperRcd'),                                                                      
                  tag= cms.string('GBRForest_HIMVASelectorIter5_v0_offline'),                                                
                  label  = cms.untracked.string('HIMVASelectorIter5')                                                        
                  ),                                                                                                         
        cms.PSet( record = cms.string('GBRWrapperRcd'),                                                                      
                  tag= cms.string('GBRForest_HIMVASelectorIter6_v0_offline'),                                                
                  label  = cms.untracked.string('HIMVASelectorIter6')                                                        
                  ),                                                                                                         
        cms.PSet( record = cms.string('GBRWrapperRcd'),                                                                      
                  tag= cms.string('GBRForest_HIMVASelectorIter7_v0_offline'),                                                
                  label  = cms.untracked.string('HIMVASelectorIter7')                                                        
                  )                                                                                                          
                                                                                                                             
        ),                                                                                                                   
        connect =cms.string('frontier://FrontierProd/CMS_CONDITIONS')
)                                                                                                                            
                                                                                                                             
process.es_prefer_forest = cms.ESPrefer("PoolDBESSource","gbrforest")   

Don't look below...

 In order to get the latest code with MVA track selection run the following commands.

Revision 72015-05-23 - StanislavLisniak

Line: 1 to 1
 
META TOPICPARENT name="StanislavLisniak"

MVA track selection for Heavy Ions reconstruction

Line: 62 to 62
  which displays MVA output values for highPurity tracks for lowpt step and must be cut at 0.35
Added:
>
>
In order to get tight and highPurity working points in the analyser (for fakerate only)
process.anaTrack.qualityStrings = cms.untracked.vstring(['highPurity','tight','loose'])
 

Todo

  1. Train mva tree for some algorithm (e.g. Second) with variables from MultiTrackSelector.cc. (see below)

Revision 62015-05-11 - StanislavLisniak

Line: 1 to 1
 
META TOPICPARENT name="StanislavLisniak"

MVA track selection for Heavy Ions reconstruction

Added:
>
>

Setup

In order to get the latest code with MVA track selection run the following commands.

cmsrel CMSSW_7_5_0_pre3
cd CMSSW_7_5_0_pre3/src/
cmsenv
git cms-init
git remote add stas https://github.com/istaslis/cmssw.git
git fetch stas
git checkout -b hiMVAtrackSelector75X stas/hiMVAtrackSelector75X
git cms-addpkg RecoHI/HiTracking
scram b -j20

The following code is needed in cfg since MVA calibrations are not in the GT, and MVA is turned off by default.

process.gbrforest = cms.ESSource("PoolDBESSource",CondDBSetup,                                                               
                                 toGet = cms.VPSet(                                                                          
        cms.PSet( record = cms.string('GBRWrapperRcd'),                                                                      
                  tag= cms.string('GBRForest_HIMVASelectorIter4_v0_offline'),                                                
                  label  = cms.untracked.string('HIMVASelectorIter4')                                                        
                  ),                                                                                                         
        cms.PSet( record = cms.string('GBRWrapperRcd'),                                                                      
                  tag= cms.string('GBRForest_HIMVASelectorIter5_v0_offline'),                                                
                  label  = cms.untracked.string('HIMVASelectorIter5')                                                        
                  ),                                                                                                         
        cms.PSet( record = cms.string('GBRWrapperRcd'),                                                                      
                  tag= cms.string('GBRForest_HIMVASelectorIter6_v0_offline'),                                                
                  label  = cms.untracked.string('HIMVASelectorIter6')                                                        
                  ),                                                                                                         
        cms.PSet( record = cms.string('GBRWrapperRcd'),                                                                      
                  tag= cms.string('GBRForest_HIMVASelectorIter7_v0_offline'),                                                
                  label  = cms.untracked.string('HIMVASelectorIter7')                                                        
                  )                                                                                                          
                                                                                                                             
        ),                                                                                                                   
        connect =cms.string('frontier://FrontierProd/CMS_CONDITIONS')
)                                                                                                                            
                                                                                                                             
process.es_prefer_forest = cms.ESPrefer("PoolDBESSource","gbrforest")   



process.hiInitialStepSelector.useAnyMVA = cms.bool(True)
process.hiLowPtTripletStepSelector.useAnyMVA = cms.bool(True)
process.hiPixelPairStepSelector.useAnyMVA = cms.bool(True)
process.hiDetachedTripletStepSelector.useAnyMVA = cms.bool(True)   
                                 

For the simple check one can open the output root file root -l step3_RAW2DIGI_L1Reco_RECO_nomva.root and run

Events->Draw("floatedmValueMap_hiLowPtTripletStepQual_MVAVals_RERECO.obj.values_","(recoTracks_hiLowPtTripletStepQual__RERECO.obj.qualityMask()&(1<<2))!=0")
which displays MVA output values for highPurity tracks for lowpt step and must be cut at 0.35
 

Todo

  1. Train mva tree for some algorithm (e.g. Second) with variables from MultiTrackSelector.cc. (see below)

Revision 52015-03-12 - StanislavLisniak

Line: 1 to 1
 
META TOPICPARENT name="StanislavLisniak"

MVA track selection for Heavy Ions reconstruction

Line: 10 to 10
 

Deadlines

Changed:
<
<
  • February 19 - tracking meeting
  • CMSSW_7_4_0_pre7: 3 February 2015 (Last open prerelease) - oops, it passed smile
  • CMSSW 7.5 goes probably May (if at all)
  • March 5 - end result on Tracking meeting.
>
>
  • CMSSW_7_5_0_pre1: 9 March 2015
  • CMSSW_7_5_0_pre2: 30 March 2015
  • CMSSW_7_5_0_pre3: 13 April 2015
  • CMSSW_7_5_0_pre4: 27 April 2015
  • CMSSW_7_5_0_pre5: 11 May 2015 (Last open release)
  • CMSSW_7_5_0_pre6: 25 May 2015 (Bug fix release as needed
  • CMSSW_7_5_0: 8 June 2015
 

CMSSW usage

Revision 42015-03-04 - AustinBaty

Line: 1 to 1
 
META TOPICPARENT name="StanislavLisniak"

MVA track selection for Heavy Ions reconstruction

Line: 105 to 103
 
  • BDTG5 > -0.02
  • BDTG6 > -0.041
Added:
>
>

Training Update 04/03/2015

Added the following variables:

  • trkDxy1/trkDxyError1
  • trkDz1/trkDzError1
  • trkPt (Low pt step only, trkAlgo==5)

Switched to new 74x sample that has a correct beam spot:

  • trkAlgo = 4 or trkAlgo = 5
    • /mnt/hadoop/cms/store/user/dgulhan/HiForest_Dijet_pthat80_740pre6_BS/HiForest_100_1_C7l.root (50 events, training)
    • /mnt/hadoop/cms/store/user/dgulhan/HiForest_Dijet_pthat80_740pre6_BS/HiForest_101_1_MtA.root (50 events, training)
    • /mnt/hadoop/cms/store/user/dgulhan/HiForest_Dijet_pthat80_740pre6_BS/HiForest_103_1_RG9.root (50 events, testing)
    • /mnt/hadoop/cms/store/user/dgulhan/HiForest_Dijet_pthat80_740pre6_BS/HiForest_104_1_L0g.root (50 events, testing)
  • trkAlgo = 6
    • /mnt/hadoop/cms/store/user/dgulhan/HiForest_Dijet_pthat80_740pre6_BS_merged/HiForest_Dijet_pthat80_740pre6_BS.root (5800 events, split randomly 50/50 into training/testing trees)

Output Files (suggested cuts now in the weights folder):

OR /net/hisrv0001/home/abaty/TMVA/CMSSW_6_2_12_patch1/src/TMVA-v4.2.0/test/weightsStore/training_3_4_2015
 -- StanislavLisniak - 2015-02-09

Revision 32015-02-22 - AustinBaty

Line: 1 to 1
 
META TOPICPARENT name="StanislavLisniak"

MVA track selection for Heavy Ions reconstruction

Todo

Changed:
<
<
  1. Train mva tree for some algorithm (e.g. Second) with variables from MultiTrackSelector.cc. (see below)
  2. Check that output (track pt distribution) from MultiTrackSelector is the same as from TMVA.
  3. Train all other algorithms and make general eff/fr plot for all algorithms.
>
>
  1. Train mva tree for some algorithm (e.g. Second) with variables from MultiTrackSelector.cc. (see below)
  2. Check that output (track pt distribution) from MultiTrackSelector is the same as from TMVA.
  3. Train all other algorithms and make general eff/fr plot for all algorithms.
 

Deadlines

Line: 16 to 16
 
  • March 5 - end result on Tracking meeting.

CMSSW usage

Added:
>
>
 The MultiTrackSelector class in CMSSW 7 defines track quality. It can be defined by cuts (if no MVA info is provided) or by decision tree(s).
Changed:
<
<
The list of variables used is listed here: MultiTrackSelector.cc, Line 599
>
>
The list of variables used is listed here: MultiTrackSelector.cc, Line 599
  They are:
Line: 35 to 35
 
  • tmva_nlayers_
  • tmva_ndof_
Deleted:
<
<
 

GBRForest

Changed:
<
<
MultiTrackSelector class uses GBRForest class (github) to store the set of decision trees. This file could be read from file or database. In order to produce the file with GBRForest from the TMVA xml output one should use GBRForestWriter from RecoMET/METPUSubtraction module. The following code is a simplified version of writeGBRForests_cfg.py. It reads file TMVAClassification_BDT.weights.xml with 4 variables 'var1+var2','var1-var2','var3','var4' and 2 spectators 'var1*2','var1*3' and saves the GBRForest object to GBRForestfile.root file with the label HITrackMVAForest.
>
>
MultiTrackSelector class uses GBRForest class ( github) to store the set of decision trees. This file could be read from file or database. In order to produce the file with GBRForest from the TMVA xml output one should use GBRForestWriter from RecoMET /METPUSubtraction module. The following code is a simplified version of writeGBRForests_cfg.py. It reads file TMVAClassification_BDT.weights.xml with 4 variables 'var1+var2','var1-var2','var3','var4' and 2 spectators 'var1*2','var1*3' and saves the GBRForest object to GBRForestfile.root file with the label HITrackMVAForest.
 
import FWCore.ParameterSet.Config as cms
Line: 61 to 61
  ) ) )
Changed:
<
<
process.p = cms.Path(process.gbrForestWriter)
>
>
process.p = cms.Path(process.gbrForestWriter)
  In order to use this file from MultiTrackSelector in one of steps:

process.detachedTripletStepSelector.GBRForestFileName = cms.string("GBRForestfile.root")     
Changed:
<
<
process.detachedTripletStepSelector.GBRForestLabel = cms.string("HITrackMVAForest")
>
>
process.detachedTripletStepSelector.GBRForestLabel = cms.string("HITrackMVAForest")
 
Changed:
<
<
the file can contain many forests for each step with different GBRForestLabel.
>
>
the file can contain many forests for each step with different GBRForestLabel.
 

Response function

Changed:
<
<
GetClassifier function in the GBRForest class defines response function of all the output from trees. It can be different from one used in MethodBDT. Probably there is a simple mapping.
>
>
GetClassifier function in the GBRForest class defines response function of all the output from trees. It can be different from one used in MethodBDT. Probably there is a simple mapping.

22_02_2015 training

Trained three BDTG's, one for each algorithm with the following parameters: "!H:!V:NTrees=1000:MinNodeSize=2.5%:BoostType=Grad:Shrinkage=0.10:UseBaggedBoost:BaggedSampleFraction=0.5:nCuts=20:MaxDepth=3"

Variables Used (trkLostMidFrac is always 0 in the Forest so I didn't use it):

  • trkMinLost
  • trkNHit
  • relPtErr := TMath::Abs(trkPtError/trkPt)
  • trkEta
  • Chi2perDOF := trkChi2/trkNdof
  • trkNlayersLost
  • trkNlayer3D
  • trkNlayer
  • trkNdof
Samples Used:
  • trkAlgo = 4 or trkAlgo = 5
    • /mnt/hadoop/cms/store/user/dgulhan/HiForest_Dijet_pthat80_740pre6_v2/HiForest_100_1_oCa.root (50 events, training)
    • /mnt/hadoop/cms/store/user/dgulhan/HiForest_Dijet_pthat80_740pre6_v2/HiForest_101_1_CET.root (50 events, training)
    • /mnt/hadoop/cms/store/user/dgulhan/HiForest_Dijet_pthat80_740pre6_v2/HiForest_102_1_LLd.root (50 events, testing)
    • /mnt/hadoop/cms/store/user/dgulhan/HiForest_Dijet_pthat80_740pre6_v2/HiForest_103_1_IfN.root (50 events, testing)
  • trkAlgo = 6
    • /mnt/hadoop/cms/store/user/dgulhan/HiForest_Dijet_pthat80_740pre6_v2_merged/Hiforest_Dijet_pthat80_740pre6_merged.root (5800 events, split randomly 50/50 into training/testing trees)
Output File Locations:
  • Trees: /mnt/hadoop/cms/store/user/abaty/MVATrackSelection/TMVA_algo*.root
  • BDT weight files for implementation: /net/hisrv0001/home/abaty/TMVA/CMSSW_6_2_12_patch1/src/TMVA-v4.2.0/test/weightsStore/training_2_22_2015/
Suggested cuts (can change for our own needs):
  • BDTG4 > -0.047
  • BDTG5 > -0.02
  • BDTG6 > -0.041
  -- StanislavLisniak - 2015-02-09 \ No newline at end of file

Revision 22015-02-09 - StanislavLisniak

Line: 1 to 1
 
META TOPICPARENT name="StanislavLisniak"

MVA track selection for Heavy Ions reconstruction

Added:
>
>

Todo

  1. Train mva tree for some algorithm (e.g. Second) with variables from MultiTrackSelector.cc. (see below)
  2. Check that output (track pt distribution) from MultiTrackSelector is the same as from TMVA.
  3. Train all other algorithms and make general eff/fr plot for all algorithms.

Deadlines

  • February 19 - tracking meeting
  • CMSSW_7_4_0_pre7: 3 February 2015 (Last open prerelease) - oops, it passed smile
  • CMSSW 7.5 goes probably May (if at all)
  • March 5 - end result on Tracking meeting.
 

CMSSW usage

The MultiTrackSelector class in CMSSW 7 defines track quality. It can be defined by cuts (if no MVA info is provided) or by decision tree(s).
Line: 24 to 37
 

GBRForest

Changed:
<
<
MultiTrackSelector class uses GBRForest class to store the set of decision trees. This file could be read from file or database. In order to produce the file with GBRForest from the TMVA xml output one should use GBRForestWriter from RecoMET/METPUSubtraction module. The following code is a simplified version of writeGBRForests_cfg.py. It reads file TMVAClassification_BDT.weights.xml with 4 variables 'var1+var2','var1-var2','var3','var4' and 2 spectators 'var1*2','var1*3' and saves the GBRForest object to GBRForestfile.root file with the label HITrackMVAForest.
>
>
MultiTrackSelector class uses GBRForest class (github) to store the set of decision trees. This file could be read from file or database. In order to produce the file with GBRForest from the TMVA xml output one should use GBRForestWriter from RecoMET/METPUSubtraction module. The following code is a simplified version of writeGBRForests_cfg.py. It reads file TMVAClassification_BDT.weights.xml with 4 variables 'var1+var2','var1-var2','var3','var4' and 2 spectators 'var1*2','var1*3' and saves the GBRForest object to GBRForestfile.root file with the label HITrackMVAForest.
 
import FWCore.ParameterSet.Config as cms
Line: 60 to 73
  the file can contain many forests for each step with different GBRForestLabel.
Changed:
<
<
>
>

Response function

GetClassifier function in the GBRForest class defines response function of all the output from trees. It can be different from one used in MethodBDT. Probably there is a simple mapping.
  -- StanislavLisniak - 2015-02-09

Revision 12015-02-09 - StanislavLisniak

Line: 1 to 1
Added:
>
>
META TOPICPARENT name="StanislavLisniak"

MVA track selection for Heavy Ions reconstruction

CMSSW usage

The MultiTrackSelector class in CMSSW 7 defines track quality. It can be defined by cuts (if no MVA info is provided) or by decision tree(s).

The list of variables used is listed here: MultiTrackSelector.cc, Line 599

They are:

  • tmva_lostmidfrac_
  • tmva_minlost_
  • tmva_nhits_
  • tmva_relpterr_
  • tmva_eta_
  • tmva_chi2n_no1dmod_
  • tmva_chi2n_
  • tmva_nlayerslost_
  • tmva_nlayers3D_
  • tmva_nlayers_
  • tmva_ndof_

GBRForest

MultiTrackSelector class uses GBRForest class to store the set of decision trees. This file could be read from file or database. In order to produce the file with GBRForest from the TMVA xml output one should use GBRForestWriter from RecoMET/METPUSubtraction module. The following code is a simplified version of writeGBRForests_cfg.py. It reads file TMVAClassification_BDT.weights.xml with 4 variables 'var1+var2','var1-var2','var3','var4' and 2 spectators 'var1*2','var1*3' and saves the GBRForest object to GBRForestfile.root file with the label HITrackMVAForest.

import FWCore.ParameterSet.Config as cms
process = cms.Process("writeGBRForests")
# CV: needs to be set to 1 so that GBRForestWriter::analyze method gets called exactly once
process.maxEvents = cms.untracked.PSet( input = cms.untracked.int32(1) )

process.source = cms.Source("EmptySource")
process.load('Configuration/StandardSequences/Services_cff')

process.gbrForestWriter = cms.EDAnalyzer("GBRForestWriter",
    jobs = cms.VPSet(
        cms.PSet(
            inputFileName = cms.string('TMVAClassification_BDT.weights.xml'),
            inputFileType = cms.string("XML"),
            inputVariables = cms.vstring(['var1+var2','var1-var2','var3','var4']),
            spectatorVariables = cms.vstring(['var1*2','var1*3']),
            gbrForestName = cms.string("HITrackMVAForest"),
            outputFileType = cms.string("GBRForest"),
            outputFileName = cms.string("GBRForestfile.root")
        )
    )
)
process.p = cms.Path(process.gbrForestWriter)

In order to use this file from MultiTrackSelector in one of steps:

process.detachedTripletStepSelector.GBRForestFileName = cms.string("GBRForestfile.root")     
process.detachedTripletStepSelector.GBRForestLabel = cms.string("HITrackMVAForest")

the file can contain many forests for each step with different GBRForestLabel.

-- StanislavLisniak - 2015-02-09

 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback