Jet Probability calibrations

Introduction

Jet-Probability taggers (JP and JBP) use as calibration the negative impact parameter significance distributions to calculate the probability of a track to come frome the primary vertex. Given the probability of all tracks associated to a jet, the probability (or rather: the confidence) of this jet to come from the primary vertex is calculated. A discriminator is then deduced from this probability.

The calibration is performed independently on data and MC, but with similar selections. Data are from the PD sample JetHT and MC is the QCD_Pt80to120 sample, requesting for both the HLT_PFJet80 condition and using only (AK4) jets with pT > 30 GeV and IetaI < 2.5. To take into account track quality effects, which can influence the impact parameter significance distributions, different categories are defined.

On this twiki page you will find the categories definition and recipes on how to create the calibration payloads.

Definition of track categories

Track selection

Tight tracks provided by the Tracking group are used. Only tracks selected for the JP and JBP taggers are considered:

  • dR (track - jet axis) < 0.3
  • pT > 1 GeV
  • chi^2 < 5
  • number of pixel hits >= 1 (new selection, it was >= 2 for versions < CMSSW_8_0_20)
  • number of pixel+strip hits >= 1 (new selection, it was >= 8 for versions < CMSSW_8_0_20)
  • I IPxy I < 0.2 cm and I dz I < 17 cm
  • I distance to jet axis I < 0.07 cm
  • decay length < 5 cm

Category definition for versions <= CMSSW_8_X_Y (up to 2016 data and MC)

There are 10 categories defined as:

  • category 0: for tracks with a number of pixel hits = 1, p > 1 GeV and IetaI < 2.5
  • category 1: for tracks with chi^2 > 2.5 and number of pixel hits >= 2, p > 1 GeV and IetaI < 2.5

for tracks with chi^2 < 2.5

  • category 2-4: for |eta| in the ranges [0,0.8], [0.8,1.6] and [1.6,2.5] with a number of pixel hits >=3 and p<8 GeV
  • category 5: with a number of pixel hits = 2 and p<8 GeV
  • category 6-8: for |eta| in the ranges [0,0.8], [0.8,1.6] and [1.6,2.5] with a number of pixel hits >=3 and p>8 GeV
  • category 9: with a number of pixel hits = 2 and p>8 GeV

For versions < CMSSW_8_0_20: the minimum number of pixel hits was >= 2, the total number of hits was >= 8.

For versions >= CMSSW_8_0_20: the minimum number of pixel hits is >= 1, the total number of hits is >= 1 (so >= 3 in practice for 3D tracking).

If the impact parameter significance of the track is > 50, the category number is set to -1.

Category definition for versions >= CMSSW_9_X_Y (from 2016 data and MC)

There are 10 categories defined as:

  • category 0: for tracks without hit in the first pixel layer, but with >= 1 pixel hit in total, p >= 0.1 GeV, any eta
  • category 1: for tracks with hit in the first pixel layer, but with <= 3 pixel hits in total, p >= 0.1 GeV, any eta

for tracks with hit in the first pixel layer and with >= 4 pixel hits in total:

  • category 2-4: for |eta| < 1 and p in the range 0.1-3, 3-6, > 6 GeV, respectively
  • category 5-7: for 1 < |eta| < 2 and p in the range 0.1-6, 6-12, > 12 GeV, respectively
  • category 8-9: for |eta| > 2 and p in the range 0.1-18, > 18 GeV, respectively

If the impact parameter significance of the track is > 50, the category number is set to -1.

NEW RECIPE to create the calibration payload: since CMSSW versions >= 7_X_X

1. Run the BTagAnalyzer code:

see https://twiki.cern.ch/twiki/bin/viewauth/CMS/BTagAnalyzer and

  • request track informations in runBTagAnalyzer_cfg.py, with:

process.btagana.useSelectedTracks = True;

process.btagana.produceJetTrackTree = True;

(or any relevant trigger, which should just be the same in Data and MC) so just replace the line: smalltree->Fill(); by: if ( PFJet80 ) smalltree->Fill();

2. Read the produced Ntuples and create a calibration .xml file

see https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideBTagJetProbabilityCalibration#The_IPHC_JPstudies_package

  • cp IPHC_JPstudies in your userdirectory, outside CMSSW (!!)
  • cd userdirectory/IPHC_JPstudies/CategoryDef
  • update JetProbaCalib.h to match the Ntuple format (the default is for 7xx)
  • update JetProbaCalib.C for the eventual jet pt cut (the default is > 30 GeV)
  • update RunJetProbaCalib.C with the input Ntuple name

  • the following works in ROOT 5... if you use ROOT >=6, replace gROOT->ProcessLine() by R__LOAD_LIBRARY(CategoryDef.C++) and make sure the macro loads the compiled library before running
  • then: root -b RunJetProbaCalib.C -q
  • the output is Histo_25.xml : change there all the "::" by "__"

3. Convert the text .xml file into a sqlite .db file:

(see https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideBTagJetProbabilityCalibration#For_CMSSW_5_3_X_releases)

4. Export the calibration file to the preparation database

  • from lxplus in a similar CMSSW directory:
  • need a valid CRAB account !
cd CMSSW_7_x_x/src

cmsenv

~/bin/uploadConditions.py ~/JPcalib_name/JPcalib_name_vi.db
  • and answer to the questions: type "enter" for the default
  • here only follows the non-default answers
inputTag: 1 (for probBTagPDF3D_tag_mc)

destinationDatabase: oracle://cms_orcon_prod/CMS_CONDITIONS (for prod) or oracle://cms_orcoff_prep/CMS_CONDITIONS (for prep)

for MC: since []: 1

userText []: JP calibration from QCD MC in RunIISpring15DR74-Asympt25ns_MCRUN2_74_V9

destinationTag []: JPcalib_MC74X_25ns_vi

for Data: since []: 246908 (first run number for 25ns data)

userText []: JP calibration from 2015D JetHT with PFJet80

destinationTag []: JPcalib_Data76X_2015D_vi

5. restart as in 1. after having activated the new payload in runBTagAnalyzer_cfg.py

  • trkProbaCalibTag = "JPcalib_Data74X_2015D_vi"
  • check that the track probability distribution is flat with the new payload

OLD RECIPE to create the calibration payload: for CMSSW versions <= 6_X_X

Here follows the old recipe on how to use the RecoBTag/ImpactParamterLearning package to produce the standard calibrations but also how to define your own calibration.

The main procedure to create new calibrations using crab was the following:

  • run the module ImpactParameterLearning using crab and get a collection of calibration xml files
  • run the module SumHistoCalibration to merge all calibration files into 1 final file in binary, xml of sqlite format

The ImpactParameterLearning package can be used to produce calibration from Reco/AOD data samples. It takes as input a TrackIPTagInfoCollection and create negative impact parameter significance distributions from the "selected tracks". Output histograms can be produced into the XML file format (which can be used for calibrtions studies) or into sqlite format (which is used to interact with the data base).

To produce calibrations using a large amount of events, the following strategy can be used. First using crab, the ImpactParameterLearning module produce a list of calibration files in XML data format. Then the SumHistoCalibration module can be used to merge all these histograms in only one remaining histograms produce into a XML file or a sqlite file. This last file can by used to recalculate track probabilities and JetProbability taggers.

For CMSSW_1_8_4 release

The following tag can by used to produce calibrations running over the pre-CSA08 fastsim samlpes.

       cvs co -r jetProba_1_8_4 RecoBTag/ImpactParameterLearning
   

For CMSSW_2_1_X releases

The following tag can be used with CMSSW 2_2_X releases.

       cvs co -r jetProba_Calib_2_2_X RecoBTag/ImpactParameterLearning
   

All the useful cfg files was converted into py file

For CMSSW_3_1_X releases

The following tag can be used with CMSSW 3_1_X releases.

       cvs co -r jetProba_Calib_3_1_X RecoBTag/ImpactParameterLearning
   

an example of configuration file to produce DQM validation histograms from the new calibration is also provided : RecoBTag/ImpactParameterLearning/test/histoMaker.py

For CMSSW_5_3_X releases

       cvs co RecoBTag/PerformanceMeasurements/
       cvs co -r V01-04-09 RecoBTag/ImpactParameter/
       cvs co -d bTag/CommissioningCommonSetup UserCode/bTag/CommissioningCommonSetup
       cvs co -r V02-00-01 RecoBTag/ImpactParameterLearning
   

By default, the BTagAnalyzer tree is produced, to perform checks and validations.

How to use it.

An example of configuration file can be found with RecoBTag/ImpactParameterLearning/calib_cfg.py. Here is an example of ImpactParameterCalibration module definition.

 process.ipCalib = cms.EDFilter("ImpactParameterCalibration",
    writeToDB = cms.bool(False),
    writeToBinary = cms.bool(True),
    nBins = cms.int32(10000),
    maxSignificance = cms.double(50.0),
    writeToRootXML = cms.bool(True),
    tagInfoSrc = cms.InputTag("impactParameterTagInfos"),
    inputCategories = cms.string('HardCoded'),
    primaryVertexSrc = cms.InputTag("offlinePrimaryVertices")
)
   

Paramters of intrest are writeToDB, and writeToRootXML which allow you to write calibrations into sqlite or Xml format. The parameter inputCategories set the way the categories are defined: HardCoded means that the standard categroy definition is used (which are hardcoded in the ImpactParameterCalibration module), RootXML takes the categories defined in an xml file. This last option allow you to define your own calibration and to fill it. The input xml files locations have to be define with the parameters calibFile3d and calibFile2d.

When an sqlite output is required, the following module need to be added to your config file:

 process.PoolDBOutputService = cms.Service("PoolDBOutputService",
    authenticationMethod = cms.untracked.uint32(1),
    loadBlobStreamer = cms.untracked.bool(True),
    catalog = cms.untracked.string('file:mycatalog_new.xml'),
    DBParameters = cms.PSet(
        messageLevel = cms.untracked.int32(0),
        authenticationPath = cms.untracked.string('.')
    ),
    timetype = cms.untracked.string('runnumber'),
    connect = cms.string('sqlite_file:btagnew_new.db'),
    toPut = cms.VPSet(cms.PSet(
        record = cms.string('BTagTrackProbability2DRcd'),
        tag = cms.string('probBTagPDF2D_tag')
    ), 
        cms.PSet(
            record = cms.string('BTagTrackProbability3DRcd'),
            tag = cms.string('probBTagPDF3D_tag')
        ))
)

where "btagnew_new.db" is the name of the output file.

Using crab, you should be able produce a list of xml files. The module SumHistoCalibration can be used to sum all the histograms contained in these xml files and to produce a final sqlite file which could be used by JetProbability alrogithms. An example of config file can be found RecoBTag/ImpactParameterLearning/test/sumXMLs_cfg.py. The module difinition is :

process.sumCalib = cms.EDFilter("SumHistoCalibration",
    xmlfiles2d = cms.vstring("RecoBTag/ImpactParameterLearning/test/2d_1.xml", "RecoBTag/ImpactParameterLearning/test/2d_2.xml"),
    xmlfiles3d = cms.vstring("RecoBTag/ImpactParameterLearning/test/3d_1.xml", "RecoBTag/ImpactParameterLearning/test/3d_2.xml"),
    sum2D      = cms.bool(True),
    sum3D      = cms.bool(True),
         writeToDB            = cms.bool(True),
         writeToRootXML  = cms.bool(False),
         writeToBinary       = cms.bool(False)
)

where xmlfiles2d and xmlfiles3d corresponds to the list of xml calibration files you want to sum. There is one list for 2D and 3D impact parameter significance calibration. sum2D and sum3D are booleens which are used to swich on or off the summation on histograms. writeToDB, writeToRootXML and writeToBinary are used to define the output format.

The IPHC_JPstudies package

To use this package, first install a CMSSW project:

cmsrel CMSSW_5_3_11
cd CMSSW_5_3_11/src
cmsenv
Then the git package:
git clone https://github.com/IPHC/IPHC_JPstudies.git
and finally compile:
scramv1 b 

Create your own categories

WARNING: you need to have at least the 5.34/18 ROOT version to run all these codes !!!

Go in the folder "CategoryDef". Here you have several codes to use to create your own calibrations. First, CategoryDef.C/h to change the variables used in the calibrations. By default, only the variables that enter the official calibration appear (see section "Variables used"). But if you want for example to add a category depending on the decay length, you have to add the decay length in the function "Copy". Be careful, to really use your new variable, other codes should be changed as well (see later).

In CategoriesDefinition.C/h, you defined all your categories to make your own calibration. It is just to put a min and a max to the variables used, for each different category.

Finally, you can produce your calibration file with JetProbaCalib.C/h. Here, check the function "IsInCategory", in case you added a new variable to the categories. Besides, if you want to change the track selection, edit the function "passTrackSel". Every time, don't forget to change the .h accordingly. You can run this code with RunJetProbaCalib.C in ROOT:

.x RunJetProbaCalib.C
In this code, you can choose on which sample you want to run to create your calibrations.

By default, you will get in output a root file named "calibeHistoWrite_std.root" containing the histograms of all your categories.

Check your calibration

Now that you have created your new calibration, let's validate it. Use directly RunJetProbaValidation.C:

.x RunJetProbaValidation.C

but change before the final line "theanalyzer->ComputeProba("calibeHistoWrite.root");" to put the proper calibration file. The output file "JP_myCalib.root" will contain the histogram "trackPCat_all" and if your calibrations are fine, it should be flat (but for that don't forget that you have to run on the same events than the ones used to produce the calibrations).

Compare performances

In the folder IPHC_JPstudies, you can use HighPtStudy.C/h to check compare the performances of different calibrations. You have to change the function "IsInCategory" and "passTrackSel" if you change them in JetProbaCalib.C. To run the code, use RunHighPtStudy.C in ROOT:

.x RunHighPtStudy.C
In that code, in the last line, plug your new calibrations by changing the following line in an appropriate way:
theanalyzer->Loop("CategoryDef/CalibrationFiles/calibeHistoWrite_std.root");

You can also change the input files, and if you want to validate your calibration, they should be the same than when you produce the calibration.

As an output, you will get another root file, "study_histo.root", containing the plots you want to make efficiency curve. To get these plots, use the code "DoPlots.C". In the last function DoPLots(), simply change the inputs of DoPlots_perf such as :

DoPlots_perf("JetProba", "JetProbaNew","study_histo_file_reference.root","study_histo_containing_newCalib.root"); 
and then run the code.

Calibration in 31x to 36x releases

In your configuratuion file, you need to add the following line to use the global tag:

process.load('Configuration/StandardSequences/FrontierConditions_CMS.GlobalTag_cff')

Then, add the following lines to read the new calibrations from a sqlite file:

from CondCore.DBCommon.CondDBCommon_cfi import *
process.load("RecoBTag.TrackProbability.trackProbabilityFakeCond_cfi")
process.trackProbabilityFakeCond.connect = "sqlite_fip:calibefile.db"
process.es_prefer_trackProbabilityFakeCond = cms.ESPrefer("PoolDBESSource","trackProbabilityFakeCond")

"calibefile.db" is the name of the calibration file you have produced.

If you want run with the new calibration on crab, you should add the follow lines in the config file to replace the previous lines that you add to run in intercative way.

Each file.db contain two different tags, the first one for 2D calibration, and the second one for 3D calibration. If you have produced you personal db files is need that you put the tags that it contains in the database (contact the DB expert) and then add n your config file the follow line with the name that you give to your tags.

process.GlobalTag.toGet = cms.VPSet( cms.PSet(record = cms.string("BTagTrackProbability2DRcd"), tag = cms.string("name_tag_2D"), connect = cms.untracked.string("frontier://FrontierProd/CMS_COND_31X_BTAU")), cms.PSet(record = cms.string("BTagTrackProbability3DRcd"), tag = cms.string("name_tag_3D"), connect = cms.untracked.string("frontier://FrontierProd/CMS_COND_31X_BTAU")) )

There are also some tags already available on the database that contain the new calibration for Data, Mc ideal alignment and Mc startup alignemnt. So you can use the same previous lines and replace the name_tag_2D and name_tag_3D with the follow name:

these tags are produced with 36X realise.

Calibration in 39x release:

For 39x Data, use the default Jet Probability calibration from the GlobalTag.

For 39x MC, the squlite file can be found on: /afs/cern.ch/user/c/cferro/public/btagnew_mc_39X_QCD.db

  • To process cmsRrun interactively, add in your .py:

from CondCore.DBCommon.CondDBCommon_cfi import *
process.load("RecoBTag.TrackProbability.trackProbabilityFakeCond_cfi")
process.trackProbabilityFakeCond.connect =cms.string( "sqlite_fip:RecoBTag/PerformanceMeasurements/test/btagnew_mc_39X_QCD.db")
process.es_prefer_trackProbabilityFakeCond = cms.ESPrefer("PoolDBESSource","trackProbabilityFakeCond")
 

  • Using CRAB, add instead in your .py:

process.GlobalTag.toGet = cms.VPSet(
  cms.PSet(record = cms.string("BTagTrackProbability2DRcd"),
       tag = cms.string("TrackProbabilityCalibration_2D_Qcd80to120Winter10_v1_mc"),
       connect = cms.untracked.string("frontier://FrontierProd/CMS_COND_31X_BTAU")),
  cms.PSet(record = cms.string("BTagTrackProbability3DRcd"),
       tag = cms.string("TrackProbabilityCalibration_3D_Qcd80to120Winter10_v1_mc"),
       connect = cms.untracked.string("frontier://FrontierProd/CMS_COND_31X_BTAU"))
)

and in your .cfg:

additional_input_files = btagnew_mc_39X_QCD.db

Calibration in 41x and 42x Data, 42x and 44x (START44_V5) MC:

The default Jet Probability Calibration from the GlobalTag is not optimal and needs to be replaced in the following way:

  • To process cmsRrun interactively:

The squlite file can be found on /afs/cern.ch/user/c/cferro/public/ as (copy the following files in your directory):

btagnew_Data_2010_41X.db (for 2010 data),

btagnew_Data_2011_41X.db (for 2011 data),

btagnew_MC_414_2011.db (for Spring11 and Summer11 MC).

and add in your .py (just use the right .db line):

from CondCore.DBCommon.CondDBCommon_cfi import *
process.load("RecoBTag.TrackProbability.trackProbabilityFakeCond_cfi")
process.trackProbabilityFakeCond.connect =cms.string(
# "sqlite_fip:RecoBTag/PerformanceMeasurements/test/btagnew_Data_2010_41X.db")
# "sqlite_fip:RecoBTag/PerformanceMeasurements/test/btagnew_Data_2011_41X.db")
"sqlite_fip:RecoBTag/PerformanceMeasurements/test/btagnew_MC_414_2011.db")
process.es_prefer_trackProbabilityFakeCond = cms.ESPrefer("PoolDBESSource","trackProbabilityFakeCond")
 

  • Using CRAB:

Comment the previous lines and add instead in your .py:

process.GlobalTag.toGet = cms.VPSet(
  cms.PSet(record = cms.string("BTagTrackProbability2DRcd"),
#        tag = cms.string("TrackProbabilityCalibration_2D_2010Data_v1_offline"),
#        tag = cms.string("TrackProbabilityCalibration_2D_2011Data_v1_offline"),
       tag = cms.string("TrackProbabilityCalibration_2D_2011_v1_mc"),
       connect = cms.untracked.string("frontier://FrontierProd/CMS_COND_31X_BTAU")),
  cms.PSet(record = cms.string("BTagTrackProbability3DRcd"),
#        tag = cms.string("TrackProbabilityCalibration_3D_2010Data_v1_offline"),
#        tag = cms.string("TrackProbabilityCalibration_3D_2011Data_v1_offline"),
       tag = cms.string("TrackProbabilityCalibration_3D_2011_v1_mc"),
       connect = cms.untracked.string("frontier://FrontierProd/CMS_COND_31X_BTAU"))
)

where the new tags are for:

2010 data:

TrackProbabilityCalibration_2D_2010Data_v1_offline

TrackProbabilityCalibration_3D_2010Data_v1_offline

2011 data:

TrackProbabilityCalibration_2D_2011Data_v1_offline

TrackProbabilityCalibration_3D_2011Data_v1_offline

Spring11 and Summer11 MC:

TrackProbabilityCalibration_2D_2011_v1_mc

TrackProbabilityCalibration_3D_2011_v1_mc

So just include the good ones in your cfg as above.

Calibration in 44x Data (Nov2011 Rereco) and Fall11 START44_V9B MC:

For the Data, the default Jet Probability calibration can be used.

For the MC, the default Jet Probability Calibration from the GlobalTag is not optimal and needs to be replaced in the following way when using CRAB:

  • add in your .py:

process.GlobalTag.toGet = cms.VPSet(
  cms.PSet(record = cms.string("BTagTrackProbability2DRcd"),
       tag = cms.string("TrackProbabilityCalibration_2D_MC_80_Ali44_v1"),
       connect = cms.untracked.string("frontier://FrontierPrep/CMS_COND_BTAU")),
  cms.PSet(record = cms.string("BTagTrackProbability3DRcd"),
       tag = cms.string("TrackProbabilityCalibration_3D_MC_80_Ali44_v1"),
       connect = cms.untracked.string("frontier://FrontierPrep/CMS_COND_BTAU"))
)

The jet probability tagger has to be re-run and so the general track collection and the PF jets have to be in the edm files.

Calibration in 52x Data and MC:

For the Monte Carlo, no new calibration is necessary yet.

For 52x Data in 2012A and 2012B, the default Jet Probability Calibration from the GlobalTag is not optimal and needs to be replaced in the following way:

  • Using CRAB: when reading AOD of the PromptReco Data,

add in your .py:

process.GlobalTag.toGet = cms.VPSet(
  cms.PSet(record = cms.string("BTagTrackProbability2DRcd"),
       tag = cms.string("TrackProbabilityCalibration_2D_2012DataTOT_v1_offline"),
       connect = cms.untracked.string("frontier://FrontierPrep/CMS_COND_BTAU")),
  cms.PSet(record = cms.string("BTagTrackProbability3DRcd"),
       tag = cms.string("TrackProbabilityCalibration_3D_2012DataTOT_v1_offline"),
       connect = cms.untracked.string("frontier://FrontierPrep/CMS_COND_BTAU"))
)

The jet probability tagger has to be re-run and so the general track collection and the PF jets have to be in the edm files.

Calibration in 53x Data and MC:

For 53x reprocessed Data from 22Jan2013, the default JP calibration used in the AOD (and RECO) productions is fine.

For 53x Data (prompt-reco data and reprocessings prior to 22Jan2013) and for 53x MC, the default Jet Probability Calibration from the GlobalTag is not optimal and needs to be replaced in the following way, when using CRAB:

  • Data, add in your .py:

process.GlobalTag.toGet = cms.VPSet(
  cms.PSet(record = cms.string("BTagTrackProbability2DRcd"),
       tag = cms.string("TrackProbabilityCalibration_2D_Data53X_v2"),
       connect = cms.untracked.string("frontier://FrontierPrep/CMS_COND_BTAU")),
  cms.PSet(record = cms.string("BTagTrackProbability3DRcd"),
       tag = cms.string("TrackProbabilityCalibration_3D_Data53X_v2"),
       connect = cms.untracked.string("frontier://FrontierPrep/CMS_COND_BTAU"))
)

  • MC, add in your .py:

process.GlobalTag.toGet = cms.VPSet(
  cms.PSet(record = cms.string("BTagTrackProbability2DRcd"),
       tag = cms.string("TrackProbabilityCalibration_2D_MC53X_v2"),
       connect = cms.untracked.string("frontier://FrontierPrep/CMS_COND_BTAU")),
  cms.PSet(record = cms.string("BTagTrackProbability3DRcd"),
       tag = cms.string("TrackProbabilityCalibration_3D_MC53X_v2"),
       connect = cms.untracked.string("frontier://FrontierPrep/CMS_COND_BTAU"))
)

The jet probability tagger has to be re-run and so the general track collection and the PF jets have to be in the edm files.

Validation Tools, Calibration Studies

an example of configuration file to produce official btag POG DQM validation histograms from the new calibration is also provided : RecoBTag/ImpactParameterLearning/test/histoMaker.py.

-- DanielBloch - 22-Mar-2011

Edit | Attach | Watch | Print version | History: r34 < r33 < r32 < r31 < r30 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r34 - 2017-10-12 - DanielBloch
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback