Jet Probability calibrations

Introduction

Jet-Probability taggers use as calibration the negative impact parameter significance distributions to calculate the probability of a track to come frome the primary vertex. Given the probability of all tracks associated to a jet, the probability of this jet to come from the primary vertex is calulated. A discriminator is then deduced from this probability.

These b-taggers need calibrations which correspond to negative impact parameter significance distributions which are extracted from data (QCD multi-jets). To take into account track quality effects (which can influence the impact parameter significance distributions), different categories are set-up which correspond to different track quality.

On this twiki page you will find the categories definition, an explanation of how to use the RecoBTag/ImpactParamterLearning package to produce the standard calibrations but also how to define your own calibration. Some validation/study tools will be also described (not yet implemented).

The main procedure to create new calibrations using crab is the following:

  • run the module ImpactParameterLearning using crab and get a collection of calibration xml files
  • run the module SumHistoCalibration to merge all calibration files into 1 final file in binary, xml of sqlite format

The categories definition

Variables used

The variables used to define tracks' categories are tracks':

  • chi^2,
  • momenta,
  • eta,
  • number of hit
  • number of pixel hit
  • a hit on the first pixel layer (not yet used)
  • trackQuality (not yet used)

for CMSSW release higher than CMSSW_2_0_0, categories' definition should inculdes the trackQuality variables provide by the Tracking group.

Categories definition

The categories used are the ones created for the PTDR. There is 9 categories defined as:

  • a single category for tracks with chi^2 > 2.5

for tracks with chi^2 < 2.4

  • 3 categories for |eta| in the ranges [0,0.8], [0.8,1.6] and [1.6,2.5] with a number of pixel >=3 and p<8
  • 1 category with a number of pixel = 2 and p<8
  • 3 categories for |eta| in the ranges [0,0.8], [0.8,1.6] and [1.6,2.5] with a number of pixel >=3 and p>8
  • 1 category with a number of pixel = 2 and p>8

The RecoBTag/ImpactParameterLearning package

The ImpactParameterLearning package can be used to produce calibration from Reco/AOD data samples. It takes as input a TrackIPTagInfoCollection and create negative impact parameter significance distributions from the "selected tracks". Output histograms can be produced into the XML file format (which can be used for calibrtions studies) or into sqlite format (which is used to interact with the data base).

To produce calibrations using a large amount of events, the following strategy can by used. First using crab, the ImpactParameterLearning module produce a list of calibration files in XML data format. Then the SumHistoCalibration module can be used to merge all these histograms in only one remaining histograms produce into a XML file or a sqlite file. This last file can by used to recalculate track probabilities and JetProbability taggers.

For CMSSW_1_8_4 release

The following tag can by used to produce calibrations running over the pre-CSA08 fastsim samlpes.

       cvs co -r jetProba_1_8_4 RecoBTag/ImpactParameterLearning
   

For CMSSW_2_1_X releases

The following tag can be used with CMSSW 2_2_X releases.

       cvs co -r jetProba_Calib_2_2_X RecoBTag/ImpactParameterLearning
   

All the useful cfg files was converted into py file

For CMSSW_3_1_X releases

The following tag can be used with CMSSW 3_1_X releases.

       cvs co -r jetProba_Calib_3_1_X RecoBTag/ImpactParameterLearning
   

an example of configuration file to produce DQM validation histograms from the new calibration is also provided : RecoBTag/ImpactParameterLearning/test/histoMaker.py

For CMSSW_5_3_X releases

       cvs co RecoBTag/PerformanceMeasurements/
       cvs co -r V01-04-09 RecoBTag/ImpactParameter/
       cvs co -d bTag/CommissioningCommonSetup UserCode/bTag/CommissioningCommonSetup
       cvs co -r V02-00-01 RecoBTag/ImpactParameterLearning
   

By default, the BTagAnalyzer tree is produced, to perform checks and validations.

How to use it.

An example of configuration file can be found with RecoBTag/ImpactParameterLearning/calib_cfg.py. Here is an example of ImpactParameterCalibration module definition.

 process.ipCalib = cms.EDFilter("ImpactParameterCalibration",
    writeToDB = cms.bool(False),
    writeToBinary = cms.bool(True),
    nBins = cms.int32(10000),
    maxSignificance = cms.double(50.0),
    writeToRootXML = cms.bool(True),
    tagInfoSrc = cms.InputTag("impactParameterTagInfos"),
    inputCategories = cms.string('HardCoded'),
    primaryVertexSrc = cms.InputTag("offlinePrimaryVertices")
)
   

Paramters of intrest are writeToDB, and writeToRootXML which allow you to write calibrations into sqlite or Xml format. The parameter inputCategories set the way the categories are defined: HardCoded means that the standard categroy definition is used (which are hardcoded in the ImpactParameterCalibration module), RootXML takes the categories defined in an xml file. This last option allow you to define your own calibration and to fill it. The input xml files locations have to be define with the parameters calibFile3d and calibFile2d.

When an sqlite output is required, the following module need to be added to your config file:

 process.PoolDBOutputService = cms.Service("PoolDBOutputService",
    authenticationMethod = cms.untracked.uint32(1),
    loadBlobStreamer = cms.untracked.bool(True),
    catalog = cms.untracked.string('file:mycatalog_new.xml'),
    DBParameters = cms.PSet(
        messageLevel = cms.untracked.int32(0),
        authenticationPath = cms.untracked.string('.')
    ),
    timetype = cms.untracked.string('runnumber'),
    connect = cms.string('sqlite_file:btagnew_new.db'),
    toPut = cms.VPSet(cms.PSet(
        record = cms.string('BTagTrackProbability2DRcd'),
        tag = cms.string('probBTagPDF2D_tag')
    ), 
        cms.PSet(
            record = cms.string('BTagTrackProbability3DRcd'),
            tag = cms.string('probBTagPDF3D_tag')
        ))
)

where "btagnew_new.db" is the name of the output file.

Using crab, you should be able produce a list of xml files. The module SumHistoCalibration can be used to sum all the histograms contained in these xml files and to produce a final sqlite file which could be used by JetProbability alrogithms. An example of config file can be found RecoBTag/ImpactParameterLearning/test/sumXMLs_cfg.py. The module difinition is :

process.sumCalib = cms.EDFilter("SumHistoCalibration",
    xmlfiles2d = cms.vstring("RecoBTag/ImpactParameterLearning/test/2d_1.xml", "RecoBTag/ImpactParameterLearning/test/2d_2.xml"),
    xmlfiles3d = cms.vstring("RecoBTag/ImpactParameterLearning/test/3d_1.xml", "RecoBTag/ImpactParameterLearning/test/3d_2.xml"),
    sum2D      = cms.bool(True),
    sum3D      = cms.bool(True),
         writeToDB            = cms.bool(True),
         writeToRootXML  = cms.bool(False),
         writeToBinary       = cms.bool(False)
)

where xmlfiles2d and xmlfiles3d corresponds to the list of xml calibration files you want to sum. There is one list for 2D and 3D impact parameter significance calibration. sum2D and sum3D are booleens which are used to swich on or off the summation on histograms. writeToDB, writeToRootXML and writeToBinary are used to define the output format.

Calibration in 31x to 36x releases

In your configuratuion file, you need to add the following line to use the global tag:

process.load('Configuration/StandardSequences/FrontierConditions_CMS.GlobalTag_cff')

Then, add the following lines to read the new calibrations from a sqlite file:

from CondCore.DBCommon.CondDBCommon_cfi import *
process.load("RecoBTag.TrackProbability.trackProbabilityFakeCond_cfi")
process.trackProbabilityFakeCond.connect = "sqlite_fip:calibefile.db"
process.es_prefer_trackProbabilityFakeCond = cms.ESPrefer("PoolDBESSource","trackProbabilityFakeCond")

"calibefile.db" is the name of the calibration file you have produced.

If you want run with the new calibration on crab, you should add the follow lines in the config file to replace the previous lines that you add to run in intercative way.

Each file.db contain two different tags, the first one for 2D calibration, and the second one for 3D calibration. If you have produced you personal db files is need that you put the tags that it contains in the database (contact the DB expert) and then add n your config file the follow line with the name that you give to your tags.

process.GlobalTag.toGet = cms.VPSet( cms.PSet(record = cms.string("BTagTrackProbability2DRcd"), tag = cms.string("name_tag_2D"), connect = cms.untracked.string("frontier://FrontierProd/CMS_COND_31X_BTAU")), cms.PSet(record = cms.string("BTagTrackProbability3DRcd"), tag = cms.string("name_tag_3D"), connect = cms.untracked.string("frontier://FrontierProd/CMS_COND_31X_BTAU")) )

There are also some tags already available on the database that contain the new calibration for Data, Mc ideal alignment and Mc startup alignemnt. So you can use the same previous lines and replace the name_tag_2D and name_tag_3D with the follow name:

these tags are produced with 36X realise.

Calibration in 39x release:

For 39x Data, use the default Jet Probability calibration from the GlobalTag.

For 39x MC, the squlite file can be found on: /afs/cern.ch/user/c/cferro/public/btagnew_mc_39X_QCD.db

  • To process cmsRrun interactively, add in your .py:

from CondCore.DBCommon.CondDBCommon_cfi import *
process.load("RecoBTag.TrackProbability.trackProbabilityFakeCond_cfi")
process.trackProbabilityFakeCond.connect =cms.string( "sqlite_fip:RecoBTag/PerformanceMeasurements/test/btagnew_mc_39X_QCD.db")
process.es_prefer_trackProbabilityFakeCond = cms.ESPrefer("PoolDBESSource","trackProbabilityFakeCond")
 

  • Using CRAB, add instead in your .py:

process.GlobalTag.toGet = cms.VPSet(
  cms.PSet(record = cms.string("BTagTrackProbability2DRcd"),
       tag = cms.string("TrackProbabilityCalibration_2D_Qcd80to120Winter10_v1_mc"),
       connect = cms.untracked.string("frontier://FrontierProd/CMS_COND_31X_BTAU")),
  cms.PSet(record = cms.string("BTagTrackProbability3DRcd"),
       tag = cms.string("TrackProbabilityCalibration_3D_Qcd80to120Winter10_v1_mc"),
       connect = cms.untracked.string("frontier://FrontierProd/CMS_COND_31X_BTAU"))
)

and in your .cfg:

additional_input_files = btagnew_mc_39X_QCD.db

Calibration in 41x and 42x Data, 42x and 44x (START44_V5) MC:

The default Jet Probability Calibration from the GlobalTag is not optimal and needs to be replaced in the following way:

  • To process cmsRrun interactively:

The squlite file can be found on /afs/cern.ch/user/c/cferro/public/ as (copy the following files in your directory):

btagnew_Data_2010_41X.db (for 2010 data),

btagnew_Data_2011_41X.db (for 2011 data),

btagnew_MC_414_2011.db (for Spring11 and Summer11 MC).

and add in your .py (just use the right .db line):

from CondCore.DBCommon.CondDBCommon_cfi import *
process.load("RecoBTag.TrackProbability.trackProbabilityFakeCond_cfi")
process.trackProbabilityFakeCond.connect =cms.string(
# "sqlite_fip:RecoBTag/PerformanceMeasurements/test/btagnew_Data_2010_41X.db")
# "sqlite_fip:RecoBTag/PerformanceMeasurements/test/btagnew_Data_2011_41X.db")
"sqlite_fip:RecoBTag/PerformanceMeasurements/test/btagnew_MC_414_2011.db")
process.es_prefer_trackProbabilityFakeCond = cms.ESPrefer("PoolDBESSource","trackProbabilityFakeCond")
 

  • Using CRAB:

Comment the previous lines and add instead in your .py:

process.GlobalTag.toGet = cms.VPSet(
  cms.PSet(record = cms.string("BTagTrackProbability2DRcd"),
#        tag = cms.string("TrackProbabilityCalibration_2D_2010Data_v1_offline"),
#        tag = cms.string("TrackProbabilityCalibration_2D_2011Data_v1_offline"),
       tag = cms.string("TrackProbabilityCalibration_2D_2011_v1_mc"),
       connect = cms.untracked.string("frontier://FrontierProd/CMS_COND_31X_BTAU")),
  cms.PSet(record = cms.string("BTagTrackProbability3DRcd"),
#        tag = cms.string("TrackProbabilityCalibration_3D_2010Data_v1_offline"),
#        tag = cms.string("TrackProbabilityCalibration_3D_2011Data_v1_offline"),
       tag = cms.string("TrackProbabilityCalibration_3D_2011_v1_mc"),
       connect = cms.untracked.string("frontier://FrontierProd/CMS_COND_31X_BTAU"))
)

where the new tags are for:

2010 data:

TrackProbabilityCalibration_2D_2010Data_v1_offline

TrackProbabilityCalibration_3D_2010Data_v1_offline

2011 data:

TrackProbabilityCalibration_2D_2011Data_v1_offline

TrackProbabilityCalibration_3D_2011Data_v1_offline

Spring11 and Summer11 MC:

TrackProbabilityCalibration_2D_2011_v1_mc

TrackProbabilityCalibration_3D_2011_v1_mc

So just include the good ones in your cfg as above.

Calibration in 44x Data (Nov2011 Rereco) and Fall11 START44_V9B MC:

For the Data, the default Jet Probability calibration can be used.

For the MC, the default Jet Probability Calibration from the GlobalTag is not optimal and needs to be replaced in the following way when using CRAB:

  • add in your .py:

process.GlobalTag.toGet = cms.VPSet(
  cms.PSet(record = cms.string("BTagTrackProbability2DRcd"),
       tag = cms.string("TrackProbabilityCalibration_2D_MC_80_Ali44_v1"),
       connect = cms.untracked.string("frontier://FrontierPrep/CMS_COND_BTAU")),
  cms.PSet(record = cms.string("BTagTrackProbability3DRcd"),
       tag = cms.string("TrackProbabilityCalibration_3D_MC_80_Ali44_v1"),
       connect = cms.untracked.string("frontier://FrontierPrep/CMS_COND_BTAU"))
)

The jet probability tagger has to be re-run and so the general track collection and the PF jets have to be in the edm files.

Calibration in 52x Data and MC:

For the Monte Carlo, no new calibration is necessary yet.

For 52x Data in 2012A and 2012B, the default Jet Probability Calibration from the GlobalTag is not optimal and needs to be replaced in the following way:

  • Using CRAB: when reading AOD of the PromptReco Data,

add in your .py:

process.GlobalTag.toGet = cms.VPSet(
  cms.PSet(record = cms.string("BTagTrackProbability2DRcd"),
       tag = cms.string("TrackProbabilityCalibration_2D_2012DataTOT_v1_offline"),
       connect = cms.untracked.string("frontier://FrontierPrep/CMS_COND_BTAU")),
  cms.PSet(record = cms.string("BTagTrackProbability3DRcd"),
       tag = cms.string("TrackProbabilityCalibration_3D_2012DataTOT_v1_offline"),
       connect = cms.untracked.string("frontier://FrontierPrep/CMS_COND_BTAU"))
)

The jet probability tagger has to be re-run and so the general track collection and the PF jets have to be in the edm files.

Calibration in 53x Data and MC:

For 53x reprocessed Data from 22Jan2013, the default JP calibration used in the AOD (and RECO) productions is fine.

For 53x Data (prompt-reco data and reprocessings prior to 22Jan2013) and for 53x MC, the default Jet Probability Calibration from the GlobalTag is not optimal and needs to be replaced in the following way, when using CRAB:

  • Data, add in your .py:

process.GlobalTag.toGet = cms.VPSet(
  cms.PSet(record = cms.string("BTagTrackProbability2DRcd"),
       tag = cms.string("TrackProbabilityCalibration_2D_Data53X_v2"),
       connect = cms.untracked.string("frontier://FrontierPrep/CMS_COND_BTAU")),
  cms.PSet(record = cms.string("BTagTrackProbability3DRcd"),
       tag = cms.string("TrackProbabilityCalibration_3D_Data53X_v2"),
       connect = cms.untracked.string("frontier://FrontierPrep/CMS_COND_BTAU"))
)

  • MC, add in your .py:

process.GlobalTag.toGet = cms.VPSet(
  cms.PSet(record = cms.string("BTagTrackProbability2DRcd"),
       tag = cms.string("TrackProbabilityCalibration_2D_MC53X_v2"),
       connect = cms.untracked.string("frontier://FrontierPrep/CMS_COND_BTAU")),
  cms.PSet(record = cms.string("BTagTrackProbability3DRcd"),
       tag = cms.string("TrackProbabilityCalibration_3D_MC53X_v2"),
       connect = cms.untracked.string("frontier://FrontierPrep/CMS_COND_BTAU"))
)

The jet probability tagger has to be re-run and so the general track collection and the PF jets have to be in the edm files.

Validation Tools, Calibration Studies

an example of configuration file to produce official btag POG DQM validation histograms from the new calibration is also provided : RecoBTag/ImpactParameterLearning/test/histoMaker.py.

-- DanielBloch - 22-Mar-2011

Edit | Attach | Watch | Print version | History: r34 | r26 < r25 < r24 < r23 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r24 - 2013-05-06 - DanielBloch
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback