Documentation for TagProbeFitTreeAnalyzer

Summary

The TagProbeFitTreeAnalyzer is used to fit the TTree output of the TagProbeFitTreeProducer to obtain the efficiencies (or fake rates). The input TTree usually contains the mass of the tag-and-probe pair and the parameters of the probe including the flag if it is passing. The input data is split into user specified bins of the probe parameters. In each bin the mass distributions for passing and failing probes are fit simultaneously with the user defined PDF to extract the efficiency on the signal. The efficiencies are output in a form of a RooDataSet (which is like a TTree or table). Each row of the RooDataSet corresponds to a bin and contains the efficiency with asymmetric errors as well as for each probe parameter its mean and the distance to the bin edge as asymmetric errors.

Slide2.jpg

Documentation

Location of the code

Sources and headers:

Example configuration:

Test package:

  • by running make in the test directory

Configuration of the module

Summary table

parameter type req.(default) description
InputFileNames vstring yes list of file names that contain parts of the input Tree
InputDirectoryName string yes the path within the input files that contains the input Tree
InputTreeName string yes the name of the input Tree
OutputFileName string yes the name of the output file
NumCPU uint32 no(1) number of CPUs to use for fitting, please validate before using more than 1
SaveWorkspace bool no(false) specifies whether to save the workspace with the data for each bin
Variables PSet yes defines variables used in the fits
Categories PSet yes defines categories (discrete variables) used in the fits
Cuts PSet no defines dynamical categories by applying simple cuts to the variables in the dataset
PDFs PSet no defines PDFs that can be used in the fits
Efficiencies PSet yes defines what efficiencies to calculate
binnedFit bool no(false) NEW set to True to perform a binned fit. In this case, you need also to specify the parameter binsForFit (uint32)

Input and Output Parameters

InputFileNames, InputDirectoryName, InputTreeName, OutputFileName

  • As an input the module needs the TTree produced by the TagProbeFitTreeProducer (or other software) with branches corresponding to the signal-background discriminating variables (mass of the tag and probe pair), parameters of the probe on which the efficiency depends on (pt, eta, ...) and flags or category varibales the efficiency of which are to be measured. For a unique specification of the TTree the user has to set the InputFileNames, InputDirectoryName and InputTreeName parameters of the module. Note that the TTree can span across multiple root files with the same directory structure and in that case the list of files can be specified.

More... Close InputFitTree.png

  • The module outputs efficiencies and various plots into a root file with a name specified by the OutputFileName parameter.

process.TagProbeFitTreeAnalyzer = cms.EDAnalyzer("TagProbeFitTreeAnalyzer",
    InputFileNames = cms.vstring("testTagProbeFitTreeProducer_JPsiMuMu.root"),
    InputDirectoryName = cms.string("MuonID"),
    InputTreeName = cms.string("fitter_tree"),
    OutputFileName = cms.string("testTagProbeFitTreeAnalyzer_JPsiMuMu.root"),
    ...
)

Operational Parameters

NumCPU, SaveWorkspace

  • The NumCPU parameter specifies how many CPUs can be used to speed up the fitting procedure. The results with multiple CPUs should be close to those obtained with one. The user should check that increasing the CPU number does not give different results.

  • By setting SaveWorkspace to true the module will save for each bin the RooWorkspace containing the dataset and PDF in the initial and final states.

(
    NumCPU = cms.uint32(1),
    SaveWorkspace = cms.bool(True),
)

Probe Parameter Definitions

Variables, Categories

In order to make the probe parameters found in the input TTree available to the fitter, the user needs to define them in the module configuration. There are two types of parameters: Reals and Integers. In CMS.RooFit they correspond to RooRealVar and RooCategory. In this module they are called Variables and Categories.

  • Variables are defined by a vector of exactly four strings corresponding to the title, lower limit of the range, the upper limit of the range and the units.
Variables = cms.PSet(
    mass = cms.vstring("Tag-Probe Mass", "2.5", "3.8", "GeV/c^{2}"),
    pt = cms.vstring("Probe p_{T}", "0", "1000", "GeV/c"),
    eta = cms.vstring("Probe #eta", "-2.5", "2.5", ""),
),

  • Categories are defined by a vector of exactly two strings corresponding to the title and the definition of the category as done with the RooFactory string parser. Note that the name of the category within the string definition ("dummy") is ignored and replaced with the python variable name ("mcTrue" or "Glb"). Within the brackets the user assigns a name to each numeric value of the integer variable in the TTree.
Categories = cms.PSet(
    mcTrue = cms.vstring("MC true", "dummy[true=1,false=0]"),
    Glb = cms.vstring("Global Muon", "dummy[true=1,false=0]"),
    TM = cms.vstring("Tracker Muon", "dummy[true=1,false=0]"),
),

  • Dynamical categories can be defined by applying simple cuts on variables already in the tree and declared in the Variables PSet. They are defined by a vector of exactly three strings corresponding to the title, the name of the variable to cut and the value at which to cut (a number, but anyway entered as a string). This will create a category with two states above and below for when the variable is above or below the cut.
    At the moment, dynamical categories can only be used to define the numberators of the efficiencies, not the denominators; this might be improved in the future, on request.
Cuts = cms.PSet(
    relIso15 = cms.vstring("Rel Isol < 0.15", "relIso", "0.15"),
    relIso10 = cms.vstring("Rel Isol < 0.10", "relIso", "0.10"),
),

  • The name of the python variables must be the same as the name of the branch in the input TTree.
  • There is no need to define the parameters that are not used in the module even if they are present in the TTree.
  • There is no problem by defining more variables and categories than are present in the TTree as long as they are not used in the Efficiency calculations.

Probability Density Function (PDF) Definitions

PDFs

To extract the efficiency on the signal the module performs an extended unbinned maximum likelihood fit simultaneously on passing and failing probes. The efficiency is a floating parameter in the fit and therefore the fit result will directly contain the central value and asymmetric errors. The user needs to specify the PDF of the signal (either a single pdf with name signal or two named signalPass, signalFail), the PDF of the passing background with name backgroundPass and the PDF of the failing background with name backgroundFail. The module will combine these PDFs to fit the passing probes with Nsignal*efficiency*signal + NbackgroundPass*backgroundPass and the failing probes with Nsignal*(1-efficiency)*signal + NbackgroundFail*backgroundFail. The user can control the initial state of the PDFs and the efficiency. In addition she can specify the value of signalFractionInPassing. All of this can be conveniently defined with the help of the RooFactory's string parser as a vector of strings. Several PDFs can be defined, all identified with a python variable name (gaussPlusLinear and gaussPlusQuadratic below).

IDEA! you can import pdfs or any other roofit object (e.g. a template) from another root file using the syntax #import rootfile:workspacename:objectname (please put 1 and only 1 space between the import and the filename)

IDEA! the proper syntax for the FFT convolution of two pdfs is FCONV::pdf(variable, pdf1, pdf2). Note that this requires CMSSW_3_9_2 or later.

PDFs = cms.PSet(
    gaussPlusLinear = cms.vstring(
        "Gaussian::signal(mass, mean[3.1,3.0,3.2], sigma[0.03,0.01,0.05])",
        "Chebychev::backgroundPass(mass, cPass[0,-1,1])",
        "Chebychev::backgroundFail(mass, cFail[0,-1,1])",
        "efficiency[0.9,0,1]",
        "signalFractionInPassing[0.9]",
    ),
    gaussPlusQuadratic = cms.vstring(
        "Gaussian::signal(mass, mean[3.1,3.0,3.2], sigma[0.03,0.01,0.05])",
        "Chebychev::backgroundPass(mass, {cPass1[0,-1,1], cPass2[0,-1,1]})",
        "Chebychev::backgroundFail(mass, {cFail1[0,-1,1], cFail2[0,-1,1]})",
        "efficiency[0.9,0,1]",
        "signalFractionInPassing[0.9]",
    ),
    twoVoigtians = cms.vstring(
            "Voigtian::signalPass(mass, meanP[90,80,100], width[2.495], sigmaP[3,1,20])", ## allow different means and sigmas for
            "Voigtian::signalFail(mass, meanF[90,80,100], width[2.495], sigmaF[3,1,20])",    ## passing and failing probes
            "Exponential::backgroundPass(mass, lp[0,-5,5])",
            "Exponential::backgroundFail(mass, lf[0,-5,5])",
            "efficiency[0.9,0,1]",
            "signalFractionInPassing[0.9]"
    ),
),

Efficiency Specifications

Efficiencies, EfficiencyCategoryAndState, UnbinnedVariables, BinnedVariables, BinToPDFmap

With one module the user can calculate many efficiencies as long as they use the same input TTree and can save the output in the same output root file. Each efficiency calculation is identified with the python variable name (pt_eta and pt_eta_mcTrue below) and the results are saved in the output file under that directory name.

  • The user needs to specify the efficiency of which state and of which category needs to be calculated with the help of the EfficiencyCategoryAndState parameter.
    • In the typical case where passing probes are idendified by the state of a single category, you should set EfficiencyCategoryAndState to a pair of strings, the category and state names as defined in the Categories PSet.
    • You can also define passing probes as the logical and of multiple categories (e.g. if you can have one category for lepton id, one for isolation, and you want to compute the product of the two in one go). In this case, you should set EfficiencyCategoryAndState to an even-sized list of strings of categories and states e.g. cms.vstring("id", "pass", "isolation", "pass").
    • There is currently no way of defining the passing probes as the logical or of multiple categories.
  • All variables that are used by the PDFs need to be specified either in the list of UnbinnedVariables or in the BinnedVariables. The binning for real variables is specified with the list of bin edges. For categories it is a list of state names. Note that one can merge two states into one "bin" with the syntax 'true,flase'.
  • The user has to associate a PDF to each bin of probe data. It is done with the help of the BinToPDFmap. The first string is the name of the default PDF. It is followed by bin name and PDF name pairs. The bin names are constructed automatically following the pattern eta_bin#;pt_bin#. Note that the usage of wildcards is supported as seen in the example below. If the BinToPDFmap parameter is missing, no fitting is performed but efficiencies counting all passing and failing probes are always available and useful for MC truth efficiencies.

Efficiencies = cms.PSet(
    pt_eta = cms.PSet(
        EfficiencyCategoryAndState = cms.vstring("Glb","true"),
        UnbinnedVariables = cms.vstring("mass"),
        BinnedVariables = cms.PSet(
            pt = cms.vdouble(3.5, 4.5, 6.0, 8.0, 50.0),
            eta = cms.vdouble(-2.1, -1.2, 0.0, 1.2, 2.1)
        ),
        BinToPDFmap = cms.vstring("gaussPlusLinear", "*pt_bin0*", "gaussPlusQuadratic")
    ),
    pt_eta_mcTrue = cms.PSet(
        EfficiencyCategoryAndState = cms.vstring("Glb","true"),
        UnbinnedVariables = cms.vstring("mass"),
        BinnedVariables = cms.PSet(
            mcTrue = cms.vstring("true"),
            pt = cms.vdouble(3.5, 4.5, 6.0, 8.0, 50.0),
            eta = cms.vdouble(-2.1, -1.2, 0.0, 1.2, 2.1)
        )
    )
)

Output Description

coming soon

Example configuration files

Example configuration files for TagProbeFitTreeAnalyzer are provided under CMS.PhysicsTools/CMS.TagAndProbe/test/ :

Review status

Reviewer/Editor and Date (copy from screen) Comments
ZoltanGecse - 25-Mar-2010 created template page

Responsible: ZoltanGecse
Last reviewed by: Most recent reviewer

Topic attachments
I Attachment History Action Size Date Who Comment
PNGpng InputFitTree.png r1 manage 139.0 K 2010-06-03 - 01:41 ZoltanGecse  
JPEGjpg Slide2.jpg r1 manage 141.6 K 2010-06-03 - 10:18 ZoltanGecse One page summary of efficiency fitting
GIFgif plot.gif r1 manage 6.9 K 2010-04-01 - 23:47 ZoltanGecse example efficiency plot
Edit | Attach | Watch | Print version | History: r8 < r7 < r6 < r5 < r4 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r8 - 2010-12-08 - GiovanniPetrucciani
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback