Histogramming Utilities

Complete: 5


Simplify histogram management providing common generic utilities

ExpressionHisto: a generic configurable histogram utility

The template utility ExpressionHisto<T> contains a histogram representing the spectrum of a specified variable that can be filled with the value of that variable for objects of the template argument type T. The variable to be accumulated in the histogram is specified as a configurable string that is interpreded using the expression parser used also elsewhere in CMSSW.

The histogram has to be initialized using the framework service TFileService.

As example of usage to make distributions for track variables is sketched below:

    // create histogram from a module configuration
    const edm::ParameterSet cfg = . . . ;
    ExpressionHisto<reco::Track> histo ( cfg );

    // initialize using TFileService
    edm::Service<TFileService> fs;
    histo.initialize( fs );

    // get a track collection from the event
    edm::Handle<reco::TrackCollection> tracks;
    event.getByLabel( tracks_, tracks );
    for( reco::TrackCollection::const_iterator trk = tracks->begin(); 
         trk != tracks->end(); ++ trk ) 
      histo.fill( * trk );

If you want to specify a weight for the item in the plot, just use histo.fill( * trk, weight ) instead of histo.fill( * trk )

If the module contains the following piece of configuration:

    min =  cms.untracked.double(0.0)
    max =  cms.untracked.double(200.0)
    nbins =  cms.untracked.int32(50)
    description =  cms.untracked.string("track transverse momentum  [GeV]") # the plot title
    name =  cms.untracked.string("tk_pt") # the plot name in ROOT
    plotquantity =  cms.untracked.string("pt") # plot item.pt()

the histogram will be filled with the track pt spectrum in the range from 0 to 200 GeV, with 50 bins.

Arrays of plots

It happens often that one needs to book a series of identical plots, e.g. for the pt of the first 5 jets in the event. The recent versions of ExpressionHisto (CMS.PhysicsTools/UtilAlgos V06-04-05) allow you to specify an additional parameter in the cfg file,
    itemsToPlot = cms.untracked.int32(n)
If the parameter is present, and it is greater than 0, then n separate histograms will be booked instead of just one; from your EDAnalyzer you can then specify that you want to fill in the idx-th plot by calling histo.fill( * trk, weight, idx ).

The names of the plots are choosen this way:

  • If there is a "%d" in the plot name or plot title, the "%d" is replaced with (index+1), that is 1,2,3,...
  • If there is no "%d", a [#N] is appended to the plot name and title, where again N is 1,2,3...

For convenience, the fill function returns false if idx is greater than the number of plots, so that the EDAnalyzer can stop looping over the items to plot.

HistoAnalyzer: a configurable histogramming module

The template analyzer module HistoAnalyzer<C> produces a configurable set of histograms containing a number of spectra taken from objects from input collection. The type of input collection C is specified as template argument type.

The module has to be instantiated for the specific type. For instance, the following C++ lines instantiate a module that produces histograms from a reco::CandidateCollection:

#include "FWCore/Framework/interface/MakerMacros.h"
#include "PhysicsTools/UtilAlgos/interface/HistoAnalyzer.h"
#include "DataFormats/Candidate/interface/Candidate.h"

typedef HistoAnalyzer<reco::CandidateCollection> CandHistoAnalyzer;

DEFINE_FWK_MODULE( CandHistoAnalyzer );

The module configuration should contain a parameter set specifying the configuration for each of the histograms to be produced. All these parameter sets are put into a VPSet with the name histograms.

Existing modules

Currently there are two instantiations:

  • CandHistoAnalyzer Works only on reco::CandidateCollection a.k.a edm::OwnVector<reco::Candidate>
  • CandViewHistoAnalyzer Works on all candidate collections thorugh View<Candidate> , but it can still access only the member functions of the base class reco::Candidate and not the additional ones of the concrete candidate type (e.g. GsfElectron).


An example is the configuration fragment below producing pt and η spectra for a given input collection:

process.TFileService = cms.Service("TFileService",
    fileName = cms.string("histo.root")

process.hists = cms.EDAnalyzer("CandViewHistoAnalyzer",
                        src = cms.InputTag("allMuons"),
                        histograms = cms.VPSet(
                               min = cms.untracked.double(0.0),
                               max = cms.untracked.double(200.0),
                               nbins = cms.untracked.int32(50),
                               description = cms.untracked.string('muon transverse momentum [GeV]'),
                               name = cms.untracked.string('muonPt'),
                               plotquantity = cms.untracked.string('pt'),
                               min = cms.untracked.double(-2.0),
                               max = cms.untracked.double(2.0),
                               nbins = cms.untracked.int32(50),
                               description = cms.untracked.string('muon pseudo rapidity'),
                               name = cms.untracked.string('muonEta'),
                               plotquantity = cms.untracked.string('eta'),

As it uses ExpressionHisto, also HistoAnalyzer is able to make an array of plots of the same quantity for the first n items of a collection by using the itemsToPlot parameter: an example is

process.plotJets = cms.EDAnalyzer('CandViewHistoAnalyzer',
    src = cms.InputTag("slimmedJets"),
    histograms = cms.VPSet(cms.PSet(
        itemsToPlot = cms.untracked.int32(5),
        min = cms.untracked.double(0.0),
        max = cms.untracked.double(200.0),
        nbins = cms.untracked.int32(50),
        description = cms.untracked.string('jet %d p_{T} [GeV]'),
        name = cms.untracked.string('jet_%d_pt'),
        plotquantity = cms.untracked.string('pt'),

Note that this module assumes that the collection is already sorted on some user-defined order, and won't attempt to re-order it to plot the "highest pt" items or such; because of this, using itemsToPlot with unordered collections like ctfWithMaterialTracks will most likely give random results.

Plotting from weighted events

If you're running on events from a mixed sample,like a NLO sample that has positive and negative weights, filling the plots assigning the same weight to each event won't give meaningful results. Because of this, recent versions of HistoAnalyzer have been extended so that they can read an "event-by-event" weight and use it for the plots.

The configuration is straightforward:

process.plotJets = cms.EDAnalyzer('CandViewHistoAnalyzer',
    src = cms.InputTag('slimmedJets'),
    untracked InputTag weights = cms.untracked.InputTag(...),
    histograms = cms.VPSet(
(if the weights parameter is not specified, all items are weighted 1.0 as usual)

The weights parameter must point to a branch containing a double (that is, something produced with produces() and read through edm::Handle hWgt;).

Possible extensions

2D histograms could also be added.


The main author of those utilities is:
  • Benedikt Hegner
Some recent extensions are by Giovanni Petrucciani

Review Status

Editor/Reviewer and date Comments
ThiagoTomei - 2016-06-03 Translated configs to python
BenediktHegner - 08 Jun 2007 Page author

Responsible: BenediktHegner
Last reviewed by:

Edit | Attach | Watch | Print version | History: r8 < r7 < r6 < r5 < r4 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r8 - 2016-06-06 - ThiagoTomei

    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback