4.3 Physics Analysis Toolkit Tutorial
Complete:
Detailed Review status
Goals of this page
This twiki was derived from the actual tutorial on April 25, 2008, for which links to material and such are summarized on
this tutorial page.
This tutorial will guide you through the steps to set up the PAT and start playing with it. In particular you will:
- learn how to produce PAT objects
- have a look inside the AOD and PAT event content
- look into configurable parameters of the PAT
- learn how to analyze PAT output with bare ROOT, FWLite and an EDAnalyzer
- explore PAT files with the Starter Kit
- give handles to start with PAT on the grid
Contents
Getting Started
This tutorial is meant to familiarize you with some first steps with the PAT. The goal is to make a few physics-level plots within ~60 minutes. Some basic knowledge of the PAT design is assumed. If you have not heard about the Layered structure of the PAT, then it is advisable to browse first through the PAT introduction linked to
this page.
Before you can start, you need to make sure you have an account to login on lxplus.cern.ch, and that you have cvs working, with the repository set to CMSSW (using the
project CMSSW
command). It is also useful to have already some vague idea on how CMS software works. The few first slides on in the PAT intro talk mentioned in the previous paragraph can help there. For more details you can consult the
CMS Workbook. Especially the pages on
SCRAM,
ROOT basics and the
troubleshooting page can help you get started.
Checking out and compiling the PAT code
To check out and compile the PAT code, follow this recipe, which you can copy-paste literally in sh-like shells.
cd /tmp/`whoami`
# start a new CMSSW project area
scramv1 p CMSSW CMSSW_1_6_11
cd CMSSW_1_6_11/src
# setup the environment
cmsenv # a.k.a. eval `scramv1 runtime -(c)sh`
unsetenv PYTHONPATH # needed at FNAL, doesn't hurt at CERN
# checkout needed backports
cvs co -r PAT_1611_080424 DataFormats/RecoCandidate
cvs co -r PAT_1611_080424 RecoMuon/CMS.MuonIdentification
cvs co -r PAT_1611_080424 TrackingTools/CMS.TrackAssociator
cvs co -r PAT_1611_080424 JetMETCorrections/Type1MET
# checkout the PAT
cvs co -r PAT_1611_080424 DataFormats/PatCandidates
cvs co -r PAT_1611_080424 PhysicsTools/PatUtils
cvs co -r PAT_1611_080424 PhysicsTools/PatAlgos
# build!
scramv1 b
And you're compiling the PAT!
Some background information on the compilation:
- you need to use CMSSW_1_6_11, previous versions won't work
- the compilation takes a very long time, order 15 minutes. At a certain point it seems to be stuck, but in fact it is not. You can either go for the coffee option while waiting, or already do the next step of looking into the AODevent content.
- the PAT is in fact already in the release, so in principle you don't need to check out code and compile, you could just run an appropriate config file. Since the release of CMSSW_1_6_11 the PAT development has however been frantic, and you should add the new PAT code to profit from the numerous improvements.
- the above recipe will simplify much in the upcoming 1_6_12 release. This tutorial will hten be updated.
Producing PAT Layer-1 objects
Once compilation has run to completion, you are ready to start producing PAT Layer-1 objects. For this we use the
patLayer1_fromAOD_full.cfg
config file. Have a look at its contents, by doing:
cd $CMSSW_BASE/src/CMS.PhysicsTools/PatAlgos/test
less patLayer1_fromAOD_full.cfg
The main components of this example config file are shown in the figure below.
- Screenshot of the Layer-1 cfg file:
To run the example config file and produce PAT Layer-1 objects from the predefined CSA07 top skim file with AOD event content, do the following in sh-like shells:
cd $CMSSW_BASE/src/CMS.PhysicsTools/PatAlgos/test
cmsRun patLayer1_fromAOD_full.cfg 2>&1 | tee output.txt
or in c-like shells:
cmsRun patLayer1_fromAOD_full.cfg |& tee output.txt
In the same folder you see many similar example files. These allow you to produce PAT Layer-0 event content from AOD in both fast and full simulation, and from scratch for fast simulation. For the Layer-1 objects examples are availble to run both Layer-0 and Layer-1 from the same inputs, or to produce the Layer-1 objects from Layer-0 fast or full simulation input.
As an alternative, you might for instance want to try:
cd $CMSSW_BASE/src/CMS.PhysicsTools/PatAlgos/test
cmsRun patLayer1_fromScratch_fast.cfg 2>&1 | tee output.txt
or in c-like shells:
cmsRun patLayer1_fromScratch_fast.cfg |& tee output.txt
for which you don't need any input data.
When you have run the PAT job, it is useful to have a look at the output:
less output.txt
One interesting piece of information is the
TrigReport
, which shows the details of all paths and filters the events went through during processing, and their success/failure rate. It is an easy diagnostic tool to spot potential problems and their location, and it is also a good place to monitor the filter efficiencies. An excerpt of the
TrigReport
is shown in the Figure below.
Other interesting information can be found higher up in the output, just after the event processing, where the cleaner modules dump their report. You can see for each
PATObjectCleaner that was run a summary and a full breakdown of failure/success rates of the different bits that are encoded in the Layer-0 objects' status.
- Screenshot of the cleaner summary:
Looking at the AOD and PAT Layer-1 event content
It is instructive to look into the event content of the root file the PAT Layer-1 produces by default. But for comparison, one should first take a look at the general AOD content. To do so, follow these steps:
root -l rfio:/castor/cern.ch/cms/store/CSA07/skim/2008/1/17/CSA07-CSA07Electron-Chowder-A3-PDElectron-ReReco-100pb-Skims6/0000/1EADC8C5-0CC5-DC11-90F9-000423D65A7E.root
# the same file on an easier location
root -l rfio:/castor/cern.ch/user/l/lowette/aod_example.root
# and then browse
root [1] new TBrowser()
Have a look at the
Events
tree (double-click on "ROOT Files", and then the file itself, and select the
Events
tree. You will see a long list of branches, each containing some dataproduct that is part of the standard CMSSW_1_6_X AOD. Many of these branches have interdependencies on each other through associations and references. The screenshot below shows a portion of the full list.
- Screenshot of the AOD content:
When one looks into the PAT Layer-1 event content though, one gets a much simplified view:
root -l PATLayer1_Output.fromAOD_full.root
# and then browse
root [1] new TBrowser()
A screenshot of the content is shown below.
What happened here is that the PAT has dropped quite some low-level and expert information from the standard AOD event content. It might be that your analysis needs such information, in which case it is sufficient to add the appropriate "keep" statements to the
PoolOutputModule in the PAT cfg. On the other hand much simplification has also happened because the Layer-1 objects embed related information in them which is in the AOD stored externally through references and associations. This makes the interface to these objects much easier for the user, and allows to easily process PAT Layer-1 objects afterwards, for instance in an event selection.
- Screenshot of the Layer-1 content:
Configurable parameters
PAT's design choice was to make everything fully configurable with config files. Although reasonable defaults are provided, the user should, after initial playing, configure the PAT for his/her analysis needs. The PAT cfi & cff include chain is complex though (intuitively organized in
CMS.PhysicsTools/PatAlgos/data
), with documentation appearing only now.
To further improve user-friendliness, all possible parameters are also available as "replace" commands in one file for each PAT layer. These files are auto-generated, so the formatting is not ideal and some comments have been stripped, but within those files you find links to the originating cff/cfi files, which should facilitate navigation in the include tree.
The files can be found in the
CMS.PhysicsTools/PatAlgos/test
directory, named as
PATLayer0_ReplaceDefaults_full.cff
and
PATLayer1_ReplaceDefaults_full.cff
. Using such files it can also be convenient for groups of users to define common sets of configurables. The same set of configurables can then be used by including the same set of "replace" statements while running PAT.
In the pictures below you can see screenshots of both replace-files.
Looking at PAT Layer-0 objects (this section can be skipped)
Using the examples, you can produce Layer-0 objects yourself in the same way you would make Layer-1 objects. To avoid waiting for it, you can also start from a pre-made file. Browse the file like this:
root -l rfio:/castor/cern.ch/user/l/lowette/PATLayer0_Output.fromAOD_full.root
root [1] new TBrowser()
All the branches ending in "_PAT" are produced in the PAT Layer-0 processing step. Many of those are internal technical tricks that are needed for the internal handling of associations. In the screenshot below you can see several of those branches. Again, this is "experts-only" stuff. When you choose to save PAT Layer-0 output and run Layer-1 + your analysis from Layer-0 input, you will not have to care about the Layer-0 internals either.
- Screenshot of the Layer-0 content:
Analysis in bare ROOT
Open the Layer-1 output file in root as before:
root -l PATLayer1_Output.fromAOD_full.root
root [1] new TBrowser()
When you now browse one of the branches, you get access to the various (also private) members of the class stored in the branch. When you double-click such a member, a plot will appear showing the distribution of the particular member over all events and entries per event (you might need to double-click twice...). The same kind of plots of members can also be achieved by using the
TTree::Draw()
method, as in the following:
Events->SetAlias("jets","patJets_selectedLayer1Jets__PAT.obj");
Events->Draw("jets.partonFlavour_");
You can build full macros with such commands, and they are indeed useful for fast checks of the contents of root files, but you will quickly hit the limitations of this approach.
Analysis with FWLite
To take the benefit of the storage of classes inside the root files, it is much more convenient to load libraries and dictionaries into root, and have full access to the
DataFormat class interfaces. This is achieved by starting your macro (or at the ROOT prompt) with the following two commands:
gSystem->Load("libFWCoreFWLite.so");
AutoLibraryLoader::enable();
When you now open the previously produced PAT Layer-1 output file and browse it, you see that not only the members but also all methods of the objects have become available. This is also shown in the screenshot below. Like in the example in bare ROOT, you can now plot the jet's parton flavour, double-clicking the pat::Jet::partonFlavour() method instead.
- Screenshot of ROOT file browsing with FWLite:
This way of looking at classes with FWLite is still close to the bare root way of working, but FWLite also provides handles to access events similarly as in the full framework. For instance, with FWLite you also have access to the
fwlite::Event
class and
getByLabel()
methods similar to the full framework functionalities (but with a slightly different interface though). A simple example of an interactive FWLite macro to plot the pat::MET distribution is reproduced here:
{
gSystem->Load("libFWCoreFWLite");
AutoLibraryLoader::enable();
gSystem->Load("libDataFormatsFWLite.so");
#include "DataFormats/FWLite/interface/Handle.h"
TFile tmpf("PATLayer1_Output.fromAOD_full.root");
fwlite::Event ev(&tmpf);
TFile * outfile = new TFile("outfile.root", "RECREATE");
TH1F * methist = new TH1F("met", "met", 100, 0, 500);
for (ev.toBegin(); !ev.atEnd(); ++ev) {
fwlite::Handle<vector<pat::MET> > met;
met.getByLabel(ev, "selectedLayer1METs");
methist->Fill(met->front().et());
}
outfile->Write();
outfile->Close();
}
You can run it with the following commands:
wget http://cern.ch/lowette/fwlitetest.C
root -b -n -q fwlitetest.C
To look at the actual output, do the following:
root -l outfile.root
root [1] patMET.Draw();
A macro like the one above can also be made to compile with
ACLiC or using scram. Examples of examples to be compiled will appear on
this twiki. For details on analysis with FWLite, have also a look at
this page.
Analysis within the CMSSW framework
In the full framework, in contrast to FWLite, you or algorithms you want to use have all freedom to access geometry (e.g. the magnetic field), access databases, etc. To use this mode of doing analysis, you will have to define your analysis class inheriting from an EDAnalyzer, for which you can use the
mkedanlzr
program to create a template for you (this program is in your path after a
cmsenv
). See also these
details on using an EdAnalyzer.
The downside of working within the CMSSW framework is that you have to work on a computer with the full CMSSW release installed, that you will need to recompile with scram for each modification, and that you have to bear with the startup time of CMSSW (although this has much improved).
An example of an EDAnalyzer accessing PAT objects can be found on
the PAT example twiki.
Analysis with the Starter Kit
The Starter Kit is a tool conceived for fast data exploration and plotting. It has a well-maintained introductory
wikipage of it's own, with many detailed examples and screenshots. Here we just give a recipe for the impatient user, to make you get out some first plots from the same environment and output as the one that was used in the above tutorial.
Compile:
cd $CMSSW_BASE/src
cvs co -r PAT_1610_080229 PhysicsTools/StarterKit
scramv1 b
cd PhysicsTools/StarterKit/test
mv ../../PatAlgos/test/PATLayer1_Output.fromAOD_full.root .
rm -f StarterKitDemo.cfg
wget http://cern.ch/lowette/StarterKitDemo.cfg
and run:
cmsRun StarterKitDemo.cfg 2>&1 | tee skoutput.txt
or in c-like shells:
cmsRun StarterKitDemo.cfg |& tee skoutput.txt
And now can start browsing the file with histograms:
root -l StarterKit.root
root [1] new TBrowser()
The PAT on the Grid
The PAT simplifies analysis on the grid. The user can choose to either run Layer-0, Layer-0 + Layer-1 or the full PAT + analysis in one go. The output from the PAT can easily be controlled by dropping branches and with the Layer-1 built-in event preselection. Using the produced output, your analysis code can then easily and fast be run offline.
To start using the PAT in grid jobs, you can just submit the example cfg's in the
CMS.PhysicsTools/PatAlgos/test
folder to produce Layer-0 or Layer-1 objects. The only thing you might want to change are the replace statements and the event content to be saved. Once that is done you will need to setup a working crab.cfg, for which we link here the
CRAB howto.
As an example you can submit jobs to the grid, running PAT Layer-1 on 5x1000 events of a particular top skim, by using the following commands (adapting the storage element where you want your outputfiles to arrive!)
cd $CMSSW_BASE/src/CMS.PhysicsTools/PatAlgos/test
wget http://cern.ch/lowette/crab.cfg
# now adapt the crab.cfg so the output arrives on your storage element area
crab -create all -submit all
# take some coffee...
What next?
Now it's your turn: time to do physics!
Main Information Sources
PAT main twiki
PAT example twiki
Starter Kit
CMS Offline Workbook
CMS Software Guide
Review status
Responsible: StevenLowette
Last reviewed by: StevenLowette - 6 May 2008