pat_logo.jpg

4.3 Physics Analysis Toolkit Tutorial

Complete: 5
Detailed Review status

Goals of this page

This tutorial will guide you through the steps to set up the PAT and start playing with it. In particular you will:

  • learn how to produce PAT objects
  • have a look inside the AOD and PAT event content
  • look into configurable parameters of the PAT
  • learn how to analyze PAT output with bare ROOT, FWLite and an EDAnalyzer
  • explore PAT files with the Starter Kit
  • give handles to start with PAT on the grid

If you are looking for the latest, greatest prescription, see the PAT software guide.

Contents

Getting Started

This tutorial is meant to familiarize you with some first steps with the PAT. The goal is to make a few physics-level plots within ~60 minutes. Some basic knowledge of the PAT design is assumed. If you have not heard about the Layered structure of the PAT, then it is advisable to browse first through the PAT introduction linked to this page.

Checking out and compiling the PAT code

If you've been following the WorkBook all along, you should have completed this in the StarterKit Tutorial. The recipe itself is linked here:

https://twiki.cern.ch/twiki/bin/view/CMS/WorkBookAnalysisStarterKit#To_set_up_a_new_area_and_run

You can also install a fresh release by moving to your working area and do:

cmsrel CMSSW_2_1_9
cd CMSSW_2_1_9/src
cmsenv

Producing PAT Layer-1 objects

You are now ready to start producing PAT Layer-1 objects. Instead of using the more "high level" tasks in the StarterKit tutorial, we will instead use the native PAT configuration files for running over AOD samples.

The file we will be looking at is the patLayer1_fromAOD_full.cfg.py.

Have a look at its contents, by doing:

   cd $CMSSW_BASE/src
   cp $CMSSW_RELEASE_BASE/src/CMS.PhysicsTools/PatAlgos/test/*.py .
   less patLayer1_fromAOD_full.cfg.py

Be sure to copy over the python files from $CMSSW_RELEASE_BASE/src/CMS.PhysicsTools/PatAlgos/test/ into your working directory, we will be using them in the rest of the tutorial.

The main components of this example config file are shown in the figure below.

  • Screenshot of the Layer-1 cfg file:
    scrshot_layer1cfg.jpg

To run the example config file and produce PAT Layer-1 objects from the predefined CSA07 top skim file with AOD event content, do the following in sh-like shells:

   cd $CMSSW_BASE/src
   cmsRun patLayer1_fromAOD_full.cfg.py 2>&1 | tee output.txt 
or in c-like shells:
   cmsRun patLayer1_fromAOD_full.cfg.py  |& tee output.txt 

In the same folder ($CMSSW_RELEASE_BASE/src/CMS.PhysicsTools/PatAlgos/test) you see many similar example files (now copied into your local directory). These allow you to produce PAT Layer-0 event content from AOD in both fast and full simulation, and from scratch for fast simulation. For the Layer-1 objects examples are availble to run both Layer-0 and Layer-1 from the same inputs, or to produce the Layer-1 objects from Layer-0 fast or full simulation input. As an alternative, you might for instance want to try:

   cmsRun patLayer1_fromScratch_fast.cfg.py 2>&1 | tee output.txt
or in c-like shells:
   cmsRun patLayer1_fromScratch_fast.cfg.py  |& tee output.txt 
for which you don't need any input data.

When you have run the PAT job, it is useful to have a look at the output:

   less output.txt

One interesting piece of information is the TrigReport, which shows the details of all paths and filters the events went through during processing, and their success/failure rate. It is an easy diagnostic tool to spot potential problems and their location, and it is also a good place to monitor the filter efficiencies. An excerpt of the TrigReport is shown in the Figure below.

Other interesting information can be found higher up in the output, just after the event processing, where the cleaner modules dump their report. You can see for each PATObjectCleaner that was run a summary and a full breakdown of failure/success rates of the different bits that are encoded in the Layer-0 objects' status.

  • Screenshot of the cleaner summary:
    scrshot_cleanerreport.jpg

Looking at the AOD and PAT Layer-1 event content

It is instructive to look into the event content of the root file the PAT Layer-1 produces by default. But for comparison, one should first take a look at the general AOD content. To do so, we're going to examine one file from the StarterKit example. In order to open the file in root directly, we need to convert the logical file name (LFN: the alias that is the same on all machines that the data is stored at regardless of where you are) to the physical file name (PFN: the actual location of the file on disk). To do this, we will use the EdmFileUtil.

Its usage is

EdmFileUtil
Allowed options:
  -h [ --help ]                print help message
  -f [ --file ] arg            data file (Required)
  -c [ --catalog ] arg         catalog
  -l [ --ls ]                  list file content
  -P [ --print ]               Print all
  -u [ --uuid ]                Print uuid
  -v [ --verbose ]             Verbose printout
  -d [ --decodeLFN ]           Convert LFN to PFN
  -b [ --printBranchDetails ]  Call Print()sc for all branches
  -t [ --tree ] arg            Select tree used with -P and -b options
  --allowRecovery              Allow root to auto-recover corrupted files
  -e [ --events ] arg          Show event ids for events within a range or set 
                               of ranges , e.g., 5-13,30,60-90 

Thus, we'll find the PFN corresponding to the LFN for the following file as follows:

EdmFileUtil -d /store/relval/CMSSW_2_1_0/RelValZMM/GEN-SIM-DIGI-RAW-HLTDEBUG-RECO/STARTUP_V4_v1/0000/08D532C8-9C60-DD11-AB1C-000423D99996.root

The output of this at CERN is:

rfio:/castor/cern.ch/cms/store/relval/CMSSW_2_1_0/RelValZMM/GEN-SIM-DIGI-RAW-HLTDEBUG-RECO/STARTUP_V4_v1/0000/08D532C8-9C60-DD11-AB1C-000423D99996.root

The output of this at Fermilab is:

dcap://cmsdca1.fnal.gov:24138/pnfs/fnal.gov/usr/cms/WAX/11/store/relval/CMSSW_2_1_0/RelValZMM/GEN-SIM-DIGI-RAW-HLTDEBUG-RECO/STARTUP_V4_v1/0000/08D532C8-9C60-DD11-AB1C-000423D99996.root

(Note: This example was run on cmslpc at Fermilab, the PFN will be different at other locations).

We now seek to examine the output of this file as follows:

To do so, follow these steps:

   root -l <output given by EdmFileUtil>

*Be sure to substitute the proper physical file name into the preceding command!*

Have a look at the Events tree (double-click on "ROOT Files", and then the file itself, and select the Events tree. You will see a long list of branches, each containing some dataproduct that is part of the standard CMSSW_2_1_x AOD. Many of these branches have interdependencies on each other through associations and references. The screenshot below shows a portion of the full list.

  • Screenshot of the AOD content:
    scrshot_aodcontent.jpg

When one looks into the PAT Layer-1 event content though, one gets a much simplified view:

   root -l PATLayer1_Output.fromScratch_fast.root 
   # and then browse
   root [1] new TBrowser()
A screenshot of the content is shown below. What happened here is that the PAT has dropped quite some low-level and expert information from the standard AOD event content. It might be that your analysis needs such information, in which case it is sufficient to add the appropriate "keep" statements to the PoolOutputModule in the PAT cfg. On the other hand much simplification has also happened because the Layer-1 objects embed related information in them which is in the AOD stored externally through references and associations. This makes the interface to these objects much easier for the user, and allows to easily process PAT Layer-1 objects afterwards, for instance in an event selection.

  • Screenshot of the Layer-1 content:
    scrshot_layer1content.jpg

Configurable parameters

PAT's design choice was to make everything fully configurable with config files. Although reasonable defaults are provided, the user should, after initial playing, configure the PAT for his/her analysis needs. The PAT cfi & cff include chain is complex though. We are currently attempting to make this easier.

The default cleaning, for example, looks like this:

  • Screenshot of the Layer-0 cleaning flow:
    pythonconfigbrowser-snapshot4.jpg

Stay tuned for more tools that are developed to make this process more robust and less prone to error.

This figure was produced with the Aachen group's Python Config Browser (currently in bet). It can be obtained by following the instructions here.

Looking at PAT Layer-0 objects (this section can be skipped)

Using the examples, you can produce Layer-0 objects yourself in the same way you would make Layer-1 objects. To avoid waiting for it, you can also start from a pre-made file. Browse the file like this:

   cmsRun patLayer0_fromScratch_fast.cfg.py
   root -l PATLayer0_Output.fromScratch_fast.root
   root [1] new TBrowser()

All the branches ending in "_PAT" are produced in the PAT Layer-0 processing step. Many of those are internal technical tricks that are needed for the internal handling of associations. In the screenshot below you can see several of those branches. Again, this is "experts-only" stuff. When you choose to save PAT Layer-0 output and run Layer-1 + your analysis from Layer-0 input, you will not have to care about the Layer-0 internals either.

  • Screenshot of the Layer-0 content:
    scrshot_layer0content.jpg

Examples

We now present several examples of using the PAT in 21x series analyses.

Bare ROOT

  • There is a worked example of how to do analysis with bare ROOT on the StarterKit twiki here.

FWLite

An example FWLite macro fwlitebatchtest.C is attached, together with a short macro to easily run it (runtest.C). You can retrieve them with:

wget -nc https://twiki.cern.ch/twiki/pub/CMSPublic/WorkBookPATBackUp/fwlitebatchtest.C https://twiki.cern.ch/twiki/pub/CMSPublic/WorkBookPATBackUp/runtest.C

The FWLite macro will loop over the events and plot the size of the jet collection. You can run it with:

root -l -b -q runtest.C
This will compile the macro (with a few harmless warning messages) and run it. ROOT notes: -l prevents ROOT from showing the splash screen, -b makes it run in batch mode (no graphics display) and with -q it will quit the ROOT prompt after execution. It will create a ROOT file fwlitebatchtest.root with a histogram containing the size of the jet collection:

  • Number of jets per event (simple macro output):
    hNumJets.png

Obviously, this contains a lot of jets (more than 40 in some cases), because there is basically no pt cut on the jets. A slightly modified version of the macro (fwlitebatchtest2.C) will require a minimum pt for a jet to be counted. You can retrieve this macro and the script to run it with:

wget -nc https://twiki.cern.ch/twiki/pub/CMSPublic/WorkBookPATBackUp/fwlitebatchtest2.C https://twiki.cern.ch/twiki/pub/CMSPublic/WorkBookPATBackUp/runtest2.C
and run it:
root -l -b -q runtest2.C
The resulting histogram indeed looks more reasonable:

  • Number of jets per event (with a cut on jet pt):
    hNumJets2.png

If you prefer to run the scripts fully in batch (for example on a batch queue), another script (runtestbatch.sh) is attached.

Other FWLite examples

  • There is a worked example of how to do analysis with FWLite on the StarterKit twiki here.

  • There is a slightly more complicated FWLite analysis with the SWGuideCATopTag algorithm here.

Full Framework Example

  • There is a worked example of how to do analysis with the full framework on the StarterKit twiki here. It is also instructive to go through the line-by-line walkthrough of this example here.

  • There is also a tutorial from the top group linked here that shows how to use the full framework along with the TQAF event hypothesis construction.

Configuration Example

  • There is a complex configuration example provided in the boosted top jet algorithm, SWGuideCATopTag, here.

*This provides a complicated configuration with a custom jet algorithm, a custom "top tagging algorithm", jet corrections that are not run in the default production, and even a non-standard quark matching routine. The PAT can handle all of this quite well, with a small amount of configuration changes and no coding at all!

Links to older software release examples

Here are some older examples with earlier versions of CMSSW:

What next?

Now it's your turn: time to do physics!

Main Information Sources

PAT main twiki
PAT example twiki
Starter Kit
CMS Offline Workbook
CMS Software Guide

Review status

Reviewer/Editor and Date (copy from screen) Comments
KatiLassilaPerini - 18 April 2008 created the template page
StevenLowette - 06 May 2008 filled the page with content
FredericRonga - 13 Jun 2008 user feedback

Responsible: StevenLowette
Last reviewed by: StevenLowette - 6 May 2008

Topic attachments
I Attachment History Action Size Date Who Comment
C source code filec fwlitebatchtest.C r3 r2 r1 manage 1.3 K 2008-09-23 - 19:14 FredericRonga Example FWLite macro
C source code filec fwlitebatchtest2.C r2 r1 manage 1.6 K 2008-09-23 - 19:14 FredericRonga Slightly more advanced FWLite example macro
PNGpng hNumJets.png r1 manage 7.9 K 2008-09-23 - 19:11 FredericRonga Number of jets per event (simple macro output)
PNGpng hNumJets2.png r1 manage 7.7 K 2008-09-23 - 19:12 FredericRonga Number of jets per event (with a cut on jet pt)
JPEGjpg pat_logo.jpg r1 manage 3.4 K 2008-09-03 - 17:47 SalvatoreRRappoccio  
JPEGjpg pythonconfigbrowser-snapshot4.jpg r1 manage 44.2 K 2008-09-05 - 14:40 AndreasHinzmann Snapshot from the ConfigBrowser
C source code filec runtest.C r1 manage 0.2 K 2008-09-18 - 08:49 FredericRonga Short ROOT macro to compile and run the FWLite macro
C source code filec runtest2.C r1 manage 0.2 K 2008-09-18 - 08:56 FredericRonga ROOT macro to easily run the FWLite example 2
Unix shell scriptsh runtestbatch.sh r1 manage 0.2 K 2008-09-23 - 18:58 FredericRonga Short script to run the FWLite macro in batch mode
JPEGjpg scrhot_layer0_flowchart.jpg r1 manage 68.6 K 2008-08-28 - 21:11 SalvatoreRRappoccio  
JPEGjpg scrshot_aodcontent.jpg r1 manage 78.1 K 2008-08-28 - 20:17 SalvatoreRRappoccio  
JPEGjpg scrshot_cleanerreport.jpg r1 manage 105.1 K 2008-08-28 - 20:16 SalvatoreRRappoccio  
JPEGjpg scrshot_layer0_flowchart.jpg r1 manage 68.6 K 2008-08-28 - 21:13 SalvatoreRRappoccio  
JPEGjpg scrshot_layer0content.jpg r1 manage 90.6 K 2008-08-28 - 21:31 SalvatoreRRappoccio  
JPEGjpg scrshot_layer1cfg.jpg r1 manage 94.6 K 2008-08-28 - 20:16 SalvatoreRRappoccio  
JPEGjpg scrshot_layer1content.jpg r1 manage 47.2 K 2008-08-28 - 20:16 SalvatoreRRappoccio  
JPEGjpg scrshot_trigreport.jpg r1 manage 51.0 K 2008-08-28 - 20:16 SalvatoreRRappoccio  
Edit | Attach | Watch | Print version | History: r20 < r19 < r18 < r17 < r16 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r20 - 2010-04-28 - RogerWolf
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback