The TauAnalysis/TauIdEfficiency Package

Complete: 3

Doxygen

software administrators: MichalBluj, ChristianVeelken

Introduction

The TauAnalysis/TauIdEfficiency package contains tools for tau commissioning tasks, like measurement of the tau id. efficiency or jet --> tau fake-rates. The package is part of the TauAnalysis subsystem.

For a description of how to run the work-flows for measuring the tau id. efficiency and determining jet --> tau fake-rates click on the following links:

Tau id. efficiency measurement

The tau id. efficiency is measured via Tag&Probe. Details of the measurement are documented in AN-11/200 and AN-11/514 . The more technical aspects of the tau id. efficiency measurement are described in the following.

Workflow

The workflow for measuring tau id. efficiencies via Tag&Probe consists of 3 steps:

The purpose of the first stage is to select from AOD events samples of QCD, TTbar, W+jets and Z+jets events. ROOT files containing the information needed to measure the tau id. efficiency (so called PAT-tuples) are produced in the second stage. In the third stage, the PAT-tuples are analyzed and histograms are produced. The final plots are made automatically at the end of the third stage.

Preselection of Events (Skimming)

The selection of QCD, W+jets and Z+jets events is done on the grid. You need to setup the crab environment first. A tutorial how to setup and run crab can be found here.

Once the crab environment is setup, edit the file submitSkimTauIdEffSample_grid.py. The parameters which you need to update are:

  • outputFilePath: directory on castor where AOD files containing the selected events will be stored
  • jobId: version label, can be chosen freely
NOTE: You need to create the directory outputFilePath yourself. It is recommended to set access permissions to 777 (anybody has read and write access).

Now edit the file userRegistry.py. Create a new entry with your username, the jobId which you have chosen in submitSkimTauIdEffSample_grid.py and the keyword 'ZtoMuTau_tauIdEff', following the example for user 'veelken'.

Once you have updated both files, you can submit the crab jobs by executing

cd $CMSSW_BASE/src/TauAnalysis/Skimming/test/
./submitSkimTauIdEffSample_grid.py

You can monitor the progress of the crab jobs on the CMS dashboard.

NOTE: It is a good idea to check that the configuraton file for preselecting events works before starting to submit crab jobs. As a test, execute:

cmsRun TauAnalysis/Skimming/test/skimTauIdEffSample_cfg.py

Production of PAT-tuples

The production of PAT-tuples is performed using the CERN lxbatch system. A description of the lxbatch system can be found here.

Before submitting any jobs, edit the file submitTauIdEffMeasPATTupleProduction_lxbatch.py. The parameters which you need to update are:

  • outputFilePath: directory on castor where PAT-tuple files will be stored
  • jobId: version label used when submitting the skimming jobs
  • version: version label, can be chosen freely
NOTE: You need to create the directory outputFilePath yourself. It is recommended to set access permissions to 777 (anybody has read and write access).

Once you have updated the file, you can submit jobs to the CERN lxbatch system by executing

cd $CMSSW_BASE/src/TauAnalysis/TauIdEfficiency/test/commissioning/
./submitTauIdEffMeasPATTupleProduction_lxbatch.py

You can monitor the progress of the jobs by executing

bjobs -w

NOTE: It is a good idea to check that the PAT-tuple production works before starting to submit the jobs. As a test, execute:

cmsRun TauAnalysis/TauIdEfficiency/test/commissioning/produceTauIdEffMeasPATTuple_cfg.py

Analysis of PAT-tuples via FWLite macro

Once the lxbatch jobs have finished processing, you can start analyzing the PAT-tuples and making the final plots. The analysis of PAT-tuples is performed on a local machine. In order to speed-up processing, copy the PAT-tuples from castor to a local disk first.

A script is available to automatize the copy procedure. You can find the script in copyTauIdEffPATtuples.py. The parameters which you need to update are:

  • jobId: version label used when submitting the skimming jobs
  • version: version label used when submitting the PAT-tuple production jobs
  • sourceFilePath: location of ROOT files containing PAT-tuples on castor
  • targetFilePath: directory to store PAT-tuples on local disk
NOTE: You need to create the directory targetFilePath yourself. It is recommended to set access permissions to 777 (anybody has read and write access).

Start copying the PAT-tuples by executing:

cd $CMSSW_BASE/src/TauAnalysis/TauIdEfficiency/test/commissioning/
./copyTauIdEffPATtuples.py
The script will tell you when all files have finished copying.

To analyze the PAT-tuples and make the final plots, first edit the file runTauIdEffMeasAnalysis.py. The parameters which you need to update are:

  • jobId: version label used when submitting the skimming jobs
  • version: version label used when submitting the PAT-tuple production jobs
  • label: version label for final plots, can be chosen freely
  • inputFilePath: directory in which PAT-tuples are stored on local disk
  • outputFilePath: directory to store temporary ROOT files containing histograms
Once you have finished editing, execute
cd $CMSSW_BASE/src/TauAnalysis/TauIdEfficiency/test/
./runTauIdEffMeasAnalysis.py
The script will ask you to execute a 'make' command. The 'make' command will start the FWLite macro to analyze the PAT-tuples and make the final plots. You will be informed by a message appearing in the terminal once all plots have been made. NOTE: Analyzing the PAT-tuples takes about 12 hours. Make sure that your terminal does not close and your afs token does not expire before the processing has finished.

The plots will appear in the directory outputFilePath/$version/$runPeriod/plots . There will be versions in PNG, PDF and EPS format.

Checklist

The following checklist summarizes the updates which you typically need to make to configuration files whenever you start measuring the tau id. efficiency for a new data-taking period:

NOTE: You need to create the targetFilePath directory yourself. It is recommended to set access permissions to 777 (anybody has read and write access).

Determination of jet --> tau fake-rates

Workflow

The workflow for determining jet --> tau fake-rates consists of 4 steps:

The purpose of the first stage is to select from AOD events samples of QCD, W+jets and Z+jets events and produce ROOT files containing the information needed to determine jet --&gt tau fake-rates (so called PAT-tuples). In the second stage, the PAT-tuples are analyzed and histograms are produced. This is done on the CERN lxbatch system which can process many jobs in parallel. The output of these jobs is collected in the third stage, called "harvesting" of histograms. The final plots are made in the fourth stage.

Production of PAT-tuples

The selection of QCD, W+jets and Z+jets events and production of PAT-tuples containing the selected events is done on the grid. You need to setup the crab environment first. A tutorial how to setup and run crab can be found here.

Once the crab environment is setup, edit the file submitCommissioningPATTupleProductionJobs_grid.py. The parameters which you need to update are:

  • castorFilePath: directory on castor where PAT-tuple files will be stored
  • crabFilePath: local directory on machine on which submitCommissioningPATTupleProductionJobs_grid.py is executed
  • version: version label, can be chosen freely
NOTE: You need to create the directories castorFilePath and crabFilePath yourself. It is recommended to set access permissions to 777 (anybody has read and write access).

Once you have updated the file, you can submit the crab jobs by executing

cd $CMSSW_BASE/src/TauAnalysis/TauIdEfficiency/test/commissioning/
./submitCommissioningPATTupleProductionJobs_grid.py

You can monitor the progress of the crab jobs on the CMS dashboard.

NOTE: It is a good idea to check that the PAT-tuple production works before starting to submit crab jobs. As a test, execute:

cmsRun TauAnalysis/TauIdEfficiency/test/commissioning/produceCommissioningWplusJetsEnrichedPATTuple_cfg.py

Analysis of PAT-tuples via FWLite macro

Once the majority of crab jobs have finished processing, you can start analyzing the PAT-tuples. The analysis of PAT-tuples is performed in parallel jobs for QCD, W+jets and Z+jets selections and for data and Monte Carlo, using the CERN batch system. Click here for an introduction how to use the CERN batch system within CMSSW.

Before you can start submitting the analysis jobs, you need to make a few modifications to the config file runTauFakeRateAnalysis.py. The parameters which you need to update are:

  • version: version label as defined in submitCommissioningPATTupleProductionJobs_grid.py
  • inputFilePath: directory in which PAT-tuples are stored on castor. Needs to match castorFilePath defined in submitCommissioningPATTupleProductionJobs_grid.py
  • harvestingFilePath: directory on castor where ROOT files containing histograms are stored. Please chose a directory different from inputFilePath
  • outputFilePath: local directory on machine on which runTauFakeRateAnalysis.py is executed.
  • runPeriod: either "2011RunA" or "2011RunB"
  • runFWLiteTauFakeRateAnalyzer: set this flag to True
  • runLXBatchHarvesting: set this flag to False
  • runMakePlots: set this flag to False
NOTE: You need to create the harvestingFilePath and outputFilePath directories yourself. It is recommended to set access permissions to 777 (anybody has read and write access).

Once you have updated this file, you can start submitting the PAT-tuple analysis jobs to lxbatch. Execute:

./runTauFakeRateAnalysis.py
The script will check with PAT-tuples exist on castor and build a Makefile for submitting the analysis jobs to the CERN batch system. After the script has run, it will tell you which command you need to execute in order to start the job submission.

You can monitor the progress of the analysis jobs by executing

bjobs -w

Harvesting of histograms

After all analysis jobs have finished processing ('bjobs -w' does not show any running jobs), you can start harvesting the ROOT files which contain the histograms.

Edit runTauFakeRateAnalysis.py and change:

  • runFWLiteTauFakeRateAnalyzer: set this flag to False
  • runLXBatchHarvesting: set this flag to True
  • runMakePlots: set this flag to False

Then execute

./runTauFakeRateAnalysis.py
and execute the command which the script tells you to run.

The harvesting will run on the CERN batch system. To monitor the progress of the jobs, execute

bjobs -w
again.

Making the final plots

Once all harvesting jobs have finished, you can proceed with making the final plots. Edit runTauFakeRateAnalysis.py and update:

  • runFWLiteTauFakeRateAnalyzer: set this flag to False
  • runLXBatchHarvesting: set this flag to False
  • runMakePlots: set this flag to True

Then execute

./runTauFakeRateAnalysis.py
and execute the command which the script tells you to run.

The plots will appear in the directory outputFilePath/$version/$runPeriod/plots . There will be versions in PNG, PDF and EPS format.

-- ChristianVeelken - 02-Feb-2012

Edit | Attach | Watch | Print version | History: r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r3 - 2012-05-01 - ChristianVeelken
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback