Submitting Toy Monte Carlo using CRAB

Complete: 3

Introduction

Running Toy Monte Carlo to study fit properties may be a very CPU intensive task. Submitting Toy Monte Carlo productions via GRID or LSF using CRAB is a useful tool to distribute the CPU load and reduce the user processing time.

Below we describe how to use CRAB to submit Toy Monte Carlo productions with a simple example.

Software package and release

The example below runs under release CMSSW_3_1_0_pre10 and is released in the following package:

You can check out the code and get ready to run it using the recipe below:

cmsrel CMSSW_3_1_0_pre10
cd CMSSW_3_1_0_pre10/src
cmsenv
addpkg PhysicsTools/RooStatsCms V??-??-??
cd PhysicsTools/RooStatsCms
scram b 
cd test
ln -s $CMSSW_BASE/test/slc4_ia32_gcc345/testCrabToyMC .
cp testCrabToyMC.cfg crab.cfg

A simple Toy Monte Carlo

The application we want to test is a simple fit program that has a Gaussian signal under an exponential background. The program is instrumented in order to receive the random number seed as a program argument, so that it can be externally driven by CRAB .

The C++ code is available below:

The seed can be passed through the option -s, as can be seen running the program with the help option, -h:

> ./testCrabToyMC -h
./testCrabToyMC [options] :
  -h [ --help ]         produce help message
  -s [ --seed ] arg     random generator seed

CRAB job driver script

A shell script is the interface between CRAB and each of the job in which the full Toy Monte Carlo production is split. The script receives two main inputs from CRAB :

  • a job index, passed as argument #1 (=$1)
  • the parameter $Maxevents, specifying how many events should be processed for that job. In our case, an event corresponds to a single toy experiment.

When running the driver script locally, which is helpful for debugging, the parameter $MaxEvents can be also set passing an optional second argument, as can be seen when running the script with the help option:

> ./testCrabToyMC.sh help
usage: testCrabToyMC.sh <job index> [<max events>]

Internally, the script runs the Toy Monte Carlo a number of times equal to $MaxEvents, and sets the random seed according to the event number that varies from the following two extremes, $j and $jmax, computed according to the following formulae:

j=`expr \( $i - 1 \) \* $n + 1`
jmax=`expr $j + $n - 1`

Using this approach ensures that the random number sequence is identical regardless of the job splitting done by CRAB , and makes the Toy Monte Carlo production reproducible even with different CRAB splitting configurations.

There are two main outputs of each job:

  • the file that contains the fit results, fitResults.txt. Each line contains the fit results for a single experiment (event).
  • a directory that contains logs and plots for each single experiment, outputToy, that is tarred and compressed at the end of job in order to retrieve it via CRAB .

The two outputs will receive a job index post-fix, so will become fitResults_1.txt, fitResults_2.txt, etc.

The shell source code of the driver script is available below:

CRAB job configuration

CRAB job submission is configured with the script testCrabToyMC.cfg. The script specifies the driver script:

script_exe = testCrabToyMC.sh

the executable, as additional input:

additional_input_files = testCrabToyMC

the outputs of each job:

output_file = fitResults.txt, outputToy.tgz

and the way the Monte Carlo production is split:

total_number_of_events=100
number_of_jobs=10

Switching from GRID scheduler (glite) to LSF scheduler may help doing a fast production under a local LSF queue, like on lxplus. This may be faster in a testing phase.

#scheduler = glite
scheduler = lsf

Optional script arguments can be passed using the following configuration fragment:

script_arguments = arg1,arg2,arg3,...

The complete CRAB configuration file is available below:

Submitting the jobs via CRAB

CRAB submission can be now performed as usually done for other CMSSW analysis application:

  • create your jobs:
crab -create

  • submit your jobs:
crab -submit

  • monitor your jobs' status:
crab -status

  • after jobs are finished, retrieve the output:
crab -getoutput

The output consists in many files fitResults_.txt and outputToy_.tgz. The files fitResults_.txt should be concatenated and used for the analysis of Toy Monte Carlo (compute pulls, etc.), and the filed outputToy_.tgz can be uncompressed, and will produce logs and plots for each single fit in the directory outputToy.

Example: fit results inspection

It can be interesting to study the fitted values of the model parameters, i.e. the exponential slope, the Gaussian mean and sigma and the signal (background) yield. The python program testCrabToyMC_summary.py can be used to inspect the outputToy_.tgz logs and produce plots for all the fitted values of the parameters and the relative Pulls. The usage is simple:
testCrabToyMC_summary.py dir1 dir2 dir3 ... dirN
where dirX represents a directory where the .tgz files are stored. After the execution, images in the png format are created for the parameters values and pulls (see attachment at the end of the page) and the histograms there displayed are saved into a rootfile.

References

Topic attachments
I Attachment History Action Size Date Who Comment
PNGpng b_h.png r1 manage 8.5 K 2009-06-23 - 12:16 DaniloPiparo Background yield
PNGpng b_h_pull.png r1 manage 8.5 K 2009-06-23 - 12:16 DaniloPiparo Background yiled pull
PNGpng lambda_h.png r1 manage 8.7 K 2009-06-23 - 12:17 DaniloPiparo Exponential slope
PNGpng lambda_h_pull.png r1 manage 7.9 K 2009-06-23 - 12:17 DaniloPiparo Exponential slope pull
PNGpng mu_h.png r1 manage 8.5 K 2009-06-23 - 12:17 DaniloPiparo Gaussian mean
PNGpng mu_h_pull.png r1 manage 7.9 K 2009-06-23 - 12:18 DaniloPiparo Gaussian mean pull
PNGpng s_h.png r1 manage 8.7 K 2009-06-23 - 12:15 DaniloPiparo Signal yield
PNGpng s_h_pull.png r1 manage 8.5 K 2009-06-23 - 12:16 DaniloPiparo Signal yield pull
PNGpng sigma_h.png r1 manage 7.7 K 2009-06-23 - 12:18 DaniloPiparo Gaussian sigma
PNGpng sigma_h_pull.png r1 manage 7.9 K 2009-06-23 - 12:18 DaniloPiparo Gaussian sigma pull
Edit | Attach | Watch | Print version | History: r5 < r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r5 - 2009-06-23 - DaniloPiparo
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback