9.6 How to Pick Events (Interactive and CRAB3)
Complete:
Detailed Review status
Goals of this page:
The goal of this page is to help the users get a copy of a subset of
events from a dataset. This utility is a python script called
edmPickEvents.py
and can run an interactive job for a small
number of events (now user can use additional command-line option to pick more than 20 events when running interactively), or can set up a CRAB job.
As a recent improvement, this script will allow you access events from
data that is
not local by using xrootd and CRAB3.
If you plan on making very large "skims' (
e.g., all W candidates),
please consider collaborating with others as to minimize the number of
identical collections we have.
Contents
How to setup the environment to run edmPickEvents.py
Note that edmPickEvents in CMSSW versions before 8_3_0 is broken. Please use recent CMSSW.
The script
edmPickEvents.py
is part of the
PhysicsTools/Utilities
package. The version that uses DAS instead of the outphased DBS2 is integrated into CMSSW_5_3_18 and later and the version that uses CRAB3 instead of CRAB2 is integrated into CMSSW_5_3_29 and later (CMSSW_5_3_X release cycle) and CMSSW_7_4_7 and later.
ssh lxplus.cern.ch
cmsrel CMSSW_10_2_14
cd CMSSW_10_2_14/src/
cmsenv
How to Run edmPickEvents.py
This script doesn't work on simulation due to a limitation in DAS client, see this thread
.
The script
edmPickEvents.py
can be run interactively or via a CRAB
job. If you have few events to pick, run it interactively. Now you can pick more than
20 events interactively using the new command line option :
edmPickEvents.py --maxEventsInteractive=30 "/Charmonium/Run2018C-17Sep2018-v1/AOD" events.txt
In case you have a lot of events to pick,
submit a CRAB job. This version of script is compatible with CRAB3 and produces a crabConfig
which can be directly used by the user to submit the jobs.
One can change some of the crab parameters (if needed, like site to store the output files, num of lumi etc).
Run edmPickEvents.py
Interactively
To run the script interactively do the following, for example
edmPickEvents.py "/Charmonium/Run2018C-17Sep2018-v1/AOD" 319337:60:30203079
where
*/Charmonium/Run2018C-17Sep2018-v1/AOD= is the dataset you want to pick
events from. You can only do one per job.
- 319337:60:30203079 is one event you want to pick. The syntax is Run:Lumi:Event (one could also put more than one event by separating the different events with a comma:
Run1:Lumi1:Event1
, Run2:Lumi2:Event2
.
or
edmPickEvents.py "/Charmonium/Run2018C-17Sep2018-v1/AOD" events.txt
where
-
events.txt
is a text file that contains = 319337:60:30203079 = ( and others if desired, but one by line )
and the screen output would look like this:
edmCopyPickMerge outputFile=pickevents.root \
eventsToProcess=319337:60:30203079 \
inputFiles=/store/data/Run2018C/Charmonium/AOD/17Sep2018-v1/60000/FFE6BEB2-BE6C-5A4F-A6D0-41999FDE5942.root
In this case, the user can either paste the above
edmCopyPickMerge
output or run
edmPickEvents.py
with the
--runInteractive
flag
which will run it for you (warning, this can take a long time).
This will create a
ROOT
file called
pickevents.root
in the same directory you executed the command from. Also note that
edmCopyPickMerge
script locates
edmPickEvents.py
configuration file and then uses it with
cmsRun
.
Run edmPickEvents.py
with CRAB
BEWARE AS THE CRAB CONFIG THAT edmPickEvents puts out might still be crab2 if you use releases before CMSSW_7_4_7!!!
If you are running over a large number of event, if you just don't want to wait for a
long job to finish, you can use
edmPickEvents.py
to setup a CRAB
job for you.
To run a CRAB job:
- First setup the Crab environment following the instructions at SWGuideCrab
- Then run the script as follows ( In this case we are running the two events via a CRAB job) :
edmPickEvents.py "/Charmonium/Run2018C-17Sep2018-v1/AOD" events.txt --crab
When you run this it gives the following screen output
Please visit CRAB twiki for instructions on how to setup environment for CRAB:
https://twiki.cern.ch/twiki/bin/viewauth/CMS/SWGuideCrab
Setup your environment for CRAB and edit pickevents_crab.py to make any desired changed. Then run:
crab submit -c pickevents_crab.py
This will create the configuration file called
pickevents_crab.config
for you
- If desired, you can modify the contents of
pickEvents_crab.config
. Most people should find the defaults sufficient.
- nota bene: if you use CMSSW versions before than CMSSW_10_4_X, you will need to enable this option
config.JobType.allowUndistributedCMSSW = True
.
crab submit -c pickevents_crab.py
- It will create a CRAB working directory according to current time. YYMMDD_HHMMSS.
- You can check the status of your job via
crab status
At the moment of run this example it was:
crab status -d crab_pickevents_20201029_184718/crab_pickEvents
Once the CRAB job finishes successfully, you can get the output
ROOT
file in the
craboutput /res
directory:
crab -getoutput
To merge the output of CRAB, you can follow the
easy instructions here:
ls -1 pick*root | perl -ne 'print "file:$_"' > myFiles.txt
edmCopyPickMerge inputFiles_load=myFiles.txt outputFile=pickevents_merged.root maxSize=1000000
where maxSize=1000000 means 1 million Kb (or 1 Gb).
(In cmsRun, local files must be noted with the prefix
file:
.)
Review Status
--
SudhirMalik - 09-Sep-2010Added instructions for CMSSW_7_4_X compatible with CRAB3