4.2.4.3 Exercise 04: How to create a PAT Tuple via crab
This exercise runs a CRAB job to create a pat::Tuple from the AOD data.
BEWARE THIS PAGE IS WAY OBSOLETE
BEWARE. THIS PAGE NEEDS TO BE UPDATED. DO NOT BLINDLY FOLLOW INSTRUCTIONS BELOW.
IN PARTICULAR THE FOLLOWING TOOLS OR PROCEDURE USED BELOW ARE DEPRECATED OR SIMPLY NOT WORKING:
- CRAB2 : USE CRAB3 INSTEAD
- SL5 :USE SL6 AND UPTODATE SCRAM ARCH AND CMSSW RELEASE
- DBS CLI: USE DAS INSTEAD
- DBS2 URL'S : USE DBS3
- CASTOR: USE EOS
Introduction
This does not teach the details of GRID nor is a CRAB tutorial. For CRAB tutorial you must refer to
WorkBookCRAB2Tutorial. A complete guide to CRAB is at
SWGuideCrab. This exercise runs CRAB job to create a PATtuple from (skimmed) data and is part of the PAT tutorial exercises. The purpose of these exercises is to show PAT users how they can use Grid tools in creating their PAT tuple for CMS analysis.
Having your storage space set up may take several days, Grid jobs run with some latency, and there can be problems.
You should
set aside about a week to complete these exercises. The actual effort required is not the whole week but a few hours. For
CRAB questions unrelated to this twiki, please use the links
CRAB2 FAQ and
CRAB3.
Pre-requisite for this exercises
To perform this set of exercises you need a Grid Certificate, CMS VO membership and SiteDB registration. It is also assumed that you have CERN account. You must have a space to write out your output files to , say
castor
. All
lxplus
users have account on
castor
where they can write their CRAB jobs output.
Obtain a CERN account
- Use the following link for a CMS CERN account: CMS CERN account
- A CERN account is needed, for example, to login in to any e-learning web-site, or obtain a file from the afs area. A CERN account will be needed for future exercises.
- Obtaining a CERN account can be time-consuming. To expedite the process please ask the relevant institutional team leader to perform the necessary "signing" after the online form has been submitted and received for initial processing by the secretariat.
Obtain a Grid Certificate, CMS VO membership and SiteDB registration
- A Grid Certificate, CMS VO membership and SiteDB registration will be needed for the next set of exercises. The registration process can be time-consuming (actions by several people are required), so it is important to start it as soon as possible. There are three main requirements which can be simply summarized: A certificate ensures that you are who you claim to be. A registration in the VO recognizes your (identified by your certificate) as a member of CMS. A SiteDB is a database and web interface that CMS uses to track sites and also used by CRAB publication step to find out the hypernews username of a person from their Grid Certificate's DN (Distinguished Name) etc.. Please look at Get Your Grid Certificate, CMSVO and SiteDB registration to complete these three steps. All three steps are needed to be completed before you successfully submit jobs on the Grid.
NOTE:
Legend of colors for this tutorial:
GRAY background for the commands to execute (cut&paste)
GREEN background for the output sample of the executed commands
BLUE background for the configuration files (cut&paste)
Step 1 - Setup a release, setup CRAB environment and verify your grid certificate is OK
To setup a
CMSSW
release for our purpose, we will execute the following instructions:
Login to
lxplus.cern.ch
ad go to your working area ( preferably
scratch0
area, for example
/afs/cern.ch/user/m/malik/scratch0/
). Then execute the following commands. We will call
/afs/cern.ch/user/m/malik/scratch0/
as
YOURWORKINGAREA
.
Create a directory for this exercise (to avoid interference with code from the other exercises).
mkdir exercise04
cd exercise04
Create a local release area and enter it.
setenv SCRAM_ARCH slc5_amd64_gcc462
cmsrel CMSSW_7_4_1_patch4
cd CMSSW_7_4_1_patch4/src
Set up the environment.
source /afs/cern.ch/cms/LCG/LCG-2/UI/cms_ui_env.csh
cmsenv
More on setup local Environment and prepare user analysis code are given
here.
To setup CRAB execute the following command. The explanation of this command is given
here.
source /afs/cern.ch/cms/ccs/wm/scripts/Crab/crab.csh
To verify that your certificate - It is assumed that you have successfully installed and followed the above instructions for setup. Now you verify that your GRID certificate has all the information needed.
Initialize your proxy
voms-proxy-init -voms cms
You should the following output:
Enter GRID pass phrase:
Your identity: /DC=org/DC=doegrids/OU=People/CN=Sudhir Malik 503653
Creating temporary proxy .......................... Done
Contacting lcg-voms.cern.ch:15002 [/DC=ch/DC=cern/OU=computers/CN=lcg-voms.cern.ch] "cms" Done
Creating proxy ................... Done
Your proxy is valid until Fri Jul 1 22:41:34 2011
Now run the following command
voms-proxy-info -all | grep -Ei "role|subject"
The output should look like this:
subject : /DC=org/DC=doegrids/OU=People/CN=Sudhir Malik 503653/CN=proxy
subject : /DC=org/DC=doegrids/OU=People/CN=Sudhir Malik 503653
attribute : /cms/Role=NULL/Capability=NULL
attribute : /cms/uscms/Role=NULL/Capability=NULL
Step 2 - Know your data set name in the DBS
The data set name is needed to run the CRAB job.
In this exercise, we will use the data set used =/Jet/Run2011B-PromptReco-v1/AOD=.
I picked up this dataset from the twiki
PhysicsPrimaryDatasets. Further, this dataset is also used in the script
PhysicsTools/PatExamples/test/patTuple_data_cfg.py
. This script is a standard script provided by PAT group to work with collision data and hence I picked it.
Before moving further, it would be interesting to see how to find this dataset in the DAS.
To do this open the
DAS
in a browser. In the
ADVANCED KEYWORD SEARCH
type the following :
dataset dataset=/Jet/Run2011B*
A page pops up ( after few seconds). There are several datasets listed on this page. The one we use is listed as
/Jet/Run2011B-PromptReco-v1/AOD
You can also make use of the DBS Command Line Interpreter (CLI):
- Having set your CMSSW environment invoke the dbs command:
cd exercise04/CMSSW_7_4_1_patch4/src
cmsenv
dbs search --query "find dataset where dataset=/Jet/Run2011B*"
The result below confirms what you already obtained with the DBS web interface:
Using DBS instance at: http://cmsdbsprod.cern.ch/cms_dbs_prod_global/servlet/DBSServlet
-------------------------------------------------------
dataset
/Jet/Run2011B-v1/RAW
/Jet/Run2011B-raw-19Oct2011-HLTTest-v1/USER
/Jet/Run2011B-PromptReco-v1/RECO
/Jet/Run2011B-PromptReco-v1/DQM
/Jet/Run2011B-PromptReco-v1/AOD
/Jet/Run2011B-LogError-PromptSkim-v1/RAW-RECO
/Jet/Run2011B-HighMET-PromptSkim-v1/RAW-RECO
/Jet/Run2011B-19Oct2011-HLTTest-v1/USER
/Jet/Run2011B-19Oct2011-HLTTest-v1/DQM
/Jet/Run2011B-15Nov2011-HiggsCert-v1/DQM
Step 3 - Run your PATtuple config file locally to make sure it works
First copy the python script
patTuple_data_cfg.py
to your local area. To do this, execute the following command:
cvs co -r CMSSW_7_4_1_patch4 PhysicsTools/PatExamples/test/patTuple_data_cfg.py
Before executing the python script
patTuple_data_cfg.py
, we need to edit this script and make
three modifications. If you are daring, try to run the config file
without and
with modifications one by one and see the error messages you might get. These modifications are:
1.
Change the line for global tag
process.GlobalTag.globaltag = 'GR_R_42_V12::All:'
to
process.GlobalTag.globaltag = 'GR_R_44_V11::All'
because we to use the appropriate gobal tag. More on global tag is
SWGuideFrontierConditions.
2.
Change
inputJetCorrLabel = ('AK5PF', ['L1Offset', 'L2Relative', 'L3Absolute', 'L2L3Residual'])
to
inputJetCorrLabel = ('AK5PF', [])#NO jet energy correction yet
because I chose not to use any jet correction.
3.
Replace all the input data files with the following:
'/store/data/Run2011A/Jet/AOD/PromptReco-v4/000/167/786/CA31E47B-1AA2-E011-B432-003048F1C58C.root',
'/store/data/Run2011A/Jet/AOD/PromptReco-v4/000/167/786/48B98498-02A2-E011-96B6-BCAEC5329732.root',
'/store/data/Run2011A/Jet/AOD/PromptReco-v4/000/167/785/6E3DC514-C4A1-E011-BBC5-0030487CD6F2.root',
'/store/data/Run2011A/Jet/AOD/PromptReco-v4/000/167/784/B01C8DF5-03A2-E011-A186-BCAEC518FF68.root',
'/store/data/Run2011A/Jet/AOD/PromptReco-v4/000/167/754/8027445A-D6A1-E011-91DE-0030487CD77E.root',
'/store/data/Run2011A/Jet/AOD/PromptReco-v4/000/167/746/C2ADFBCB-A3A1-E011-97DA-003048D37560.root',
'/store/data/Run2011A/Jet/AOD/PromptReco-v4/000/167/746/A49A3BEE-E0A1-E011-AB60-BCAEC532970F.root',
'/store/data/Run2011A/Jet/AOD/PromptReco-v4/000/167/746/90E492BD-DEA1-E011-8D48-003048F110BE.root',
'/store/data/Run2011A/Jet/AOD/PromptReco-v4/000/167/746/2E7786CA-A3A1-E011-9241-BCAEC532970D.root',
'/store/data/Run2011A/Jet/AOD/PromptReco-v4/000/167/746/1A134FCA-A3A1-E011-BAC3-BCAEC5329702.root',
'/store/data/Run2011A/Jet/AOD/PromptReco-v4/000/167/740/F6DB1AFB-7AA1-E011-95F6-003048F11DE2.root',
'/store/data/Run2011A/Jet/AOD/PromptReco-v4/000/167/740/E46112AF-74A1-E011-ABD5-BCAEC5329721.root',
'/store/data/Run2011A/Jet/AOD/PromptReco-v4/000/167/715/1852D1D3-26A1-E011-899C-001D09F28EC1.root',
'/store/data/Run2011A/Jet/AOD/PromptReco-v4/000/167/696/548BA9E0-26A1-E011-AD09-003048F118C4.root',
'/store/data/Run2011A/Jet/AOD/PromptReco-v4/000/167/676/AA128CD7-3DA1-E011-938E-001D09F34488.root',
'/store/data/Run2011A/Jet/AOD/PromptReco-v4/000/167/676/A4562376-47A1-E011-AF79-0030487CBD0A.root',
'/store/data/Run2011A/Jet/AOD/PromptReco-v4/000/167/676/809361D8-3DA1-E011-ACCF-001D09F2423B.root',
'/store/data/Run2011A/Jet/AOD/PromptReco-v4/000/167/676/383D553B-3FA1-E011-BA14-003048D2BC42.root',
'/store/data/Run2011A/Jet/AOD/PromptReco-v4/000/167/676/04437321-50A1-E011-B51A-001D09F29169.root',
'/store/data/Run2011A/Jet/AOD/PromptReco-v4/000/167/675/F82CB1ED-EAA0-E011-9CA9-001D09F28EC1.root',
'/store/data/Run2011A/Jet/AOD/PromptReco-v4/000/167/675/ECD4DADB-DCA0-E011-980E-0019B9F72D71.root',
'/store/data/Run2011A/Jet/AOD/PromptReco-v4/000/167/675/EC4F6380-16A1-E011-8A26-003048D2BDD8.root',
'/store/data/Run2011A/Jet/AOD/PromptReco-v4/000/167/675/CE809180-16A1-E011-A5DF-0030486780A8.root',
'/store/data/Run2011A/Jet/AOD/PromptReco-v4/000/167/675/90F2050B-DAA0-E011-A767-001D09F2441B.root',
'/store/data/Run2011A/Jet/AOD/PromptReco-v4/000/167/675/74536EF9-DEA0-E011-892A-001D09F2905B.root',
'/store/data/Run2011A/Jet/AOD/PromptReco-v4/000/167/675/601C1E07-F9A0-E011-AA76-001D09F295FB.root',
'/store/data/Run2011A/Jet/AOD/PromptReco-v4/000/167/675/5C09D510-3BA1-E011-A20F-001D09F254CE.root',
'/store/data/Run2011A/Jet/AOD/PromptReco-v4/000/167/675/4A48B4DB-DCA0-E011-8026-001D09F244BB.root',
'/store/data/Run2011A/Jet/AOD/PromptReco-v4/000/167/675/24B51FC9-FBA0-E011-AAE0-003048F1182E.root',
'/store/data/Run2011A/Jet/AOD/PromptReco-v4/000/167/675/22D01B9C-18A1-E011-9F0B-0030486780A8.root',
'/store/data/Run2011A/Jet/AOD/PromptReco-v4/000/167/675/1AA90303-F9A0-E011-98E5-003048F11C5C.root',
'/store/data/Run2011A/Jet/AOD/PromptReco-v4/000/167/675/0C6593C3-E1A0-E011-9D48-001D09F28EC1.root',
'/store/data/Run2011A/Jet/AOD/PromptReco-v4/000/167/674/EADA254E-A7A1-E011-AE17-BCAEC518FF44.root',
'/store/data/Run2011A/Jet/AOD/PromptReco-v4/000/167/674/940EB3F6-CBA0-E011-B453-0019B9F72BFF.root',
'/store/data/Run2011A/Jet/AOD/PromptReco-v4/000/167/674/80BBE492-ABA0-E011-942D-001D09F34488.root',
'/store/data/Run2011A/Jet/AOD/PromptReco-v4/000/167/674/602510F6-A5A0-E011-82E5-001D09F29114.root',
'/store/data/Run2011A/Jet/AOD/PromptReco-v4/000/167/673/A02BB63F-E3A0-E011-9A58-0030487C7E18.root'
because these correspond to our dataset chosen.
Now from your directory
/YOURWORKINGAREA/exercise04/CMSSW_7_4_1_patch4/src
, execute the following command
cmsRun PhysicsTools/PatExamples/test/patTuple_data_cfg.py
If successful, this will create an output file
jet2011A_aod.root
of size about
79MB
with
1000 events
.
Now you prepare the CRAB configuration file
Step 4 - Prepare your crab.py script for CRAB job submission
Your crab config files to submit CRAB job should look like this
crab.cfg.
In this
crab.cfg
file, the name of the output file is
jet2011A_aod.root
. Make sure that this name of the output file is the same as defined in the line
process.out.fileName = cms.untracked.string(''jet2011A_aod.root")
in the file
patTuple_data_cfg.py
.
To look at what each of the line in
crab.cfg
means, refer to
CRAB Tutorial.
This
crab.cfg
file contains
storage_path
as
/srm/managerv2?SFN=/castor/cern.ch/user/m/malik/pattutorial/
that we create in the next step. In this exercise we run over the data set name
/Jet/Run2011A-PromptReco-v4/AOD
. This
crab.cfg
also uses a file called
Cert_160404-167151_7TeV_PromptReco_Collisions11_JSON.txt
. This file that describes which luminosity sections in which runs are considered good and should be processed. In CMS, this kind of file is in the JSON format. (JSON stands for Java Script Object Notation).
To find the most current good luminosity section files in JSON format, please visit
https://cms-service-dqm.web.cern.ch/cms-service-dqm/CAF/certification/Collisions11/
.
To know on how to work with files for Good Luminosity Sections in JSON format please look at the twiki
SWGuideGoodLumiSectionsJSONFile.
Instructions on how to setup your crab jobs over selected lumi section is
Running over selected luminosity blocks.
You can find the official (centrally produced) JSON format files
here
or on afs at the CAF area
/afs/cern.ch/cms/CAF/CMSCOMM/COMM_DQM/certification
. At the same locations as the JSON format file you can also find the txt file reporting the history of the quality and DCS selection and at the bottom the snippet to configure any CMSSW application for running on the selected good Lumi Sections.
Step 5 - Get JSON
format file, creating writable directory in castor
and setup CRAB
environment
Get the
JSON
file to you
WORKINGDIRECTORY/exercise04/CMSSW_7_4_1_patch4/src
area
wget http://cms-service-dqm.web.cern.ch/cms-service-dqm/CAF/certification/Collisions11/7TeV/Prompt/Cert_160404-167151_7TeV_PromptReco_Collisions11_JSON.txt
Notice how character "s" was removed from "https".
wget
command does not like "https" somehow.
Create a directory in
castor
called
pattutorial
rfmkdir /castor/cern.ch/user/m/malik/pattutorial
Create a directory in
castor
called
jet2011A_AOD
rfmkdir /castor/cern.ch/user/m/malik/pattutorial/jet2011A_AOD
This directory has to be
group
writable, so change the persmission accordingly
rfchmod 775 /castor/cern.ch/user/m/malik/pattutorial/jet2011A_AOD
To see your created directory do:
nsls /castor/cern.ch/user/m/malik/pattutorial/jet2011A_AOD
Setup up CRAB environment
source /afs/cern.ch/cms/ccs/wm/scripts/Crab/crab.csh
Step 6 - Create CRAB job
To create your CRAB job, execute the following command:
crab -create
You should see the following output on the screen:
crab: Version 2.7.8 running on Sat Jul 2 18:18:27 2011 CET (16:18:27 UTC)
crab. Working options:
scheduler glite
job type CMSSW
server ON (use_server)
working directory /afs/cern.ch/user/m/malik/scratch0/PAT2011JULY/CMSSW_4_2_4/src/crab_0_110702_181826/
Enter GRID pass phrase:
Your identity: /DC=org/DC=doegrids/OU=People/CN=Sudhir Malik 503653
Creating temporary proxy ............................................................................ Done
Contacting voms.cern.ch:15002 [/DC=ch/DC=cern/OU=computers/CN=voms.cern.ch] "cms" Done
Creating proxy ............................................................................................. Done
Your proxy is valid until Sun Jul 10 18:18:35 2011
crab: Contacting Data Discovery Services ...
crab: Accessing DBS at: http://cmsdbsprod.cern.ch/cms_dbs_prod_global/servlet/DBSServlet
crab: Requested (A)DS /Jet/Run2011A-PromptReco-v4/AOD has 194 block(s).
crab: 495 jobs created to run on 36743 lumis
************** MC dependence removal ************
removing MC dependencies for photons
removing MC dependencies for electrons
removing MC dependencies for muons
removing MC dependencies for taus
removing MC dependencies for jets
WARNING: called applyPostfix for module/sequence patJetPartonMatch which is not in patDefaultSequence!
WARNING: called applyPostfix for module/sequence patJetGenJetMatch which is not in patDefaultSequence!
WARNING: called applyPostfix for module/sequence patJetGenJetMatch which is not in patDefaultSequence!
WARNING: called applyPostfix for module/sequence patJetPartonAssociation which is not in patDefaultSequence!
=================================================
Type1MET corrections are switched off for other
jet types but CaloJets. Users are recommened to
use pfMET together with PFJets & tcMET together
with JPT jets.
=================================================
WARNING: called applyPostfix for module/sequence patDefaultSequence which is not in patDefaultSequence!
WARNING: called applyPostfix for module/sequence patDefaultSequence which is not in patDefaultSequence!
WARNING: called applyPostfix for module/sequence patDefaultSequence which is not in patDefaultSequence!
removed from lepton counter: taus
---------------------------------------------------------------------
INFO : some objects have been removed from the sequence. Switching
off PAT cross collection cleaning, as it might be of limited
sense now. If you still want to keep object collection cross
cleaning within PAT you need to run and configure it by hand
WARNING: called applyPostfix for module/sequence countPatCandidates which is not in patDefaultSequence!
---------------------------------------------------------------------
INFO : cleaning has been removed. Switch output from clean PAT
candidates to selected PAT candidates.
switchOnTrigger():
PATTriggerProducer module patTrigger exists already in sequence patDefaultSequence
==> entry re-used
---------------------------------------------------------------------
switchOnTrigger():
PATTriggerEventProducer module patTriggerEvent exists already in sequence patDefaultSequence
==> entry re-used
---------------------------------------------------------------------
crab: Checking remote location
crab: WARNING: The stageout directory already exists. Be careful not to accidentally mix outputs from different tasks
crab: Creating 495 jobs, please wait...
crab: Total of 495 jobs created.
Log file is /afs/cern.ch/user/m/malik/scratch0/PAT2011JULY/CMSSW_4_2_4/src/crab_0_110702_181826/log/crab.log
Step 7 - Submit CRAB job
To submit the CRAB job you should execute the following command:
crab -submit
You should see the following output on the screen:
crab: Version 2.7.8 running on Sat Jul 2 18:22:20 2011 CET (16:22:20 UTC)
crab. Working options:
scheduler glite
job type CMSSW
server ON (default)
working directory /afs/cern.ch/user/m/malik/scratch0/PAT2011JULY/CMSSW_4_2_4/src/crab_0_110702_181826/
crab: Registering credential to the server : vocms58.cern.ch
crab: Credential successfully delegated to the server.
crab: Starting sending the project to the storage vocms58.cern.ch...
crab: Task crab_0_110702_181826 successfully submitted to server vocms58.cern.ch
crab: Total of 495 jobs submitted
Log file is /afs/cern.ch/user/m/malik/scratch0/PAT2011JULY/CMSSW_4_2_4/src/crab_0_110702_181826/log/crab.log
Step 8 - Status of CRAB job
To see the status of your CRAB job, execute the following command:
crab -status
You should see the following output on the screen:
crab: Version 2.7.8 running on Sat Jul 2 18:23:19 2011 CET (16:23:19 UTC)
crab. Working options:
scheduler glite
job type CMSSW
server ON (default)
working directory /afs/cern.ch/user/m/malik/scratch0/PAT2011JULY/CMSSW_4_2_4/src/crab_0_110702_181826/
crab:
ID END STATUS ACTION ExeExitCode JobExitCode E_HOST
----- --- ----------------- ------------ ---------- ----------- ---------
1 N Submitting SubRequested
2 N Submitting SubRequested
3 N Submitting SubRequested
4 N Submitting SubRequested
5 N Submitting SubRequested
6 N Submitting SubRequested
7 N Submitting SubRequested
8 N Submitting SubRequested
9 N Submitting SubRequested
10 N Submitting SubRequested
--------------------------------------------------------------------------------
11 N Submitting SubRequested
12 N Submitting SubRequested
13 N Submitting SubRequested
14 N Submitting SubRequested
15 N Submitting SubRequested
16 N Submitting SubRequested
17 N Submitting SubRequested
18 N Submitting SubRequested
19 N Submitting SubRequested
20 N Submitting SubRequested
--------------------------------------------------------------------------------
21 N Submitting SubRequested
22 N Submitting SubRequested
23 N Submitting SubRequested
24 N Submitting SubRequested
25 N Submitting SubRequested
26 N Submitting SubRequested
27 N Submitting SubRequested
28 N Submitting SubRequested
29 N Submitting SubRequested
30 N Submitting SubRequested
--------------------------------------------------------------------------------
.................................................................
.................................................................
--------------------------------------------------------------------------------
471 N Submitting SubRequested
472 N Submitting SubRequested
473 N Submitting SubRequested
474 N Submitting SubRequested
475 N Submitting SubRequested
476 N Submitting SubRequested
477 N Submitting SubRequested
478 N Submitting SubRequested
479 N Submitting SubRequested
480 N Submitting SubRequested
--------------------------------------------------------------------------------
481 N Submitting SubRequested
482 N Submitting SubRequested
483 N Submitting SubRequested
484 N Submitting SubRequested
485 N Submitting SubRequested
486 N Submitting SubRequested
487 N Submitting SubRequested
488 N Submitting SubRequested
489 N Submitting SubRequested
490 N Submitting SubRequested
--------------------------------------------------------------------------------
491 N Submitting SubRequested
492 N Submitting SubRequested
493 N Submitting SubRequested
494 N Submitting SubRequested
495 N Submitting SubRequested
crab: 495 Total Jobs
>>>>>>>>> 495 Jobs Submitting
crab: You can also follow the status of this task on :
CMS Dashboard: http://dashb-cms-job-task.cern.ch/taskmon.html#task=malik_crab_0_110702_181826_2c05hd
Server page: http://vocms58.cern.ch:8888/logginfo
Your task name is: malik_crab_0_110702_181826_2c05hd
Log file is /afs/cern.ch/user/m/malik/scratch0/PAT2011JULY/CMSSW_4_2_4/src/crab_0_110702_181826/log/crab.log
You can also see the status of your CRAB job on the dashboard URL listed at end of the screen output of the above step i.e.
http://dashb-cms-job-task.cern.ch/taskmon.html
(here your task).
The contents of this
URL
will depend on the status of your jobs i.e. running, pending, successful, failed etc. An example scren shot of this URL is here:
Another view of the url obtained by clicking on "TaskMonitorId" is:
NOTE: There could be a mismatch between seeing the status of your jobs by executing the command
crab -status
and the dashboard URL.
If you see any jobs with status
successful
, you should see the output root files in
/castor in the directory
/castor/cern.ch/user/m/malik/pattutorial/jet2011A_AOD
To look at this directory, do
rfdir /castor/cern.ch/user/m/malik/pattutorial/jet2011A_AOD
If there are any output files in this directory, it should look like this:
-rw-r--r-- 1 cms003 zh 1245918093 Jun 30 17:33 jet2011A_aod_100_0_I5h.root
-rw-r--r-- 1 cms003 zh 1274219226 Jun 30 17:39 jet2011A_aod_101_0_CRr.root
-rw-r--r-- 1 cms003 zh 972050289 Jun 30 17:20 jet2011A_aod_102_0_3DV.root
-rw-r--r-- 1 cms003 zh 1089918123 Jun 30 17:56 jet2011A_aod_103_0_tFV.root
-rw-r--r-- 1 cms003 zh 903472800 Jun 30 17:49 jet2011A_aod_105_0_wDk.root
-rw-r--r-- 1 cms003 zh 1193558045 Jun 30 17:34 jet2011A_aod_106_0_KpW.root
-rw-r--r-- 1 cms003 zh 647739659 Jun 30 17:03 jet2011A_aod_107_0_MGr.root
-rw-r--r-- 1 cms003 zh 1025885141 Jun 30 17:47 jet2011A_aod_108_0_9Rj.root
-rw-r--r-- 1 cms003 zh 1284572326 Jun 30 17:57 jet2011A_aod_109_0_iyZ.root
-rw-r--r-- 1 cms003 zh 1311061481 Jun 30 11:53 jet2011A_aod_10_1_Sj5.root
-rw-r--r-- 1 cms003 zh 1280795427 Jul 01 07:32 jet2011A_aod_110_2_EhQ.root
-rw-r--r-- 1 cms003 zh 1294175267 Jun 30 18:15 jet2011A_aod_113_0_esC.root
-rw-r--r-- 1 cms003 zh 1277053249 Jun 30 18:19 jet2011A_aod_114_0_pq3.root
-rw-r--r-- 1 cms003 zh 1285075447 Jun 30 18:17 jet2011A_aod_115_0_fOy.root
-rw-r--r-- 1 cms003 zh 1259237238 Jun 30 18:25 jet2011A_aod_116_0_RXi.root
-rw-r--r-- 1 cms003 zh 1245886124 Jun 30 18:22 jet2011A_aod_117_0_DIT.root
-rw-r--r-- 1 cms003 zh 1215675870 Jun 30 13:55 jet2011A_aod_11_1_kT4.root
-rw-r--r-- 1 cms003 zh 1267111659 Jun 30 18:17 jet2011A_aod_120_0_BNj.root
-rw-r--r-- 1 cms003 zh 1249178192 Jun 30 18:20 jet2011A_aod_121_0_Pw9.root
-rw-r--r-- 1 cms003 zh 1067304425 Jun 30 18:14 jet2011A_aod_122_0_GDS.root
-rw-r--r-- 1 cms003 zh 1249161461 Jun 30 18:18 jet2011A_aod_123_0_4Ku.root
-rw-r--r-- 1 cms003 zh 1011336451 Jun 30 17:25 jet2011A_aod_124_0_FaR.root
-rw-r--r-- 1 cms003 zh 1100894887 Jun 30 17:30 jet2011A_aod_125_0_w6E.root
-rw-r--r-- 1 cms003 zh 1303136732 Jun 30 17:32 jet2011A_aod_127_0_PCd.root
-rw-r--r-- 1 cms003 zh 1292889530 Jun 30 17:37 jet2011A_aod_128_0_UGc.root
-rw-r--r-- 1 cms003 zh 843154097 Jun 30 13:18 jet2011A_aod_12_1_jS0.root
-rw-r--r-- 1 cms003 zh 1288254355 Jun 30 17:40 jet2011A_aod_130_0_SI7.root
-rw-r--r-- 1 cms003 zh 1288040088 Jun 30 17:31 jet2011A_aod_131_0_BNS.root
-rw-r--r-- 1 cms003 zh 1286172683 Jun 30 17:34 jet2011A_aod_132_0_gby.root
-rw-r--r-- 1 cms003 zh 1261984151 Jun 30 17:31 jet2011A_aod_133_0_xeu.root
-rw-r--r-- 1 cms003 zh 1929437920 Jun 30 17:59 jet2011A_aod_134_0_mzS.root
-rw-r--r-- 1 cms003 zh 1291247174 Jun 30 17:31 jet2011A_aod_135_0_5lU.root
-rw-r--r-- 1 cms003 zh 1277940439 Jun 30 17:34 jet2011A_aod_136_0_B6g.root
-rw-r--r-- 1 cms003 zh 1257422925 Jun 30 17:30 jet2011A_aod_137_0_0cf.root
-rw-r--r-- 1 cms003 zh 1278923610 Jun 30 17:42 jet2011A_aod_138_0_9Vn.root
-rw-r--r-- 1 cms003 zh 1282749623 Jun 30 17:31 jet2011A_aod_139_0_yFP.root
-rw-r--r-- 1 cms003 zh 1320252214 Jun 30 15:36 jet2011A_aod_13_1_MYg.root
-rw-r--r-- 1 cms003 zh 1281544555 Jun 30 17:43 jet2011A_aod_140_0_EIG.root
-rw-r--r-- 1 cms003 zh 1255142335 Jun 30 17:40 jet2011A_aod_142_0_j7Q.root
-rw-r--r-- 1 cms003 zh 1260618421 Jun 30 17:36 jet2011A_aod_143_0_1wD.root
-rw-r--r-- 1 cms003 zh 1169947714 Jun 30 18:14 jet2011A_aod_145_0_nF8.root
Step 9 - Job Output Retrieval Check for output files in /castor
For the jobs which are in the "Done" state it is possible to retrieve the log files of the jobs (just the log files), because the output files are copied to the Storage Element associated to the T2 specified on the
crab.cfg
which is
/castor/cern.ch/user/m/malik/pattutorial/jet2011A_AOD
in our case and infact return_data is 0 which means we are not publishing the data to the DBS. The following command retrieves the log files of all "Done" jobs of the last created CRAB project. The job results will be copied in the
res
subdirectory of your crab project ( for example
crab_0_110702_181826
):
crab -getoutput
When you execute this command you should see an output should look like:
crab: Version 2.7.8 running on Fri Jul 1 10:06:14 2011 CET (08:06:14 UTC)
crab. Working options:
scheduler glite
job type CMSSW
server ON (default)
working directory /afs/cern.ch/user/m/malik/scratch0/PAT2011JULY/CMSSW_4_2_4/src/crab_0_110630_103547/
crab: Only 494 jobs will be retrieved from 495 requested.
(for details: crab -status)
crab: Starting retrieving output from server vocms58.cern.ch...
crab: Results of Jobs # 1 are in /afs/cern.ch/user/m/malik/scratch0/PAT2011JULY/CMSSW_4_2_4/src/crab_0_110630_103547/res/
crab: Results of Jobs # 2 are in /afs/cern.ch/user/m/malik/scratch0/PAT2011JULY/CMSSW_4_2_4/src/crab_0_110630_103547/res/
crab: Results of Jobs # 3 are in /afs/cern.ch/user/m/malik/scratch0/PAT2011JULY/CMSSW_4_2_4/src/crab_0_110630_103547/res/
crab: Results of Jobs # 4 are in /afs/cern.ch/user/m/malik/scratch0/PAT2011JULY/CMSSW_4_2_4/src/crab_0_110630_103547/res/
crab: Results of Jobs # 5 are in /afs/cern.ch/user/m/malik/scratch0/PAT2011JULY/CMSSW_4_2_4/src/crab_0_110630_103547/res/
crab: Results of Jobs # 6 are in /afs/cern.ch/user/m/malik/scratch0/PAT2011JULY/CMSSW_4_2_4/src/crab_0_110630_103547/res/
crab: Results of Jobs # 7 are in /afs/cern.ch/user/m/malik/scratch0/PAT2011JULY/CMSSW_4_2_4/src/crab_0_110630_103547/res/
crab: Results of Jobs # 8 are in /afs/cern.ch/user/m/malik/scratch0/PAT2011JULY/CMSSW_4_2_4/src/crab_0_110630_103547/res/
crab: Results of Jobs # 9 are in /afs/cern.ch/user/m/malik/scratch0/PAT2011JULY/CMSSW_4_2_4/src/crab_0_110630_103547/res/
crab: Results of Jobs # 10 are in /afs/cern.ch/user/m/malik/scratch0/PAT2011JULY/CMSSW_4_2_4/src/crab_0_110630_103547/res/
crab: Results of Jobs # 11 are in /afs/cern.ch/user/m/malik/scratch0/PAT2011JULY/CMSSW_4_2_4/src/crab_0_110630_103547/res/
crab: Results of Jobs # 12 are in /afs/cern.ch/user/m/malik/scratch0/PAT2011JULY/CMSSW_4_2_4/src/crab_0_110630_103547/res/
crab: Results of Jobs # 13 are in /afs/cern.ch/user/m/malik/scratch0/PAT2011JULY/CMSSW_4_2_4/src/crab_0_110630_103547/res/
crab: Results of Jobs # 14 are in /afs/cern.ch/user/m/malik/scratch0/PAT2011JULY/CMSSW_4_2_4/src/crab_0_110630_103547/res/
crab: Results of Jobs # 15 are in /afs/cern.ch/user/m/malik/scratch0/PAT2011JULY/CMSSW_4_2_4/src/crab_0_110630_103547/res/
crab: Results of Jobs # 16 are in /afs/cern.ch/user/m/malik/scratch0/PAT2011JULY/CMSSW_4_2_4/src/crab_0_110630_103547/res/
crab: Results of Jobs # 17 are in /afs/cern.ch/user/m/malik/scratch0/PAT2011JULY/CMSSW_4_2_4/src/crab_0_110630_103547/res/
crab: Results of Jobs # 18 are in /afs/cern.ch/user/m/malik/scratch0/PAT2011JULY/CMSSW_4_2_4/src/crab_0_110630_103547/res/
crab: Results of Jobs # 19 are in /afs/cern.ch/user/m/malik/scratch0/PAT2011JULY/CMSSW_4_2_4/src/crab_0_110630_103547/res/
crab: Results of Jobs # 20 are in /afs/cern.ch/user/m/malik/scratch0/PAT2011JULY/CMSSW_4_2_4/src/crab_0_110630_103547/res/
crab: Results of Jobs # 21 are in /afs/cern.ch/user/m/malik/scratch0/PAT2011JULY/CMSSW_4_2_4/src/crab_0_110630_103547/res/
crab: Results of Jobs # 22 are in /afs/cern.ch/user/m/malik/scratch0/PAT2011JULY/CMSSW_4_2_4/src/crab_0_110630_103547/res/
crab: Results of Jobs # 23 are in /afs/cern.ch/user/m/malik/scratch0/PAT2011JULY/CMSSW_4_2_4/src/crab_0_110630_103547/res/
crab: Results of Jobs # 24 are in /afs/cern.ch/user/m/malik/scratch0/PAT2011JULY/CMSSW_4_2_4/src/crab_0_110630_103547/res/
crab: Results of Jobs # 25 are in /afs/cern.ch/user/m/malik/scratch0/PAT2011JULY/CMSSW_4_2_4/src/crab_0_110630_103547/res/
crab: Results of Jobs # 26 are in /afs/cern.ch/user/m/malik/scratch0/PAT2011JULY/CMSSW_4_2_4/src/crab_0_110630_103547/res/
................................
................................
................................
It may happen that while doing
crab -getoutput
, your disk quota gets exceeded. In that case you will not be able to look at the log files of all the jobs.
If you succeed in getting
crab -getoutput
of however many jobs that got done, your
crab_0_110702_181826/res
should have all the log files and look like the output below.
Note. Let us say jobs 1 to 50 and 60 to 70 and 95 are done and you do not want to wait till all the jobs are done, you can get their output by doing
crab -getoutput 1-50,6-70, 95
...................
...................
...................
-rw------- 1 malik zh 48K Jun 30 11:30 crab_fjr_2.xml
-rw------- 1 malik zh 4.5M Jun 30 11:30 CMSSW_2.stdout
-rw------- 1 malik zh 49K Jun 30 11:32 crab_fjr_4.xml
-rw------- 1 malik zh 4.4M Jun 30 11:32 CMSSW_4.stdout
-rw------- 1 malik zh 48K Jun 30 11:32 crab_fjr_14.xml
-rw------- 1 malik zh 4.8M Jun 30 11:32 CMSSW_14.stdout
-rw------- 1 malik zh 50K Jun 30 11:33 crab_fjr_24.xml
-rw------- 1 malik zh 5.0M Jun 30 11:33 CMSSW_24.stdout
-rw------- 1 malik zh 49K Jun 30 11:33 crab_fjr_16.xml
-rw------- 1 malik zh 5.0M Jun 30 11:33 CMSSW_16.stdout
-rw------- 1 malik zh 49K Jun 30 11:34 crab_fjr_9.xml
-rw------- 1 malik zh 4.6M Jun 30 11:34 CMSSW_9.stdout
-rw------- 1 malik zh 50K Jun 30 11:34 crab_fjr_23.xml
-rw------- 1 malik zh 5.0M Jun 30 11:34 CMSSW_23.stdout
-rw------- 1 malik zh 49K Jun 30 11:37 crab_fjr_21.xml
-rw------- 1 malik zh 5.0M Jun 30 11:37 CMSSW_21.stdout
-rw------- 1 malik zh 48K Jun 30 11:38 crab_fjr_1.xml
-rw------- 1 malik zh 4.4M Jun 30 11:38 CMSSW_1.stdout
-rw------- 1 malik zh 49K Jun 30 11:38 crab_fjr_6.xml
-rw------- 1 malik zh 4.6M Jun 30 11:38 CMSSW_6.stdout
-rw------- 1 malik zh 48K Jun 30 11:41 crab_fjr_7.xml
-rw------- 1 malik zh 4.8M Jun 30 11:42 CMSSW_7.stdout
-rw------- 1 malik zh 49K Jun 30 11:49 crab_fjr_20.xml
-rw------- 1 malik zh 5.0M Jun 30 11:49 CMSSW_20.stdout
-rw------- 1 malik zh 49K Jun 30 11:50 crab_fjr_5.xml
-rw------- 1 malik zh 4.6M Jun 30 11:50 CMSSW_5.stdout
-rw------- 1 malik zh 48K Jun 30 11:53 crab_fjr_10.xml
Step 10 - Check log files to trace problems, if any
You can look at the log files in
crab_0_110702_181826/res
directory after executing the above step to see the details in case a job fails.
You can also print a short report about the task, namely the total number of events and files processed/requested/available, the name of the dataset path, a summary of the status of the jobs, and so on. A summary file of the runs and luminosity sections processed is written to res/. In principle -report should generate all the info needed for an analysis. To get a report execute the following. Note in this case 1 job out of 495 jobs did not execute.
crab -report
The output of this command should look like something like this ( this is an old cut and paste, but conveys the message):
crab: Version 2.7.8 running on Fri Jul 1 09:59:55 2011 CET (07:59:55 UTC)
crab. Working options:
scheduler glite
job type CMSSW
server ON (default)
working directory /afs/cern.ch/user/m/malik/scratch0/PAT2011JULY/CMSSW_4_2_4/src/crab_0_110630_103547/
crab: --------------------
Dataset: /Jet/Run2011A-PromptReco-v4/AOD
Remote output :
SE: srm-cms.cern.ch srm-cms.cern.ch srmPath: srm://srm-cms.cern.ch:8443/srm/managerv2?SFN=/castor/cern.ch/user/m/malik/pattutorial/jet2011A_AOD/
------------------
------------------
Total Jobs : 495
Luminosity section summary file: /afs/cern.ch/user/m/malik/scratch0/PAT2011JULY/CMSSW_4_2_4/src/crab_0_110630_103547/res/lumiSummary.json
# Jobs: Cleared:494
----------------------------
Log file is /afs/cern.ch/user/m/malik/scratch0/PAT2011JULY/CMSSW_4_2_4/src/crab_0_110630_103547/log/crab.log
If you want to publish your output to DBS, you need to re-run your CRAB job with modification of some option. Please see
here for more details.
Step 10 - Open an output root file to make sure you see the plots of variables.
Make sure to open a root file to see if it contains what you wnanted. Once you have your output root files, you are ready to analyze them.
To open a root file from your castor storage area, do as follows ( as an example, replace path name and root file name to what you actually have in your area)
root -l /castor/cern.ch/user/m/malik/pattutorial/jet2011A_AOD/jet2011A_aod_89_0_Afy.root
Note:
If you are doing this exercise in the context of the PAT Tutorial course in case of problems don't hesitate to contact the
SWGuidePAT#Support. Having successfully finished
Exercise 4 you might want to proceed to
Exercise 5 to learn how to access the pat::Candidate collections that you just produced within an EDAnalyzer or within an FWLite executable. For an overview you can go back to the
WorkBookPATTutorial entry page.
--
SudhirMalik - 2-July-2010
StefanoBelforte - 2015-05-24 put BIG warning about most of things here being obsolete |
Main.Sudhir - 30 June 2011 |
update to CMSSW_4_2_4 |
Main.Sudhir - 19 May 2010 |
update to CMSSW_3_8_3, last screen shoton "crab -report" still old |
Main.Sudhir - 13 May 2010 |
update to CMSSW_3_6_1, screen shots still for CMSSW_3_5_7 |