5.6 Data Analysis with CRAB
Complete:
Detailed Review status
Introduction and Editorial Note
This Workbook Chapter reproduces text from the CRAB guide:
SWGuideCrab and other SWGuide pages linked in there. Text from the SWGuide twiki's is included, not linked, for easier reading and a bit reorganized but any change there will be reflected here. There should be no need to edit this twiki page to update instructions.
CRAB is a utility to submit CMSSW jobs to distributed computing resources. By using CRAB you will be able to:
- Access CMS data and Monte-Carlo which are distributed to CMS aligned centres worldwide.
- Exploit the CPU and storage resources at CMS aligned centres.
Prerequisites
To use CRAB to submit your CMSSW job to the Grid you must meet some prerequisites:
Get a Grid certificate and the registration to CMS VO
CRAB submits jobs to the Grid (LCG), so you need to run it from an User Interface, with a valid certificate, issued by your appropriate Certification Authority, and have a valid proxy. You need also to be registered on VORMS server. To get a certificate from CERN CA and register to CMS VO, you can find detailed instruction in the
SWGuideLcgAccess page. If you get a certificate from another Certification Authority, the procedure to register to CMS VO with your certificate should be the same.
Setup your certificate for LCG
See instructions in
this Offline Workbook page
Test your grid certificate
- Is your personal certificate able to generate Grid proxies? To find out, after having setup your environment run this command:
grid-proxy-init -debug -verify
In case of failure, the possible causes are:
- the certificate/key pair is not installed in
$HOME/.globus/usercert.pem $HOME/.globus/userkey.pem
(a.k.a. "pem files")
- the certificate has expired
- the certificate and the private key do not match
In the first case, you either do not have a certificate at all or have to install it on the UI; in the second case, you should get a new certificate; in the third case you probably have incorrectly installed your certificate.
- Are you a member of the CMS VO? To see if this is the case, you can execute this command:
voms-proxy-init -voms cms
If you get an error, chances are that you did not register to the CMS VO, or your registration expired. In this case, please follow the instructions in the SWGuideLcgAccess page
- You can verify the expiration date of your certificate with:
openssl x509 -subject -dates -noout -in $HOME/.globus/usercert.pem
- see also: https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideVomsFAQ
Test the code locally
Before launching a million event analysis job on Grid, be sure to test your code locally in a clean area.
- Build a new CMSSW area (for example, CMSSW_5_2_5_...; pick as appropriate to your job):
cmsrel CMSSW_5_2_5
cd CMSSW_5_2_5/src
cmsenv
- Check-out from the cvs repository only the code or configuration files you need to modify, and build your local libraries including your analysis code.
- Make sure that the code you check-out is compatible with the CMSSW version you are using.
- Make sure that the CMSSW version you are using is compatible with the data you intend to read.
- Prepare a test job accessing the data you will access in your Grid job. There are several ways to read the proper data:
- The easiest way is to use the xrootd service to read data directly from a remote site. How to do this is explained in Using Xrootd Service for remote Data Accessing.
- You can also use the xrootd service to copy a data file from a suitable dataset to your local machine (to work w/o network e.g.), as explained in File download with command-line tools.
- If no suitable files exist, you can generate some events using the configuration file which is available from the DAS
service.
- Test your CMSSW configuration file locally in order to avoid problems with the ParameterSet parsing.
- Run the job interactively (e.g. at CERN on lxplus):
cmsRun your-pset-config-file.py
Validate a CMSSW config file
In CRAB2, a user can validate its CMSSW configuration file by launching
crab -validateCfg
after creating the task with
crab -create
. In this way the configuration file will be controlled and validated by a corresponding python API. Note that it is
not enough to check that the configuration file runs interactively, because in interactive mode CMSSW is too tolerant with python errors in that configuration file.
At times a user may worry that the problem is in CRAB or CRAB validation rather than in the configuration file; in this case, one can use the following test, which does not involve CRAB:
edmConfigHash your-pset-config-file.py
Note that this is needed, but not necessarely sufficient, to have a valid CMSSW configuration file.
Other problem could be related to some hidden charatecters (^M) in the configuration file, overall if it was downloaded from the web. To discover them you can use the command
cat -v your-pset-config-file.py
and remove them with the command
perl -pi -e 'tr/\cM//d;' your-pset-config-file.py
Then you can revalidate the configuration file again.
Use CRAB at CERN
please see
SWGuideCrab, in particular :
https://twiki.cern.ch/twiki/bin/view/CMSPublic/CRAB3CheatSheet#Environment_setup
Use CRAB outside CERN
Preferred way: use CVMFS
- Setup the Grid UI according to your site directions
- Follow same instructions as at CERN (CVMFS is globally available)
Basic Crab Commands
Please see
SWGuideCrab in particular see
CRAB3Commands
Common operations with CRAB
Please see
SWGuideCrab
Return results locally
Please see
SWGuideCrab in particular see
CRAB3ConfigurationFile
Copy results to a Storage Element
Please see
SWGuideCrab in particular see
CRAB3ConfigurationFile
Publish copied results in a Storage Element to a DBS instance
Please see
SWGuideCrab in particular see
CRAB3ConfigurationFile
Analyse published results
Please see
SWGuideCrab
Review status
Complete Review, no changes. The information on page is quite clear.
Responsible:
StefanoBelforte
Last reviewed by: Review Me