Previous page
Overview
This page contains a few instructions on how to use the data from the grid
On this page:
Download a single file
Before submitting to the grid, it is useful to test your scripts locally. To do it, one should locate and download a single test file. Here are the instructions on how to do it:
With xrdcp command:
xrdcp root://cms-xrd-global.cern.ch//store/path/to/file /some/local/path
from US, use
root://cmsxrootd.fnal.gov//store/path/to/file
. If global redirector is not working try
xrootd-cms.infn.it
Another ways of getting a file
download from specific site
Locate the file in
DAS
by searching for
file dataset=DATASET
file dataset=/SingleMuon/Run2017H-17Nov2017-v2/MINIAOD
Choose one site (in this example
T2_US_MIT
), and get file PFN by executing the following commands (replace the site and file name by one you need):
site=T2_US_MIT
lfn=/store/data/Run2017H/SingleMuon/MINIAOD/17Nov2017-v2/90000/FA9FA831-8B34-E811-BA1D-008CFAC93CFC.root
pfl=`curl -ks "https://cmsweb.cern.ch/phedex/datasvc/perl/prod/lfn2pfn?node=${site}&lfn=${lfn}&protocol=srmv2" | grep PFN | cut -d "'" -f4`
then I create a user proxy:
voms-proxy-init -voms cms
Set your UID from created proxy in
/tmp/x509up_u{UID}.
, and then set the correct X509_USER_PROXY and copy the file:
UID=58751
env -i X509_USER_PROXY=/tmp/x509up_u$UID gfal-copy -n 1 $pfl "file:///`pwd`/miniAOD.root"
Localing a PFN (physical file name)
OR you can locate it physical file name
edmFileUtil -d /store/relval/CMSSW_10_6_4/RelValZMM_13/MINIAODSIM/PUpmx25ns_106X_upgrade2018_realistic_v9-v1/10000/DBE18AD9-E36D-B449-B659-A71362DAC57A.root
Submit jobs using crab
Crab operations
After submitting a job via the
CRAB3ConfigurationFile, a folder
PROJECTFOLDER
will appear. You can see your submission process and do some operation using the crab commands.
The full list of crab commands can be found here:
CRAB3Commands
Here is the list of the most common commands:
- Inspect how the submission process proceeds:
crab status -d PROJECTFOLDER
- In case of errors, use
crab status --verboseErrors
for details
- To resubmit failed jobs with extra options:
crab resubmit -d PROJECTFOLDER --maxmemory=4000 --maxjobruntime=360 --numcores=1 --jobids=1,2
- To kill a project:
crab kill -d PROJECTFOLDER
- The results will appear in
cat PROJECTFOLDER | grep config.Data.outLFNDirBase | awk '{print $3}' | sed -e 's/"\//\/eos\/cms\//g' | sed -e 's/\/group//g' | sed -e 's/"//g'
Production
To reprocess data (or MC) do the following:
cmsrel CMSSW_12_4_3
cd CMSSW_12_4_3/src
cmsenv
- Produce the config file with the cmsDriver.py:
cmsDriver.py RECO -s RAW2DIGI,L1Reco,RECO --data --era Run3 --scenario pp --conditions 124X_dataRun3_Prompt_Candidate_2022_07_26_15_08_24 --eventcontent RECO --datatier RECO --filein file:RAW.root --customise Configuration/DataProcessing/Utils.addMonitoring --python_filename=pset_rereco.py --no_exec -n -1
- create configuration file:
import CRABClient
from CRABClient.UserUtilities import config
config = config()
config.General.requestName = 'Alignment_rereco'
config.General.workArea = 'crab_projects'
config.General.transferOutputs = True
config.JobType.pluginName = 'Analysis'
config.JobType.psetName = 'pset_rereco.py'
config.Data.inputDataset = '/ZeroBias/Run2022A-v1/RAW'
config.Data.inputDBS = 'global'
config.Data.splitting = 'LumiBased'
config.Data.unitsPerJob = 20
config.Data.runRange = '354329-354332'
config.Data.publication = True
config.Data.outLFNDirBase = '/store/user/mpitt/PROPOG/Alignment/v1'
config.Site.storageSite = "T2_CH_CERN"
crab submit -c crab_cfg.py
View jobs
To view running jobs goto
https://monit-grafana.cern.ch/
, click on
JOBS
→
CMS Task Monitoring - Task View
Debug failed jobs
If some of the jobs have errors, you can rerun the job locally using the following commands:
- Inspect job ID with
crab status --long -d PROJECTFOLDER
. If all jobs have errors, then look at the first few jobs. Here we rerun for --jobid=0
- To see the job log, run
crab getlog --short -d PROJECTFOLDER --jobid=0
and inspect the PROJECTFOLDER/crab.log
file
To resubmit the job locally:
- run
crab preparelocal -d PROJECTFOLDER
.
- from
PROJECTFOLDER/local
execute run_job.sh 1
to run the first job, the job will be executed locally after unpacking CMSSW setup.
- kill the process with
Ctrl+C
- run
cmsRun -j FrameworkJobReport.xml PSet.py
to inspect the output.
To inspect memory usage (you are limited to 2GB by default), execute
ps aux
in a different shell.
cd JOBNAME/inputs
cmsRun PSet.py
Crabcache
Before executing these lines, run
export X509_USER_PROXY=/tmp/x509up_u58751
- To get all the files uploaded by a user to the crabcache and the amount of quota (in bytes) he's using:
curl -X GET 'https://cmsweb.cern.ch/crabcache/info?subresource=userinfo&username=mpitt' --key $X509_USER_PROXY --cert $X509_USER_PROXY -k
- To get more information about one specific file (the file must be owned by the user who makes the query):
curl -X GET 'https://cmsweb.cern.ch/crabcache/info?subresource=fileinfo&hashkey=697a932e19bd2912710fe0322de3eff41a5553f1f9820117a8262f0ebcd3640a' --key $X509_USER_PROXY --cert $X509_USER_PROXY -k
- To remove a specific file (currently you can only remove your files. In the future power users should be able to remove everything):
curl -X GET 'https://cmsweb.cern.ch/crabcache/info?subresource=fileremove&hashkey=697a932e19bd2912710fe0322de3eff41a5553f1f9820117a8262f0ebcd3640a' --key $X509_USER_PROXY --cert $X509_USER_PROXY -k
curl -X GET 'https://cmsweb.cern.ch/crabcache/info?subresource=basicquota' --key $X509_USER_PROXY --cert $X509_USER_PROXY -k
Restoring tast folders:
To get full task list execute:
crab tasks
- To restore lost folder:
crab remake --task=XXX
- To clean the cache of killed job
crab purge FOLDER
Obtaining Luminosity per dataset
From
crab report
, the location of JSON-formatted report file is listed. Copy this file to lxplus:
cp PROJECTFOLDER/results/processedLumis.json .
#setup BRIL (for the first time run pip install)
export PATH=$HOME/.local/bin:/cvmfs/cms-bril.cern.ch/brilconda/bin:$PATH
# pip install --install-option="--prefix=$HOME/.local" brilws
# get lumi from the crab submission:
brilcalc lumi -b "STABLE BEAMS" -i processedLumis.json -c /cvmfs/cms.cern.ch/SITECONF/T0_CH_CERN/JobConfig/site-local-config.xml -u /fb
Several option exists to retrieve info about a dataset, here is an example for finding AOD parent file of miniAOD file:
for f in `dasgoclient --query="parent file=/store/data/Run2017D/SingleElectron/MINIAOD/09Aug2019_UL2017-v1/50000/FD85D6D5-1095-EE44-9BDF-202A69E0F25C.root"`; do
dasgoclient --query="child file=$f" | grep AOD/09Aug2019_UL
done

The additional option
--normtag /afs/cern.ch/user/l/lumipro/public/Normtags/normtag_DATACERT.json
is not working for me...
Accessing grid files in condor
:
To use local condor batch to analyze files located at remote sites add
use_x509userproxy = true
in condor jdl file and setup proxy in your run file (recommended to set the proxy path first):
export X509_USER_PROXY=${HOME}/private/.x509up_${UID}
echo YOURPASSWORD | voms-proxy-init -voms cms -rfc -out ${HOME}/private/.x509up_${UID} -valid 192:00
debug condor jobs
Full list of jobs
condor_q -nobatch
To connect to a running job:
condor_ssh_to_job JobId
If jobs on hold:
condor_q -hold -af HoldReason
update SSH key in github
in linux run
ssh-keygen -t rsa
cat /afs/cern.ch/user/m/mpitt/.ssh/id_rsa.pub
Go to github, settings, SSH key, add new key, and copy the content of
/afs/cern.ch/user/m/mpitt/.ssh/id_rsa.pub
file
--
MichaelPitt - 2019-12-08