Production validation harvesting
This was set up following instructions from Luca Malgeri to provide a first handle on how to run harvesting for validation in 3.1.0.
NOTES:
- These instructions may work only on LXPLUS. I'm not sure where the setup scripts that are used are available elsewhere.
- The instructions below assume use of a BASH shell. Changing to a CSH-family shell can be done in the usual way by using the
.csh
versions of the scripts instead of the .sh
ones.
- These instructions were put together for harvesting of the RelVal samples (which are stored on the CERN Tier-1). The PreProduction samples will all be on Tier-2's. As soon as possible updated instructions to run harvesting with CRAB on the PreProduction samples will become available here.
Setting up and running the harvesting
- We start with a clean 3.1.0 setup in a recognizable place:
mkdir prodval_harvesting
cd prodval_harvesting
cmsrel CMSSW_3_1_0
cd CMSSW_3_1_0/src
- Source the required setup scripts to get CRAB and the LCG UI working. NOTE: The order of these is important. If you find you end up with the wrong version of Python (i.e. 2.3.x instead of 2.4.x), the order was probably wrong.
source /afs/cern.ch/cms/LCG/LCG-2/UI/cms_ui_env.sh
cmsenv
source /afs/cern.ch/cms/ccs/wm/scripts/Crab/crab.sh
- Now we get the required configuration for prodval harvesting (still in the src dir):
addpkg Configuration/StandardSequences
cvs co -r 1.7 Configuration/StandardSequences/python/Harvesting_cff.py
cvs co -r V01-00-17 HLTriggerOffline/Common
scram build
NOTE: This will probably complain about a missing module named
volumeBasedMagneticField_85l_cfi
. This appears to be harmless for our purposes.
- At this point we can ask
cmsDriver.py
to perform some (white!) magic for us to create the configuration file:
cmsDriver.py step3_MC -s HARVESTING:validationprodHarvesting --harvesting AtJobEnd --conditions FrontierConditions_GlobalTag,MC_31X_V1::All --filein file:step2_MC_RAW2DIGI_RECO_VALIDATION.root --mc --no_exec
This should create a configuration file called
step3_MC_HARVESTING_MC.py
in your working dir.
- The url to find the datasets to be validated is this one
. Pick any of the GEN-SIM-RECO
datasets from there.
- All of these datasets should be available at CERN's Tier-1, so we can just run on them through CASTOR. Modify the
step3_MC_HARVESTING_MC.py
file to read in the files corresponding to the wanted dataset. In this case we want to process the /RelValMinBias_2M_PROD/CMSSW_3_1_0_pre11-MC_31X_V1-v1/GEN-SIM-RECO
dataset. /Hint:/ By clicking on the 'plain' LFN link just below the dataset name in the DBS page one can get a list of the corresponding file names.
After modification, your step3
config file should look something like this:
# Auto generated configuration file
# using:
# Revision: 1.123
# Source: /cvs_server/repositories/CMSSW/CMSSW/Configuration/PyReleaseValidation/python/ConfigBuilder.py,v
# with command line options: step3_MC -s HARVESTING:validationprodHarvesting --harvesting AtJobEnd --conditions FrontierConditions_GlobalTag,MC_31X_V1::All --filein file:step2_MC_RAW2DIGI_RECO_VALIDATION.root --mc --no_exec
import FWCore.ParameterSet.Config as cms
process = cms.Process('HARVESTING')
# import of standard configurations
process.load('Configuration/StandardSequences/Services_cff')
process.load('FWCore/MessageService/MessageLogger_cfi')
process.load('Configuration/StandardSequences/MixingNoPileUp_cff')
process.load('Configuration/StandardSequences/GeometryIdeal_cff')
process.load('Configuration/StandardSequences/MagneticField_38T_cff')
process.load('Configuration/StandardSequences/EDMtoMEAtJobEnd_cff')
process.load('Configuration/StandardSequences/Harvesting_cff')
process.load('Configuration/StandardSequences/FrontierConditions_GlobalTag_cff')
process.configurationMetadata = cms.untracked.PSet(
version = cms.untracked.string('$Revision: 1.1 $'),
annotation = cms.untracked.string('step3_MC nevts:1'),
name = cms.untracked.string('PyReleaseValidation')
)
process.maxEvents = cms.untracked.PSet(
input = cms.untracked.int32(1)
)
process.options = cms.untracked.PSet(
Rethrow = cms.untracked.vstring('ProductNotFound'),
fileMode = cms.untracked.string('FULLMERGE')
)
# Input source
process.source = cms.Source("PoolSource",
processingMode = cms.untracked.string('RunsAndLumis'),
fileNames = cms.untracked.vstring(
"/store/relval/CMSSW_3_1_0_pre11/RelValMinBias_2M_PROD/GEN-SIM-RECO/MC_31X_V1-v1/0000/E49ABFE2-EE64-DE11-A43C-000423D991D4.root",
"/store/relval/CMSSW_3_1_0_pre11/RelValMinBias_2M_PROD/GEN-SIM-RECO/MC_31X_V1-v1/0000/D497326B-C564-DE11-AE6A-000423D95030.root",
"/store/relval/CMSSW_3_1_0_pre11/RelValMinBias_2M_PROD/GEN-SIM-RECO/MC_31X_V1-v1/0000/A65659E9-C664-DE11-847D-001D09F295A1.root"
)
)
# Additional output definition
# Other statements
process.GlobalTag.globaltag = 'MC_31X_V1::All'
# Path and EndPath definitions
process.edmtome_step = cms.Path(process.EDMtoME)
process.validationprodHarvesting = cms.Path(process.postValidation*process.hltpostvalidation_prod)
process.dqmHarvesting = cms.Path(process.DQMOffline_SecondStep*process.DQMOffline_Certification)
process.validationHarvestingFS = cms.Path(process.HarvestingFastSim)
process.validationHarvesting = cms.Path(process.postValidation*process.hltpostvalidation)
process.dqmsave_step = cms.Path(process.DQMSaver)
# Schedule definition
process.schedule = cms.Schedule(process.edmtome_step,process.validationprodHarvesting,process.dqmsave_step)
- Finally, we can run the harvesting. This should only take a little time and produce an output file of about 15 MB.
cmsRun step3_MC_HARVESTING_MC.py
After telling you that it opened all files and that nothing bad happened,
cmsRun
will leave you with a new ROOT file with a long name. In the above case:
DQM_V0001_R000000001__Global__CMSSW_X_Y_Z__RECO.root
.
Making the results available for scrutiny
It's easiest for `public' access if things are stored in your CASTOR area. For CASTOR usage see
here
or
WorkBookSetComputerNode.
--
JeroenHegeman - 03 Jul 2009