Production validation harvesting

This was set up following instructions from Luca Malgeri to provide a first handle on how to run harvesting for validation in 3.1.0.


  • These instructions may work only on LXPLUS. I'm not sure where the setup scripts that are used are available elsewhere.
  • The instructions below assume use of a BASH shell. Changing to a CSH-family shell can be done in the usual way by using the .csh versions of the scripts instead of the .sh ones.
  • These instructions were put together for harvesting of the RelVal samples (which are stored on the CERN Tier-1). The PreProduction samples will all be on Tier-2's. As soon as possible updated instructions to run harvesting with CRAB on the PreProduction samples will become available here.

Setting up and running the harvesting

  • We start with a clean 3.1.0 setup in a recognizable place:
mkdir prodval_harvesting
cd prodval_harvesting
cmsrel CMSSW_3_1_0
cd CMSSW_3_1_0/src
  • Source the required setup scripts to get CRAB and the LCG UI working. NOTE: The order of these is important. If you find you end up with the wrong version of Python (i.e. 2.3.x instead of 2.4.x), the order was probably wrong.
source /afs/
source /afs/
  • Now we get the required configuration for prodval harvesting (still in the src dir):
addpkg Configuration/StandardSequences
cvs co -r 1.7 Configuration/StandardSequences/python/
cvs co -r V01-00-17 HLTriggerOffline/Common
scram build
NOTE: This will probably complain about a missing module named volumeBasedMagneticField_85l_cfi. This appears to be harmless for our purposes.
  • At this point we can ask to perform some (white!) magic for us to create the configuration file: step3_MC -s HARVESTING:validationprodHarvesting --harvesting AtJobEnd --conditions FrontierConditions_GlobalTag,MC_31X_V1::All --filein file:step2_MC_RAW2DIGI_RECO_VALIDATION.root --mc --no_exec
This should create a configuration file called in your working dir.
  • The url to find the datasets to be validated is this one. Pick any of the GEN-SIM-RECO datasets from there.
  • All of these datasets should be available at CERN's Tier-1, so we can just run on them through CASTOR. Modify the file to read in the files corresponding to the wanted dataset. In this case we want to process the /RelValMinBias_2M_PROD/CMSSW_3_1_0_pre11-MC_31X_V1-v1/GEN-SIM-RECO dataset. /Hint:/ By clicking on the 'plain' LFN link just below the dataset name in the DBS page one can get a list of the corresponding file names.
    After modification, your step3 config file should look something like this:
# Auto generated configuration file
# using: 
# Revision: 1.123 
# Source: /cvs_server/repositories/CMSSW/CMSSW/Configuration/PyReleaseValidation/python/,v 
# with command line options: step3_MC -s HARVESTING:validationprodHarvesting --harvesting AtJobEnd --conditions FrontierConditions_GlobalTag,MC_31X_V1::All --filein file:step2_MC_RAW2DIGI_RECO_VALIDATION.root --mc --no_exec
import FWCore.ParameterSet.Config as cms

process = cms.Process('HARVESTING')

# import of standard configurations

process.configurationMetadata = cms.untracked.PSet(
    version = cms.untracked.string('$Revision: 1.1 $'),
    annotation = cms.untracked.string('step3_MC nevts:1'),
    name = cms.untracked.string('PyReleaseValidation')
process.maxEvents = cms.untracked.PSet(
    input = cms.untracked.int32(1)
process.options = cms.untracked.PSet(
    Rethrow = cms.untracked.vstring('ProductNotFound'),
    fileMode = cms.untracked.string('FULLMERGE')
# Input source
process.source = cms.Source("PoolSource",
    processingMode = cms.untracked.string('RunsAndLumis'),
    fileNames = cms.untracked.vstring(

# Additional output definition

# Other statements
process.GlobalTag.globaltag = 'MC_31X_V1::All'

# Path and EndPath definitions
process.edmtome_step = cms.Path(process.EDMtoME)
process.validationprodHarvesting = cms.Path(process.postValidation*process.hltpostvalidation_prod)
process.dqmHarvesting = cms.Path(process.DQMOffline_SecondStep*process.DQMOffline_Certification)
process.validationHarvestingFS = cms.Path(process.HarvestingFastSim)
process.validationHarvesting = cms.Path(process.postValidation*process.hltpostvalidation)
process.dqmsave_step = cms.Path(process.DQMSaver)

# Schedule definition
process.schedule = cms.Schedule(process.edmtome_step,process.validationprodHarvesting,process.dqmsave_step)
  • Finally, we can run the harvesting. This should only take a little time and produce an output file of about 15 MB.
After telling you that it opened all files and that nothing bad happened, cmsRun will leave you with a new ROOT file with a long name. In the above case: DQM_V0001_R000000001__Global__CMSSW_X_Y_Z__RECO.root.

Making the results available for scrutiny

It's easiest for `public' access if things are stored in your CASTOR area. For CASTOR usage see here or WorkBookSetComputerNode.

-- JeroenHegeman - 03 Jul 2009

Edit | Attach | Watch | Print version | History: r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r1 - 2009-07-03 - JeroenHegeman
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Sandbox All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback