HLT release validation system: tags and testing
SWGuideTriggerSoftwareValidation
Tag collection
Notice of discontinuation.
A new consolidated procedure has now been implemented for integration of validation tags. The trigger validation packages' tags will now be considered along with the other cms systems. Trigger validation developers are henceforth asked to submit new tags
elsewhere according to the described procedures:
This notice implies the trigger validation tag collection and testing procedure which has been thoroughly in place for a long time now terminates. The testing guidelines are also transferred and maintained elsewhere. I hope the new procedures we have defined will continue to serve our developers equally well from now on.
Thanks! --
NunoLeonardo - 04 Aug 2009
Subsystems enter new tags for trigger validation (please edit) (along with brief explantion of changes and dependencies on other tags required)
### example
### cvs co -r Vxx-yy-zz HLTriggerOffline/<my_package>
Tag verification for release integration (do not edit table)
package |
tag |
notes |
addpkg HLTriggerOffline/Common |
V01-01-00 |
integrated |
addpkg HLTriggerOffline/Egamma |
V00-01-11 |
integrated |
addpkg HLTriggerOffline/Muon |
V01-03-02 |
integrated |
addpkg HLTriggerOffline/Tau |
V04-01-00 |
integrated |
addpkg HLTriggerOffline/Top |
V00-01-03 |
integrated |
addpkg HLTriggerOffline/special |
V01-00-16 |
integrated |
addpkg HLTriggerOffline/SUSYBSM |
V00-05-11 |
integrated |
addpkg HLTriggerOffline/HeavyFlavor |
V01-01-03 |
integrated |
addpkg HLTriggerOffline/CMS.JetMET |
V00-01-05 |
integrated |
dependencies on |
tag |
|
addpkg Validation/RecoMuon |
V00-02-60 |
|
addpkg DQMServices/ClientConfig |
V03-03-08 or later |
|
addpkg DQMOffline/Trigger |
V06-00-06 or later |
|
(b) Validation/Tools: used by Muon, Top
(a) DQMOffline/Trigger, DQM/HLTEvF: required by Tau
last tested in ib:
CMSSW_3_2_X_2009-07-23-1700
note in some cases it may be needed to execute relval step 1 before the standard testing stage i.
Testing code for release integration
Note central testing instructions have now been transferred to
CMS.RelValTesting.
- Check the latest integration build which are highlighted in showBuilds
An example is below.
cmsrel CMSSW_3_1_X_BUILDDATE-TIME
cd CMSSW_3_1_X_BUILDDATE-TIME/src
- Checkout the package tags you wish to verify
cvs co -r VXX-XX-XX HLTriggerOffline/PACKAGE
scramv1 b
- execute the cmsDriver command ( stage i )
less $CMSSW_RELEASE_BASE/src/Configuration/PyReleaseValidation/data/cmsDriver_highstats_hlt.txt | grep VALIDATION | grep RECO2 | sed s/"STEP2 ++ RECO2 @@@"// | awk '{print $_, " --filein /store/relval/CMSSW_3_1_0_pre10/RelValTTbar/GEN-SIM-DIGI-RAW-HLTDEBUG/STARTUP_31X_v1/0008/86BFF7D7-EE57-DE11-9AC8-001D09F24763.root --no_exec -n 2"}'
-
- if problems, can proceed with hlt vasidation alone, byt modifying cmsDriver command, eg
cmsDriver.py step2 -s RAW2DIGI,RECO,POSTRECO,VALIDATION:hltvalidation
-
- to bypass metoedm memory limitation
ulimit -v 3000000
(bash) or limit vmem unlim
(tcsh) (see also eg
)
- check cmsDriver_highstats_hlt.txt
less $CMSSW_RELEASE_BASE/src/Configuration/PyReleaseValidation/data/cmsDriver_highstats_hlt.txt
- the lines therein including the indication of
step2
and which also start with "RECO" are the ones that run the relval step 2 where validation gets executed
- pick recent release input file eg list
)
- or otherwise re-run the step1 of relval eg
cmsDriver.py TTbar_Tauola.cfi -s GEN:ProductionFilterSequence,SIM,DIGI,L1,DIGI2RAW,HLT --conditions FrontierConditions_CMS.GlobalTag,STARTUP_30X::All --relval 25000,100 --datatier 'GEN-SIM-DIGI-RAW-HLTDEBUG' --eventcontent FEVTDEBUGHLT -n 2 --no_exec
- examples
cmsDriver.py step2 -s RAW2DIGI,RECO,POSTRECO,ALCA:MuAlCalIsolatedMu+RpcCalHLT,VALIDATION --relval 25000,100 --datatier GEN-SIM-RECO --geometry Pilot2 --eventcontent RECOSIM --conditions FrontierConditions_CMS.GlobalTag,STARTUP_30X::All --no_exec --filein /store/relval/CMSSW_3_1_0_pre8/RelValTTbar/GEN-SIM-RECO/STARTUP_31X_v1/0006/D887D58E-DB4D-DE11-A5BE-001D09F24691.root
- verify harvesting (posprocessing) sequence ( stage ii )
cmsDriver.py harvest -s HARVESTING:validationHarvesting --mc --conditions FrontierConditions_CMS.GlobalTag,STARTUP31X_V1::All --harvesting AtJobEnd --filein file:step2_RAW2DIGI_RECO_POSTRECO_ALCA_VALIDATION.root
-
- note: to ensure postprocessor histograms make it to the generated dqm file, be sure to add
--harvesting AtJobEnd
- verify global tag inconsistency, see also [[https://twiki.cern.ch/twiki/bin/view/CMS/SWGuideFrontierConditions#31X_pre_releases_and_integration
][conditions]]
- verify dqm qt sequence ( stage iii )
Additonal checks/requirements
- check for number of bins booked (thanks A.Rizzi, ~arizzi/public/bincounter.C )
root -l -b -q ~nuno/public/validation/bincounter.C\(\"DQM_V0001_R000000001__Global__CMSSW_X_Y_Z__RECO.root\"\)
-
- hlt subsystem bin counting
root -l -b -q ~nuno/public/validation/bincounterHLT.C\(\"DQM_V0001_R000000001__Global__CMSSW_X_Y_Z__RECO.root\"\)
- variable binning is forbidden (would otherwise induce failure in merging step); check example (thanks A.Meyer, ~ameyer/public/merge.py)
cmsRun ~nuno/public/validation/merge.py
- clarification on relval workflow stages (thanks D.Reyes); relval processing runs in single chain job with three steps:
- step 1:
- in case of fullsim: GEN,SIM,DIGI,L1,DIGI2RAW,HLT
- in case of fastsim and merging jobs only have 1 step
- step 2: reco, validation , dqm (me2edm)
- step 3: alca
That is, the Validation sequence is run in step 2 for fullsim, and in step 1 for merging and (since recently also) fastsim jobs.
- append the following to the <...>_cfg.py
process.load("DQMServices.Components.DQMStoreStats_cfi")
process.myStats = cms.Path(process.dqmStoreStats)
process.schedule.append(process.myStats)
process.SimpleMemoryCheck = cms.Service("SimpleMemoryCheck",
ignoreTotal=cms.untracked.int32(1),
oncePerEventMode=cms.untracked.bool(False)
)
- dqm guidelines re number of bins and memory (thanks A.Meyer, G.D.-Ricca)
- online dqm: restrictions are rather loose, as long as the total size of the executable does not exceed 1.2 GB or so. But keep a single histogram reasonable size, definitely below a few thousand bins. (instructions)
- offline dqm: the restrictions are rather harsh as the modules have to run in the same job with full reconstruction. The total size of DQM in memory can not exceed ~200 MB or so. This means each subsytem has a maximum of a few hundred thousand bins. Please try to stay well below that order of magnitude. (instructions)
- validation: for validation, the total number of bins and thus the total memory have likely the same constraints of the offline DQM. The limits are even stricter if one envisage to run the Validation and the Offline DQM sequences in the same process ...
- notes: (1) there are no signs that anyone with the required skills will start working with the metoedm converter anytime soon, so the only workaround is to keep the memory usage low; (2) one has also to take into account that if for example the reco step will start requiring more memory (for any reason, like a new algorithm), we will have to cut on the DQM sequences to stay within the RAM limits on the standard worker nodes (1.0-2.0 GB, possibly shared between different grid processes)
Developments
Two-menu workflow
In 31X the production workflows are being adapted to incorporate the HLT menus in a same sample edm file.
The digi, L1, HLT collections for both conditions/menus are accessible via distinguished process names:
-
- 8E29, STARTUP conditions, processName: 'HLT8E29'
- 1E31, INDEAL conditions, processName: 'HLT'
Both menus are to be validated.
Validation responsibles are asked to adapt their configuration sequences so as to produce results for both menus.
Notes:
- the validation sequences should be adapted to execute the relevant modules twice
- modifications at the level of the python configuration should in general suffice
- in each case the proper input collections should be retrieved
- the dqm output folders should be differentiated, both at source and harvesting levels
- cmsDriver.py now defaults to HLT:8E29 for STARTUP conditions, and HLT:1E31 for IDEAL conditions (no need to specify explicitly in --s)
Testing sequences:
cmsDriver.py TTbar_Tauola.cfi --step=GEN,SIM,DIGI,L1,DIGI2RAW,HLT --conditions=FrontierConditions_CMS.GlobalTag,STARTUP_31X::All --fileout=GenHLT_8E29.root --number=100 --mc --no_exec --datatier 'GEN-SIM-DIGI-RAW-HLTDEBUG' --eventcontent=FEVTDEBUGHLT --python_filename=GenHLT_8E29.py --processName=HLT8E29 --relval 9000,100 --no_exec
cmsDriver.py CMS.RelVal --step=DIGI,L1,DIGI2RAW,HLT --conditions=FrontierConditions_CMS.GlobalTag,IDEAL_31X::All --filein=file:GenHLT_8E29.root --fileout=GenHLT_8E29_1E31.root --number=100 --mc --no_exec --datatier 'GEN-SIM-DIGI-RAW-HLTDEBUG' --eventcontent=FEVTDEBUGHLT --python_filename=DigiHLT_1E31.py --processName=HLT --relval 9000,100 --no_exec
cmsDriver.py step2 -s RAW2DIGI,RECO,POSTRECO,ALCA:MuAlCalIsolatedMu+RpcCalHLT,VALIDATION --relval 25000,100 --datatier GEN-SIM-RECO --eventcontent RECOSIM --conditions FrontierConditions_CMS.GlobalTag,IDEAL_31X::All --python_filename=RECO.py --relval 9000,100 --no_exec
Generic client
The generic client aka postprocessor has moved to
DQMServices/ClientConfig
. This migration has been incorporated for all modules.
Here we highlight few desired developments of interest for validation.
- general description of tool (twiki)
- commented config parameter (to replace example
as guideline)
- currently module gives an exception when expected dqm folders as given in configuration are not found; it would be best if this would be noted through message logging but would not cause job to crash (namely, as it is to be executed by production)
- for correct normalization wrt references in gui, defining efficiencies as tprofile instead of th1f's would be recommended by dqm
Pileup relvals
Defined hlt validation pu sequences, which include adapted set of modules (
see
)
- hltvalidation_pu in HLTValidation_cff.py
- hltpostvalidation_pu in HLTValidationHarvest_cff.py
to be appended to the sequences
- validation_pu in Validation_cff.py
- validationHarvestingPU in Harvest_cff.py
test
cmsDriver.py TTbar_Tauola.cfi -s GEN:ProductionFilterSequence,SIM,DIGI,L1,DIGI2RAW,HLT --conditions FrontierConditions_CMS.GlobalTag,STARTUP_30X::All --relval 9000,50 --datatier 'GEN-SIM-DIGI-RAW-HLTDEBUG' --pileup LowLumiPileUp --eventcontent FEVTDEBUGHLT --dump_python -n 2
#(replace with 300p4 samples as input to mix module)
cmsDriver.py step2 -s RAW2DIGI,RECO,POSTRECO,ALCA:MuAlCalIsolatedMu+RpcCalHLT,VALIDATION: validation_pu --relval 25000,100 --datatier GEN-SIM-RECO --eventcontent RECOSIM --conditionsFrontierConditions_CMS.GlobalTag,STARTUP_30X::All --filein file:TTbar_Tauola_cfi_GEN_SIM_DIGI_L1_DIGI2RAW_HLT_PU.root -n 2
cmsDriver.py harvest -s HARVESTING:validationHarvestingPU --mc --conditions FrontierConditions_CMS.GlobalTag,STARTUP_30X::All --harvesting AtJobEnd --filein file:step2_RAW2DIGI_RECO_POSTRECO_ALCA_VALIDATION.root
FastSim validation workflow
This describes steps towards including trigger validation in the fastsim relval workflows. This is aimed for 31x, and onwards.
New sequences for stages i and ii are defined for fastsim.
? hltvalidation_fastsim
? hltpostvalidation_fastsim
We propose to preserve a single step approach for the fastsim (which will include:
generation, fastsim, reco, l1, hlt, validation ).
This requires that the validation sequence stage-i (dqm sources) be included in endpath (so as to triggerresults collections to become available).
The modifications needed to the currently cmsDriver generated configs are as follows:
relval step1:
#process.schedule.extend([process.reconstruction,process.out_step])
process.load('HLTriggerOffline.Common.HLTValidation_cff')
process.load("DQMServices.Components.MEtoEDMConverter_cff")
process.validation_step = cms.EndPath(process.hltvalidation_fastsim + process.MEtoEDMConverter )
process.schedule.extend([process.reconstruction,process.validation_step,process.out_step])
dqm harvesting: adapt cmsDriver command to pick up fastsim sequence (for trigger: hltpostvalidation_fastsim)
? test requires uptodate 31x tags, as specified above
? need increase virtual memory on lxplus (see
post) as
ulimit -v 3000000
(bash) or
limit vmem unlim
(tcsh)
Example test:
limit vmem unlim
cmsRun /afs/cern.ch/user/n/nuno/public/fastsim/stagei.py
cmsRun /afs/cern.ch/user/n/nuno/public/fastsim/stageii.py
root -l DQM_V0001_R000000001__Global__CMSSW_X_Y_Z__RECO.root
Sequences confirmed by subsystems:
tau, muon, alca-pi0, top,
egamma,bphysics,4vector.
Status
- trigger: hltvalidation_fastsim, hltpostvalidation_fastsim sequences defined since cmssw.310pre3 in:
- HLTriggerOffline/Common/HLTValidation_cff.py
- HLTriggerOffline/Common/HLTValidationHarvest_cff.py
- fastsim: validation sequence defined since cmssw.310pre4 in
- FastSimulation/Configuration/Validation_cff.py
- astSimulation/Configuration/python/Harvesting_cff.py
- cmsDriver: updates to include fastsim validation sequence available (tags queued for 310pre5)
- Configuration/PyReleaseValidation/ConfigBuilder.py
- Configuration/PyReleaseValidation/cmsDriverOptions.py
- Configuration/StandardSequences/python/Harvesting_cff.py
- relval: commands updated in
- Configuration/PyReleaseValidation/cmsDriver_highstats_hlt.txt
- Configuration/PyReleaseValidation/cmsDriver_standard_hlt.txt
- missing:
- trigger: extend sequence (now done)
- fastsim: harvesting (now done)
- cmsdriver: harvesting (now done)
- relval: harvesting, testing workflows (now done)
- integration of tags (now done)
Test
- stage i (gen-hlt-validation, ie relval step1)
cmsDriver.py TTbar_Tauola_cfi.py -s GEN:ProductionFilterSequence,FASTSIM,VALIDATION --pileup=NoPileUp --conditions=FrontierConditions_CMS.GlobalTag,IDEAL_30X::All --eventcontent=FEVTDEBUGHLT --beamspot=Early10TeVCollision --datatier GEN-SIM-DIGI-RECO -n 10 --relval 100000,1000
cmsDriver.py harvest -s HARVESTING:validationHarvestingFS --mc --conditions FrontierConditions_CMS.GlobalTag,IDEAL_30X::All --harvesting AtJobEnd --filein file:TTbar_Tauola_cfi_py_GEN_FASTSIM_VALIDATION.root
Memory checks
For checking for potential memory leaks
valgrind may reveal a useful tool helping to identify the involved methods and parts of the code. The following instructions are based on a recent usage of valgrind for reference. (cf also
1
2
)
addpkg DQM/L1TMonitor;
scramv1 b clean;
scramv1 b -v USER_CXXFLAGS="-g";
valgrind --tool=memcheck `cmsvgsupp` --leak-check=yes --show-reachable=yes --num-callers=20 --track-fds=yes --log-file=valgrind.out cmsRun $CMSSW_RELEASE_BASE/src/Configuration/CMS.GlobalRuns/python/recoT0DQM_EvContent_38T_cfg.py
Gui related
- get the harvested root files from the gui
Links
?
CMS tag collector
Integration Builds
?
DQM CMS.ValidationTagCollector DQMOffline#TesT SWGuideValidation CMS.RelValTesting CMS.DataCert CMS.DQMReferenceHandling
?
gui
certificates
? data ops and release planing
twiki forum
CMS.ReleaseSchedule relvalOps
? software release fora
announcement
integration
development
validation
? register for cvs package
notifications
Responsible: Nuno Leonardo, Tom Danielson
Responsible:
NunoLeonardo
--
MonicaVazquezAcosta - 16 Oct 2008
--
NunoLeonardo - 26 Nov 2008