HLT release validation system: tags and testing

MOVED TO... SWGuideTriggerSoftwareValidation

Tag collection

Notice of discontinuation. A new consolidated procedure has now been implemented for integration of validation tags. The trigger validation packages' tags will now be considered along with the other cms systems. Trigger validation developers are henceforth asked to submit new tags elsewhere according to the described procedures:

This notice implies the trigger validation tag collection and testing procedure which has been thoroughly in place for a long time now terminates. The testing guidelines are also transferred and maintained elsewhere. I hope the new procedures we have defined will continue to serve our developers equally well from now on. Thanks! -- NunoLeonardo - 04 Aug 2009

Subsystems enter new tags for trigger validation (please edit) (along with brief explantion of changes and dependencies on other tags required)

### example
### cvs co -r Vxx-yy-zz HLTriggerOffline/<my_package>

Tag verification for release integration (do not edit table)

package tag notes
addpkg HLTriggerOffline/Common V01-01-00 DONE integrated
addpkg HLTriggerOffline/Egamma V00-01-11 DONE integrated
addpkg HLTriggerOffline/Muon V01-03-02 DONE integrated
addpkg HLTriggerOffline/Tau V04-01-00 DONE integrated
addpkg HLTriggerOffline/Top V00-01-03 DONE integrated
addpkg HLTriggerOffline/special V01-00-16 DONE integrated
addpkg HLTriggerOffline/SUSYBSM V00-05-11 DONE integrated
addpkg HLTriggerOffline/HeavyFlavor V01-01-03 DONE integrated
addpkg HLTriggerOffline/CMS.JetMET V00-01-05 DONE integrated
dependencies on tag  
addpkg Validation/RecoMuon V00-02-60 DONE
addpkg DQMServices/ClientConfig V03-03-08 or later DONE
addpkg DQMOffline/Trigger V06-00-06 or later DONE

(b) Validation/Tools: used by Muon, Top
(a) DQMOffline/Trigger, DQM/HLTEvF: required by Tau

last tested in ib: CMSSW_3_2_X_2009-07-23-1700
note in some cases it may be needed to execute relval step 1 before the standard testing stage i.

Testing code for release integration

Note central testing instructions have now been transferred to CMS.RelValTesting.

  • Check the latest integration build which are highlighted in showBuilds
    An example is below.
     cmsrel CMSSW_3_1_X_BUILDDATE-TIME
     cd CMSSW_3_1_X_BUILDDATE-TIME/src
  • Checkout the package tags you wish to verify
     cvs co -r VXX-XX-XX HLTriggerOffline/PACKAGE
     scramv1 b
  • execute the cmsDriver command ( stage i )
less $CMSSW_RELEASE_BASE/src/Configuration/PyReleaseValidation/data/cmsDriver_highstats_hlt.txt | grep VALIDATION | grep RECO2 | sed s/"STEP2 ++ RECO2 @@@"// | awk '{print $_, " --filein /store/relval/CMSSW_3_1_0_pre10/RelValTTbar/GEN-SIM-DIGI-RAW-HLTDEBUG/STARTUP_31X_v1/0008/86BFF7D7-EE57-DE11-9AC8-001D09F24763.root --no_exec -n 2"}'
    • if problems, can proceed with hlt vasidation alone, byt modifying cmsDriver command, eg
cmsDriver.py step2 -s RAW2DIGI,RECO,POSTRECO,VALIDATION:hltvalidation 
    • to bypass metoedm memory limitation ulimit -v 3000000 (bash) or limit vmem unlim (tcsh) (see also eg)
    • check cmsDriver_highstats_hlt.txt
        less $CMSSW_RELEASE_BASE/src/Configuration/PyReleaseValidation/data/cmsDriver_highstats_hlt.txt 
    • the lines therein including the indication of step2 and which also start with "RECO" are the ones that run the relval step 2 where validation gets executed
    • pick recent release input file eg list )
    • or otherwise re-run the step1 of relval eg
      cmsDriver.py  TTbar_Tauola.cfi -s GEN:ProductionFilterSequence,SIM,DIGI,L1,DIGI2RAW,HLT --conditions FrontierConditions_CMS.GlobalTag,STARTUP_30X::All --relval 25000,100 --datatier 'GEN-SIM-DIGI-RAW-HLTDEBUG' --eventcontent FEVTDEBUGHLT -n 2 --no_exec 
    • examples
       cmsDriver.py step2 -s RAW2DIGI,RECO,POSTRECO,ALCA:MuAlCalIsolatedMu+RpcCalHLT,VALIDATION --relval 25000,100  --datatier GEN-SIM-RECO --geometry Pilot2 --eventcontent RECOSIM --conditions FrontierConditions_CMS.GlobalTag,STARTUP_30X::All --no_exec  --filein /store/relval/CMSSW_3_1_0_pre8/RelValTTbar/GEN-SIM-RECO/STARTUP_31X_v1/0006/D887D58E-DB4D-DE11-A5BE-001D09F24691.root
      

  • verify harvesting (posprocessing) sequence ( stage ii )
    • Test HLT alone (cfg)
      cmsRun HLTriggerOffline/Common/test/hltHarvesting_cfg.py
    • Test the full harvesting step with cmsDriver
 
cmsDriver.py harvest -s HARVESTING:validationHarvesting --mc  --conditions FrontierConditions_CMS.GlobalTag,STARTUP31X_V1::All --harvesting AtJobEnd --filein file:step2_RAW2DIGI_RECO_POSTRECO_ALCA_VALIDATION.root  

    • note: to ensure postprocessor histograms make it to the generated dqm file, be sure to add --harvesting AtJobEnd
    • verify global tag inconsistency, see also [[https://twiki.cern.ch/twiki/bin/view/CMS/SWGuideFrontierConditions#31X_pre_releases_and_integration
][conditions]]
  • verify dqm qt sequence ( stage iii )
    • execute standalone job (cfg) below, and display in dqm gui
       cmsRun $CMSSW_RELEASE_BASE/src/HLTriggerOffline/Common/test/hltSourceHarvestCompare_cfg.py 

  • search for input files
    • dbs query, example
      find file where dataset like *RelValTTbar* and release=CMSSW_2_2_0
    • check also nls /castor/cern.ch/cms/store/relval/

Additonal checks/requirements

  • check for number of bins booked (thanks A.Rizzi, ~arizzi/public/bincounter.C )
root -l -b -q ~nuno/public/validation/bincounter.C\(\"DQM_V0001_R000000001__Global__CMSSW_X_Y_Z__RECO.root\"\) 
    • hlt subsystem bin counting
root -l -b -q ~nuno/public/validation/bincounterHLT.C\(\"DQM_V0001_R000000001__Global__CMSSW_X_Y_Z__RECO.root\"\) 

  • variable binning is forbidden (would otherwise induce failure in merging step); check example (thanks A.Meyer, ~ameyer/public/merge.py)
 cmsRun ~nuno/public/validation/merge.py
  • clarification on relval workflow stages (thanks D.Reyes); relval processing runs in single chain job with three steps:
    • step 1:
      • in case of fullsim: GEN,SIM,DIGI,L1,DIGI2RAW,HLT
      • in case of fastsim and merging jobs only have 1 step
    • step 2: reco, validation , dqm (me2edm)
    • step 3: alca
That is, the Validation sequence is run in step 2 for fullsim, and in step 1 for merging and (since recently also) fastsim jobs.

  • append the following to the <...>_cfg.py
process.load("DQMServices.Components.DQMStoreStats_cfi")
process.myStats = cms.Path(process.dqmStoreStats)
process.schedule.append(process.myStats)

process.SimpleMemoryCheck = cms.Service("SimpleMemoryCheck",
 ignoreTotal=cms.untracked.int32(1),
 oncePerEventMode=cms.untracked.bool(False)
)

  • dqm guidelines re number of bins and memory (thanks A.Meyer, G.D.-Ricca)
    • online dqm: restrictions are rather loose, as long as the total size of the executable does not exceed 1.2 GB or so. But keep a single histogram reasonable size, definitely below a few thousand bins. (instructions)
    • offline dqm: the restrictions are rather harsh as the modules have to run in the same job with full reconstruction. The total size of DQM in memory can not exceed ~200 MB or so. This means each subsytem has a maximum of a few hundred thousand bins. Please try to stay well below that order of magnitude. (instructions)
    • validation: for validation, the total number of bins and thus the total memory have likely the same constraints of the offline DQM. The limits are even stricter if one envisage to run the Validation and the Offline DQM sequences in the same process ...
    • notes: (1) there are no signs that anyone with the required skills will start working with the metoedm converter anytime soon, so the only workaround is to keep the memory usage low; (2) one has also to take into account that if for example the reco step will start requiring more memory (for any reason, like a new algorithm), we will have to cut on the DQM sequences to stay within the RAM limits on the standard worker nodes (1.0-2.0 GB, possibly shared between different grid processes)

Developments

Two-menu workflow

In 31X the production workflows are being adapted to incorporate the HLT menus in a same sample edm file.

The digi, L1, HLT collections for both conditions/menus are accessible via distinguished process names:

    • 8E29, STARTUP conditions, processName: 'HLT8E29'
    • 1E31, INDEAL conditions, processName: 'HLT'

Both menus are to be validated. Validation responsibles are asked to adapt their configuration sequences so as to produce results for both menus.

Notes:

  1. the validation sequences should be adapted to execute the relevant modules twice
  2. modifications at the level of the python configuration should in general suffice
  3. in each case the proper input collections should be retrieved
  4. the dqm output folders should be differentiated, both at source and harvesting levels
  5. cmsDriver.py now defaults to HLT:8E29 for STARTUP conditions, and HLT:1E31 for IDEAL conditions (no need to specify explicitly in --s)

Testing sequences:

cmsDriver.py TTbar_Tauola.cfi --step=GEN,SIM,DIGI,L1,DIGI2RAW,HLT --conditions=FrontierConditions_CMS.GlobalTag,STARTUP_31X::All --fileout=GenHLT_8E29.root --number=100 --mc --no_exec --datatier 'GEN-SIM-DIGI-RAW-HLTDEBUG' --eventcontent=FEVTDEBUGHLT --python_filename=GenHLT_8E29.py  --processName=HLT8E29 --relval 9000,100 --no_exec

cmsDriver.py CMS.RelVal --step=DIGI,L1,DIGI2RAW,HLT --conditions=FrontierConditions_CMS.GlobalTag,IDEAL_31X::All --filein=file:GenHLT_8E29.root --fileout=GenHLT_8E29_1E31.root --number=100 --mc --no_exec --datatier 'GEN-SIM-DIGI-RAW-HLTDEBUG' --eventcontent=FEVTDEBUGHLT --python_filename=DigiHLT_1E31.py --processName=HLT --relval 9000,100 --no_exec

cmsDriver.py step2 -s RAW2DIGI,RECO,POSTRECO,ALCA:MuAlCalIsolatedMu+RpcCalHLT,VALIDATION --relval 25000,100 --datatier GEN-SIM-RECO --eventcontent RECOSIM --conditions FrontierConditions_CMS.GlobalTag,IDEAL_31X::All --python_filename=RECO.py --relval 9000,100 --no_exec

Generic client

The generic client aka postprocessor has moved to DQMServices/ClientConfig. This migration has been incorporated for all modules. Here we highlight few desired developments of interest for validation.

  1. general description of tool (twiki)
  2. commented config parameter (to replace example as guideline)
  3. currently module gives an exception when expected dqm folders as given in configuration are not found; it would be best if this would be noted through message logging but would not cause job to crash (namely, as it is to be executed by production)
  4. for correct normalization wrt references in gui, defining efficiencies as tprofile instead of th1f's would be recommended by dqm

Pileup relvals

Defined hlt validation pu sequences, which include adapted set of modules (see)

  1. hltvalidation_pu in HLTValidation_cff.py
  2. hltpostvalidation_pu in HLTValidationHarvest_cff.py
to be appended to the sequences
  1. validation_pu in Validation_cff.py
  2. validationHarvestingPU in Harvest_cff.py

test

cmsDriver.py  TTbar_Tauola.cfi -s  GEN:ProductionFilterSequence,SIM,DIGI,L1,DIGI2RAW,HLT --conditions FrontierConditions_CMS.GlobalTag,STARTUP_30X::All --relval  9000,50 --datatier 'GEN-SIM-DIGI-RAW-HLTDEBUG' --pileup LowLumiPileUp  --eventcontent FEVTDEBUGHLT --dump_python -n 2
#(replace with 300p4 samples as input to mix module)

cmsDriver.py step2 -s  RAW2DIGI,RECO,POSTRECO,ALCA:MuAlCalIsolatedMu+RpcCalHLT,VALIDATION: validation_pu --relval 25000,100 --datatier GEN-SIM-RECO --eventcontent  RECOSIM --conditionsFrontierConditions_CMS.GlobalTag,STARTUP_30X::All  --filein file:TTbar_Tauola_cfi_GEN_SIM_DIGI_L1_DIGI2RAW_HLT_PU.root -n  2

cmsDriver.py harvest -s HARVESTING:validationHarvestingPU --mc  --conditions FrontierConditions_CMS.GlobalTag,STARTUP_30X::All --harvesting  AtJobEnd --filein file:step2_RAW2DIGI_RECO_POSTRECO_ALCA_VALIDATION.root

FastSim validation workflow

This describes steps towards including trigger validation in the fastsim relval workflows. This is aimed for 31x, and onwards.

New sequences for stages i and ii are defined for fastsim.
? hltvalidation_fastsim
? hltpostvalidation_fastsim

We propose to preserve a single step approach for the fastsim (which will include: generation, fastsim, reco, l1, hlt, validation ).

This requires that the validation sequence stage-i (dqm sources) be included in endpath (so as to triggerresults collections to become available).

The modifications needed to the currently cmsDriver generated configs are as follows:

relval step1:

#process.schedule.extend([process.reconstruction,process.out_step])
process.load('HLTriggerOffline.Common.HLTValidation_cff')
process.load("DQMServices.Components.MEtoEDMConverter_cff")
process.validation_step = cms.EndPath(process.hltvalidation_fastsim + process.MEtoEDMConverter )
process.schedule.extend([process.reconstruction,process.validation_step,process.out_step])
dqm harvesting: adapt cmsDriver command to pick up fastsim sequence (for trigger: hltpostvalidation_fastsim)


? test requires uptodate 31x tags, as specified above
? need increase virtual memory on lxplus (see post) as ulimit -v 3000000 (bash) or limit vmem unlim (tcsh)

Example test:

limit vmem unlim
cmsRun /afs/cern.ch/user/n/nuno/public/fastsim/stagei.py
cmsRun /afs/cern.ch/user/n/nuno/public/fastsim/stageii.py
root -l DQM_V0001_R000000001__Global__CMSSW_X_Y_Z__RECO.root

Sequences confirmed by subsystems: tau, muon, alca-pi0, top, egamma,bphysics,4vector.

Status

  1. trigger: hltvalidation_fastsim, hltpostvalidation_fastsim sequences defined since cmssw.310pre3 in:
    1. HLTriggerOffline/Common/HLTValidation_cff.py
    2. HLTriggerOffline/Common/HLTValidationHarvest_cff.py
  2. fastsim: validation sequence defined since cmssw.310pre4 in
    1. FastSimulation/Configuration/Validation_cff.py
    2. astSimulation/Configuration/python/Harvesting_cff.py
  3. cmsDriver: updates to include fastsim validation sequence available (tags queued for 310pre5)
    1. Configuration/PyReleaseValidation/ConfigBuilder.py
    2. Configuration/PyReleaseValidation/cmsDriverOptions.py
    3. Configuration/StandardSequences/python/Harvesting_cff.py
  4. relval: commands updated in
    1. Configuration/PyReleaseValidation/cmsDriver_highstats_hlt.txt
    2. Configuration/PyReleaseValidation/cmsDriver_standard_hlt.txt
  5. missing:
    1. trigger: extend sequence (now done)
    2. fastsim: harvesting (now done)
    3. cmsdriver: harvesting (now done)
    4. relval: harvesting, testing workflows (now done)
    5. integration of tags (now done)

Test

  • stage i (gen-hlt-validation, ie relval step1)
cmsDriver.py TTbar_Tauola_cfi.py -s GEN:ProductionFilterSequence,FASTSIM,VALIDATION --pileup=NoPileUp  --conditions=FrontierConditions_CMS.GlobalTag,IDEAL_30X::All --eventcontent=FEVTDEBUGHLT --beamspot=Early10TeVCollision  --datatier GEN-SIM-DIGI-RECO -n 10 --relval 100000,1000 
  • stage ii (harvesting)
 cmsDriver.py harvest -s HARVESTING:validationHarvestingFS --mc  --conditions FrontierConditions_CMS.GlobalTag,IDEAL_30X::All --harvesting AtJobEnd --filein file:TTbar_Tauola_cfi_py_GEN_FASTSIM_VALIDATION.root

Memory checks

For checking for potential memory leaks valgrind may reveal a useful tool helping to identify the involved methods and parts of the code. The following instructions are based on a recent usage of valgrind for reference. (cf also 1 2)

addpkg DQM/L1TMonitor;
scramv1 b clean;
scramv1 b -v USER_CXXFLAGS="-g";
valgrind --tool=memcheck `cmsvgsupp` --leak-check=yes --show-reachable=yes --num-callers=20 --track-fds=yes --log-file=valgrind.out  cmsRun $CMSSW_RELEASE_BASE/src/Configuration/CMS.GlobalRuns/python/recoT0DQM_EvContent_38T_cfg.py

Gui related

  • get the harvested root files from the gui
    • from he browser: https://cmsweb.cern.ch/dqm/offline/data/browse/ROOT
    • from command line with wget
      source /afs/cern.ch/cms/LCG/LCG-2/UI/cms_ui_env.sh
      voms-proxy-init
      wget --ca-directory $X509_CERT_DIR/ --certificate=$X509_USER_PROXY -- 
      private-key=$X509_USER_PROXY  
      https://cmsweb.cern.ch/dqm/offline/data/browse/ROOT/RelVal/CMSSW_3_5_x/DQM_V0002_R000000001__RelValSinglePiPt1__CMSSW_3_5_0_pre3-MC_3XY_V15_FastSim-v1__GEN-SIM-DIGI-RECO.root
      

Links

? CMS tag collector Integration Builds
? DQM CMS.ValidationTagCollector DQMOffline#TesT SWGuideValidation CMS.RelValTesting CMS.DataCert CMS.DQMReferenceHandling
? gui certificates
? data ops and release planing twiki forum CMS.ReleaseSchedule relvalOps
? software release fora announcement integration development validation
? register for cvs package notifications

Responsible: Nuno Leonardo, Tom Danielson Responsible: NunoLeonardo

-- MonicaVazquezAcosta - 16 Oct 2008 -- NunoLeonardo - 26 Nov 2008

Edit | Attach | Watch | Print version | History: r166 < r165 < r164 < r163 < r162 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r166 - 2011-04-20 - NunoLeonardo
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback