How to measure timing with Fast Simulation

Complete: 5

Global timing from a normal cmsRun job, for each producer

You can run a normal Fast Simulation job in any release. The cfg file used for the estimate of the main Fast Simulation timing is FastSimulation/Configuration/test/IntegrationTest_cfg.py. It uses the famosWithEverything sequence, but it can be shrinked to using only the, e.g., famosWithBTagging sequence, if you are primarily interested in b tagging. The steps to measure absolute timing is as follows.

  1. Setup a CMSSW area, in any release you'd like to test the timing with, and go to CMSWW_X_Y_Z/src
  2. eval `scram runtime -csh`
  3. project CMSSW
  4. addpkg FastSimulation/Configuration
  5. Edit FastSimulation/Configuration/test/IntegrationTest_cfg.py, change maxEvents to 100, uncomment the followng three lines,
    process.Timing =  cms.Service("Timing")
    process.load("FWCore.MessageService.MessageLogger_cfi")
    process.MessageLogger.destinations = cms.untracked.vstring("detailedInfo.txt","cout")
    
    and choose the sequence you'd like
  6. cmsRun FastSimulation/Configuration/test/IntegrationTest_cfg.py
  7. When the job is over, check the last line on your screen, which should look something like 956.731u 6.573s 16:07.27 99.5%  0+0k 0+0io 0pf+0w, and remember 99.5% out of this line. This is the fraction of absolute time actually spent in the execution of your job. It can be much smaller if your machine is very busy.
  8. Extract the timing information form the message logger printout with find . -name "detailedInfo.txt" | xargs grep "TimeModule>" > TimingInfo.txt
  9. Get some magic code that works the above timing information out: cp ~pjanot/public/timing.cpp .
  10. Edit timing.cpp and change double factor to the number you remebered from point 7 (in that case, 0.995).
  11. Compile it : g++ timing.cpp
  12. Run it: ./a.out
  13. Read the output, which looks like this
Timing per module 
================= 
  1                                  CaloMuonProducer :     461 ms/event
  2                                    MuonIdProducer :     376 ms/event
  3                                  GsfTrackProducer :     352 ms/event
  4                                     TrackProducer :     347 ms/event
  5                                  PoolOutputModule :     344 ms/event
  6                                     FamosProducer :     237 ms/event
  7                                FastjetJetProducer :     190 ms/event
  8                                 CaloTowersCreator :    66.9 ms/event
  9                                  GoodSeedProducer :      64 ms/event
 10                             PrimaryVertexProducer :    52.8 ms/event
 11                                       METProducer :    40.8 ms/event
 12                                  PFElecTkProducer :    37.1 ms/event
 13                                   PFBlockProducer :    26.5 ms/event
 14                                 PFClusterProducer :    25.1 ms/event
 15                                        V0Producer :    22.4 ms/event
 16                            Pythia6GeneratorFilter :    17.3 ms/event
 17                      InputGenJetsParticleSelector :    15.5 ms/event
 18                              MuIsoDepositProducer :    13.9 ms/event
[...]
Total time = 2.93e+03 ms/event 
================= 
Timing per label  
================= 
  1                                                                          CaloMuonProducer:calomuons :     461 ms/event
  2                                                                                MuonIdProducer:muons :     376 ms/event
  3                                                                  GsfTrackProducer:electronGsfTracks :     352 ms/event
  4                                                                            PoolOutputModule:display :     263 ms/event
  5                                                                          FamosProducer:famosSimHits :     237 ms/event
  6                                                     TrackProducer:iterativeSecondTracksWithTriplets :    98.4 ms/event
  7                                                         TrackProducer:iterativeThirdTracksWithPairs :    90.9 ms/event
  8                                                      TrackProducer:iterativeFirstTracksWithTriplets :      85 ms/event
[...]
================= 
Total time = 2.93e+03 ms/event 
Total time = 2.93e+03 +/- 1.5e+03 ms/event
Min.  time = 650 ms/event
Max.  time = 8.79e+03 ms/event

In this output, the time spent in each module per event is given, as well as the total time, the rms, the smallest and the largest time spent per event. If you are interested in muon, you see that CaloMuons and MuonId take almost 1 sec/event, and you might wonder what to do to improve on that. For this, you need more information, i.e., the breakdown of where the time indeed goes in that producer. This is the topic of the next section.

Note in passing: The code of timing.cpp can certainly be improved to show histograms, cheeses, warnings, etc... Please feel free to upgrade the code, and to send me the modified version! (mailto:patrick.janot@cernNOSPAMPLEASE.ch)

Breakdown measurements with valgrind

Change the cfg file

You can use exactly the same cfg.py file as above, with a few changes:

  1. Limit the number of events (maxEvents) to 10: valgrind is very slow.
  2. Choose the sequence with the producer you are interested in, i.e., famosWithBTagging or famosWithJets, in the two aforementioned example.
  3. Add the following lines at the beginning of the cfg, to process events 2 to 10 (the timing of the first event is usually biased by initialization)
process.ProfilerService = cms.Service (
      "ProfilerService",
       firstEvent = cms.untracked.int32(3),
       lastEvent = cms.untracked.int32(12),
       paths = cms.untracked.vstring('p1') 
)

Run valgrind

Then, just run vlagrind on it with the following magic command

valgrind --tool=callgrind --combine-dumps=yes --instr-atstart=no --dump-instr=yes --separate-recs=1 cmsRun FastSimulation/Configuration/test/Integration_cfg.py

Go for a coffee

You don't need a documentation for that, do you?

Check the output with the GUI

And, after a coffee, when the job is over, just check the output

setenv PATH /afs/cern.ch/cms/bin/i386_linux24:${PATH}
kcachegrind callgrind.out.xxxxx

In this graphic interface, you'll be able to click on your favourite producer, and check the fraction of time spent in various classes, methods, submethods, etc.... Then go to your code, optimize, and repeat the operation until FastSimulation conveners are happy!

Review Status

Reviewer/Editor and Date Comments
PatrickJanot - 06 November 2007 First version of the documentation

Responsible: PatrickJanot
Last reviewed by: AndreaGiammanco

Edit | Attach | Watch | Print version | History: r7 < r6 < r5 < r4 < r3 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r7 - 2012-03-05 - AndreaGiammanco



 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback