Welcome to the performance studies twiki.

Useful References

Other nice links

Useful literature

How to set attributes to functions

void func1() __attribute__ ((optimize("-O2","-ftree-vectorize","-ftree-vectorizer-verbose=7")));

CPU, Memory and Physics Performance Studies: experience with gcc461

The idea here is to study the performance in terms of CPU, Memory and Physics output of the cmsRun application. The main objective is to pave the way for a more comprehensive and homogeneous improvement of the code exploiting the newest CPU capabilities. We start with a study of the effects of the usage of the gcc 434 and gcc 46 compilers.


  • release: CMSSW_4_4_0_pre7
  • machine: vocms123, vocms122
  • Dataset: /RelValProdTTbar/CMSSW_4_4_0_pre5-START44_V2-v1/GEN-SIM-DIGI-RAW-HLTDEBUG
    • Castor location: /castor/cern.ch/cms/store/relval/CMSSW_4_4_0_pre5/RelValTTbar/GEN-SIM-DIGI-RAW-HLTDEBUG/START44_V2-v1/
    • 750 Events merged here: /build/hauth/cmssw_compare/inputFiles/CMSSW_4_4_0_pre5_RelValTTbar_GEN-SIM-DIGI-RAW-HLTDEBUG/CMSSW_4_4_0_pre5_RelValTTbar_GEN-SIM-DIGI-RAW-HLTDEBUG_750Evts.root

Config files

Are located here

cloned here

In your CMSSW folder, do symbolic links to this config files:

ln -s ../config/CMSSW_4_4_0_pre7/RECO_NOOUTPUT_MEM.py 
ln -s ../config/CMSSW_4_4_0_pre7/RECO_NOOUTPUT_TIME.py 
ln -s ../config/CMSSW_4_4_0_pre7/RECO_NOOUTPUT.py 
ln -s ../config/CMSSW_4_4_0_pre7/RECO_VALIDATION_DQM.py 

Details on the Test Setup

Work directory:

CMSSW_4_4_0_pre7: default version using scram p, no custom compiles


gcc434CMSSW_4_4_0_pre7: custom compile with gcc434, no custom flags


gcc461CMSSW_4_4_0_pre7: custom compile with gcc461, custom fastjet compile, custom gcc flags: -msse3


gcc461oCMSSW_4_4_0_pre7: custom compile with gcc461, custom fastjet compile, custom gcc flags: -msse3 -ffast-math -O3



Note: no PU included

Example cmsDriver command to run the reco sequence

cmsDriver.py step3_RELVAL --conditions auto:startup -s RAW2DIGI,L1Reco,RECO --datatier GEN-SIM-RECO,AODSIM --eventcontent RECOSIM,AODSIM --no_exec --python_filename RECOPROD1_START44_V2.py --filein=file:/build/hauth/cmssw_compare/inputFiles/CMSSW_4_4_0_pre5_RelValTTbar_GEN-SIM-DIGI-RAW-HLTDEBUG/CMSSW_4_4_0_pre5_RelValTTbar_GEN-SIM-DIGI-RAW-HLTDEBUG_750Evts.root -n -1
taken from dbs: https://cmsweb.cern.ch/dbs_discovery/getAppConfigs?dbsInst=cms_dbs_prod_global&appPath=*&procPath=/RelValProdTTbar/CMSSW_4_4_0_pre5-START44_V2-v1/GEN-SIM-RECO&ajax=0&userMode=user.

Example cmsDriver command to generate DQM output

cmsDriver.py step3_RELVAL --conditions auto:startup -s RAW2DIGI,L1Reco,RECO,VALIDATION,DQM --datatier DQM --eventcontent DQM --no_exec --python_filename RECO_VALIDATION_DQM.py \
    --filein=file:/build/hauth/cmssw_compare/inputFiles/CMSSW_4_4_0_pre5_RelValTTbar_GEN-SIM-DIGI-RAW-HLTDEBUG/CMSSW_4_4_0_pre5_RelValTTbar_GEN-SIM-DIGI-RAW-HLTDEBUG_750Evts.root -n -1 --mc

Example cmsDriver command to run reco without output (for benchmarking purposes)

cmsDriver.py step3_RELVAL --conditions auto:startup -s RAW2DIGI,L1Reco,RECO --no_exec --python_filename RECO_NOOUTPUT.py  --no_output \
    --filein=file:/build/hauth/cmssw_compare/inputFiles/CMSSW_4_4_0_pre5_RelValTTbar_GEN-SIM-DIGI-RAW-HLTDEBUG/CMSSW_4_4_0_pre5_RelValTTbar_GEN-SIM-DIGI-RAW-HLTDEBUG_750Evts.root -n -1 --mc

Example cmsDriver with fixed PU

cmsDriver.py step2 -s RAW2DIGI,L1Reco,RECO,DQM --relval 25000,100 --datatier GEN-SIM-RECO,DQM --eventcontent RECODEBUG,DQM --geometry DB --conditions DESIGN44_V4::All --pileup E7TeV_FIX_1_BX156,{'N':22} --no_exec --python_filename RECO1_MC44_V4_N22_DESIGN44_V4.py


  434 pass2 461 pass2 461o pass2
434 pass1 time=XXX time=XXX time=XXX
Phys=XXX Phys=XXX Phys=XXX
461 pass1 time=XXX time=XXX time=XXX
Phys=XXX Phys=XXX Phys=XXX
461o pass1 time=XXX time=XXX time=XXX
Phys=XXX Phys=XXX Phys=XXX

Location of RelMon reports:

Location of IgProf reports:

Simple memory checker and timing module information:

Optimization Projects

Vectorized Cephes

svn co https://ekptrac.physik.uni-karlsruhe.de/svn/hauth/CMSSW/vect CMSSW_vect




New Code, Vectorized Version ( using regular exp ):


New Code, Vectorized NOT USED ( using regular exp ):


CPU Profile of various Computing Resources


processor   : 15
vendor_id   : GenuineIntel
cpu family   : 6
model      : 26
model name   : Intel(R) Xeon(R) CPU           L5520  @ 2.27GHz
stepping   : 5
cpu MHz      : 1600.000
cache size   : 8192 KB
physical id   : 1
siblings   : 8
core id      : 3
cpu cores   : 4
apicid      : 23
fpu      : yes
fpu_exception   : yes
cpuid level   : 11
wp      : yes
flags      : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr *sse sse2* ss ht tm syscall nx rdtscp lm constant_tsc ida nonstop_tsc pni monitor ds_cpl vmx est tm2 *ssse3* cx16 xtpr *sse4_1 sse4_2* popcnt lahf_lm
bogomips   : 4532.91
clflush size   : 64
cache_alignment   : 64
address sizes   : 40 bits physical, 48 bits virtual

Possible Hardware to buy and investigate

Nvidia Tesla card: http://www.computery.de/product_info.php?info=p283591_NVIDIA-TESLA--C2070----.html] about 2000 Euro

-- DaniloPiparo - 26-Aug-2011

Edit | Attach | Watch | Print version | History: r24 < r23 < r22 < r21 < r20 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r24 - 2011-10-27 - DaniloPiparo
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback