Applying PGO (Performance Guided Optimization) to CMS Reconstruction

I applied pgo (combined with lto) to CMS reconstruction in CMSSW_8_1_0_pre7 (slc7 gcc5.3)

Executive summary

PGO (trained on simulated data) improves speed of reconstruction and HLT of typical real data by 10%. Result obtained on an INTEL i7-6700K CPU @ 4.00GHz (Skylake workstation) for both 4 and 8 threads jobs (in last case hyper-threading is in effect). The result is not reproduced on other architectures such as Haswell and Broadwell where the difference is just marginal.

Instrumentation and "performance" data gathering

I downloaded a limited set of packages
DataFormats  FWCore    MagneticField  RecoEgamma     RecoLocalTracker  RecoPixelVertexing  RecoVertex        TrackingTools
EventFilter  Geometry  RecoCaloTools  RecoLocalCalo  RecoParticleFlow  RecoTracker         TrackPropagation

and compiled with

setenv USER_CXXFLAGS "-flto -fipa-icf -flto-odr-type-merging -fno-fat-lto-objects -Wodr -fprofile-generate=/data/innocent/pgo/pgodata -Wno-error=maybe-uninitialized"

had to add -Wno-error=maybe-uninitialized as well to avoid compilation error (even in stdlib itself....)

I then run hlt and reconstruction (with output) on few hundreds Zmumu and ttbar events from standard 25202 and 25206 workflows. In my opinion it is enough to run on few tens of events... I took care to run single threaded as gcc pgo-data gathering seems to corrupt its data file if run concurrently

In a subsequent test I run standard

runTheMatrix.py --useInput=all --list=limited
and even some "addOnTests.py" Then I added -fprofile-correction when recompiling for the production version

Compilation of the "production" version

after a scram b clean I recompiled with
setenv USER_CXXFLAGS "-flto -fipa-icf -flto-odr-type-merging -fno-fat-lto-objects -Wodr -fprofile-use=/data/innocent/pgo/pgodata -Wno-error=maybe-uninitialized -fprofile-correction"

One library failed to link.

Building shared library tmp/slc7_amd64_gcc530/src/RecoParticleFlow/PFClusterTools/src/RecoParticleFlowPFClusterTools/libRecoParticleFlowPFClusterTools.so
`_ZNSt6vectorIdSaIdEED1Ev' referenced in section `.text' of /tmp/innocent/cctfRGXp.ltrans0.ltrans.o: defined in discarded section `.gnu.linkonce.t._ZNSt6vectorIdSaIdEED5Ev' of tmp/slc7_amd64_gcc530/src/RecoParticleFlow/PFClusterTools/src/RecoParticleFlowPFClusterTools/PFEnergyCalibration.o (symbol from plugin)
`_ZNSt6vectorIdSaIdEED1Ev' referenced in section `.text.unlikely' of /tmp/innocent/cctfRGXp.ltrans17.ltrans.o: defined in discarded section `.gnu.linkonce.t._ZNSt6vectorIdSaIdEED5Ev' of tmp/slc7_amd64_gcc530/src/RecoParticleFlow/PFClusterTools/src/RecoParticleFlowPFClusterTools/PFEnergyCalibration.o (symbol from plugin)
collect2: error: ld returned 1 exit status
config/SCRAM/GMake/Makefile.rules:1917: recipe for target 'tmp/slc7_amd64_gcc530/src/RecoParticleFlow/PFClusterTools/src/RecoParticleFlowPFClusterTools/libRecoParticleFlowPFClusterTools.so' failed
make: *** [tmp/slc7_amd64_gcc530/src/RecoParticleFlow/PFClusterTools/src/RecoParticleFlowPFClusterTools/libRecoParticleFlowPFClusterTools.so] Error 1

using gcc6 it continues to fail

>> Building shared library tmp/slc6_amd64_gcc600/src/RecoParticleFlow/PFClusterTools/src/RecoParticleFlowPFClusterTools/libRecoParticleFlowPFClusterTools.so
/tmp/innocent/cc4WDpmj.ltrans13.ltrans.o: In function `PFResolutionMap::dCrackPhi(double, double)':
:(.text.unlikely+0xa24): undefined reference to `std::vector >::~vector()'
/cvmfs/cms.cern.ch/slc6_amd64_gcc600/external/gcc/6.0.0-cms2/bin/../lib/gcc/x86_64-pc-linux-gnu/6.1.1/../../../../x86_64-pc-linux-gnu/bin/ld: /tmp/innocent/cc4WDpmj.ltrans13.ltrans.o: relocation R_X86_64_PC32 against undefined symbol `_ZNSt6vectorIdSaIdEED1Ev' can not be used when making a shared object; recompile with -fPIC
/cvmfs/cms.cern.ch/slc6_amd64_gcc600/external/gcc/6.0.0-cms2/bin/../lib/gcc/x86_64-pc-linux-gnu/6.1.1/../../../../x86_64-pc-linux-gnu/bin/ld: final link failed: Bad value
collect2: error: ld returned 1 exit status
config/SCRAM/GMake/Makefile.rules:1917: recipe for target 'tmp/slc6_amd64_gcc600/src/RecoParticleFlow/PFClusterTools/src/RecoParticleFlowPFClusterTools/libRecoParticleFlowPFClusterTools.so' failed
gmake: *** [tmp/slc6_amd64_gcc600/src/RecoParticleFlow/PFClusterTools/src/RecoParticleFlowPFClusterTools/libRecoParticleFlowPFClusterTools.so] Error 1
gmake: *** [There are compilation/build errors. Please see the detail log above.] Error 2

so I compiled RecoParticleFlow/PFClusterTools without lto

setenv USER_CXXFLAGS "-fprofile-use=/data/data/vin/pgo/pgodata -Wno-error=maybe-uninitialized"

a simple (radical) solution seems to exits: the code in question is not used and can be eradicated....

I run a test job and it crashed with segmentation violation in

Thread 4 "cmsRun" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f38cc5bf700 (LWP 27204)]
0x00007f39220ff3e4 in __memmove_ssse3_back () from /usr/lib64/libc.so.6
(gdb) where
#0  0x00007f39220ff3e4 in __memmove_ssse3_back () from /usr/lib64/libc.so.6
#1  0x00007f390970752e in std::vector >::_M_copy_aligned(std::_Bit_const_iterator, std::_Bit_const_iterator, std::_Bit_iterator) [clone .isra.205] [clone .lto_priv.572] () from /data/data/vin/pgo/CMSSW_8_1_0_pre7/lib/slc7_amd64_gcc530/pluginRecoTrackerMeasurementDetPlugins.so
#2  0x00007f3909707657 in std::vector >::vector(std::vector > const&) [clone .lto_priv.474] ()
   from /data/data/vin/pgo/CMSSW_8_1_0_pre7/lib/slc7_amd64_gcc530/pluginRecoTrackerMeasurementDetPlugins.so
#3  0x00007f390970135c in MeasurementTrackerEventProducer::produce(edm::Event&, edm::EventSetup const&) ()
   from /data/data/vin/pgo/CMSSW_8_1_0_pre7/lib/slc7_amd64_gcc530/pluginRecoTrackerMeasurementDetPlugins.so
I recompiled RecoTracker/MeasurementDet/plugins as well without lto (even though that is a major code component)

simple modifications of the code (constructor and instantiation of MeasurementTrackerEvent) did not remove the seg-fault

good news is that using gcc6.0 no segmentation-violations have been observed while running several tests

the full recipe is

rm -rf /data/data/vin/pgo/pgodata/*
cd $CMSSW_BASE/src/
scram b clean
setenv USER_CXXFLAGS "-flto -fipa-icf -flto-odr-type-merging -fno-fat-lto-objects -Wodr -fprofile-generate=/data/innocent/pgo/pgodata -Wno-error=maybe-uninitialized"
scram b -j 8 > & build.log &
cd /data/data/vin/matrix/
rm -rf *
runTheMatrix.py --useInput=all --list=limited > & lim.log &
addOnTests.py -t hlt_data_GRun
cd $CMSSW_BASE/src/
scram b clean
setenv USER_CXXFLAGS "-flto -fipa-icf -flto-odr-type-merging -fno-fat-lto-objects -Wodr -fprofile-use=/data/data/vin/pgo/pgodata -Wno-error=maybe-uninitialized -fprofile-correction"
scram b -j 8 -k > & build.log &
cd RecoParticleFlow/PFClusterTools
touch src/*
setenv USER_CXXFLAGS "-fprofile-use=/data/data/vin/pgo/pgodata -Wno-error=maybe-uninitialized -fprofile-correction"
scram b -j 8
cd ../../RecoTracker/MeasurementDet/plugins/
touch *
scram b -j 8
cd $CMSSW_BASE/src/
setenv USER_CXXFLAGS "-flto -fipa-icf -flto-odr-type-merging -fno-fat-lto-objects -Wodr -fprofile-use=/data/data/vin/pgo/pgodata -Wno-error=maybe-uninitialized -fprofile-correction"
scram b -j 8

Results

I run reconstruction (4 threads) for 2000 events from real 2016 data (Run 274172, JetHT dataset) and compared standard release with my new version

regressions

Standard regression tool (JRValidation) shows only negligible differences http://innocent.home.cern.ch/innocent/regress/jetHT81X_pgo/

global timing

Results training with runTheMatrix are fully consistent with those below

[innocent@vinavx3 run2016]$ grep -A 9 "Time Summary" reco_81Xori.log
 Time Summary: 
 - Min event:   0.905522
 - Max event:   15.824
 - Avg event:   4.1114
 - Total loop:  2075.15
 - Total job:   2076.34
 Event Throughput: 0.963784 ev/s
 CPU Summary: 
 - Total loop:  8180.63
 - Total job:   8181.74
[innocent@vinavx3 run2016]$ grep -A 9 "Time Summary" reco_81Xpgo.log
 Time Summary: 
 - Min event:   0.817703
 - Max event:   14.1506
 - Avg event:   3.65765
 - Total loop:  1848.19
 - Total job:   1849.32
 Event Throughput: 1.08214 ev/s
 CPU Summary: 
 - Total loop:  7269.44
 - Total job:   7270.46
or (if you prefer)
[innocent@vinavx3 run2016]$ grep -A 5 "\[sec\]" reco_81Xori.log
TimeReport ---------- Event  Summary ---[sec]----
TimeReport       event loop CPU/event = 4.090838
TimeReport      event loop Real/event = 1.038137
TimeReport     sum Streams Real/event = 4.111742
TimeReport efficiency CPU/Real/thread = 0.985139

[innocent@vinavx3 run2016]$ grep -A 5 "\[sec\]" reco_81Xpgo.log
TimeReport ---------- Event  Summary ---[sec]----
TimeReport       event loop CPU/event = 3.635210
TimeReport      event loop Real/event = 0.924637
TimeReport     sum Streams Real/event = 3.657976
TimeReport efficiency CPU/Real/thread = 0.982875

Timing at module level

a cursory look at module level

grep TimeReport $1 | cut -d ' ' -f 14- | sort -n -r | grep 0
python ../pyTools/timeReport.py | sort -n -r | cut -c 12- | head -40
seems to indicate a quite uniform gain across modules (with some surprises...)

Producer Ori PGO gain %
muons1stStep 0.2784 0.2492 10.48
hbheprereco 0.2539 0.2169 14.55
RECOoutput 0.2513 0.2550 -1.46
pixelPairStepTrackCandidates 0.1876 0.1632 13.00
jetCoreRegionalStepTrackCandidates 0.1751 0.1505 14.05
initialStepTrackCandidatesPreSplitting 0.1427 0.1252 12.30
initialStepTrackCandidates 0.1423 0.1247 12.39
detachedTripletStepTrackCandidates 0.1286 0.1125 12.51
lowPtTripletStepTrackCandidates 0.1102 0.0961 12.79
pixelLessStepSeeds 0.1055 0.0943 10.69
pixelPairStepTracks 0.0928 0.0780 16.00
electronGsfTracks 0.0925 0.0710 23.24
initialStepTracks 0.0868 0.0733 15.51
initialStepTracksPreSplitting 0.0865 0.0731 15.50
unsortedOfflinePrimaryVertices 0.0801 0.0651 18.69
detachedTripletStepTracks 0.0773 0.0644 16.77
conversionTrackCandidates 0.0704 0.0638 9.41
pixelLessStepTrackCandidates 0.0697 0.0613 11.98
detachedTripletStepSeeds 0.0683 0.0606 11.28
earlyMuons 0.0649 0.0572 11.84
earlyDisplacedMuons 0.0643 0.0566 11.98
ecalMultiFitUncalibRecHit 0.0578 0.0499 13.54
lowPtTripletStepTracks 0.0556 0.0458 17.54
pixelPairStepSeeds 0.0522 0.0436 16.52
trackExtrapolator 0.0501 0.0453 9.51
tobTecStepTrackCandidates 0.0468 0.0405 13.49
convTrackCandidates 0.0451 0.0400 11.35
particleFlowBlock 0.0365 0.0340 6.87
allConversions 0.0340 0.0402 -18.03
tobTecStepSeedsTripl 0.0295 0.0259 11.98
trackerDrivenElectronSeeds 0.0292 0.0248 15.01
siStripMatchedRecHits 0.0290 0.0269 7.22
lowPtTripletStepSeeds 0.0289 0.0253 12.57
firstStepPrimaryVertices 0.0263 0.0209 20.66
firstStepPrimaryVerticesPreSplitting 0.0249 0.0199 20.34
cosmicMuons1Leg 0.0233 0.0204 12.44
pixelLessStepTracks 0.0222 0.0194 12.49
pixelPairElectronSeeds 0.0190 0.0162 14.93
photonConvTrajSeedFromSingleLeg 0.0185 0.0144 22.52
particleFlowDisplacedVertexCandidate 0.0182 0.0163 10.64

Performance counters

I also gathered performance counters report at symbol level using

~/pmu-tools/ocperf.py record -e task-clock -e cycles -e resource_stalls.any -e rs_events.empty_cycles -e uops_executed.stall_cycles -e branch-misses -e offcore_requests_outstanding.demand_data_rd_ge_6 cmsRun doRecoPerf.py > & bha.log &
Due to changes in inlining it is very difficult to find correspondence between the two reports at this level

Samples: 21M of event 'cycles', Event count (approx.): 20850919838206                                                                                                                                         
Overhead  Command  Shared Object                                                   Symbol                                                                                                                     
   2.49%  cmsRun   libMagneticFieldParametrizedEngine.so                           [.] magfieldparam::TkBfield::getBxyz                                                                                       
   2.08%  cmsRun   libz.so.1.2.8                                                   [.] 0x0000000000002f50                                                                                                     
   1.84%  cmsRun   libTrackPropagationSteppingHelixPropagator.so                   [.] SteppingHelixPropagator::makeAtomStep                                                                                  
   1.66%  cmsRun   libTrackingToolsTrajectoryState.so                              [.] BasicTrajectoryState::createLocalErrorFromCurvilinearError                                                             
   1.58%  cmsRun   libTrackingToolsAnalyticalJacobians.so                          [.] AnalyticalCurvilinearJacobian::computeFullJacobian                                                                     
   1.48%  cmsRun   libm-2.17.so                                                    [.] __atanf                                                                                                                
   1.39%  cmsRun   libjemalloc.so.2                                                [.] free                                                                                                                   
   1.39%  cmsRun   libTrackingToolsGeomPropagators.so                              [.] AnalyticalPropagator::propagatedStateWithPath                                                                          
   1.35%  cmsRun   libm-2.17.so                                                    [.] __sin_avx                                                                                                              
   1.34%  cmsRun   libRecoVertexPrimaryVertexProducer.so                           [.] DAClusterizerInZ_vect::update                                                                                          
   1.32%  cmsRun   libRecoLocalTrackerSiPixelRecHits.so                            [.] VVIObjF::VVIObjF                                                                                                       
   1.24%  cmsRun   libTrackPropagationSteppingHelixPropagator.so                   [.] SteppingHelixPropagator::getNextState                                                                                  
   1.18%  cmsRun   libjemalloc.so.2                                                [.] malloc                                                                                                                 
   1.17%  cmsRun   libRecoLocalCaloHcalRecAlgos.so                                 [.] FitterFuncs::PulseShapeFunctor::EvalPulse                                                                              
   1.16%  cmsRun   pluginRecoTrackerFinalTrackSelectorsPlugins.so                  [.] TrackMVAClassifier<(anonymous namespace)::mva >::computeMVA                                                      
   1.05%  cmsRun   libTrackingToolsTrajectoryState.so                              [.] BasicTrajectoryState::checkCurvilinError                                                                               
   1.01%  cmsRun   pluginTrackingToolsGsfToolsPlugins.so                           [.] KullbackLeiblerDistanceDetails::compute<5u>                                                                            
   1.00%  cmsRun   libTrackingToolsKalmanUpdators.so                               [.] (anonymous namespace)::lupdate<2u>                                                                                     
   0.97%  cmsRun   libRecoLocalCaloHcalRecAlgos.so                                 [.] FitterFuncs::PulseShapeFunctor::funcHPDShape                                                                           
   0.96%  cmsRun   libTrackPropagationSteppingHelixPropagator.so                   [.] SteppingHelixPropagator::refToDest                                                                                     
   0.95%  cmsRun   libm-2.17.so                                                    [.] __ieee754_log_avx                                                                                                      
   0.93%  cmsRun   pluginRecoTrackerMeasurementDetPlugins.so                       [.] TkGluedMeasurementDet::doubleMatch                             
   0.88%  cmsRun   libGeometryEcalAlgo.so                                          [.] std::_Rb_tree, std::less, std::allocator >::_M_insert_unique::operator()                                                                               
   0.75%  cmsRun   libRecoLocalTrackerSiPixelRecHits.so                            [.] SiPixelTemplateReco::PixelTempReco2D                                                                                   
   0.74%  cmsRun   libm-2.17.so                                                    [.] __kernel_tanf                                                                                                          
   0.68%  cmsRun   libRecoVertexKalmanVertexFit.so                                 [.] KalmanVertexUpdator<5u>::positionUpdate                                                                                
   0.67%  cmsRun   libfastjet.so.0.0.0                                             [.] fastjet::LazyTiling9::run                                                                                              
   0.67%  cmsRun   pluginTrackingToolsTrackAssociatorPlugins.so                    [.] CaloDetIdAssociator::crossedElement                                                                                    
   0.65%  cmsRun   libTrackingToolsGeomPropagators.so                              [.] HelixForwardPlaneCrossing::position                                                                                    
   0.61%  cmsRun   libMagneticFieldInterpolation.so                                [.] LinearGridInterpolator3D::interpolate                                                                                  
   0.57%  cmsRun   libTrackingToolsGeomPropagators.so                              [.] HelixArbitraryPlaneCrossing::positionInDouble                                                                          
   0.54%  cmsRun   libRecoLocalTrackerSiStripRecHitConverter.so                    [.] StripCPEfromTrackAngle::stripErrorSquared                                                                              
   0.53%  cmsRun   libRecoTrackerTkHitPairs.so                                     [.] InnerDeltaPhi::phiRange                                                                                                
   0.51%  cmsRun   libRecoLocalTrackerSiPixelRecHits.so                            [.] VVIObjFDetails::sincosint                                                                                              
   0.47%  cmsRun   libTrackingToolsKalmanUpdators.so                               [.] Chi2MeasurementEstimator::estimate                                                                                     
   0.46%  cmsRun   libRecoEgammaEgammaPhotonAlgos.so                               [.] ROOT::Math::meta_matrix_dot<16u>::g >, ROOT::M
   0.45%  cmsRun   libm-2.17.so                                                    [.] __ieee754_atan2_avx                                                                                                    
   0.44%  cmsRun   libTrackingToolsGeomPropagators.so                              [.] HelixArbitraryPlaneCrossing2Order::pathLength                                                                          
   0.44%  cmsRun   pluginRecoTrackerFinalTrackSelectorsPlugins.so                  [.] TrackMVAClassifier<(anonymous namespace)::mva >::computeMVA                                                     
   0.43%  cmsRun   libm-2.17.so                                                    [.] __cos_avx                                                                                                              
   0.43%  cmsRun   libTrackingToolsGsfTools.so                                     [.] BasicMultiTrajectoryState::combine                                                                                     
   0.43%  cmsRun   libTrackingToolsAnalyticalJacobians.so                          [.] JacobianCurvilinearToLocal::compute                                                                                    
   0.43%  cmsRun   libTrackPropagationSteppingHelixPropagator.so                   [.] SteppingHelixPropagator::refToMagVolume                                                                                
   0.42%  cmsRun   libGeometryCommonTopologies.so                                  [.] TkRadialStripTopology::localError                                                                                      
   0.39%  cmsRun   libTrackingToolsMaterialEffects.so                              [.] MultipleScatteringUpdator::compute                                                                                     
   0.37%  cmsRun   libMagneticFieldParametrizedEngine.so                           [.] OAEParametrizedMagneticField::isDefined                                                                                
   0.37%  cmsRun   libTrackPropagationSteppingHelixPropagator.so                   [.] SteppingHelixPropagator::propagate                                                                                     
   0.36%  cmsRun   libz.so.1.2.8                                                   [.] 0x0000000000002f57                                                                                                     
   0.36%  cmsRun   libRecoParticleFlowPFProducer.so                                [.] PFBlockAlgo::findBlocks                                                                                                
   0.35%  cmsRun   libTrackingToolsGeomPropagators.so                              [.] HelixBarrelCylinderCrossing::HelixBarrelCylinderCrossing                                                               
   0.34%  cmsRun   libGeometryCommonTopologies.so                                  [.] TkRadialStripTopology::measurementError                                                                                
   0.34%  cmsRun   libTrackingToolsGeomPropagators.so                              [.] HelixArbitraryPlaneCrossing::pathLength                                                                                
   0.33%  cmsRun   libm-2.17.so                                                    [.] __atan2f_finite                                                                                                        
   0.32%  cmsRun   libRecoLocalTrackerSiPixelRecHits.so                            [.] SiPixelTemplate::interpolate                                                                                           
   0.31%  cmsRun   libz.so.1.2.8                                                   [.] 0x0000000000002f5f                                                                                                     
   0.31%  cmsRun   libz.so.1.2.8                                                   [.] 0x0000000000002f71                                                                                                     
   0.30%  cmsRun   libRecoTrackerTkHitPairs.so                                     [.] HitPairGeneratorFromLayerPair::doublets                                                                                
   0.29%  cmsRun   libTrackingToolsTrackFitters.so                                 [.] TrajectoryStateCombiner::combine                                                                                       
   0.29%  cmsRun   pluginGammaConversionTrackingForConversionPlugins.so            [.] HitPairGeneratorFromLayerPairForPhotonConversion::checkRZCompatibilityWithSeedTrack                                    
   0.28%  cmsRun   libRecoPixelVertexingPixelTriplets.so                           [.] ThirdHitPredictionFromCircle::phi                                                                                      
   0.28%  cmsRun   libTrackingToolsGeomPropagators.so                              [.] AnalyticalPropagator::propagateWithPath                                                                                
   0.28%  cmsRun   libRecoVertexKalmanVertexFit.so                                 [.] ROOT::Math::SVector::operator=

Samples: 19M of event 'cycles', Event count (approx.): 18493519501008                                                                                                                                                
Overhead  Command  Shared Object                                                   Symbol                                                                                                                            
   2.48%  cmsRun   libMagneticFieldParametrizedEngine.so                           [.] magfieldparam::TkBfield::getBxyz                                                                                              
   2.33%  cmsRun   libz.so.1.2.8                                                   [.] 0x0000000000002f50                                                                                                            
   2.16%  cmsRun   libTrackingToolsGeomPropagators.so                              [.] AnalyticalPropagator::propagateWithPath                                                                                       
   1.73%  cmsRun   libTrackingToolsTrajectoryState.so                              [.] BasicTrajectoryState::createLocalError                                                                                        
   1.55%  cmsRun   libTrackPropagationSteppingHelixPropagator.so                   [.] SteppingHelixPropagator::makeAtomStep                                                                                         
   1.52%  cmsRun   libm-2.17.so                                                    [.] __sin_avx                                                                                                                     
   1.52%  cmsRun   libm-2.17.so                                                    [.] __atanf                                                                                                                       
   1.47%  cmsRun   libjemalloc.so.2                                                [.] free                                                                                                                          
   1.39%  cmsRun   libTrackingToolsAnalyticalJacobians.so                          [.] AnalyticalCurvilinearJacobian::computeFullJacobian                                                                            
   1.35%  cmsRun   libTrackingToolsKalmanUpdators.so                               [.] KFUpdator::update                                                                                                             
   1.27%  cmsRun   libjemalloc.so.2                                                [.] malloc                                                                                                                        
   1.25%  cmsRun   libRecoLocalTrackerSiPixelRecHits.so                            [.] VVIObjF::VVIObjF                                                                                                              
   1.20%  cmsRun   pluginRecoTrackerFinalTrackSelectorsPlugins.so                  [.] TrackMVAClassifier<(anonymous namespace)::mva >::computeMVA                                                             
   1.17%  cmsRun   libTrackPropagationSteppingHelixPropagator.so                   [.] SteppingHelixPropagator::getNextState                                                                                         
   1.14%  cmsRun   pluginRecoTrackerMeasurementDetPlugins.so                       [.] TkGluedMeasurementDet::measurements                                                                                           
   1.10%  cmsRun   libTrackPropagationSteppingHelixPropagator.so                   [.] SteppingHelixPropagator::refToDest                                                                                            
   1.09%  cmsRun   pluginTrackingToolsGsfToolsPlugins.so                           [.] CloseComponentsMerger<5u>::merge                                                                                              
   1.01%  cmsRun   libm-2.17.so                                                    [.] __ieee754_log_avx                                                                                                             
   0.97%  cmsRun   libTrackPropagationRungeKutta.so                                [.] RKOneCashKarpStep::operator()                                                                                      
   0.96%  cmsRun   libRecoVertexPrimaryVertexProducer.so                           [.] DAClusterizerInZ_vect::update                                                                                                 
   0.95%  cmsRun   libTrackingToolsTrackAssociator.so                              [.] DetIdAssociator::fillSet                                                                                                      
   0.94%  cmsRun   libRecoLocalCaloHcalRecAlgos.so                                 [.] FitterFuncs::PulseShapeFunctor::funcHPDShape                                                                                  
   0.87%  cmsRun   libRecoEgammaEgammaPhotonAlgos.so                               [.] ROOT::Math::AssignSym::Evaluate::positionUpdate                                                                                       
   0.65%  cmsRun   libTrackPropagationRungeKutta.so                                [.] RKPropagatorInS::propagateWithPath                                                                                            
   0.64%  cmsRun   libRecoLocalCaloHcalRecAlgos.so                                 [.] FitterFuncs::PulseShapeFunctor::EvalPulse                                                                                     
   0.59%  cmsRun   pluginTrackingToolsTrackAssociatorPlugins.so                    [.] CaloDetIdAssociator::crossedElement                                                                                           
   0.54%  cmsRun   libTrackingToolsGeomPropagators.so                              [.] HelixForwardPlaneCrossing::position                                                                                           
   0.52%  cmsRun   libRecoLocalTrackerSiStripRecHitConverter.so                    [.] StripCPEfromTrackAngle::stripErrorSquared                                                                                     
   0.51%  cmsRun   libm-2.17.so                                                    [.] __ieee754_atan2_avx                                                                                                           
   0.50%  cmsRun   libTrackingToolsGeomPropagators.so                              [.] HelixArbitraryPlaneCrossing::positionInDouble                                                                                 
   0.50%  cmsRun   libRecoLocalTrackerSiPixelRecHits.so                            [.] SiPixelTemplateReco::PixelTempReco2D                                                                                          
   0.49%  cmsRun   libTrackingToolsKalmanUpdators.so                               [.] Chi2MeasurementEstimator::estimate                                                                                            
   0.49%  cmsRun   libm-2.17.so                                                    [.] __cos_avx                                                                                                                     
   0.46%  cmsRun   libTrackingToolsMaterialEffects.so                              [.] MultipleScatteringUpdator::compute                                                                                            
   0.44%  cmsRun   libMagneticFieldInterpolation.so                                [.] RectangularCylindricalMFGrid::uncheckedValueInTesla                                                                           
   0.42%  cmsRun   libTrackPropagationSteppingHelixPropagator.so                   [.] SteppingHelixPropagator::refToMagVolume                                                                                       
   0.41%  cmsRun   libz.so.1.2.8                                                   [.] 0x0000000000002f57                                                                                                            
   0.41%  cmsRun   libTrackingToolsAnalyticalJacobians.so                          [.] JacobianCurvilinearToLocal::JacobianCurvilinearToLocal                                                                        
   0.41%  cmsRun   libGeometryCommonTopologies.so                                  [.] TkRadialStripTopology::localError                                                                                             
   0.41%  cmsRun   libTrackingToolsGeomPropagators.so                              [.] HelixBarrelCylinderCrossing::HelixBarrelCylinderCrossing                                                                      
   0.40%  cmsRun   libTrackingToolsTrackFitters.so                                 [.] TrajectoryStateCombiner::combine                                                                                              
   0.39%  cmsRun   libMagneticFieldParametrizedEngine.so                           [.] OAEParametrizedMagneticField::isDefined                                                                                       
   0.38%  cmsRun   libRecoTrackerTkDetLayers.so                                    [.] CompositeTECWedge::groupedCompatibleDetsV                                                                                     
   0.38%  cmsRun   libRecoVertexKalmanVertexFit.so                                 [.] KalmanVertexUpdator<5u>::chi2Increment                                                                                        
   0.38%  cmsRun   libTrackingToolsGeomPropagators.so                              [.] HelixArbitraryPlaneCrossing2Order::pathLength                                                                                 
   0.36%  cmsRun   libRecoLocalTrackerSiPixelRecHits.so                            [.] VVIObjFDetails::sincosint                                                                                                     
   0.35%  cmsRun   pluginRecoTrackerFinalTrackSelectorsPlugins.so                  [.] TrackMVAClassifier<(anonymous namespace)::mva >::computeMVA                                                            
   0.35%  cmsRun   pluginRecoTrackerTkSeedGeneratorPlugins.so                      [.] MultiHitGeneratorFromChi2::hitTriplets                                                                                        
   0.35%  cmsRun   libz.so.1.2.8                                                   [.] 0x0000000000002f5f                                                                                                            
   0.35%  cmsRun   libz.so.1.2.8                                                   [.] 0x0000000000002f71                                                                                                            
   0.34%  cmsRun   libTrackingToolsTrackAssociator.so                              [.] TrackDetectorAssociator::fillMuon                                                                                             
   0.34%  cmsRun   libRecoPixelVertexingPixelTriplets.so                           [.] ThirdHitPredictionFromCircle::phi                                                                                             
   0.33%  cmsRun   libRecoParticleFlowPFProducer.so                                [.] PFBlockAlgo::findBlocks                                                                                                       
   0.33%  cmsRun   libm-2.17.so                                                    [.] __atan2f_finite                                                                                                               
   0.33%  cmsRun   pluginRecoTrackerMeasurementDetPlugins.so                       [.] TkStripMeasurementDet::simpleRecHits                                                                                          
   0.32%  cmsRun   libTrackingToolsMaterialEffects.so                              [.] EnergyLossUpdator::compute                                                                                                    
   0.32%  cmsRun   libDataFormatsGeometrySurface.so                                [.] Plane::side                                                                                                                   
   0.31%  cmsRun   libTrackingToolsGsfTools.so                                     [.] BasicMultiTrajectoryState::BasicMultiTrajectoryState                                                                          
   0.31%  cmsRun   pluginRecoLocalCaloEcalRecProducersPlugins.so                   [.] EcalUncalibRecHitRatioMethodAlgo::computeTime                                                                    
   0.31%  cmsRun   libTrackPropagationSteppingHelixPropagator.so                   [.] SteppingHelixPropagator::propagate                                                                                            
   0.31%  cmsRun   pluginRecoTrackerCkfPatternPlugins.so                           [.] TrajectorySegmentBuilder::segments                                                                                            
   0.29%  cmsRun   libRecoTrackerTkHitPairs.so                                     [.] RecHitsSortedInPhi::RecHitsSortedInPhi                                                                                        
   0.29%  cmsRun   libjemalloc.so.2                                                [.] je_arena_dalloc_bin_junked_locked                                                                                             
   0.29%  cmsRun   libRecoTrackerTkMSParametrization.so                            [.] MultipleScatteringParametrisation::operator()                                                                                 
   0.28%  cmsRun   libfastjet.so.0.0.0                                             [.] fastjet::LazyTiling25::run                                                                                                    
   0.28%  cmsRun   libRecoLocalTrackerSiStripRecHitConverter.so                    [.] StripCPEfromTrackAngle::localParameters                                                                                       
   0.28%  cmsRun   libGeometryCommonTopologies.so                                  [.] TkRadialStripTopology::coveredStrips                                                                                          
   0.28%  cmsRun   libMagneticFieldInterpolation.so                                [.] SpecialCylindricalMFGrid::uncheckedValueInTesla                                                                               
   0.27%  cmsRun   libGeometryCommonTopologies.so                                  [.] TkRadialStripTopology::measurementError                                                                                       
   0.27%  cmsRun   libGeometryCommonTopologies.so                                  [.] BowedSurfaceDeformation::positionCorrection                                                                                   
   0.27%  cmsRun   pluginRecoTrackerCkfPatternPlugins.so                           [.] GroupedCkfTrajectoryBuilder::advanceOneLayer                                                                                  
   0.26%  cmsRun   libjemalloc.so.2                                                [.] je_arena_tcache_fill_small                                                                                                    
   0.26%  cmsRun   libTrackingToolsAnalyticalJacobians.so                          [.] JacobianLocalToCurvilinear::JacobianLocalToCurvilinear                                                                        
   0.26%  cmsRun   libTrackingToolsDetLayers.so                                    [.] GeomDetCompatibilityChecker::isCompatible                                                                                                                                                                                                                                                                                        

at global level one gets (reminder: cms compiles with vectorization at SSE3 level, no AVX)

counter Ori PGO diff %
arith_divider_active 2003813331082 2029221833145 -1.27
branch-instructions 5003390142742 4683583763677 6.39
branch-misses 94561145829 102263340302 -8.15
cycle_activity_stalls_mem_any 7791185641089 6421826074225 17.58
cycle_activity_stalls_total 9907112712454 8391164277761 15.30
cycles 30560976528399 27377605436798 10.42
fp_arith_inst_retired_128b_packed_double 250556381329 258529244007 -3.18
fp_arith_inst_retired_128b_packed_single 338210547618 326550462044 3.45
fp_arith_inst_retired_256b_packed_double 7 13 -85.71
fp_arith_inst_retired_256b_packed_single 52421 509416 -871.78
fp_arith_inst_retired_scalar_double 3317711064737 3366454136777 -1.47
fp_arith_inst_retired_scalar_single 1641464270133 1640125049783 0.08
instructions 35919309265662 33991078348989 5.37
mem_load_retired_l1_hit 10188053373388 9519339641382 6.56
mem_load_retired_l2_hit 156509805644 151082264975 3.47
mem_load_retired_l3_hit 44401681528 43235662633 2.63
mem_load_retired_l3_miss 5390997682 5440045205 -0.91
offcore_requests_outstanding_demand_data_rd_ge_6 18615085948 79097548673 -324.91
resource_stalls_any 13301542668571 10705626454772 19.52
rs_events_empty_cycles 1365684826390 1423106473467 -4.20
seconds_time_elapsed 2040 1852 9.21
task-clock_(msec) 7976059 7162005 10.21
uops_executed_cycles_ge_1_uop_exec 20658306464831 18984610500196 8.10
uops_executed_cycles_ge_2_uops_exec 13619518631801 13160925641623 3.37
uops_executed_cycles_ge_3_uops_exec 7952791476040 8098899341323 -1.84
uops_executed_cycles_ge_4_uops_exec 3947942099884 4229263951400 -7.13
uops_executed_stall_cycles 9909067962691 8391461244629 15.32
uops_issued_vector_width_mismatch 11548292529815 11634427914000 -0.75

HLT

I run on 12000 events from run 275124. I also added code to remove synchronization at each lumi section https://github.com/wddgit/cmssw/tree/lumis HLT is much faster than reco: A 4 thread job can easily become I/O bound and affected by the many limi-sections transitions. The 12K events (8GB) should fit in memory after the first run.

unfortunately I got a set-fault in AnalyticalTrackSelector constructor which required recompiling w/o lto RecoTracker/FinalTrackSelectors/plugins as well.

timing

[innocent@vinavx3 runHLT2016]$ grep -A 5 "\[sec\]" hlt_ori81.log2
TimeReport ---------- Event  Summary ---[sec]----
TimeReport       event loop CPU/event = 0.086881
TimeReport      event loop Real/event = 0.024135
TimeReport     sum Streams Real/event = 0.088524
TimeReport efficiency CPU/Real/thread = 0.899959
[innocent@vinavx3 runHLT2016]$ grep -A 5 "\[sec\]" hlt_pgo81.log2
TimeReport ---------- Event  Summary ---[sec]----
TimeReport       event loop CPU/event = 0.078948
TimeReport      event loop Real/event = 0.022299
TimeReport     sum Streams Real/event = 0.080974
TimeReport efficiency CPU/Real/thread = 0.885097

at module level

Producer Ori PGO gain %
hltEgammaGsfTracksUnseeded 0.0237 0.0182 23.31
hltDisplacedhltIter4PFlowPixelLessSeeds 0.0201 0.0179 10.76
hltEgammaElectronPixelSeedsUnseeded 0.0173 0.0151 12.44
hltDisplacedhltIter4PFlowCkfTrackCandidates 0.0145 0.0130 10.35
hltIter2ElectronsCkfTrackCandidates 0.0136 0.0119 12.66
hltParticleFlowBlockForTaus 0.0131 0.0123 6.26
hltIsolPixelTrackProdHB 0.0101 0.0122 -20.69
hltCSCHaloData 0.0099 0.0099 0.44
hltVerticesPF 0.0096 0.0089 7.47
hltParticleFlowBlockReg 0.0094 0.0089 5.81
hltParticleFlowBlock 0.0091 0.0084 7.63
hltPixelTracks 0.0080 0.0065 18.78
hltParticleFlowClusterHFForEgammaUnseeded 0.0078 0.0072 7.66
hltIsolPixelTrackProdHE 0.0076 0.0091 -19.96
hltGlbTrkMuonsNoVtx 0.0068 0.0060 11.54
hltParticleFlowClusterPSUnseeded 0.0058 0.0046 19.87
hltMuCkfTrackCandidates 0.0053 0.0046 12.55
hltEgammaCkfTrackCandidatesForGSFUnseeded 0.0052 0.0046 10.70
hltIter1PFlowPixelSeeds 0.0049 0.0038 22.04
hltAK8PFJetsTrimR0p1PT0p03 0.0047 0.0048 -1.38
hltGlbTrkMuons 0.0046 0.0040 11.37
hltParticleFlowClusterHF 0.0044 0.0039 12.65
hltIter1PFlowCkfTrackCandidates 0.0044 0.0039 12.16
hltTowerMakerForAllBeamHaloCleaned 0.0041 0.0033 17.58
hltEgammaGsfTracks 0.0035 0.0027 23.42
hltIter2PFlowCkfTrackCandidates 0.0031 0.0027 12.61
hltIter0PFlowCkfTrackCandidates 0.0031 0.0027 11.98
hltInclusiveVertexFinderPF 0.0030 0.0028 7.30
hltEgammaElectronPixelSeeds 0.0029 0.0025 12.40
hltIter2PFlowCkfTrackCandidatesForPhotons 0.0028 0.0023 18.81
hltIter2DisplacedJpsiCkfTrackCandidates 0.0027 0.0023 13.51
hltIter2ElectronsCtfWithMaterialTracks 0.0027 0.0024 9.75
hltIter1PFlowCkfTrackCandidatesForTau 0.0026 0.0023 11.30
hltHighEtaEle20Selector 0.0025 0.0030 -20.25
hltIter2DisplacedTau3muCkfTrackCandidates 0.0024 0.0021 12.66
hltPixelTracksGlbTrkMuon 0.0024 0.0021 12.16
hltAK8TrimModJets 0.0024 0.0023 1.64
hltIter0PFlowCkfTrackCandidatesForPhotons 0.0023 0.0020 13.32
hltIter2ElectronsPixelSeeds 0.0023 0.0020 14.81
hltEgammaGsfElectronsUnseeded 0.0023 0.0021 7.97

Performance Counters

counter Ori PGO diff %
arith_divider_active 214822386406 213569673717 0.58
branch-instructions 925031172213 860138743106 7.02
branch-misses 19320385473 19430058461 -0.57
cycle_activity_stalls_mem_any 1009452280776 866041434869 14.21
cycle_activity_stalls_total 1309652792639 1157488571103 11.62
cycles 4014228469111 3649932205042 9.08
fp_arith_inst_retired_128b_packed_double 53986899121 55980633902 -3.69
fp_arith_inst_retired_128b_packed_single 21079043496 20323329547 3.59
fp_arith_inst_retired_256b_packed_double 13 13 0.00
fp_arith_inst_retired_256b_packed_single 84711 126912 -49.82
fp_arith_inst_retired_scalar_double 252124445493 251619554458 0.20
fp_arith_inst_retired_scalar_single 124050430103 123279486351 0.62
instructions 5498351571318 5143077318552 6.46
mem_load_retired_l1_hit 1512210371447 1428293526339 5.55
mem_load_retired_l2_hit 19337741323 19241213627 0.50
mem_load_retired_l3_hit 5483904025 5409868454 1.35
mem_load_retired_l3_miss 2038078359 2096114259 -2.85
offcore_requests_outstanding_demand_data_rd_ge_6 26134941297 36162512979 -38.37
resource_stalls_any 1414985661187 1193712200091 15.64
rs_events_empty_cycles 257475283925 257061596841 0.16
seconds_time_elapsed 306 283 7.34
task-clock_(msec) 1055635 960126 9.05
uops_executed_cycles_ge_1_uop_exec 2704040938986 2490436233092 7.90
uops_executed_cycles_ge_2_uops_exec 2002392604536 1899541903228 5.14
uops_executed_cycles_ge_3_uops_exec 1321539136703 1288100218371 2.53
uops_executed_cycles_ge_4_uops_exec 742445276972 734131228637 1.12
uops_executed_stall_cycles 1310343143215 1155636886326 11.81
uops_issued_vector_width_mismatch 1042376696465 1036182252960 0.59

-- VincenzoInnocente - 2016-06-25

Edit | Attach | Watch | Print version | History: r14 < r13 < r12 < r11 < r10 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r14 - 2016-07-01 - VincenzoInnocente
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback