AtlasPublicTopicHeader.png

Trigger Software Upgrade Public Results

Introduction

Approved plots that can be shown by ATLAS speakers at conferences and similar events. Please do not add figures on your own. Contact the responsible project leader in case of questions and/or suggestions. Follow the guidelines on the trigger public results page.

Phase-I Upgrade public plots

ATL-COM-DAQ-2016-116 ATLAS Trigger GPU Demonstrator Performance Plots

The ratio of event throughput rates with GPU acceleration to the CPU-only rates as a function of the number of Atlas trigger (Athena) processes running on the CPU. Separate tests were performed with Athena configured to execute only Inner Detector Tracking (ID), only Calorimeter topological clustering (Calo) or both (ID & Calo). The system was configured to either perform the work on the CPU or offload to one or two GPU. The system consisted of a two Intel(R) Xeon(R) E5-2695 v3 14-core CPU with a clock speed of 2.30GHz and two NVidia GK210GL GPU in a Tesla K80 module. The input was a simulated 𝑡𝑡̅ dataset converted to a raw detector output format (bytestream). An average of 46 minimum bias events per simulated collision were superimposed corresponding to instantaneous luminosity of 1.7x1034 cm-2s-1. The ID track seeding takes about 30% of event processing time on CPU and is accelerated by about a factor of 5 on GPU. As a result throughput increases by about 35% with GPU acceleration for up to 14 athena processes. The Calorimeter clustering algorithm takes about 8% of event processing time on CPU and accelerated by about a factor 2 on GPU, however the effect of the acceleration is offset by a small increase in the time of the non-accelerated code and as a result a small decrease in speed is observed with offloading to GPU.
png eps pdf
Event throughput rates with and without GPU acceleration as a function of the number of Atlas trigger (Athena) processes running on the CPU. Separate tests were performed with Athena configured to execute only Inner Detector Tracking (ID), only Calorimeter topological clustering (Calo) or both (ID & Calo). The system was configured to either perform the work on the CPU or offload to one or two GPU. The system consisted of a two Intel(R) Xeon(R) E5-2695 v3 14-core CPU with a clock speed of 2.30GHz and two NVidia GK210GL GPU in a Tesla K80 module. The input was a simulated 𝑡𝑡̅ dataset converted to a raw detector output format (bytestream). An average of 46 minimum bias events per simulated collision were superimposed corresponding to instantaneous luminosity of 1.7x1034 cm-2s-1. A significant rate increase is seen when the ID track seeding is offloaded to GPU. The ID track seeding takes about 30% of event processing time on CPU and is accelerated by about a factor of 5 on GPU. A small rate decrease is observed when the calorimeter clustering is offloaded to GPU. The calorimeter clustering takes about 8% of event processing time on CPU and accelerated by about a factor 2 on GPU, however the effect of the acceleration is offset by a small increase in the time of the non-accelerated code. There is only a relatively small increase in rate when the number of Athena processes is increased above the number of physical cores (28).
png eps pdf
The time-averaged mean number of Atlas trigger (Athena) processes in a wait-state pending the return of work offloaded to the GPU as a function of the number of running on the CPU. Separate tests were performed with Athena configured to execute only Inner Detector Tracking (ID), only Calorimeter topological clustering (Calo) or both (ID & Calo). The system was configured to offload work to one or two GPUs. The system consisted of a two Intel(R) Xeon(R) E5-2695 v3 14-core CPU with a clock speed of 2.30GHz and two NVidia GK210GL GPU in a Tesla K80 module. The input was a simulated 𝑡𝑡̅ dataset converted to a raw detector output format (bytestream). An average of 46 minimum bias events per simulated collision were superimposed corresponding to instantaneous luminosity of 1.7x1034 cm-2s-1. When offloaded to GPU, the ID track seeding takes about 8% of the total event processing time and so the average number of Athena processes waiting is less than 1 for up to about 12 Athena processes. The offloaded calorimeter clustering takes about 4% of event processing time on CPU and so the average number of Athena processes waiting is less than 1 for up to about 25 Athena processes.
png eps pdf
Breakdown of the time per event for Inner Detector Track Seeding offloaded to a GPU showing the time fraction for the kernels running on the GPU (GPU execution) and the overhead associated with offloading the work (other). Track Seeding consists of the formation of triplets of hits compatible with a track. The overhead comprises the time to convert data-structures between CPU and GPU data-formats, the data transfer time between CPU and GPU and the Inter Process Communication (IPC) time that accounts for the transfer of data between the Atlas Trigger (Athena) processes and the process handling communication with the GPU. The system consisted of a two Intel(R) Xeon(R) E5-2695 v3 14-core CPU with a clock speed of 2.30GHz and two NVidia GK210GL GPU in a Tesla K80 module. Measurements were made using one GPU and with 12 Athena processes running on the CPU. The input was a simulated 𝑡𝑡̅ dataset converted to a raw detector output format (bytestream). An average of 46 minimum bias events per simulated collision were superimposed corresponding to instantaneous luminosity of 1.7x1034 cm-2s-1.
png pdf
Breakdown of the time per event for Inner Detector Tracking offloaded to a GPU showing the time fraction for the Counting, Doublet Making and Triplet Making kernels running on the GPU (GPU execution) and the overhead associated with offloading the work (other). The Counting kernel determines the number of pairs of Inner Detector hits and the Doublet and Triplet making kernels form combinations of 2 and 3 hits respectively compatible with a track. The overhead comprises the time to convert data-structures between CPU and GPU data-formats, the data transfer time between CPU and GPU and the Inter Process Communication (IPC) time that accounts for the transfer of data between the Atlas Trigger (Athena) processes and the process handling communication with the GPU. The system consisted of a two Intel(R) Xeon(R) E5-2695 v3 14-core CPU with a clock speed of 2.30GHz and two NVidia GK210GL GPU in a Tesla K80 module. Measurements were made using one GPU and with 12 Athena processes running on the CPU. The input was a simulated 𝑡𝑡̅ dataset converted to a raw detector output format (bytestream). An average of 46 minimum bias events per simulated collision were superimposed corresponding to instantaneous luminosity of 1.7x1034 cm-2s-1.
png pdf
Breakdown of the time per event for the Atlas Trigger process (Athena) running Inner Detector (ID) Track Seeding on the CPU or offloaded to a GPU showing the time fraction for the Counting, Doublet Making and Triplet Making kernels running on the GPU (GPU execution) and the overhead associated with offloading the work (other). The Counting kernel determines the number of pairs of ID hits and the Doublet and Triplet making kernels form combinations of 2 and 3 hits respectively compatible with a track. The overhead comprises the time to convert data-structures between CPU and GPU data-formats, the data transfer time between CPU and GPU and the Inter Process Communication (IPC) time that accounts for the transfer of data between the Athena processes and the process handling communication with the GPU. The system consisted of a two Intel(R) Xeon(R) E5-2695 v3 14-core CPU with a clock speed of 2.30GHz and two NVidia GK210GL GPU in a Tesla K80 module. Measurements were made with one GPU and 12 Athena processes running on the CPU. Athena was configured to only run ID tracking. The input was a simulated 𝑡𝑡̅ dataset converted to a raw detector output format (bytestream). An average of 46 minimum bias events per simulated collision were superimposed corresponding to instantaneous luminosity of 1.7x1034 cm-2s-1.
png pdf
Breakdown of the time per event for Calorimeter clustering offloaded to a GPU showing the time fraction for the kernels running on the GPU (GPU execution) and the overhead associated with offloading the work (other). The overhead comprises the time to convert data-structures between CPU and GPU data-formats, the data transfer time between CPU and GPU and the Inter Process Communication (IPC) time that accounts for the transfer of data between the Atlas Trigger (Athena) processes and the process handling communication with the GPU. The system consisted of a two Intel(R) Xeon(R) E5-2695 v3 14-core CPU with a clock speed of 2.30GHz and two NVidia GK210GL GPU in a Tesla K80 module. Measurements were made using one GPU and with 14 Athena processes running on the CPU. The input was a simulated 𝑡𝑡̅ dataset converted to a raw detector output format (bytestream). An average of 46 minimum bias events per simulated collision were superimposed corresponding to instantaneous luminosity of 1.7x1034 cm-2s-1.
png pdf
Breakdown of the time per event for Calorimeter clustering offloaded to a GPU showing the time fraction for the Classification, Tagging and Growing kernels running on the GPU (GPU execution) and the overhead associated with offloading the work (other). The Classification kernel identifies calorimeter cells that will initiate (seed), propagate (grow), or terminate a cluster, the Tagging kernel assigns a unique tag to seed cells and the Growing kernel associates neighbouring growing or terminating cells to form clusters. The overhead comprises the time to convert data-structures between CPU and GPU data-formats, the data transfer time between CPU and GPU and the Inter Process Communication (IPC) time that accounts for the transfer of data between the Atlas Trigger (Athena) processes and the process handling communication with the GPU. The system consisted of a two Intel(R) Xeon(R) E5-2695 v3 14-core CPU with a clock speed of 2.30GHz and two NVidia GK210GL GPU in a Tesla K80 module. Measurements were made using one GPU and with 14 Athena processes running on the CPU. The input was a simulated 𝑡𝑡̅ dataset converted to a raw detector output format (bytestream). An average of 46 minimum bias events per simulated collision were superimposed corresponding to instantaneous luminosity of 1.7x1034 cm-2s-1.
png pdf
Breakdown of the time per event for the Atlas Trigger process (Athena) running Calorimeter clustering on CPU and offloaded to a GPU showing the time for the Classification, Tagging and Clustering kernels running on the GPU (GPU execution) and the overhead associated with offloading the work (other). The Classification kernel identifies calorimeter cells that will initiate (seed), propagate (grow), or terminate a cluster, the Tagging kernel assigns a unique tag to seed cells and the Growing kernel associates neighbouring growing or terminating cells to form clusters. The overhead comprises the time to convert data-structures between CPU and GPU data-formats, the data transfer time between CPU and GPU and the Inter Process Communication (IPC) time that accounts for the transfer of data between the Atlas Trigger (Athena) processes and the process handling communication with the GPU. There is a small increase in the execution time of the non-accelerated code when the calorimeter clustering is offloaded to GPU. The system consisted of a two Intel(R) Xeon(R) E5-2695 v3 14-core CPU with a clock speed of 2.30GHz and two NVidia GK210GL GPU in a Tesla K80 module. Measurements were made using one GPU and with 14 Athena processes running on the CPU. Athena was configured to only run Calorimeter Clustering. The input was a simulated 𝑡𝑡̅ dataset converted to a raw detector output format (bytestream). An average of 46 minimum bias events per simulated collision were superimposed corresponding to instantaneous luminosity of 1.7x1034 cm-2s-1.
png pdf


png pdf

ATL-COM-DAQ-2016-119 Performance plots of HLT Inner Detector tracking algorithm implemented on GPU

Transverse impact parameter distributions for the simulated tracks correctly reconstructed by the GPU-accelerated tracking algorithm and the standard CPU-only algorithm. The reference CPU algorithm was FastTrackFinder consisting of track seed (spacepoint triplet) maker and combinatorial track following; the GPU algorithm was FastTrackFinder with GPU-accelerated track seed maker. The simulated tracks were required to have pT>1GeV and |eta|<2.5
png eps pdf
Transverse momentum distributions for the simulated tracks correctly reconstructed by the GPU-accelerated tracking algorithm and the standard CPU-only algorithm. The reference CPU algorithm was FastTrackFinder consisting of track seed (spacepoint triplet) maker and combinatorial track following; the GPU algorithm was FastTrackFinder with GPU-accelerated track seed maker. The simulated tracks were required to have pT>1GeV and |eta|<2.5
png eps pdf
Track reconstruction efficiency as a function of simulated track azimuth for the GPU-accelerated tracking algorithm and the standard CPU-only algorithm. The reference CPU algorithm was FastTrackFinder consisting of track seed (spacepoint triplet) maker and combinatorial track following; the GPU algorithm was FastTrackFinder with GPU-accelerated track seed maker. The simulated tracks were required to have pT>1GeV and |eta|<2.5
png eps pdf
Track reconstruction efficiency as a function of simulated track transverse momentum for the GPU-accelerated tracking algorithm and the standard CPU-only algorithm. The reference CPU algorithm was FastTrackFinder consisting of track seed (spacepoint triplet) maker and combinatorial track following; the GPU algorithm was FastTrackFinder with GPU-accelerated track seed maker. The simulated tracks were required to have pT>1GeV and |eta|<2.5
png eps pdf


Responsible: JohnBaines, TomaszBold
Subject: public

Topic attachments
I Attachment History Action Size Date Who Comment
PDFpdf CaloExecutionTimePiChart1.pdf r1 manage 14.4 K 2016-09-22 - 17:14 JohnTMBaines  
PNGpng CaloExecutionTimePiChart1.png r1 manage 100.1 K 2016-09-22 - 17:21 JohnTMBaines  
PDFpdf CaloExecutionTimePiChart2.pdf r2 r1 manage 15.9 K 2016-09-23 - 10:53 JohnTMBaines  
PNGpng CaloExecutionTimePiChart2.png r2 r1 manage 114.0 K 2016-09-23 - 10:53 JohnTMBaines  
PDFpdf CaloExecutionTimePiChart3.pdf r1 manage 16.5 K 2016-09-22 - 17:12 JohnTMBaines  
PNGpng CaloExecutiontimePiChart3.png r1 manage 135.4 K 2016-09-22 - 17:12 JohnTMBaines  
Unknown file formateps HLT_a0.eps r1 manage 13.4 K 2016-09-23 - 17:16 DmitryEmeliyanov1  
PDFpdf HLT_a0.pdf r1 manage 16.4 K 2016-09-23 - 17:16 DmitryEmeliyanov1  
PNGpng HLT_a0.png r1 manage 17.4 K 2016-09-23 - 17:16 DmitryEmeliyanov1  
Unknown file formateps HLT_pT.eps r1 manage 10.1 K 2016-09-23 - 17:28 DmitryEmeliyanov1  
PDFpdf HLT_pT.pdf r1 manage 14.8 K 2016-09-23 - 17:28 DmitryEmeliyanov1  
PNGpng HLT_pT.png r1 manage 16.7 K 2016-09-23 - 17:28 DmitryEmeliyanov1  
Unknown file formateps HLT_pT_eff.eps r1 manage 10.6 K 2016-09-23 - 17:28 DmitryEmeliyanov1  
PDFpdf HLT_pT_eff.pdf r1 manage 14.9 K 2016-09-23 - 17:28 DmitryEmeliyanov1  
PNGpng HLT_pT_eff.png r1 manage 16.6 K 2016-09-23 - 17:28 DmitryEmeliyanov1  
Unknown file formateps HLT_phi_eff.eps r1 manage 10.2 K 2016-09-23 - 17:28 DmitryEmeliyanov1  
PDFpdf HLT_phi_eff.pdf r1 manage 14.5 K 2016-09-23 - 17:28 DmitryEmeliyanov1  
PNGpng HLT_phi_eff.png r1 manage 15.3 K 2016-09-23 - 17:28 DmitryEmeliyanov1  
PDFpdf IDexecutiontimePiChart1.pdf r1 manage 15.2 K 2016-09-22 - 17:12 JohnTMBaines  
PNGpng IDexecutiontimePiChart1.png r1 manage 47.2 K 2016-09-22 - 17:14 JohnTMBaines  
PDFpdf IDexecutiontimePiChart2.pdf r1 manage 16.6 K 2016-09-22 - 17:12 JohnTMBaines  
PNGpng IDexecutiontimePiChart2.png r1 manage 132.6 K 2016-09-22 - 17:12 JohnTMBaines  
PDFpdf IDexecutiontimePiChart3.pdf r1 manage 17.0 K 2016-09-22 - 17:12 JohnTMBaines  
PNGpng IDexecutiontimePiChart3.png r1 manage 153.8 K 2016-09-22 - 17:12 JohnTMBaines  
Unknown file formateps occupancyG2.eps r1 manage 9.5 K 2016-09-22 - 15:19 JohnTMBaines  
PDFpdf occupancyG2.pdf r1 manage 18.9 K 2016-09-22 - 15:19 JohnTMBaines  
PNGpng occupancyG2.png r1 manage 16.8 K 2016-09-22 - 15:19 JohnTMBaines  
Unknown file formateps rateG2.eps r1 manage 12.3 K 2016-09-22 - 15:19 JohnTMBaines  
PDFpdf rateG2.pdf r1 manage 21.8 K 2016-09-22 - 15:19 JohnTMBaines  
PNGpng rateG2.png r1 manage 20.5 K 2016-09-22 - 15:19 JohnTMBaines  
Unknown file formateps speedupG2.eps r1 manage 13.7 K 2016-09-22 - 15:19 JohnTMBaines  
PDFpdf speedupG2.pdf r1 manage 21.2 K 2016-09-22 - 15:19 JohnTMBaines  
PNGpng speedupG2.png r1 manage 20.9 K 2016-09-22 - 15:19 JohnTMBaines  
Edit | Attach | Watch | Print version | History: r8 < r7 < r6 < r5 < r4 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r5 - 2016-09-23 - DmitryEmeliyanov1
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Atlas All webs login

  • Edit
  • Attach
This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback