%ATLASPUBLICHEADER% ---+!! <nop> Trigger Software Upgrade Public Results %TOC% %STARTINCLUDE% ---+Introduction Approved plots that can be shown by ATLAS speakers at conferences and similar events. *Please do not add figures on your own.* Contact the responsible project leader in case of questions and/or suggestions. Follow the guidelines on the [[TriggerPublicResults#Guidelines][trigger public results]] page. ---+ Phase-I Upgrade public plots ------+ [[https://cds.cern.ch/record/2213588/][ATL-COM-DAQ-2016-116]] ATLAS Trigger GPU Demonstrator Performance Plots <table class="twikiTable" width="100%" bgcolor=#f5f5fa border=1 cellpadding=10 cellspacing=10> <colgroup><col width="70%"></colgroup> <tbody> <tr><td bgcolor="#eeeeee"> The ratio of event throughput rates with GPU acceleration to the CPU-only rates as a function of the number of Atlas trigger (Athena) processes running on the CPU. Separate tests were performed with Athena configured to execute only Inner Detector Tracking (ID), only Calorimeter topological clustering (Calo) or both (ID & Calo). The system was configured to either perform the work on the CPU or offload to one or two GPU. The system consisted of a two Intel(R) Xeon(R) E5-2695 v3 14-core CPU with a clock speed of 2.30GHz and two NVidia !GK210GL GPU in a Tesla K80 module. The input was a simulated 𝑡𝑡 ̅ dataset converted to a raw detector output format (bytestream). An average of 46 minimum bias events per simulated collision were superimposed corresponding to instantaneous luminosity of 1.7x10<sup>34</sup> cm<sup>-2</sup>s<sup>-1</sup>. The ID track seeding takes about 30% of event processing time on CPU and is accelerated by about a factor of 5 on GPU. As a result throughput increases by about 35% with GPU acceleration for up to 14 athena processes. The Calorimeter clustering algorithm takes about 8% of event processing time on CPU and accelerated by about a factor 2 on GPU, however the effect of the acceleration is offset by a small increase in the time of the non-accelerated code and as a result a small decrease in speed is observed with offloading to GPU. <td align="center"> <img width="300" src="%ATTACHURL%/speedupG2.png"/><br> [[%ATTACHURL%/speedupG2.png][png]] [[%ATTACHURL%/speedupG2.eps][eps]] [[%ATTACHURL%/speedupG2.pdf][pdf]] </td></tr> <tr><td bgcolor="#eeeeee"> Event throughput rates with and without GPU acceleration as a function of the number of Atlas trigger (Athena) processes running on the CPU. Separate tests were performed with Athena configured to execute only Inner Detector Tracking (ID), only Calorimeter topological clustering (Calo) or both (ID & Calo). The system was configured to either perform the work on the CPU or offload to one or two GPU. The system consisted of a two Intel(R) Xeon(R) E5-2695 v3 14-core CPU with a clock speed of 2.30GHz and two NVidia !GK210GL GPU in a Tesla K80 module. The input was a simulated 𝑡𝑡 ̅ dataset converted to a raw detector output format (bytestream). An average of 46 minimum bias events per simulated collision were superimposed corresponding to instantaneous luminosity of 1.7x10<sup>34</sup> cm<sup>-2</sup>s<sup>-1</sup>. A significant rate increase is seen when the ID track seeding is offloaded to GPU. The ID track seeding takes about 30% of event processing time on CPU and is accelerated by about a factor of 5 on GPU. A small rate decrease is observed when the calorimeter clustering is offloaded to GPU. The calorimeter clustering takes about 8% of event processing time on CPU and accelerated by about a factor 2 on GPU, however the effect of the acceleration is offset by a small increase in the time of the non-accelerated code. There is only a relatively small increase in rate when the number of Athena processes is increased above the number of physical cores (28). <td align="center"> <img width="300" src="%ATTACHURL%/rateG2.png"/><br> [[%ATTACHURL%/rateG2.png][png]] [[%ATTACHURL%/rateG2.eps][eps]] [[%ATTACHURL%/rateG2.pdf][pdf]] </td></tr> <tr><td bgcolor="#eeeeee"> The time-averaged mean number of Atlas trigger (Athena) processes in a wait-state pending the return of work offloaded to the GPU as a function of the number of running on the CPU. Separate tests were performed with Athena configured to execute only Inner Detector Tracking (ID), only Calorimeter topological clustering (Calo) or both (ID & Calo). The system was configured to offload work to one or two GPUs. The system consisted of a two Intel(R) Xeon(R) E5-2695 v3 14-core CPU with a clock speed of 2.30GHz and two NVidia !GK210GL GPU in a Tesla K80 module. The input was a simulated 𝑡𝑡 ̅ dataset converted to a raw detector output format (bytestream). An average of 46 minimum bias events per simulated collision were superimposed corresponding to instantaneous luminosity of 1.7x10<sup>34</sup> cm<sup>-2</sup>s<sup>-1</sup>. When offloaded to GPU, the ID track seeding takes about 8% of the total event processing time and so the average number of Athena processes waiting is less than 1 for up to about 12 Athena processes. The offloaded calorimeter clustering takes about 4% of event processing time on CPU and so the average number of Athena processes waiting is less than 1 for up to about 25 Athena processes. <td align="center"> <img width="300" src="%ATTACHURL%/occupancyG2.png"/><br> [[%ATTACHURL%/occupancyG2.png][png]] [[%ATTACHURL%/occupancyG2.eps][eps]] [[%ATTACHURL%/occupancyG2.pdf][pdf]] </td></tr> <tr><td bgcolor="#eeeeee"> Breakdown of the time per event for Inner Detector Track Seeding offloaded to a GPU showing the time fraction for the kernels running on the GPU (GPU execution) and the overhead associated with offloading the work (other). Track Seeding consists of the formation of triplets of hits compatible with a track. The overhead comprises the time to convert data-structures between CPU and GPU data-formats, the data transfer time between CPU and GPU and the Inter Process Communication (IPC) time that accounts for the transfer of data between the Atlas Trigger (Athena) processes and the process handling communication with the GPU. The system consisted of a two Intel(R) Xeon(R) E5-2695 v3 14-core CPU with a clock speed of 2.30GHz and two NVidia !GK210GL GPU in a Tesla K80 module. Measurements were made using one GPU and with 12 Athena processes running on the CPU. The input was a simulated 𝑡𝑡 ̅ dataset converted to a raw detector output format (bytestream). An average of 46 minimum bias events per simulated collision were superimposed corresponding to instantaneous luminosity of 1.7x10<sup>34</sup> cm<sup>-2</sup>s<sup>-1</sup>. <td align="center"> <img width="300" src="%ATTACHURL%/IDexecutiontimePiChart1.png"/><br> [[%ATTACHURL%/IDexecutiontimePiChart1.png][png]] [[%ATTACHURL%/IDexecutiontimePiChart1.pdf][pdf]] </td></tr> <tr><td bgcolor="#eeeeee"> Breakdown of the time per event for Inner Detector Tracking offloaded to a GPU showing the time fraction for the Counting, Doublet Making and Triplet Making kernels running on the GPU (GPU execution) and the overhead associated with offloading the work (other). The Counting kernel determines the number of pairs of Inner Detector hits and the Doublet and Triplet making kernels form combinations of 2 and 3 hits respectively compatible with a track. The overhead comprises the time to convert data-structures between CPU and GPU data-formats, the data transfer time between CPU and GPU and the Inter Process Communication (IPC) time that accounts for the transfer of data between the Atlas Trigger (Athena) processes and the process handling communication with the GPU. The system consisted of a two Intel(R) Xeon(R) E5-2695 v3 14-core CPU with a clock speed of 2.30GHz and two NVidia GK210GL GPU in a Tesla K80 module. Measurements were made using one GPU and with 12 Athena processes running on the CPU. The input was a simulated 𝑡𝑡 ̅ dataset converted to a raw detector output format (bytestream). An average of 46 minimum bias events per simulated collision were superimposed corresponding to instantaneous luminosity of 1.7x10<sup>34</sup> cm<sup>-2</sup>s<sup>-1</sup>. <td align="center"> <img width="300" src="%ATTACHURL%/IDexecutiontimePiChart2.png"/><br> [[%ATTACHURL%/IDexecutiontimePiChart2.png][png]] [[%ATTACHURL%/IDexecutiontimePiChart2.pdf][pdf]] </td></tr> <tr><td bgcolor="#eeeeee"> Breakdown of the time per event for the Atlas Trigger process (Athena) running Inner Detector (ID) Track Seeding on the CPU or offloaded to a GPU showing the time fraction for the Counting, Doublet Making and Triplet Making kernels running on the GPU (GPU execution) and the overhead associated with offloading the work (other). The Counting kernel determines the number of pairs of ID hits and the Doublet and Triplet making kernels form combinations of 2 and 3 hits respectively compatible with a track. The overhead comprises the time to convert data-structures between CPU and GPU data-formats, the data transfer time between CPU and GPU and the Inter Process Communication (IPC) time that accounts for the transfer of data between the Athena processes and the process handling communication with the GPU. The system consisted of a two Intel(R) Xeon(R) E5-2695 v3 14-core CPU with a clock speed of 2.30GHz and two NVidia GK210GL GPU in a Tesla K80 module. Measurements were made with one GPU and 12 Athena processes running on the CPU. Athena was configured to only run ID tracking. The input was a simulated 𝑡𝑡 ̅ dataset converted to a raw detector output format (bytestream). An average of 46 minimum bias events per simulated collision were superimposed corresponding to instantaneous luminosity of 1.7x10<sup>34</sup> cm<sup>-2</sup>s<sup>-1</sup>. <td align="center"> <img width="300" src="%ATTACHURL%/IDexecutiontimePiChart3.png"/><br> [[%ATTACHURL%/IDexecutiontimePiChart3.png][png]] [[%ATTACHURL%/IDexecutiontimePiChart3.pdf][pdf]] </td></tr> <tr><td bgcolor="#eeeeee"> Breakdown of the time per event for Calorimeter clustering offloaded to a GPU showing the time fraction for the kernels running on the GPU (GPU execution) and the overhead associated with offloading the work (other). The overhead comprises the time to convert data-structures between CPU and GPU data-formats, the data transfer time between CPU and GPU and the Inter Process Communication (IPC) time that accounts for the transfer of data between the Atlas Trigger (Athena) processes and the process handling communication with the GPU. The system consisted of a two Intel(R) Xeon(R) E5-2695 v3 14-core CPU with a clock speed of 2.30GHz and two NVidia !GK210GL GPU in a Tesla K80 module. Measurements were made using one GPU and with 14 Athena processes running on the CPU. The input was a simulated 𝑡𝑡 ̅ dataset converted to a raw detector output format (bytestream). An average of 46 minimum bias events per simulated collision were superimposed corresponding to instantaneous luminosity of 1.7x10<sup>34</sup> cm<sup>-2</sup>s<sup>-1</sup>. <td align="center"> <img width="300" src="%ATTACHURL%/CaloExecutionTimePiChart1.png"/><br> [[%ATTACHURL%/CaloExecutionTimePiChart1.png][png]] [[%ATTACHURL%/CaloExecutionTimePiChart1.pdf][pdf]] </td></tr> <tr><td bgcolor="#eeeeee"> Breakdown of the time per event for Calorimeter clustering offloaded to a GPU showing the time fraction for the Classification, Tagging and Growing kernels running on the GPU (GPU execution) and the overhead associated with offloading the work (other). The Classification kernel identifies calorimeter cells that will initiate (seed), propagate (grow), or terminate a cluster, the Tagging kernel assigns a unique tag to seed cells and the Growing kernel associates neighbouring growing or terminating cells to form clusters. The overhead comprises the time to convert data-structures between CPU and GPU data-formats, the data transfer time between CPU and GPU and the Inter Process Communication (IPC) time that accounts for the transfer of data between the Atlas Trigger (Athena) processes and the process handling communication with the GPU. The system consisted of a two Intel(R) Xeon(R) E5-2695 v3 14-core CPU with a clock speed of 2.30GHz and two NVidia !GK210GL GPU in a Tesla K80 module. Measurements were made using one GPU and with 14 Athena processes running on the CPU. The input was a simulated 𝑡𝑡 ̅ dataset converted to a raw detector output format (bytestream). An average of 46 minimum bias events per simulated collision were superimposed corresponding to instantaneous luminosity of 1.7x10<sup>34</sup> cm<sup>-2</sup>s<sup>-1</sup>. <td align="center"> <img width="300" src="%ATTACHURL%/CaloExecutionTimePiChart2.png"/><br> [[%ATTACHURL%/CaloExecutionTimePiChart2.png][png]] [[%ATTACHURL%/CaloExecutionTimePiChart2.pdf][pdf]] </td></tr> <tr><td bgcolor="#eeeeee"> Breakdown of the time per event for the Atlas Trigger process (Athena) running Calorimeter clustering on CPU and offloaded to a GPU showing the time for the Classification, Tagging and Clustering kernels running on the GPU (GPU execution) and the overhead associated with offloading the work (other). The Classification kernel identifies calorimeter cells that will initiate (seed), propagate (grow), or terminate a cluster, the Tagging kernel assigns a unique tag to seed cells and the Growing kernel associates neighbouring growing or terminating cells to form clusters. The overhead comprises the time to convert data-structures between CPU and GPU data-formats, the data transfer time between CPU and GPU and the Inter Process Communication (IPC) time that accounts for the transfer of data between the Atlas Trigger (Athena) processes and the process handling communication with the GPU. There is a small increase in the execution time of the non-accelerated code when the calorimeter clustering is offloaded to GPU. The system consisted of a two Intel(R) Xeon(R) E5-2695 v3 14-core CPU with a clock speed of 2.30GHz and two NVidia !GK210GL GPU in a Tesla K80 module. Measurements were made using one GPU and with 14 Athena processes running on the CPU. Athena was configured to only run Calorimeter Clustering. The input was a simulated 𝑡𝑡 ̅ dataset converted to a raw detector output format (bytestream). An average of 46 minimum bias events per simulated collision were superimposed corresponding to instantaneous luminosity of 1.7x10<sup>34</sup> cm<sup>-2</sup>s<sup>-1</sup>. <td align="center"> <img width="300" src="%ATTACHURL%/CaloExecutiontimePiChart3.png"/><br> [[%ATTACHURL%/CaloExecutiontimePiChart3.png][png]] [[%ATTACHURL%/CaloExecutiontimePiChart3.pdf][pdf]] </td></tr> </tbody> </table> ------+ [[https://cds.cern.ch/record/2214099/][ATL-COM-DAQ-2016-119]] Performance plots of HLT Inner Detector tracking algorithm implemented on GPU <table class="twikiTable" width="100%" bgcolor=#f5f5fa border=1 cellpadding=10 cellspacing=10> <colgroup><col width="70%"></colgroup> <tbody> <tr><td bgcolor="#eeeeee"> Transverse impact parameter distributions for the simulated tracks correctly reconstructed by the GPU-accelerated tracking algorithm and the standard CPU-only algorithm. The reference CPU algorithm was FastTrackFinder consisting of track seed (spacepoint triplet) maker and combinatorial track following; the GPU algorithm was FastTrackFinder with GPU-accelerated track seed maker. The simulated tracks were required to have pT>1GeV and |eta|<2.5 <td align="center"> <img width="300" src="%ATTACHURL%/HLT_a0.png"/><br> [[%ATTACHURL%/HLT_a0.png][png]] [[%ATTACHURL%/HLT_a0.eps][eps]] [[%ATTACHURL%/HLT_a0.pdf][pdf]] </td></tr> <tr><td bgcolor="#eeeeee"> Transverse momentum distributions for the simulated tracks correctly reconstructed by the GPU-accelerated tracking algorithm and the standard CPU-only algorithm. The reference CPU algorithm was FastTrackFinder consisting of track seed (spacepoint triplet) maker and combinatorial track following; the GPU algorithm was FastTrackFinder with GPU-accelerated track seed maker. The simulated tracks were required to have pT>1GeV and |eta|<2.5 <td align="center"> <img width="300" src="%ATTACHURL%/HLT_pT.png"/><br> [[%ATTACHURL%/HLT_pT.png][png]] [[%ATTACHURL%/HLT_pT.eps][eps]] [[%ATTACHURL%/HLT_pT.pdf][pdf]] </td></tr> <tr><td bgcolor="#eeeeee"> Track reconstruction efficiency as a function of simulated track azimuth for the GPU-accelerated tracking algorithm and the standard CPU-only algorithm. The reference CPU algorithm was FastTrackFinder consisting of track seed (spacepoint triplet) maker and combinatorial track following; the GPU algorithm was FastTrackFinder with GPU-accelerated track seed maker. The simulated tracks were required to have pT>1GeV and |eta|<2.5 <td align="center"> <img width="300" src="%ATTACHURL%/HLT_phi_eff.png"/><br> [[%ATTACHURL%/HLT_phi_eff.png][png]] [[%ATTACHURL%/HLT_phi_eff.eps][eps]] [[%ATTACHURL%/HLT_phi_eff.pdf][pdf]] </td></tr> <tr><td bgcolor="#eeeeee"> Track reconstruction efficiency as a function of simulated track transverse momentum for the GPU-accelerated tracking algorithm and the standard CPU-only algorithm. The reference CPU algorithm was FastTrackFinder consisting of track seed (spacepoint triplet) maker and combinatorial track following; the GPU algorithm was FastTrackFinder with GPU-accelerated track seed maker. The simulated tracks were required to have pT>1GeV and |eta|<2.5 <td align="center"> <img width="300" src="%ATTACHURL%/HLT_pT_eff.png"/><br> [[%ATTACHURL%/HLT_pT_eff.png][png]] [[%ATTACHURL%/HLT_pT_eff.eps][eps]] [[%ATTACHURL%/HLT_pT_eff.pdf][pdf]] </td></tr> </tbody> </table> <!-- *********************************************************** --> <!-- Do NOT remove the remaining lines, but add requested info as appropriate--> <!-- *********************************************************** --> ----- <!-- For significant updates to the topic, consider adding your 'signature' (beneath this editing box) --> <!-- Person responsible for the page: Either leave as is - the creator's name will be inserted; Or replace the complete REVINFO tag (including percentages symbols) with a name in the form Main.TwikiUsersName --> %RESPONSIBLE% Main.JohnBaines, Main.TomaszBold <br> %SUBJECT% public %BR% <!-- Once this page has been reviewed, please add the name and the date e.g. Main.StephenHaywood - 31 Oct 2006 --> %STOPINCLUDE%
Attachments
Attachments
Topic attachments
I
Attachment
History
Action
Size
Date
Who
Comment
pdf
CaloExecutionTimePiChart1.pdf
r1
manage
14.4 K
2016-09-22 - 17:14
JohnTMBaines
png
CaloExecutionTimePiChart1.png
r1
manage
100.1 K
2016-09-22 - 17:21
JohnTMBaines
pdf
CaloExecutionTimePiChart2.pdf
r2
r1
manage
15.9 K
2016-09-23 - 10:53
JohnTMBaines
png
CaloExecutionTimePiChart2.png
r2
r1
manage
114.0 K
2016-09-23 - 10:53
JohnTMBaines
pdf
CaloExecutionTimePiChart3.pdf
r1
manage
16.5 K
2016-09-22 - 17:12
JohnTMBaines
png
CaloExecutiontimePiChart3.png
r1
manage
135.4 K
2016-09-22 - 17:12
JohnTMBaines
eps
HLT_a0.eps
r1
manage
13.4 K
2016-09-23 - 17:16
DmitryEmeliyanov1
pdf
HLT_a0.pdf
r1
manage
16.4 K
2016-09-23 - 17:16
DmitryEmeliyanov1
png
HLT_a0.png
r1
manage
17.4 K
2016-09-23 - 17:16
DmitryEmeliyanov1
eps
HLT_pT.eps
r1
manage
10.1 K
2016-09-23 - 17:28
DmitryEmeliyanov1
pdf
HLT_pT.pdf
r1
manage
14.8 K
2016-09-23 - 17:28
DmitryEmeliyanov1
png
HLT_pT.png
r1
manage
16.7 K
2016-09-23 - 17:28
DmitryEmeliyanov1
eps
HLT_pT_eff.eps
r1
manage
10.6 K
2016-09-23 - 17:28
DmitryEmeliyanov1
pdf
HLT_pT_eff.pdf
r1
manage
14.9 K
2016-09-23 - 17:28
DmitryEmeliyanov1
png
HLT_pT_eff.png
r1
manage
16.6 K
2016-09-23 - 17:28
DmitryEmeliyanov1
eps
HLT_phi_eff.eps
r1
manage
10.2 K
2016-09-23 - 17:28
DmitryEmeliyanov1
pdf
HLT_phi_eff.pdf
r1
manage
14.5 K
2016-09-23 - 17:28
DmitryEmeliyanov1
png
HLT_phi_eff.png
r1
manage
15.3 K
2016-09-23 - 17:28
DmitryEmeliyanov1
pdf
IDexecutiontimePiChart1.pdf
r1
manage
15.2 K
2016-09-22 - 17:12
JohnTMBaines
png
IDexecutiontimePiChart1.png
r1
manage
47.2 K
2016-09-22 - 17:14
JohnTMBaines
pdf
IDexecutiontimePiChart2.pdf
r1
manage
16.6 K
2016-09-22 - 17:12
JohnTMBaines
png
IDexecutiontimePiChart2.png
r1
manage
132.6 K
2016-09-22 - 17:12
JohnTMBaines
pdf
IDexecutiontimePiChart3.pdf
r2
r1
manage
17.1 K
2016-09-24 - 19:13
JohnTMBaines
png
IDexecutiontimePiChart3.png
r2
r1
manage
113.1 K
2016-09-24 - 19:12
JohnTMBaines
eps
occupancyG2.eps
r1
manage
9.5 K
2016-09-22 - 15:19
JohnTMBaines
pdf
occupancyG2.pdf
r1
manage
18.9 K
2016-09-22 - 15:19
JohnTMBaines
png
occupancyG2.png
r1
manage
16.8 K
2016-09-22 - 15:19
JohnTMBaines
eps
rateG2.eps
r1
manage
12.3 K
2016-09-22 - 15:19
JohnTMBaines
pdf
rateG2.pdf
r1
manage
21.8 K
2016-09-22 - 15:19
JohnTMBaines
png
rateG2.png
r1
manage
20.5 K
2016-09-22 - 15:19
JohnTMBaines
eps
speedupG2.eps
r1
manage
13.7 K
2016-09-22 - 15:19
JohnTMBaines
pdf
speedupG2.pdf
r1
manage
21.2 K
2016-09-22 - 15:19
JohnTMBaines
png
speedupG2.png
r1
manage
20.9 K
2016-09-22 - 15:19
JohnTMBaines
Edit
|
Attach
|
Watch
|
P
rint version
|
H
istory
:
r8
<
r7
<
r6
<
r5
<
r4
|
B
acklinks
|
V
iew topic
|
Raw edit
|
More topic actions...
Topic revision: r7 - 2016-09-24
-
JohnTMBaines
ATLAS
Public ATLAS home
Internal TWiki
Changes
Notifications
RSS Feed
Physics Results
B Physics and Light States
Standard Model physics
Top physics
Higgs physics
Higgs and Diboson searches
Supersymmetry searches
Exotics searches
Heavy Ion physics
Physics Modelling
Upgrade Physics Studies
Performance Results
Tracking
Electron and Photon
Muon
Tau
Jets and Etmiss
Flavour Tagging
Simulation
Detector
Detector Systems, Trigger, Luminosity, Data Taking
Upgrade
Account
Log In
Cern Search
TWiki Search
Google Search
Atlas
All webs
Edit
Attach
Copyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use
Discourse
or
Send feedback