Multi Threading Task Force Entry Point


CLOSED
With the public release of Geant4 version 10.0 (December 2013) and the successful integration of multi-threading capabilities, the task force has been absorbed by the Run, Event and Detector Description Working Group that takes over the task-force responsibilities.

This page will remain to contain updated information.
CLOSED

This is the twiki page for the Geant4 Multi Threading Task Force working group. The task force has been created by the Geant4 SB in February 2013 with a mandate of two years with the following charges:

  1. collect all the relevant expertise and act as the center of gravity,
  2. drive the collaboration-wide effort of MT migration, and
  3. drive the collaboration-wide effort of documenting MT-related matters.
The Task Force coordinator is Andrea Dotti. The task force has a e-group: geant4-mt-developers open to Geant4 collaboration members.

Announcements

NEW Geant4 Version 10.2 has been released on 6 December 2015.NEW
It can be downloaded from Geant4 website. Consult the Release Notes for a list of updates and announcements regarding multi-threading capabilities in Geant4.
Manuals have been updated and can be downloaded and consulted from User Documentation pages. Note that the manuals linked from this twiki have been integrated in Geant4 official documentation. In the next months the content of these pages will evolve to keep track of new developments and improvements in Geant4 Version 10.X.Y. Please refer to the User Documentation for information regarding Version 10.2.

Documentation and manuals

This section contains links to the current documentation regarding multi-threading. Please note that these pages are work in progress. An edited version of these pages has been merged into the Geant4 documentation manuals for Geant4 Version 10.2.

How-to multithreading for kernel developers

Geant4MTForKernelDevelopers includes documentation on how to modify and develop Geant4 code taking into account MT requirements: what are split-classes, thread-safety via thread-local-storage, design of relevant classes.

A page with details for Hadronic WG is available here: HadronicsMTNotes

How-to multithreading for application developers

Geant4MTForApplicationDevelopers includes documentation on how to migrate user-code to be used with a multi-threaded build of Geant4.
QuickMigrationGuideForGeant4V10 contains the twiki version of the previous link that has become the content of the official Geant4 documentation.

Geant4 on Xeon Phi

A tentative manual to run Geant4 on Xeon Phi architectures can be found here: XeonPhiSupport

Results

Warning, important Not yet fully updated with Geant4 Version 10.1 Results Warning, important
This section contains few highlights of physics and performances results obtained with Geant4 Version 10.2 with multi-threading enabled. Note that the results should be considered preliminary. You are welcome to contact Andrea Dotti if you need more information. If you need a copy of these results for a presentation or note, you should contact Andrea Dotti and Makoto Asai for permission.

Physics Performances

Physics performances have been evaluated comparing the simulation outputs of a simplified version of LHC calorimeters obtained with Geant4 Version 10.0.beta with multi-threading and the same simulation application obtained with the sequential version of Geant4. As expected results are statistically compatible. This set of plots compares the simulation obtained with showers of 20 GeV protons impinging on a W/liquid Ar sampling calorimeter (red sequential, blue multi-threaded). Some extracts are shown here: energy deposited in the calorimeter (total of absorber and active material), spectra of all secondary neutrons produced in the shower and a similar spectra for pions.

The following plot compares the response, as a function of pion beam energies, of three different versions of Geant4 agains test-beam data (NB: in this case 10.0.beta-cand02-MT reffers to a multi-threaded build of Geant4, but using a sequential version of the application).

More information can be found here.

Reproducibility

Reproducibility has been studied for the simplified calorimeter setups. Weak reproducibility is confirmed for Geant4 multi-threaded: if a job is started twice with the same initial random seed the status of the RNG engines is the same at the end of both jobs.

Strong reproducibility is confirmed in the majority of the cases: re-seeding a sequential application during the event loop, with the random seed of a particular event obtained with the multi-threaded application reproduces the RNG engine status in 100% of the cases. The optional module Radioactive decay violates reproducibility and this problem is being addressed. A possible non-reproducibility when using NeutronHP module is also being investigated.

More information can be found here. Note: The results of this presentation are very old (version 10.0.beta) and not updated, but slides 8 and 9 contain details about our reproducibility tests.

CPU and Memory Performances UPDATED

There are two metrics that are important for Multi-threading applications:
  1. Linear speedup of throughput: the ideal speedup is linear with the number of threads (double of events can be produced in the same amount of time if the number of threads doubles). Since threads compete for resources the real-world speedup is less than linear.
  2. Absolute throughput of 1 thread with respect to sequential Geant4: threads have an overhead that may be responsible of reducing absolute performances of the same application when multi-threaded activated.

For Geant4 Version 10.0 and 10.1 the collaboration has concentrated on maximizing speedup linearity, while absolute throughput has not been optimized yet. However results show that Geant4 Version 10.1 absolute throughput with or without multi-threading enabled is very similar, in addition linearity of speedup is maintained also on a large number of threads (>90% efficiency).

The most recent results can be obtained on the Geant4 Profiling and Benchmarking Results page in section 6.

Performances of development versions of Geant4 Version 10.0 have been also measured on Intel architecture and on Intel Xeon Phi (MIC) architectures, showing a good linearity for a large number of threads (see XeonPhiSupport for compilation instructions):

Memory consumption, as a function of the number of threads is also measured for Intel Xeon Phi architecture (results show comparison of latest public releases):NEW

The following table shows a preliminary inter comparison of different architectures. Numbers are raw throughput obtained for a single computing unit (in general a CPU, a full card for MIC):

Finally the following plot shows a inter-comparison of performances obtained with the same setup and application changing compilation options:

Weak Vs Strong Scaling: how to measure MT performances NEW

When measuring speedup as a function of thread numbers in MT mode it is important to specify if the measurements are performed for strong or weak scaling. The former being the time needed to perform the simulation keeping fixed the number of events independently of the number of threads and the latter keeping fixed the number of events per thread.
In general the strong scaling shows a deviation from ideal linearity larger than the weak scaling because, simplifying, you need to wait the slowest thread to declare the job finished. This last events peeling is a sequential part of the code (all other threads have finished already) and a stronger deviation from linearity is introduced. Depending on how events are scheduled to threads and on the nature of the events to be simulated (complex and long vs simple and fast) the differences may be large as demonstrated in the following two plots, showing the exact same simulation (CMS geometry, 50 GeV pions) in the two regimes.

Weak Scaling:

Strong Scaling (less data points have been considered in this case):

There is no correct way to measure speedup, and it depends on the application type. We usually prefer to report results obtained in weak scaling regime because they are more representative of the speedup of the throughput (events/sec) and are less sensitive to the specific type of the application and events being simulated. In publications reporting this number, please specify if weak or strong scaling was measured.


Acknowledgements: The results of this page have been produced by S.Y. Jun (FNAL), A. Ribon, G. Lestaris and W. Pokorski (CERN) and A. Dotti (SLAC).


The old version of this page (used during developments of Geant4 Version 10.0.beta), is available at revision 71 (check History at the bottom of this page).
Topic attachments
I Attachment History Action Size Date Who Comment
Unknown file formatmpp MTDevel.mpp r1 manage 223.5 K 2013-03-25 - 23:49 AndreaDotti Developments todos and Gantt Chart
PDFpdf MTDevel.pdf r1 manage 41.2 K 2013-03-25 - 23:53 AndreaDotti Todos (update March 25)
PNGpng Memory.png r1 manage 19.9 K 2013-07-01 - 22:06 AndreaDotti Memory linearity
PNGpng edep.png r1 manage 33.5 K 2013-07-02 - 19:13 AndreaDotti Energy deposit in calorimeter (comparison)
PNGpng fnal3.png r1 manage 57.4 K 2014-07-09 - 18:12 AndreaDotti AMD Performances: scalability G4 V10.0.p02
PNGpng memoryusage.png r2 r1 manage 49.3 K 2016-02-17 - 00:24 AndreaDotti  
PNGpng mic.png r1 manage 101.7 K 2013-07-01 - 22:07 AndreaDotti Xeon Phi performances
PNGpng mic2.png r1 manage 211.6 K 2014-07-09 - 18:13 AndreaDotti Xeon Phi Performances: V10.0
PNGpng neutrons.png r1 manage 37.4 K 2013-07-02 - 19:13 AndreaDotti Neutrons spectra (Comparison)
PDFpdf output_20GeV_p_WLAr_10.0.beta.cand02.pdf r1 manage 285.9 K 2013-07-02 - 19:12 AndreaDotti Comparison between MT and sequential SimplifiedCalorimeter
PNGpng pions.png r1 manage 39.7 K 2013-07-02 - 19:13 AndreaDotti Pions spectra in showers (comparison)
PNGpng strongScaling.png r1 manage 12.6 K 2014-12-15 - 19:57 AndreaDotti  
PNGpng testbeam.png r1 manage 97.7 K 2013-07-02 - 19:14 AndreaDotti Test beam (comparison)
PNGpng throughput_cmsExpMT_pi_E50.png r2 r1 manage 9.1 K 2016-02-17 - 00:25 AndreaDotti  
PNGpng weakScaling.png r1 manage 14.3 K 2014-12-15 - 19:57 AndreaDotti  
Edit | Attach | Watch | Print version | History: r85 < r84 < r83 < r82 < r81 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r85 - 2016-02-17 - AndreaDotti
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Geant4 All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback