Advanced multi-threading Topics for Geant4 Application Developers

This page is work in progress. It is based on Geant4 Version 10.0.beta. Some interfaces and classes may be reviewed and refined for final release.

How to customize threading model

This section describes how to change the default behavior for threads in Geant4. The goal of this section is to describe the internals details of how threads are started and controlled. This information is needed for advanced users that need to modify parallelism model behavior.

Just as an example, the user can, in increasing order of complexity:

  1. Change the way events are assigned to threads (e.g. from the default run-robin to a queue based model)
  2. Add user-specific initialization code in thread initialization/termination functions
  3. Replace completely threading model (by default based on pthreads) to allow custom threading libraries or frameworks (e.g. tbb)

Several classes, all in the run category (source/run) are responsible for threading handling of Geant4. Their interfaces and usage is described hereafter.

Introduction: event-level parallelism via multi-threading in Geant4 Version 10.0.beta

Overview

Geant4 Version 10.0.beta introduces parallelism at the event level: events are tracked concurrently by independent threads. Geant4 Version 10.0.beta uses a master/worker model in which threads are responsible of performing the simulation, while the main control flow controls and steers the work. A diagram of the general overview of a multi-threaded Geant4 application is shown here:

Per-event seeds are pre-generated by the main control flow to guarantee reproducibility, each worker is responsible of creating a new G4Run and simulating a sub-set of the events, at the end of the run the results from each run are merged into the global run.

General design

Geant4 Version 10.0.beta threading model is based on POSIX standard and the pthread library (on Linux and Mac OS). The use of POSIX standards guarantees maximum portability between system and integration with advanced parallelization frameworks (for example we have verified this model co-works with TBB and MPI).

Threads share the most memory consuming objects (geometry and physics tables), while they own per-thread instances of the other classes (e.g. SensiticeDetectors, hits, etc). This allow to create a code that is lock-free (i.e. no use of mutex during the event loop), this guarantees maximum scalability (cfr: Euro-Par2010, Part II LNCS6272, pp.287-303). Thread safety is obtained via Thread Local Storage.

In Geant4 Version 10.0.beta threads (a.k.a. Workers) are started to perform the simulation of G4Events. An additional control flow (a.k.a. Master) is responsible for spawning workers, assigning events, starting one or more runs, collecting and reducing results from workers and finally terminating the workers. While there is a single instance of master it exists a worker for each thread. Compare this model to the sequential version of Geant4: both roles of master and workers are done by the same entity (an instance of G4RunManager, or user-defined sub-class).

In a similar way to the sequential version of Geant4, master and workers are represented by instances of sub-classes inheriting from G4RunManager: the G4MTRunManager class represents the master model, while G4WorkerRunManager instances represent worker models. The user is responsible of instantiating a single G4MTRunManager (or derived user-class) instance. This is responsible of instantiating and owning one or more G4WorkerRunManager instances. User should never instantiate directly an instance of G4WorkerRunMAnager class.

A simplified class-diagram of the relevant classes for multi-threading is here:

As in sequential Geant4 user interacts with Geant4 kernel via user initializations and user actions. In Geant4 multi-threaded user initializations (G4VUserDetectorConstruction, GVUserPhysicsList and the new G4VUserActionInitializtion) instances are shared among all threads (as such they are attached to G4MTRunManager instance); while user actions (G4VUserPrimaryGeneratorAction, G4UserRunAction, G4UserSteppingAction and G4UserTrackingAction) are not shared and a separate instance exists for each thread.

Since the master thread does not perform simulation of events user actions do not have functions for G4MTRunManager and cannot be assigned to it. G4RunAction is the exception to this rule since it can be attached to the master G4MTRunManager to allow for merging of partial results produced by workers.

The role of G4VUserActionInitialization class

The new G4VUserActionInitialization class has been introduced to allow the instantiation of per-worker user actions (see QuickMigrationGuideForGeant4V10 manual): the pure virtual method virtual void Build() const is invoked by G4WorkerRunManager=s and should contain user code needed to instantiate user actions. At the same time the virtual method =virtual void BuildForMaster() const is invoked by G4MTRunManager and it tipically contains the user code needed to instantiate the master G4RunAction.

Since the instance of this class is shared among threads it is important to pay some attention is needed if non-const resources are modified (note that to enforce the correct behavior the methods are marked as const).

The role of the master: G4MTRunManager class

This class implements the role of the master. It is a sub-class of G4RunManager and as such implements the methods needed to start a run. Few new methods are designed to allow customization of the threading model.

Important Note: The following documentation is based on Geant4 Version 10.0.beta. As such we plan to review and refine APIs for the final Geant4 Version 10.0 release. Some interfaces may change, but the changes will be minor and the general structure will remain.

Method name Purpose Is Virtual? Comments
void SetNumberOfThreads( G4int ) Sets number of worker threads to be used in simulation. No A corresponding UI command exists: /run/numberOfThreads n. It is not possible to change the number of workers after the first run.
G4int GetNumberOfThreads() const Returns number of workers. No  
static long GetSeed( G4int i) Returns the =i=-th seed from the list of RNG seeds. No Handling of RNG seeds for the events may be refined for the final release. Used internally by G4WorkerRunManager to re-seed current event.
static G4int NumberOfAvailableSeeds() Returns the number of available RNG seeds left. No Handling of RNG seeds for the events may be refined for the final release.
G4bool InitializeSeeds( G4int nEvts) Called by master to allow user-derived RNG seeds initialization. It returns true if initialization was performed. Yes In combination with AddOneSeed it allows for customization of RNG from user.
void AddOneSeed( long seed) Add the given seed to the list of RNG seeds. No  
void InitializaeSeedsQueue(G4int ns ) Prepares RNG seeds queue. No  
void PrepareCommandStack() Prepares UI commands to be passed to workers. Yes  
std::vector GetCommandStack() Returns copy of UI commands for workers. No  
static G4ScoringManager* GetMasterScoringManager() Returns the master instance of the scoring manager. No To be reviewed for final release.
static masterWorlds_t& GetMasterWorlds() Returns list of master physical worlds. No To be reviewed for final release.
static void addWorld( G4int counter, G4VPhysicalVolume* w) Adds a physical world to the list of master ones. No To be reviewed for final release.
void AddWorkerRunManager( G4WorkerRunManager* ) Called by worker threads to register the G4WorkerRunManager instance to the list of workers. No It responsibility of the master to delete worker run managers.
void MergeScores(const G4ScoringManager*) Called by workers, passing worker private scorer manager, to merge the local results with the global ones. No  
void MergeRun(const G4Run*) Called by workers, passing worker private run, to merge the local results with the global ones. No  

Some methods are used to implement barriers between master and workers. A barrier is a mechanism of synchronization between master and workers: the former requests a barrier during the life time of the job when it needs all workers to be synchronized before continuing the job. The barrier method will not return until all workers have reached the barrier point. Each barrier is made of two calls: the master thread calls a barrier waiting for workers, each worker makes a call to a method of the shared G4MTRunManager instance to singal that it has reached the barrier point. This call will not return until all threads have reached the barrier point.

Note: Barrier mechanism may be reviewed for final release and we may factor out the code to separate classes.

Currently the barrier mechanism is implemented in G4MTRunManager with the methods:

  1. Begin of event loop barrier: ThisWorkerReady (for workers) and WaitForReadyWorkers (for master)
  2. End of event loop barrier: ThisWorkerFinishWork and WaitForEnfEventLoopWorkers
  3. Barrier to signal new run or request to terminate thread: ThisWorkerWaitForNextAction and NewActionRequest (this may be expanded in the future to allow more actions, i.e. change of state of worker threads)

Some virtual base class methods from G4RunManager have been modified with respect to sequential mode to allow for simulating events with multithreading. In particular the event-loop logic is changed since the G4MTRunManager is not responsible for the actual simulation of any event:

Inherited Metod Change introduced
Constructors The G4MTRunManager uses the protected constructor from the base class that accepts a boolean as argument. The boolean specifies if instantiated run manager should implement the master (false) or worker (true) behavior.
Destructor In the destructor threads are terminated. Note: this may change in the final release.
TerminateEventLoop Empty
ProcessOneEvent Empty
TerminateOneEvent Empty
InitializeEventLoop RNG seeds in a number sufficient to re-seed each event are generated; then worker threads are created and started. Master then signals to workers a new run can start and it waits for them to start the event loop (barrier).
RunTermination Master waits for workers to end the event loop, then it calls base class method. It is important that the master first waits for workers and then executes base class functionalities (see equivalent method for G4WorkerRunManager).
ConstructScoringWorlds After calling base method, it fills the list of physics worlds (used by workers)
SetUserAction These methods, with the exception of SetUserAction(G4UserRunAction*) do not have a purpose for G4MTRunManager, they throw an exception if user attempts to use them.

The role of the worker threads: G4WorkerRunManager class

This class is instantiated, for each worker thread, by Geant4 kernel and it is used to implement the per-worker event loop simulation. This inherits from G4RunManager class and modify the behavior of the sequential simulation. In particular the worker should not construct new instances of shared objects (the user initializations), but only initialize the thread-private objects.

Some new interfaces have been introduced:

Method name Purpose Is Virtual? Comments
SetWorkerThread(G4WorkerThread*) Store a reference to the per-worker instance of the G4WorkerThread (see later) No  
SetupDefaultRNGEngine() This allows for the creation of a per-worker RNG engine of the same type of the one used in the master. No This may be reviewed in the final release. If a non CLHEP engine is used in the application, additional code should be done in G4UserWorkerInitialization class (see later)

Some methods, inherited from the base class G4RunManager have been changed to take into accounts the new behavior needed for multi-threading:

Inherited Metod Change introduced
Constructors The G4MTRunManager uses the protected constructor from the base class that accepts a boolean as argument. The boolean specifies if instantiated run manager should implement the master (false) or worker (true) behavior.
Destructor User initialization are shared among threads and are owned by master. Do not delete them.
InitializeGeometry() Only worker private sensitive detector and field manager are instantiated, the rest is shared with master.
DoEventLoop(...) Only the subset of events assigned to this worker are processed in the event loop. In addition use barrier mechanism to signal to the master when event loop is starting/ending. It also calls a per-worker user hook just before the beginning of event loop barrier (G4UserWorkerInitialization::WorkerRunStart() , see later) and one just after the event loop (G4USerWorkerInitialization::WorkerRunEnd().
ProcessOneEvent(G4int ievt) Before doing the same operations of sequential Geant4, re-seed current worker RNG engine with the per-event seed.
RunTermination() Reduce local scorer and G4Run with the global one. Signal to master the run is terminated with barrier mechanism (it is very important that this is done after the reduction has taken place to ensure that the G4MTRunManager::RunTermination() is executed at the correct moment.
ConstructScoringWorlds() Scoring worlds in workers need a special treatment since process manager associated to particles is per-thred.
SetUserInitialization Since user initializations belong to the master model and have no function in workers, these methods throw an exception if the user tries to access them. An exception being physics list, that requires special treatment for worker.

Worker thread specific initializations: G4VUserActionInitialization, G4UserWorkerInitialization and G4WorkerThread classes

These three classes control and implement different aspects of the multi-threaded behavior and are used by worker or master run managers to control different aspects of the thread model.

G4VUserActionInitialization is the simplest one and is used by master and workers to create the local instances of user actions. See Geant4MTForApplicationDevelopers.

G4UserWorkerInitialization is used by the kernel to initialize worker threads. It is the class that implements the details of the threading models and provide some additional user hooks related to the further customization of workers behavior. Geant4 Version 10.0.beta usage of pthread=s, (via the wrappers defined in =G4Threading.hh) becomes concrete in this class. In particular the static method:

static void* StartThread(void* context)
is the one that is passed to the macro G4THREADCREATE (wrapper around pthread_create function). This function performs the following operations (in order) for each worker thread:
  • Step 0: Initializes per-thread cout and cerr streams
  • Step 1: Initializes per-thread RNG engine and initialize per-thread part of split-classes (geometry and physics vectors)
  • Step 2: Instantiates a G4WorkerRunManager
  • Step 3: Sets up shared reference to detector construction and physics list
  • Step 4: Calls user methods to define user actions via G4UserActionInitialization::Build(), then it calls user hook G4UserWorkerInitialization::WorkerStart()
  • Step 5: Enters a loop waiting for new actions to be performed. Currently only to actions are supported: "do a run" or "terminate thread". If a new run is requested, it gets the list of UI commands and executes them. Note: currently the list of commands must contain the command "/run/beamOn" to have worker threads start process the events. This will change in the future. In addition a temporary work-around to re-initialize part of split classes between runs is in place, this will be removed/improved for final release.
  • Step 6: Calls user hook G4USerWorkerInitialization::WorkerStop() and terminate.

Some virtual methods are provided to allow for customization of the threading model without the need to rewrite the StartThread function, all these methods are by default empty.

Method Purpose
WorkerInitialize() called once after the tread is created but before the local G4WorkerRunManager is instantiated.
WorkerStart() called once at the beginning of simulation job when kernel classes and user action classes have already instantiated but geometry and physics have not been yet initialized.
WorkerRunStart() This method is called before an event loop. Geometry and physics have already been set up for the thread. All threads are synchronized and ready to start the local event loop.
WorkerRunEnd() This method is called when the local event loop has finished but before the synchronization over threads.
WorkerStop() This method is called once at the end of the program just before the thread is going to stop.

It is important to notice that these methods are all marked as const, this is because they are called concurrently by all worker threads but the instance of G4UserWorkerInitialization is shared among threads. In addition it is a good idea to implement in this method only very general code that is independent of the Geant4 application (i.e. these methods should not be confused with user-actions). One can imagine the StartThread function as the equivalent of the main function for worker threads.

The virtual method void SetupRNGEngine(const CLHEP::HepRandomEngine* aRNGEngine) const is called by worker threads to set the Random Number Generator Engine. The default implementation "clones" the engine from the master. User needs to re-implement this method if using a non-standard RNG Engine (i.e. a different one w.r.t. the one provided in the CLHEP version supported by G4). The virtual method G4Thread* CreateAndStartWorker(G4WorkerThread* workerThreadContext) is called by the kernel to create a new thread worker and to start it. User should not re-implement this unless he/she wants to overwrite the default threading model (e.g. not using pthreads). The function parameter is a per-worker instance of the G4WorkerThread class. It is a simple container class that holds information about the thread: the thread id, the total number of events and total number of workers. Note: This last class will be reviewed for the final release.

Memory Handling in MT

The subject is analyzed in detail in Geant4MTForKernelDevelopers

MPI Based simulations

Geant4 comes with examples that shows how to use MPI to perform job parallelization: examples/extended/parallel/MPI/. These work with multi-threaded. Note: This section is under construction and detailed information will be provided.

Random Number Generators

The requirement on RNG for Geant4 Version 10.0 is the full event-level strong reproducibility.

Each event simulated in Geant4 Ver.10 should be reproducible (on the same hw) with strong reproducibility independently of the number of threads used in the original program. Strong Reproducibility is defined as the ability to start a new job with 1 event and 1 thread and simulate exactly any event of a generic job.

Strategy:

Use CLHEP implementation with thin G4 wrapper to guarantee multi-threading capability (as in prototype mt_9.4.p01). Tests are created to verify basic functionality of RNG in G4MT: thread should have indpendentent streams of random numbers. Threads should be reproducible.

With lower priority we investigate the possibility to use specific RNGs developed for parallel applications. The interest in these pRNG is mainly for performance reasons, alternative should have at least same "randomness" performances of CLHEP ones.

Two unit tests have been developed in global/HEPRandom category: testRandMT and testRandMarsaglia.

testRandMT unit test checks the technical implementation of the G4 random layer on top of CLHEP for MT on all available distributions and all engines with a subset of tests:

  1. Basic functionality (generate random numbers without crashes)
  2. Weak Reproducibility with a single thread
  3. Strong Reproducibility with a single thread
  4. Simple MT, start several thread with same initial random seed, check all final numbers are the same
  5. Weak Reproducibility with multiple threads
  6. Strong Reproducibility with multiple threads
  7. NoHistoryMemory1: Start multiple threads with different seeds, halfway reseed with common seed. Check all final numbers are the same
  8. NoHistoryMemory2: The opposite with respect to the previous. Start all threads with common seed and halfway reseed with different seed. Final results are different

testRandMarsaglia generates a file in the format needed by the statistical battery test DIEHARD from Marsaglia et al. Random numbers are generated in different threads.

Interesting links

For the moment just a collection of links, there is a thread in MT mailing list.
  1. http://csrc.nist.gov/groups/ST/toolkit/rng/documents/SP800-22rev1a.pdf
  2. http://csrc.nist.gov/groups/ST/toolkit/rng/index.html
  3. http://www.iro.umontreal.ca/~lecuyer/myftp/papers/testu01.pdf
  4. http://docs.nvidia.com/cuda/curand/index.html#topic_2__nist
  5. http://docs.nvidia.com/cuda/curand/index.html#topic_1
  6. http://sprng.fsu.edu
  7. http://pdf.aminer.org/000/669/477/testing_random_number_generators.pdf
  8. http://www.stat.fsu.edu/pub/diehard/
  9. http://www.ics.uci.edu/~smyth/courses/ics178/random_number_generators_article.pdf
Topic attachments
I Attachment History Action Size Date Who Comment
JPEGjpg G4V10GeneralSchema.jpg r2 r1 manage 153.1 K 2013-12-05 - 00:47 AndreaDotti General Schema of a MT application (version for V10)
JPEGjpg G4V10MasterWorker-ClassDiag.jpg r1 manage 50.2 K 2013-06-27 - 02:59 AndreaDotti Class diagram for relevant classes: Worker / Master and RunManagers
Edit | Attach | Watch | Print version | History: r15 < r14 < r13 < r12 < r11 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r15 - 2013-12-11 - AndreaDotti
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Geant4 All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback