Types of Modules in the Threaded Framework

Complete: 4

Contacts

Uses of Member Data in Modules

The two broad types of modules in the threaded framework differ based on how their member data is used. Therefore we first must discuss the various ways modules uses data.

Configuration

Configuration data is data which corresponds to immutable module state which is set at construction time. One example would be a variable that holds a copy of a configuration parameter. Additionally, if a helper class is instantiated in the constructor and that helper function never changes any internal state (i.e. changes any member data) while processing Events then that helper would also be configuration data. In contrast, if you have a helper class which is instantiated in the constructor but the internals of the helper do change while processing the event (because it is caching data while doing its calculation) that helper would NOT be a configuration data. It would instead be considered 'Stream Scratch' data.

Shared

By Shared we referred to thread-safe read/write data which is shared across all Streams. An example would be an output file and an associated serial task queue. A module would create tasks whos purpose is to write data to the file and then add those task to the serial task queue. The serial task queue would then run one task at a time thereby guaranteeing that only one write to that file is happening at any moment in time. This provides a thread-safe mechanism to write to the file and can therefore be shared across all concurrently running Streams.

Run and LuminosityBlock Cache

This is data which is dependent upon quantities associated with a Run or LuminosityBlock but not dependent upon Event data and is therefore. This data can therefore be shared by all Streams which are processing that Run or LuminosityBlock. Each Run or LuminosityBlock would get its own instance of such data. Therefore that data would be created on the Global begin Run or begin LuminosityBlock transition and not be allowed to change after that point.

Stream Scratch

Stream scratch data covers all data structures which can be modified during the processing of one Event. To avoid concurrency problems, we duplicate these structures so each Stream has its own copy. Since these are Stream based they can be updated on an Event as well as on Stream begin Run or Stream begin LuminosityBlock transitions.

Stream based Job, Run and LuminosityBlock Summary

This pertains to summary information obtained from the Events seen by one particular Stream. For example, you can have a count of the number of events seen and number of events passed by one Stream for one particular LuminosityBlock. Normally a summary based only on the Events seen by one Stream is not a particularly interesting quantity since there is no a-priori way to specify which Events will be seen by which Streams. However, having all Streams indepedently build a partial Summary and then use those partial summaries to build a final Global summary is a useful pattern.

Global based Job, Run and LuminosityBlock Summary

This is data which summaries data obtained for all Events in a given Run, LuminosityBlock or Job. As mentioned in the Stream based summaries, a good pattern to follow to create a Global Summary is first to construct Stream partial summaries and once those partial summaries are completed you combine the partial summaries to form the Global summary.

Comparing Stream and Global Modules

As mentioned above, the two broad categories of threaded framework modules differ based on how their member data are used to accommodate the standard uses of module data as described in the previous section.

Stream Module

A Stream module is one which only cares about Events individually and not as a group and therefore works just fine if it only sees a subset of the Events coming from the Source. Most EDProducers and EDFilters fall into this category. This 'Event independence' nature allows us to create an instance of the module per Stream. Since the module is per Stream it means its member data automatically functions as 'Configuration', 'Stream Scratch' or 'Stream based Summary' data. The various extension interfaces available for a Stream module allow one to also associate 'Shared', 'Run/LuminosityBlock Cache' and 'Global Summary' style data to per Stream module instances which arise from the same module configuration. A full description of the C++ interface available to a Stream Module can be found here.

Global Module

A Global module is one which must see all Events coming from a Source in order to do its job. Most EDAnalyzers and OutputModules fall into this category. Because the module needs to see all transitions it means that one instance of the module must be shared across all Streams. This means the member data of a Global module can only function as 'Configuration' or 'Shared' data. In other words Global module's member data must either be immutable or thread-safe. The extension interfaces available for a Global module do allow one to associate data need for all the other module data uses to a given module instance. A full description of the C++ interface available to a Global Module can be found here.

Comparison

  • Memory Usage: A Global module will use less memory than a Stream module since we replicate a new Stream module instance for each Stream in the system. However, if the work being done by the module requires an extensive amount of 'Stream Scratch' data the memory different is likely to be small.
  • Writeability: We expect that Stream modules would actually be easier to write in most cases since developers are very accustomed to using member data to store temporary data about the Event (i.e. 'Stream Scratch').

One Modules

A One module is a module which, for what ever reason, is unable to be used by multiple threads simultaneously and is therefore thread-unsafe. Like a Global module, one instance of a One module is shared across all Streams. However, unlike a Global module, the framework enforces thread-safety by only allowing one Event, LuminosityBlock or Run at a time to be passed to a One module instance. The use of One modules is terrible for performances. If we had a job with N Streams then it is possible that N-1 Streams will be stuck waiting while a One module is processing one Event. If a One module needs to know about begin or end transitions for either a Run or LuminosityBlock it will prohibit the system from fully finishing the begin transition processing of the next Run/LuminosityBlock until the One module has finished its end transition for the previous Run/LuminosityBlock. The reason is the One module may be accumulating information over that LuminosityBlock or Run. This constraint once again affects the preformance of the job. A full description of the C++ interface available to a One Module can be found here.

Classic Modules

A Classic module inherits from the original module base classes used in CMSSW. A Classic module is a One module which says it is dependent upon both LuminosityBlocks and Runs. Therefore a configuration which contains a Classic module will have the same restrictions on processing concurrent LuminosityBlock or Run as a One module. In addition, all Classic modules tell the system that they could potentially interfere will all other Classic modules. Therefore the system will only allow one Class module to run at a time. If there are many Classic modules in a configuration the performance can be severely impacted, all the way to only one thread doing work at a time.

The original EDProducer base class interface has been slightly modified, the beginRun and endRun methods now take edm::Run const& and the beginLuminosityBlock and endLuminosityBlock similarly now take edm::LuminosityBlock const& as an argument. This means a Classic EDProducer can no longer put data into the Run or LuminosityBlock. If you need to do that, you must transition to a Stream, Global or One module.

Review status

Reviewer/Editor and Date Comments
Last reviewed by: ChrisDJones - 14-Nov-2012 Created the page

Responsible: ChrisDJones
Last reviewed by:

Edit | Attach | Watch | Print version | History: r9 < r8 < r7 < r6 < r5 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r9 - 2018-02-16 - ChrisDJones
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback