cmsRun no longer supports forking mode. This was removed from CMS software in 9_3_X and all later release series, because it was not being used and it interfered with multithreaded software development. This section is still here for people who might be using older releases, but the things described here no longer work in modern releases. Probably at some point we should delete this entire section.

multiprocess cmsRun

cmsRun can be configured to run in multiprocess or forking mode. In this mode of operation, the main cmsRun executable:

  • reads the first Event from the first input file
  • loads all EventSetup conditions associated to the IOVs (Run, LumiSection, Time) of the first event
  • closes the input file
  • forks up to maxChildProcesses cmsRun children processes, optionally setting the children cpu affinity to each run on a separate logical processors
  • controls the forked processes instructing them to process one block of maxSequentialEventsPerChild events at a time, skipping the events processed by the other children

The advantage of this mode is that the EventSetup conditions are preloaded in memory by the master, and shared (via the copy-on-write mechanism) with all the forked processes, leading to a substantial reduction in memory usage.

To enable the multiprocess mode, add this to the process options:

process.options = cms.untracked.PSet(
    multiProcesses = cms.untracked.PSet(
        maxChildProcesses                       = cms.untracked.int32( 16 ),
        maxSequentialEventsPerChild             = cms.untracked.uint32( 100 ),
        setCpuAffinity                          = cms.untracked.bool( True ),
        continueAfterChildFailure               = cms.untracked.bool( False ),
        eventSetupDataToExcludeFromPrefetching  = cms.untracked.PSet( )
    )
)

output files and logs

When running in multiprocess mode, the output files produced by the PoolOutputModule, DQMRootOutputModule and DQMFileSaver are automatically renamed to avoid different processes overwriting the same file. For example, if a job is configured to write its output to data.root and is then run forking 16 children, it will create 16 output files named data_00.root, data_01.root, ..., data_15.root.

In a similar way, the standard output and standard error from the forked processes are redirected to separate files named after the master process pid: redirectout_31269_00.log, redirectout_31269_01.log, ... .

known issues

  • In order to properly run in multiprocess mode, some modules (notably Sources, OutputModules, some Services and all modules that write files with specific names) may need to implement the preForkReleaseResources() and postForkReacquireResources(...) methods.
    As of CMSSW 7.0.0, at least the PoolOutputModule, DQMRootOutputModule, DQMFileSaver, RandomBNumberGeneratorService and all sources inheriting from InputSource should properly implement them.
    Using other types of sources, output modules, or services may lead to unexpected behaviour.

  • Reading input files with a PoolSource over eos/xrootd in a multiprocess job does not currently work (all forked processes hang when they try to open the input files). Reading from local files works fine.

  • Running in multiprocess mode generates spurious endOfRun / beginOfRun transitions in the forked processes when they run across different files.

-- AndreaBocci - 01 Apr 2014

Edit | Attach | Watch | Print version | History: r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r3 - 2018-07-12 - DavidDagenhart
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback