Framework Handles for Memory Optimization

Complete: 3

Contacts

Introduction

The following are several mechanisms provided by the framework group that can be used to control memory usage of cmsRun.

Deleting Products Early

If large temporary data structures are being used to communicate information between modules in the framework then it is possible for the framework to delete those data structures once the last module who reads that data has been run for that Event.

Because it is easy to misconfigure this optimization it is strongly advised to only use this as a last resort. It is by far better to see if the temporary data structure can be redesigned or even if the algorithms involved can be slightly modified so as to minimize the amount of information they need to pass.

This feature is setup via the job configuration. You must specify which data products can be deleted early in the canDeleteEarly untracked vstring parameter of the top level options PSet and each module which uses that data must contain an untracked vstring parameter named mightGet which contains the name of the data product. The name of the data product must be in the same form as used in the OutputModule outputCommands specification, except wild cards are not allowed. That means the data product name is the exact name of the ROOT branch that would be used to hold that data.

The system is smart enough to wait and delete the products until after all the modules who need the data have either been run or will no longer be run. That is, it works even if a module is on a path and it appears after an EDFilter which has stopped the processing of the Event and therefore the module will not be run for this Event. It will also work properly if the same module appears on multiple paths even if all those paths have EDFilters.

Example

Say that an std::vector was being generated by a module with label makeFoo and empty product instance name in the RECO process. In addition, the EDProducer's with label fred and wilma both use that data product. If we want that data product to be deleted early, one would add to the configuration the following:

process.options = cms.untracked.PSet( canDeleteEarly = cms.untracked.vstring("Foos_makeFoo__RECO") )

process.fred = cms.EDProducer(...., mightGet = cms.untracked.vstring("Foos_makeFoo__RECO") ... )
process.wilma = cms.EDProducer(...., mightGet = cms.untracked.vstring("Foos_makeFoo__RECO") ... )

Deleting a data product that is never used

It is possible to tell the system to delete a data product that is not used by any module. You do that by declaring that the module which creates the object also mightGet that object. This causes the system to delete the data product right after the module which creates it has been run.

process.options = cms.untracked.PSet( canDeleteEarly = cms.untracked.vstring("Foos_makeFoo__RECO") )
process.makeFoo = cms.EDProducer(...., mightGet = cms.untracked.vstring("Foos_makeFoo__RECO") ... )

Controlling ROOT's Storage Buffers

The PoolSource and PoolOutputModule have parameters which can be used to control how much memory ROOT will reserve for use while reading and writing data.

PoolSource parameters

The following parameters are useful for controlling the amount of memory used by the PoolSource [use the command edmPluginHelp -p PoolSource for more details].
  • cacheSize : size of the read ahead cache used by ROOT. The larger the cache the less reads ROOT has to do to the file system. So the larger the cache the lower the latency.
  • treeMaxVirtualSize : Set the size of ROOT's TTree TBasket cache. Normally ROOT just uses the size specified from the ROOT file itself.

PoolOutputModule parameters

The following parameters are useful for controlling the amount of memory used by the PoolOutputModule [use the command edmPluginHelp -p PoolOutputModule for more details].
  • splitLevel : Specifies how many branches to split a data product into. The more branches usually the faster the read back but the more memory buffers which are needed. However, the lower the splitLevel also usually affects means less compression of the buffer which can lead to the need for larger compressed buffer sizes.
  • overrideInputFileSplitLevels : The default False value means if a data product is read from a file we actually use the split level from the file and not the PoolOutputModule's splitLevel.
  • eventAutoFlushCompressedSize : Controls the maximum sum of memory used by ROOT for all 'compressed' output buffers for a file. This indirectly controls the size of the uncompressed buffers based on how much each object is compressed. The larger the eventAutoFlushCompressedSize the faster ROOT can read back the data and the smaller the output file.
  • basketSize : Controls the initial size of the uncompressed buffers. However, after ROOT has learned how well each branch compresses, ROOT resets the basket size for each individual branch based on the value of eventAutoFlushCompressedSize.
  • compressionLevel : The higher the compression level the smaller the needed compression buffer. However, the higher the compression level also can increase the time need to compress the objects and, depending on the algorithm, increase the time to decompress the data.
  • compressionAlgorithm : The algorithm used to compress the data in the file. Different algorithms use different amounts of memory as well as different amount of time needed to compress and decompress the file. In general, the default ZLIB is much faster and takes less memory but doesn't compress as well as the optional LZMA .
  • outputCommands : The fewer data products written to the file the fewer buffers which are needed.

-- ChrisDJones - 20-Feb-2012

Edit | Attach | Watch | Print version | History: r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r1 - 2012-02-20 - ChrisDJones
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback