Brian Bockelman's Work on Understanding CMSSW I/O

This page is dedicated to working through various ROOT I/O options and optimizations and applying them to CMSSW. The ultimate goal is to improve the overall performance with all SEs, and to make WAN analysis reasonable.

Read Optimizations

These optimizations work when the file is read.

Fix for TTreeCache

Almost all ROOT I/O optimizations depend on TTreeCache working. This allows ROOT to reliably predict the reads it will perform.

How TTreeCache works

  1. The cacheSize in the CMSSW pool input module is set to non-zero in your .py file; I recommend 20MB.
  2. For the first 100 events, reads are performed as normal; TTreeCache observes which branches you are using.
  3. After the first 100 events, TTreeCache knows exactly what buffers it will need to read for the rest of the execution.
  4. TTreeCache reads in the next 20MB of data it will use
  5. CMSSW runs through the events in the file until the TTreeCache no longer has the necessary data.
  6. TTreeCache will then refill again. 4-6 is repeated until all events have been read from the file.

When TTreeCache is working (and your job is I/O-bound, not CPU-bound), you will see the first 100 events run at normal speed. Then you will see CMSSW stall for several seconds on one event (as it loads the data), and then run very quickly through at least 100 events. This "stall a bit then go fast" will repeat. If you use Xrootd, it may be able prefetch data, eliminating the "stall" portion.

TTreeCache patch

By default, CMSSW will request all branches from TTreeCache. This reads in too much data and triggers a ROOT bug that causes at least 2x more than that data to be read. The patch below lets ROOT do its own training to discover used branches.

First, check out the input module

cmsenv
export CVSROOT=:pserver:anonymous@cmscvs.cern.ch:/cvs_server/repositories/CMSSW
addpkg IOPool/Input
addpkg IOPool/TFileAdaptor

Then, apply the following patch:

Before CMSSW_3_5_4

/usr/bin/curl -k https://twiki.cern.ch/twiki/pub/Main/CmsIOWork/ttreecache_rollup3.patch | patch -p0

After CMSSW_3_5_4

/usr/bin/curl -k https://twiki.cern.ch/twiki/pub/Main/CmsIOWork/ttreecache_rollup5.patch | patch -p0

Build your working area:

scram b

Enable TTreeCache usage

Add the cacheSize variable to the Source module. Change this:
process.source = cms.Source("PoolSource",
                            fileNames = cms.untracked.vstring("/store/mc/Summer09/PhotonJet_Pt3000/AODSIM/MC_31X_V3_AODSIM-v1/0019/F2D13D2D-6180-DE11-902D-001A9254452C.root"),
                            )
To this:
process.source = cms.Source("PoolSource",
                            fileNames = cms.untracked.vstring("/store/mc/Summer09/PhotonJet_Pt3000/AODSIM/MC_31X_V3_AODSIM-v1/0019/F2D13D2D-6180-DE11-902D-001A9254452C.root"),
                            cacheSize = cms.untracked.uint32(20*1024*1024),
                            )

Fix Storage-Only Mode

Storage-only mode does not work unless you apply a patch and turn on caching with a non-zero cacheSize.

Patch only necessary prior to CMSSW_3_5_4.

First, check out the TFileAdaptor package:

export CVSROOT=:pserver:anonymous@cmscvs.cern.ch:/cvs_server/repositories/CMSSW
PackageManagement.pl --anoncvs --pack "IOPool/TFileAdaptor" --release CMSSW_3_3_6

Then, apply this patch:

/usr/bin/curl https://twiki.cern.ch/twiki/pub/Main/CmsIOWork/cache_storage.patch | patch -p0

To enable (any CMSSW version), you will want to make the following change to your file adaptor configuration:

process.AdaptorConfig = cms.Service("AdaptorConfig", 
    enable=cms.untracked.bool(True),
    stats = cms.untracked.bool(True),
    cacheHint = cms.untracked.string("storage-only"),
)

Additionally, change the cacheSize to a non-zero value for your input module. It should look something like this:

process.source = cms.Source("PoolSource",
                            fileNames = cms.untracked.vstring("/store/mc/Summer09/PhotonJet_Pt3000/AODSIM/MC_31X_V3_AODSIM-v1/0019/F2D13D2D-6180-DE11-902D-001A9254452C.root"),
                            cacheSize = cms.untracked.uint32(20*1024*1024),
                            )

Storage-only mode is almost never explicitly enabled unless you send jobs to a POSIX site (in which case, it is the fastest method available).

WARNING: Storage-only mode should never be set manually in CRAB jobs. This is because some storage systems (RFIO) cause job crashes in this mode. For CRAB jobs, we always recommend cacheHint="auto-detect".

Avoid RFIO crashes

If you plan on submitting to DPM or Castor using CRAB, you need to do the following to your adaptor config:

process.source.cacheSize = cms.untracked.uint32(20*1024*1024)
process.AdaptorConfig = cms.Service("AdaptorConfig", 
    enable=cms.untracked.bool(True),
    stats = cms.untracked.bool(True),
    cacheHint = cms.untracked.string("auto-detect"),
    readHint = cms.untracked.string("direct-unbuffered "),
)

(i.e., turn off readHint, turn cacheHint to auto-detect, and use a 20MB cache).

It is fine to keep this readHint and cacheHint setting for all CRAB jobs.

Switching to native ROOT adaptors

It is possible to turn off the CMSSW AdaptorConfig and use the ROOT native adaptors. We currently do not recommend this except for possibly xrootd. If you add the following:
native=cms.untracked.string("rfio")
to you adaptor config, then it should switch to the ROOT adaptors (but you will lose performance statistics from the job).

The entire AdaptorConfig might look like this:

      process.AdaptorConfig = cms.Service("AdaptorConfig",
          stats = cms.untracked.bool(True),
          native = cms.untracked.string("xrootd"),
      )
Topic attachments
I Attachment History Action Size Date Who Comment
Texttxt PATSimple.py.txt r1 manage 1.4 K 2009-12-04 - 03:33 BrianBockelman  
Unknown file formatpatch cache_storage.patch r1 manage 1.1 K 2010-01-04 - 15:34 BrianBockelman Patch to make storage-only cache work.
Unknown file formatpatch read_coalesce.patch r2 r1 manage 2.7 K 2009-12-16 - 21:05 BrianBockelman Version 2 of the read coalescing patch.
Texttxt test_cfg.py.txt r1 manage 9.3 K 2009-12-04 - 04:06 BrianBockelman  
Unknown file formatpatch ttreecache.patch r1 manage 0.8 K 2009-12-04 - 03:38 BrianBockelman  
Unknown file formatpatch ttreecache_rollup.patch r2 r1 manage 3.6 K 2010-01-20 - 02:46 BrianBockelman  
Unknown file formatpatch ttreecache_rollup3.patch r1 manage 3.3 K 2010-01-26 - 22:29 BrianBockelman  
Unknown file formatpatch ttreecache_rollup5.patch r1 manage 5.8 K 2010-04-23 - 18:23 BrianBockelman  
Unknown file formatpatch ttreecache_training.patch r1 manage 0.7 K 2010-01-19 - 14:40 BrianBockelman Patch to decrease TTreeCache training time from 100 events to 2 events.
Edit | Attach | Watch | Print version | History: r24 < r23 < r22 < r21 < r20 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r24 - 2010-06-14 - BrianBockelman
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback