Brian Bockelman's Work on Understanding CMSSW I/O
This page is dedicated to working through various
ROOT I/O options and optimizations and applying them to CMSSW. The ultimate goal is to improve the overall performance with all SEs, and to make WAN analysis reasonable.
Read Optimizations
These optimizations work when the file is read.
Fix for TTreeCache
Almost all
ROOT I/O optimizations depend on TTreeCache working. This allows
ROOT to reliably predict the reads it will perform.
How TTreeCache works
- The cacheSize in the CMSSW pool input module is set to non-zero in your .py file; I recommend 20MB.
- For the first 100 events, reads are performed as normal; TTreeCache observes which branches you are using.
- After the first 100 events, TTreeCache knows exactly what buffers it will need to read for the rest of the execution.
- TTreeCache reads in the next 20MB of data it will use
- CMSSW runs through the events in the file until the TTreeCache no longer has the necessary data.
- TTreeCache will then refill again. 4-6 is repeated until all events have been read from the file.
When TTreeCache is working (
and your job is I/O-bound, not CPU-bound), you will see the first 100 events run at normal speed. Then you will see CMSSW stall for several seconds on one event (as it loads the data), and then run very quickly through at least 100 events. This "stall a bit then go fast" will repeat. If you use Xrootd, it may be able prefetch data, eliminating the "stall" portion.
TTreeCache patch
By default, CMSSW will request
all branches from TTreeCache. This reads in too much data and triggers a
ROOT bug that causes at least 2x more than that data to be read. The patch below lets
ROOT do its own training to discover used branches.
First, check out the input module
cmsenv
export CVSROOT=:pserver:anonymous@cmscvs.cern.ch:/cvs_server/repositories/CMSSW
addpkg IOPool/Input
addpkg IOPool/TFileAdaptor
Then, apply the following patch:
Before CMSSW_3_5_4
/usr/bin/curl -k https://twiki.cern.ch/twiki/pub/Main/CmsIOWork/ttreecache_rollup3.patch | patch -p0
After CMSSW_3_5_4
/usr/bin/curl -k https://twiki.cern.ch/twiki/pub/Main/CmsIOWork/ttreecache_rollup5.patch | patch -p0
Build your working area:
scram b
Enable TTreeCache usage
Add the cacheSize variable to the Source module. Change this:
process.source = cms.Source("PoolSource",
fileNames = cms.untracked.vstring("/store/mc/Summer09/PhotonJet_Pt3000/AODSIM/MC_31X_V3_AODSIM-v1/0019/F2D13D2D-6180-DE11-902D-001A9254452C.root"),
)
To this:
process.source = cms.Source("PoolSource",
fileNames = cms.untracked.vstring("/store/mc/Summer09/PhotonJet_Pt3000/AODSIM/MC_31X_V3_AODSIM-v1/0019/F2D13D2D-6180-DE11-902D-001A9254452C.root"),
cacheSize = cms.untracked.uint32(20*1024*1024),
)
Fix Storage-Only Mode
Storage-only mode does not work unless you apply a patch and turn on caching with a non-zero cacheSize.
Patch only necessary prior to CMSSW_3_5_4.
First, check out the TFileAdaptor package:
export CVSROOT=:pserver:anonymous@cmscvs.cern.ch:/cvs_server/repositories/CMSSW
PackageManagement.pl --anoncvs --pack "IOPool/TFileAdaptor" --release CMSSW_3_3_6
Then, apply this patch:
/usr/bin/curl https://twiki.cern.ch/twiki/pub/Main/CmsIOWork/cache_storage.patch | patch -p0
To enable (any CMSSW version), you will want to make the following change to your file adaptor configuration:
process.AdaptorConfig = cms.Service("AdaptorConfig",
enable=cms.untracked.bool(True),
stats = cms.untracked.bool(True),
cacheHint = cms.untracked.string("storage-only"),
)
Additionally, change the cacheSize to a non-zero value for your input module. It should look something like this:
process.source = cms.Source("PoolSource",
fileNames = cms.untracked.vstring("/store/mc/Summer09/PhotonJet_Pt3000/AODSIM/MC_31X_V3_AODSIM-v1/0019/F2D13D2D-6180-DE11-902D-001A9254452C.root"),
cacheSize = cms.untracked.uint32(20*1024*1024),
)
Storage-only mode is almost never explicitly enabled unless you send jobs to a POSIX site (in which case, it is the fastest method available).
WARNING: Storage-only mode should never be set manually in CRAB jobs. This is because some storage systems (RFIO) cause job crashes in this mode. For
CRAB jobs, we always recommend cacheHint="auto-detect".
Avoid RFIO crashes
If you plan on submitting to DPM or Castor using
CRAB, you need to do the following to your adaptor config:
process.source.cacheSize = cms.untracked.uint32(20*1024*1024)
process.AdaptorConfig = cms.Service("AdaptorConfig",
enable=cms.untracked.bool(True),
stats = cms.untracked.bool(True),
cacheHint = cms.untracked.string("auto-detect"),
readHint = cms.untracked.string("direct-unbuffered "),
)
(i.e., turn off readHint, turn cacheHint to auto-detect, and use a 20MB cache).
It is fine to keep this readHint and cacheHint setting for all
CRAB jobs.
Switching to native ROOT adaptors
It is possible to turn off the CMSSW AdaptorConfig and use the
ROOT native adaptors. We currently do not recommend this except for possibly xrootd. If you add the following:
native=cms.untracked.string("rfio")
to you adaptor config, then it should switch to the
ROOT adaptors (but you will lose performance statistics from the job).
The entire AdaptorConfig might look like this:
process.AdaptorConfig = cms.Service("AdaptorConfig",
stats = cms.untracked.bool(True),
native = cms.untracked.string("xrootd"),
)