TFileAdaptor - AdaptorConfig - StorageFactory - <source-config>

CMSSW normally reads data via a set of I/O adaptors. The interface between CMSSW and ROOT is called TFileAdaptor. The underlying C++ interface for POSIX-like storage objects is called StorageFactory.

As of CMSSW 3.8 series, the I/O parameters are normally configured via $CMS_PATH/SITECONF/local/JobConfig/site-local-config.xml, the site-specific and -local configuration file. This configuration is automatically picked up by every CMSSW job via a standard service known as TFileAdaptor (C++) and AdaptorConfig (ParameterSet). In the site-local-config.xml the I/O parameters are inside a <source-config> block. The values can be overridden for a specific job via SiteLocalConfigService. For more information about these please see SWIntTrivial and SiteLocalConfigService. The parameters are:

Type Default <source-config> SiteLocalConfigService Legacy Explanation
boolean true N/A N/A AdaptorConfig.enable (1)
boolean true N/A N/A AdaptorConfig.stats (2)
string .:$TMPDIR <cache-temp-dir name="VAL"/> overrideSourceCacheTempDir AdaptorConfig.tempDir (3)
double 10 <cache-min-free value="VAL"/> overrideSourceCacheMinFree AdaptorConfig.tempMinFree (4)
string application-only <cache-hint value="VAL"/> overrideSourceCacheHintDir AdaptorConfig.cacheHint (5)
string auto-detect <read-hint value="VAL"/> overrideSourceReadHint AdaptorConfig.readHint (6)
vstring (empty) <native-protocols> <prefix>VAL</prefix> </native-protocols> overrideSourceNativeProtocols AdaptorConfig.native (7)
unsigned integer 0 <ttree-cache-size value="VAL"/> overrideSourceTTreeCacheSize PoolSource.cacheSize (8)
unsigned integer 0 <timeout-in-seconds value="VAL"/> overrideSourceTimeout none (9)

  1. enable: If false, the adaptor layer is disabled.
  2. stats: If true, the storage layer statistics are added to the job report.
  3. tempDir: Colon (":") separated list of directories to use for remote file downloads, including "lazy-download" segments. The first directory in the list which is both local to the system and has enough free space will be used. Environment variables like $TMPDIR can be used in the list. Normally the current directory (".") should be the first on the list.
  4. tempMinFree: Minimum required free space in gigabytes for a temporary download directory. Directories mentioned in <cache-temp-dir> with less space available than this will not be used. This is useful for pruning various common local directories such as /tmp, /var/tmp, /data, or /build which may not exist or be large enough on all systems on the site.
  5. cacheHint: See below for complete details.
  6. readHint: See below for complete details.
  7. native: The list of I/O protocols (such as "srm", "rfio", "dcap", etc.) that should use native ROOT I/O bindings instead of the CMS ones. The value "all" forces all protocols to use native bindings.
  8. cacheSize: Size of TTree read cache in bytes. If the value is the default zero, ROOT will not cache anything. If the value is non-zero, then the I/O layer caching options affect how the value is interpreted. More complete documentation about the caching systems is on the statistics interpretation page.
  9. timeout: Timeout limit in seconds for opening a file. If the value is the default zero, or the I/O protocol does not support this option, there is no timeout. If the value is non-zero, and the I/O protocol supports this option, an error will be returned if the input file is not opened in the prescribed time. Currently, only the "dcap" protocol supports the timeout. This parameter was not available prior to 3_10_0_pre3.

Notes:

  • All ROOT caching is per-file; there are no shared caches.
  • Fast cloning does not use ROOT's TTreeCache and will use raw I/O.
  • A separate document explains storage statistics reported by TFileAdaptor.

Cache hints

The cacheHint indicates how file caching requested in PoolSource.cacheSize should be implemented. Possible values are "application-only", "storage-only", "lazy-download" and "auto-detect".

application-only
This is the default and means ROOT will do the caching. If PoolSource.cacheSize is non-zero, a TTreeCache of that size will be created per open file. Asynchronous read-ahead will be turned off and the cache will be filled with normal reads.

storage-only
Means ROOT will drive the caching using a prefetch list, but will not allocate a cache of its own. If PoolSource.cacheSize is non-zero, a TTreeCache with a read-list of that size will be created, but no actual cache buffer -- the ROOT cache will be "virtual" and could in fact be very large. ROOT will hand over the prefetch list to the storage layer, which is expected to do its own caching. This method only makes sense if the underlying storage binding is capable of prefetching, which is currently true for local files (and anything downloading into a local file, such as srm, storm, gsiftp) and RFIO. Using this method with an incompatible storage system such as dCache will trigger an error.

lazy-download
Means remote files will be downloaded to a local shadow file on demand in 128MB segments. ROOT reads will be directed to this local file; ROOT will never read directly from the remote file. If PoolSource specifies a non-zero cache, it will behave as a "storage-only" virtual / prefetch cache. Note that the file will be downloaded lazily even if PoolSource.cacheSize is zero. The local shadow file will be created in the specified temporary directory and will be removed automatically when the corresponding remote file is closed. If no suitable local temporary directory with sufficient free space can be found, lazy download is automatically switched off.

auto-detect
This tells the I/O layer to pick the best strategy suited for the I/O technology in use. This will be "lazy-download" for RFIO, dCache and the "file" protocol, including any method which downloads remote files to local disk.

Read hints

The readHint indicates how I/O reads should be performed. Possible values are "direct-unbuffered", "read-ahead-buffered" and "auto-detect".

direct-unbuffered
Requests to disable all read buffering. If caching is off, this will guarantee the application will read just the bytes actually needed by ROOT and nothing more. The servers are likely to see hundreds of thousands of badly scattered reads. Note that if the cacheHint is "storage-only", this hint will be ignored.

read-ahead-buffered
Requests to use reasonable read-ahead buffering at the I/O layer. This is usually prerequisite for storage-level caching. This may increase application throughput, but depending on I/O layer read-ahead size and the file input, may inflate the data requested from the disk servers by as much as a factor of 100 or more. This is because CMSSW ROOT file organisation is generally bad and the reads are usually small; you may end up seeing a full read-ahead buffer read for every small read. Note: RFIO bugs can corrupt data with certain I/O patterns when read-ahead is enabled. Do not use cacheHint = storage-only, readHint = read-ahead-buffered and PoolSource.cacheSize > 0 with RFIO.

auto-detect
This is the default and requests optimum read-ahead buffering given the other I/O choices. In practise this is usually modest read-ahead on.

Use recommendations

The recommended I/O setting is 10-50MB cache with both caching and read hints set to "auto-detect". These settings should produce good results for RFIO, dCache, XROOTD, Storm and access via "file" protocol (e.g. Hadoop).

Leave read-ahead on for local files, including those accessed via "lazy-download." The I/O system exploits the operating system buffer cache and needs buffered reads for this.

Use a non-zero virtual cache with read-ahead enabled for XROOTD or anything involving POSIX files (Storm, SRM, GsiFTP, or just plain files). There is no practical reason to use "application-only" cache for these methods, or to disable caching completely, or to disable read-ahead.

Too small a tree cache can adversely affect the amount of data read from upstream disk servers. The usual read-ahead considerations will need to be added on top of this, so an "application-only" cache with "read-ahead-buffered" reads could generate worse load than no cache at all. The currently recommended cache size is FIXME.

Non-zero cache with "storage-only" cache hint usually results in data corruption with RFIO.

Non-zero cache with "storage-only" cache hint results in error with dCache.

In general read-ahead is necessary for decent performance, however it risks inflating disk server load dramatically. Prefer "lazy-download" caching over "read-ahead-buffered" for remote files.

Additional comments

Note that if the file is badly organised and even a small portion of data is copied forward in the CMSSW job, in practise "lazy-download" will fetch the entire file up front to the local disk, and all further access will be to that file. For sizeable input files, or for multiple concurrent jobs each downloading files simultaneously, this will inevitably force the file contents out of the operating system buffer cache, and the job will slow down to local disk write rates (usually ~40MB/s) while this takes place.

If the input is well organised, or CMMSW only reads sparsely from it, or the file is small enough to fit into the buffer cache entirely, or there isn't enough buffer cache pressure to force the file off to disk, then reads typically go directly to memory and the application runs faster.


Responsible: LassiTuura
Topic revision: r10 - 2011-09-01 - unknown
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2018 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback