CMSSW storage statistics

CMSSW's I/O adapter layer, TFileAdaptor, generates at the end of the job a framework job report StorageStatistics segment on I/O configuration and usage statistics. A complete example report is included at the end.

The first part of the report is the storage parameters. These are the parameters that were used to configure the AdaptorConfig service.

The second part will include the actual statistics summary if the file adaptor layer was enabled. It has three major parts to it: a general summary of ROOT's statistics at the end, summary for the main I/O entry point tstoragefile, and summaries for various I/O technologies used in the job, such as rfio, dcap, file or local-cache.

Note that I/O statistics are only reported when CMS' own I/O adapters are used. I/O outside these adapters, e.g. using root protocol, will not be reported.

Understanding ROOT I/O in CMSSW

A CMSSW job will normally run with CMS I/O adaptors enabled. These adaptors override ROOT's support for a variety of native I/O systems such as RFIO, dCache, SRM, local files and so on. We do this to a small part to simply get accurate data on what happens at the I/O layer; ROOT's own statistics are quite limited. The main reason is that we gain significant improvements and control over what happens at the I/O layer. ROOT has eventually recognised the value of most of these changes and incorporated them to the main code body, but it usually takes quite a lot of time for that to happen.

CMSSW simply uses TFile::Open() to open files. We install hooks into ROOT to redirect those accesses to our own code, where we both gather the statistics described on this page and redirect the operations to our own I/O layer.

Independent of the I/O layer settings, a CMSSW job may be configured to use a read cache, called TTreeCache, via PoolSource.cacheSize. If a cache has been enabled, it may operate in one of two modes: as an actual cache that buffers data, or as a virtual cache which simply maintains a prefetch list of data to fetch but without buffering the data. The cache will automatically adopt one of these modes depending on the lower level I/O technology in use. If the lower level supports asynchronous prefetching the cache will become virtual, otherwise it will act as a normal, real cache.

CMSSW I/O adaptors interpose in the selection logic and can force the cache to adopt one of these modes, or it can let ROOT do its default logic. For further details please refer to the adaptor documentation. The significance to the present documentation is that the numbers will vary depending on the I/O technology in use and the job configuration.

The single most useful numbers are the tstoragefile:read and the tstoragefile:write. These two are the top-level all-inclusive numbers; the rest is details you may safely skip if you are not interested in all the gory details of the low-level I/O performance. Of course these numbers will only be in the report if the adaptors were enabled and files were read using protocols that are redirected to the CMS layer.

Counter value summary

Each reported counter value indicates the following values:

subsystem
The I/O layer originating the value.

counter-name
The I/O operation being counted.

num-operations
Number of times the operation was attempted.

num-succesful-operations
Number of times the operation was a success. The operation may have genuinely failed, in which case the job probably stopped with an error. For cache accesses a success means a cache hit and failure means a cache miss.

total-megabytes
Total volume of data processed in the requests. This is the total amount of volume read, written or prefetched. The value will be zero for counters where it is not meaningful, for example file seeks or opens.

total-msecs
Total real time spent on the operations, in milliseconds.

min-msecs, max-msecs
The least and the greatest amount of time an operation took, in real time milliseconds.

General ROOT summary

The general ROOT summary includes two entries, tfile read and tfile write statistics. These are simply the values reported by ROOT and may or may not be accurate. You probably should not trust these values.

The main I/O statistics

The main ROOT I/O statistics are reported for tstoragefile; this reports the I/O data at the main entry point where ROOT invoked CMSSW I/O layer. The following parameters may be reported:

open, close
Number of files opened or closed, respectively. You can use these statistics to make sure the application closed properly all the files it opened.

construct, destruct
Number of file objects constructed or destructed. You can use these statistics to make sure the application is destroying all file objects.

flush
Number of times a file was flushed. This normally happens when the file being written is checkpointed. Frequent flushes would indicate a potential performance problem.

seek
The number of seeks and time spent on them. It's common to have one to three seeks per read.

stat
The number of file size check operations. This would usually be once per opened file.

read, write
Number of times and volume of data that ROOT read from or wrote to a file. These are the main statistics entry points and should be used to evaluate I/O performance.

read-actual, write-actual
The number of times and volume of data that resulted in an actual read from a lower-level I/O layer, i.e. wasn't satisfied from a file cache, or a write to the lower level. If there is no file cache, or the file cache is virtual, read-actual should be equal to read value; in case of virtual cache there will be additional read-async and read-prefetch-to-cache values. If ROOT has a normal file cache, this value and a few others will add up to the read value. CMSSW does not use write caches so write and write-actual should always be equal.

readv-actual
The number of times and volume of data that was read using scatter-gather I/O. When ROOT has a normal file cache, it reads data in scatter-gather operations; the lower I/O level may support these natively or translate them into single reads. This value indicates the number of scatter-gather operations. Each operation typically involves many reads.

read-prefetch-to-cache
The number of times and volume of data that was requested to be prefetched to the cache. This will occur for every read if and only if ROOT has a virtual file cache, and never otherwise. The number of successful operations indicates the number of times there was a cache hit, which will be true whenever the cache is active. In other words, the number of misses indicates how many reads occurred while the cache was not active. Note that since the cache is only prefetch lists and no actual data, both hit and a miss will always result in an actual read as well. Some cache hits will generate read-async requests.

read-via-cache, write-via-cache
The number of times and volume of data that was read from or written to a cache. This will occur for every read if and only if ROOT has an actual normal file cache, and never otherwise. The number of successful operations indicates the number of times the operation was satisfied by a cache. Accessing the cache may generate further read or write accesses. CMSSW does not use write caches so write-via-cache should always be zero.

read-async
The number of times and volume of data that was scheduled to be read asynchronously by the lower level I/O system. This will happen if and only if ROOT has a virtual file cache. ROOT will always make one probe to find out if the lower level I/O system supports asynchronous reads. A value greater than one indicates there was a virtual file cache and ROOT requested caching from the lower I/O levels. The number of operations indicates how many prefetches ROOT requested from the underlying system.

Technology-specific I/O statistics

Each component of the I/O stack also reports its own statistics. These would in general be a strict subset of the tstoragefile statistics. The following parameters may be reported:

open, close, construct, destruct, position, flush
As with tstoragefile statistics except position is the same as seek.

stagein
Number of files staged in with this storage technology. There should be one stage-in for every input file. Not all storage technologies support or need stage-in.

read, write
Number of times and volume of data read or written using the technology using single read or write operation.

readv, writev
Number of times and volume of data read or written using the technology using scatter-gather I/O.

prefetch
Number of times and volume of data read asynchronously to storage-technology specific cache. For RFIO this means read-ahead buffer's preseek list. For local files this means prefetching data to the operating system's buffer cache. For XROOTD it means that system's prefetch buffer.

What is "local-cache"?

The local-cache I/O statistics indicate files were cached locally from a remote disk server in large segments. It is an I/O optimisation used to avoid excessive number of small reads going to the disk servers. It is used by RFIO and dCache I/O layers. All reads from ROOT will be redirected to a local anonymous file usually resident in the file system buffer. This includes also prefetch orders given by ROOT's file caches. The local cache will be populated from the remote file in large chunks as necessary. There are normally 1'000-10'000 times fewer reads to the actual backing store (rfio or dcap) compared to local-cache.

Complete statistics example

<StorageStatistics>
<storage-factory-summary>
 <storage-factory-params>
  <param name='enabled' value='true' unit='boolean'/>
  <param name='cache-hint' value='lazy-download' unit='string'/>
  <param name='read-hint' value='auto-detect' unit='string'/>
  <param name='stats' value='true' unit='boolean'/>
 </storage-factory-params>
 <storage-factory-stats>
<storage-timing-summary>
 <counter-value subsystem='file' counter-name='close' num-operations='1' num-successful-operations='1' total-megabytes='0' total-msecs='0.006912' min-msecs='0.006912' max-msecs='0.006912'/>
 <counter-value subsystem='file' counter-name='construct' num-operations='1' num-successful-operations='1' total-megabytes='0' total-msecs='0.001024' min-msecs='0.001024' max-msecs='0.001024'/>
 <counter-value subsystem='file' counter-name='destruct' num-operations='1' num-successful-operations='1' total-megabytes='0' total-msecs='0.003072' min-msecs='0.003072' max-msecs='0.003072'/>
 <counter-value subsystem='file' counter-name='flush' num-operations='6' num-successful-operations='6' total-megabytes='0' total-msecs='215.624' min-msecs='1.87008' max-msecs='200.245'/>
 <counter-value subsystem='file' counter-name='open' num-operations='1' num-successful-operations='1' total-megabytes='0' total-msecs='0.205056' min-msecs='0.205056' max-msecs='0.205056'/>
 <counter-value subsystem='file' counter-name='position' num-operations='126463' num-successful-operations='126463' total-megabytes='0' total-msecs='137.376' min-msecs='0' max-msecs='0.434176'/>
 <counter-value subsystem='file' counter-name='prefetch' num-operations='0' num-successful-operations='0' total-megabytes='0' total-msecs='0' min-msecs='0' max-msecs='0'/>
 <counter-value subsystem='file' counter-name='read' num-operations='0' num-successful-operations='0' total-megabytes='0' total-msecs='0' min-msecs='0' max-msecs='0'/>
 <counter-value subsystem='file' counter-name='readv' num-operations='0' num-successful-operations='0' total-megabytes='0' total-msecs='0' min-msecs='0' max-msecs='0'/>
 <counter-value subsystem='file' counter-name='write' num-operations='126459' num-successful-operations='126459' total-megabytes='938.641' total-msecs='88624' min-msecs='0.002048' max-msecs='2126.31'/>
 <counter-value subsystem='file' counter-name='writev' num-operations='0' num-successful-operations='0' total-megabytes='0' total-msecs='0' min-msecs='0' max-msecs='0'/>
 <counter-value subsystem='local-cache' counter-name='close' num-operations='1' num-successful-operations='1' total-megabytes='0' total-msecs='753.012' min-msecs='753.012' max-msecs='753.012'/>
 <counter-value subsystem='local-cache' counter-name='construct' num-operations='1' num-successful-operations='1' total-megabytes='0' total-msecs='0' min-msecs='0' max-msecs='0'/>
 <counter-value subsystem='local-cache' counter-name='destruct' num-operations='1' num-successful-operations='1' total-megabytes='0' total-msecs='0.141056' min-msecs='0.141056' max-msecs='0.141056'/>
 <counter-value subsystem='local-cache' counter-name='position' num-operations='244805' num-successful-operations='244805' total-megabytes='0' total-msecs='236.307' min-msecs='0' max-msecs='2.02291'/>
 <counter-value subsystem='local-cache' counter-name='prefetch' num-operations='3675' num-successful-operations='3675' total-megabytes='2729.2' total-msecs='3929.92' min-msecs='0' max-msecs='950.982'/>
 <counter-value subsystem='local-cache' counter-name='read' num-operations='101809' num-successful-operations='101809' total-megabytes='1555.79' total-msecs='72359.1' min-msecs='0.000768' max-msecs='5087.2'/>
 <counter-value subsystem='local-cache' counter-name='readv' num-operations='0' num-successful-operations='0' total-megabytes='0' total-msecs='0' min-msecs='0' max-msecs='0'/>
 <counter-value subsystem='local-cache' counter-name='write' num-operations='0' num-successful-operations='0' total-megabytes='0' total-msecs='0' min-msecs='0' max-msecs='0'/>
 <counter-value subsystem='local-cache' counter-name='writev' num-operations='0' num-successful-operations='0' total-megabytes='0' total-msecs='0' min-msecs='0' max-msecs='0'/>
 <counter-value subsystem='rfio' counter-name='close' num-operations='1' num-successful-operations='1' total-megabytes='0' total-msecs='173.915' min-msecs='173.915' max-msecs='173.915'/>
 <counter-value subsystem='rfio' counter-name='construct' num-operations='1' num-successful-operations='1' total-megabytes='0' total-msecs='0.002048' min-msecs='0.002048' max-msecs='0.002048'/>
 <counter-value subsystem='rfio' counter-name='destruct' num-operations='1' num-successful-operations='1' total-megabytes='0' total-msecs='0.027904' min-msecs='0.027904' max-msecs='0.027904'/>
 <counter-value subsystem='rfio' counter-name='open' num-operations='1' num-successful-operations='1' total-megabytes='0' total-msecs='4205.43' min-msecs='4205.43' max-msecs='4205.43'/>
 <counter-value subsystem='rfio' counter-name='position' num-operations='4' num-successful-operations='4' total-megabytes='0' total-msecs='1.03706' min-msecs='0.001024' max-msecs='0.557056'/>
 <counter-value subsystem='rfio' counter-name='prefetch' num-operations='0' num-successful-operations='0' total-megabytes='0' total-msecs='0' min-msecs='0' max-msecs='0'/>
 <counter-value subsystem='rfio' counter-name='read' num-operations='15' num-successful-operations='15' total-megabytes='1850.94' total-msecs='62882.7' min-msecs='1101.38' max-msecs='5067.64'/>
 <counter-value subsystem='rfio' counter-name='readv' num-operations='0' num-successful-operations='0' total-megabytes='0' total-msecs='0' min-msecs='0' max-msecs='0'/>
 <counter-value subsystem='rfio' counter-name='stagein' num-operations='1' num-successful-operations='1' total-megabytes='0' total-msecs='173.003' min-msecs='173.003' max-msecs='173.003'/>
 <counter-value subsystem='rfio' counter-name='write' num-operations='0' num-successful-operations='0' total-megabytes='0' total-msecs='0' min-msecs='0' max-msecs='0'/>
 <counter-value subsystem='rfio' counter-name='writev' num-operations='0' num-successful-operations='0' total-megabytes='0' total-msecs='0' min-msecs='0' max-msecs='0'/>
 <counter-value subsystem='tstoragefile' counter-name='close' num-operations='2' num-successful-operations='2' total-megabytes='0' total-msecs='753.288' min-msecs='0.018176' max-msecs='753.27'/>
 <counter-value subsystem='tstoragefile' counter-name='construct' num-operations='2' num-successful-operations='2' total-megabytes='0' total-msecs='10163' min-msecs='20.6561' max-msecs='10142.3'/>
 <counter-value subsystem='tstoragefile' counter-name='flush' num-operations='6' num-successful-operations='6' total-megabytes='0' total-msecs='215.667' min-msecs='1.87802' max-msecs='200.255'/>
 <counter-value subsystem='tstoragefile' counter-name='read' num-operations='101809' num-successful-operations='101809' total-megabytes='1555.79' total-msecs='77463' min-msecs='0.004096' max-msecs='5087.21'/>
 <counter-value subsystem='tstoragefile' counter-name='read-actual' num-operations='101809' num-successful-operations='101809' total-megabytes='1555.79' total-msecs='72473.8' min-msecs='0.001792' max-msecs='5087.21'/>
 <counter-value subsystem='tstoragefile' counter-name='read-async' num-operations='3675' num-successful-operations='3675' total-megabytes='2729.19' total-msecs='3934.77' min-msecs='0.000768' max-msecs='950.984'/>
 <counter-value subsystem='tstoragefile' counter-name='read-prefetch-to-cache' num-operations='101794' num-successful-operations='41194' total-megabytes='944.264' total-msecs='4114.99' min-msecs='0.002816' max-msecs='1680.18'/>
 <counter-value subsystem='tstoragefile' counter-name='seek' num-operations='371256' num-successful-operations='371256' total-megabytes='0' total-msecs='823.234' min-msecs='0.000768' max-msecs='2.02496'/>
 <counter-value subsystem='tstoragefile' counter-name='stat' num-operations='3' num-successful-operations='3' total-megabytes='0' total-msecs='0.032' min-msecs='0.004864' max-msecs='0.016128'/>
 <counter-value subsystem='tstoragefile' counter-name='write' num-operations='126459' num-successful-operations='126459' total-megabytes='938.641' total-msecs='89004.4' min-msecs='0.004864' max-msecs='2126.31'/>
 <counter-value subsystem='tstoragefile' counter-name='write-actual' num-operations='126459' num-successful-operations='126459' total-megabytes='938.641' total-msecs='88806.1' min-msecs='0.00384' max-msecs='2126.31'/>
 <counter-value subsystem='tstoragefile' counter-name='write-via-cache' num-operations='126459' num-successful-operations='0' total-megabytes='0' total-msecs='0' min-msecs='0' max-msecs='0'/>
</storage-timing-summary>
  <storage-root-summary>
   <counter-value subsystem='tfile' counter-name='read' total-megabytes='0.841771'/>
   <counter-value subsystem='tfile' counter-name='write' total-megabytes='0'/>
  </storage-root-summary>
 </storage-factory-stats>
</storage-factory-summary>
</StorageStatistics>


Responsible: LassiTuura
Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r4 - 2011-09-01 - unknown
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback