CMSSW on High Latency Links

Much of my CMSSW I/O work has been working toward a goal of minimizing interactions with the SE and improving asynchronous I/O. Both have been accomplished, to some extent, enabling for the first time running CMSSW on high-latency (even low bandwidth!) links.

In order to run in high-latency mode, you need to apply the patches from CmsIOWork2. Warning - this will take 15-20 minutes, even if everything goes "right".

In CMSSW_3_5_x, this will require you to patch 3 CMSSW modules and 3 ROOT modules (libTree.so, libNetx.so, libRIO.so). We hope that whatever the first CMSSW release is that is built on ROOT 5.26 will contain all the necessary patches and fixes.

Using the USCMS xrootd service

USCMS has put together a read-only xrootd service consisting of three T2 sites (Nebraska, Caltech, and UCSD). Any LFN located at one of those three sites is accessible to anyone with a CMS grid certificate. CMSSW can access this using:

process.source.fileNames = [ \
'root://xrootd.unl.edu//$LFN',
                            ]
Replace $LFN above with your actual LFN; make sure you keep the double slash (//). An example would be:
process.source.fileNames = [ \
'root://xrootd.unl.edu//store/user/bbockelm.nocern/38B7A52D-0490-DE11-AB38-001F2907EE22.root',
'root://xrootd.unl.edu//store/user/bbockelm.nocern/38B7A52D-0490-DE11-AB38-001F2907EE22-flushed.root',
                            ]

Using USCMS xrootd service with CRAB

Patches forthcoming!

Results

Unfortunately, using xrootd removes our ability to collect statistics in the Framework Job Report. Thus, we are left with interpreting the xrootd statistics.

The following two tests were run on lxplus5, with the data coming from the USCMS data service. The RTT between lxplus5 and Nebraska is approximately 130 ms. The tests are the same PAT tuple creation Leo Sala and I have been running. The file used was

root://xrootd.unl.edu//store/user/bbockelm.nocern/38B7A52D-0490-DE11-AB38-001F2907EE22-flushed.root
which is the ROOT 5.26 version of the CMS file 38B7A52D-0490-DE11-AB38-001F2907EE22.root. All 6600 or so events were analyzed.

This first test manually turns asynchronous calls off in the code:

Low level caching info:
 StallsRate=0.365248
 StallsCount=103
 ReadsCounter=282
 BytesUsefulness=0.579334
 BytesSubmitted=1334373 BytesHit=773048

XrdClient counters:
 ReadBytes:                 812573
 WrittenBytes:              0
 WriteRequests:             0
 ReadRequests:              182
 ReadMisses:                104
 ReadHits:                  78
 ReadMissRate:              0.571429
 ReadVRequests:             19
 ReadVSubRequests:          19
 ReadVSubChunks:            3029
 ReadVBytes:                413166717
 ReadVAsyncRequests:        0
 ReadVAsyncSubRequests:     0
 ReadVAsyncSubChunks:       0
 ReadVAsyncBytes:           0
 ReadAsyncRequests:         2
 ReadAsyncBytes:            502784

1099.638u 11.296s 21:45.35 85.1%   0+0k 0+0io 41pf+0w

This second test is the default Xrootd behavior

Low level caching info:
 StallsRate=0.102136
 StallsCount=660
 ReadsCounter=6462
 BytesUsefulness=0.801967
 BytesSubmitted=431606693 BytesHit=346134190

XrdClient counters:
 ReadBytes:                 346173715
 WrittenBytes:              0
 WriteRequests:             0
 ReadRequests:              5805
 ReadMisses:                227
 ReadHits:                  5578
 ReadMissRate:              0.039104
 ReadVRequests:             0
 ReadVSubRequests:          0
 ReadVSubChunks:            0
 ReadVBytes:                0
 ReadVAsyncRequests:        3
 ReadVAsyncSubRequests:     3
 ReadVAsyncSubChunks:       360
 ReadVAsyncBytes:           56568980
 ReadAsyncRequests:         1236
 ReadAsyncBytes:            381606487

1108.537u 10.224s 20:31.39 90.8%   0+0k 0+0io 39pf+0w

The xrootd counter statistics are not very useful, as it is hard to divine how many bytes were actually transferred - most of the bytes in the async case are double-counted because ROOT manages the cache in the sync case and XrdClient manages it in the async case.

Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r4 - 2010-05-13 - BrianBockelman
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback