5.13 Using Xrootd Service (AAA) for Remote Data Access

Complete: 3
Detailed Review status

Contents

Goals of this workbook page

This page describes Any Data, Anytime, Anywhere (AAA), CMS's implementation of a generic xrootd service for analyzing CMS data located at any grid site with bare ROOT or the CMSSW/FWLite environment, without downloading it to your local storage. Much effort has been invested in having ROOT and CMSSW read remote files efficiently, so that you will be able to analyze data without knowing whether the input file is on your computer or halfway around the world! AAA also allows for greater resilience against damaged or missing input files, and for greater use of opportunistic resources.

Introduction to the AAA Service

To access a particular file, no matter where in the world it is, you only need to know the Logical File Name (LFN) of the file. The LFN uniquely identifies any file that is somewhere with in the /store directory tree within all of CMS storage. The LFN's of files that are in defined CMS datasets can be found through the DAS service (see WorkBook Chapter 5.4). For files that are not in official datasets, such as those in /store/user or /store/group, you will have to know the actual file names yourself. Examples below demonstrate how the files can be accessed through their LFN's in various contexts. What you don't need to know is the actual physical location of the file -- based on the LFN, the system will look up the physical location using a "redirector" that queries potential locations for you, and point your application to a valid location without any intervention from you. Your application will then proceed as it would if your file was local.

Interested users can consult CmsXrootdArchitecture for technical details on AAA implementation.

#lxplusWarning

Warning for LXPLUS users

Currently LXPLUS at CERN seems to be using a strange IPv6 configuration, which confuses certain XrootD releases (like the 4.0.4 in CMSSW 7-8). If you get a No servers are available to read the file when you would expect a reachable file, you can try setting an environmental variable which forces the use of IPv4.

If you are using bash, please do

export XRD_NETWORKSTACK=IPv4
If using tcsh, please do
setenv XRD_NETWORKSTACK IPv4

Quick steps to analyze remote data located in remote Tier-2 sites

Have a valid grid proxy

To use AAA, you MUST have a valid grid proxy with a valid VOMS extention for CMS. This requires that you already have a grid certificate installed (see Chapter 5 of the CMS workbook). The needed grid proxy is obtained via the usual command

voms-proxy-init --voms cms

Note that the nor grid-proxy-init nor a simple voms-proxy-init without the -voms cms option will work. Neither will it work to let xroot ask for the passphrase and create a proxy internally, since it uses grid-proxy-init

Know your redirector

As stated above, when you attempt to open a file, your application must query a redirector to find the file. You must specify the redirector to the application. Which redirector you use depends on your region, to minimize the distance over which the data must travel and thus minimize the reading latency. These "regional" redirectors will try file locations in your region first before trying to go overseas.

If you are working in the US, it is best to use cmsxrootd.fnal.gov, while in Europe and Asia, it is best to use xrootd-cms.infn.it. There is also a "global redirector" at cms-xrd-global.cern.ch which will query all locations.

In the examples below, cmsxrootd.fnal.gov is always used, but feel free to replace that with a choice more appropriate for your region.

Open a file using ROOT

If you are using bare ROOT, you can open files in the xrootd service just like you would any other file:

TFile *f =TFile::Open("root://cmsxrootd.fnal.gov///store/mc/SAM/GenericTTbar/GEN-SIM-RECO/CMSSW_5_3_1_START53_V5-v1/0013/CE4D66EB-5AAE-E111-96D6-003048D37524.root");

Note the prefix of the root://cmsxrootd.fnal.gov/ (or any other redirector name name) in front of your LFN. This returns a TFile object, and you can proceed normally. The same is true for FWLite environment.

BEWARE: do not use the apparently equivalent syntax, which is known not to work :

TFile("root://cmsxrootd.fnal.gov//store/foo/bar")

Open a file in CMSSW

You want to edit the PoolSource line in your python configuration file to point directly at the Xrootd service, instead of using a generic LFN.

For example, this might be the "before" picture:

process.source = cms.Source("PoolSource",
                            #                            # replace 'myfile.root' with the source file you want to use
                            fileNames = cms.untracked.vstring('/store/myfile.root')
                            )

Here's the same file, but accessed through the Xrootd Service by simply adding prefix root://cmsxrootd.fnal.gov/ :

process.source = cms.Source("PoolSource",
                            #                            # replace 'myfile.root' with the source file you want to use
                            fileNames = cms.untracked.vstring('root://cmsxrootd.fnal.gov//store/myfile.root')
                            )

Note that if your site has fallback configured, as described here, you don't even need to make the above change -- CMSSW will automatically read the file from a remote site!

Let CRAB find your file

Since you are now able to read data from any location, your CRAB job doesn't necessarily have to run at the same site where the data actually resides. This potentially gives your grid jobs access to a much wider range of grid sites -- indeed, any in CMS.

To allow CRAB2 to ignore the location of your dataset and just run where there is an open batch slot, include the line

data_location_override = sites list

in the [GRID] block of the configuration file.

Here, sites list is a set of site names where you will allow your jobs to run (even if the target dataset is not physically at those sites). For example data_location_override = T2_US will allow your job to run at any T2 site in the US. This will only work with the remoteGlidein scheduler for CRAB.

In CRAB3, the equivalent line in the configuration file will be

config.Data.ignoreLocality = True

This option is false by default, but users can explore its behavior.

Open a file in Condor Batch or CERN Batch

Condor

If one wants to use local condor batch to analyze user/group skims located at remote sites. The only modification needed is adding:

use_x509userproxy = true

in your condor jdl file (the file which defines universe, Executable, etc..).

For OLDER versions of HTCondor (before 8.0.0), you need:

     x509userproxy = /tmp/x509up_uXXXX

The string /tmp/x509up_uXXXX is the string in the "path:" statement from output of "voms-proxy-info -all", which contains your valid grid proxy. Condor will pass this information to the working node of the condor batch.

CERN Batch

Jobs submitted to the CERN batch farm will look for a valid grid proxy in the location pointed to by the environment variable $X509_USER_PROXY. (If $X509_USER_PROXY is not set, Xrootd looks for the proxy in the default location in /tmp.)

To make your proxy available to the lxbatch jobs, first copy your proxy to an area in afs, for example your home directory:

cp /tmp/x509up_uXXXX  /afs/cern.ch/user/u/username/

If you are submitting a job with bsub myscript.sh, then in myscript.sh, set this environment variable:

export X509_USER_PROXY=/afs/cern.ch/user/u/username/x509up_uXXXX

File download with command-line tools

If for some reason (perhaps intensive debugging of a particular event) you wish to have the file to located locally, AAA also provides a command-line tool called xrdcp. This command line utility ships with stand-alone ROOT and CMSSW, which provides a much easier way to copy a grid file than lcg-cp, srm-cp, and FileMover utility.

xrdcp root://cmsxrootd.fnal.gov//store/path/to/file /some/local/path

Where is "anywhere"?

The data at any CMS site are currently available through AAA.

If you wish to check whether your desired file is actually accessible through AAA, execute the command xrdfs cms-xrd-global.cern.ch locate /store/path/to/file. As long as you do not get the message No servers have the file, it is safe for you to use the AAA service!

Support

If there's any problem, please post to the Computing Tools hypernews with the print out from below debugging command.

xrdcp -d 1 -f root://cmsxrootd.fnal.gov//store/<path-to-file> /dev/null
where <path-to-file> is your file path which usually starts with /store/.

Review status

Reviewer/Editor and Date (copy from screen) Comments
-- KenBloom - 29 Jan 2014 Substantial revisions to reflect current status
-- LucianoBarone - 27 Nov 2013 added Rome among sites with Xrootd enabled
-- JohnStupak - 14-September-2013 Review with minor changes
-- StefanoBelforte - 16 Jan 2013 more details on how Crab uses this
-- JieChen - 8 Jan 2013 modified the content slightly to be user friendly
-- JieChen - 7 Jan 2013 move the page from Brian Bockelman's HdfsXrootdto page to SWGuide format

Responsible: BrianBockelman JieChen
Last reviewed by: Most recent reviewer

Edit | Attach | Watch | Print version | History: r39 < r38 < r37 < r36 < r35 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r39 - 2018-07-13 - LeonardoCristella


ESSENTIALS

ADVANCED TOPICS


 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback