FAX for end-users
FAX (Federated ATLAS Xrootd) is a way to access most of the ATLAS data from any place, without needing to know where the files are. It relies on a
gLFN(globalLogicalFileName) convention. So when user needs a file, it just needs to specify its
gLFN and the FAX will find the place having this file and will deliver it to the user. More than 40 largest ATLAS sites have been federated. Soon more sites will join.
Why use FAX?
- FAX simplifies your code: independently if your code will run interactively from your laptop, from your Tier3 even if it has no disk space, batch farm or grid, file naming stays the same.
- FAX allows for recovery. If the file is missing from one site (site is down or there is another problem), there is a large chance it exists somewhere else in the federation and the FAX will deliver it.
- Allows usage of more advance tools like
frun
.
Prerequisites
Data sets must be registered in DDM (Rucio). All the grid produced datasets are automatically registered independently if these are part of official production or simply result of a user's job. If your files are not registered it is trivial to do so. Very detailed description how to do this is given
here
.
Usage
Set up environment
Now all the sites have CVMFS mounted so setting up fax is trivially done using
localSetupFAX:
export ATLAS_LOCAL_ROOT_BASE=/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase
source ${ATLAS_LOCAL_ROOT_BASE}/user/atlasLocalSetup.sh
localSetupFAX
As you may see from it output localSetupFAX will setup grid middleware, rucio client, xrootd, set up the best endpoint. If you want to set an endpoint to use by yourself you may find their addresses and current statuses
here, than do:
export STORAGEPREFIX=root://endpointAddress:port/
You'll need a grid proxy (you need a valid ATLAS grid certificate):
voms-proxy-init -voms atlas
Check availability of your dataset/datacontainer in FAX
Until all of the sites join the federation, it is important to have a way to simply check if a dataset is available in FAX. The tool to do that is called fax-is-dataset-covered and is available upon setting up localSetupFAX.
Usage is trivial: just give it a dataset or datacontainer name. For each input dataset it will print number of FAX endpoints containing full and incomplete replicas.
Example:
fax-is-dataset-covered data12_8TeV.periodH2.physics_Muons.PhysCont.NTUP_SMWZ.grp13_v01_p1067/
data12_8TeV.00212809.physics_Muons.merge.NTUP_SMWZ.f481_m1233_p1067_p1141_tid01012924_00 complete replicas: 1 incomplete: 0
data12_8TeV.00212858.physics_Muons.merge.NTUP_SMWZ.f481_m1233_p1067_p1141_tid01014068_00 complete replicas: 1 incomplete: 0
data12_8TeV.00212815.physics_Muons.merge.NTUP_SMWZ.f481_m1233_p1067_p1141_tid01012923_00 complete replicas: 1 incomplete: 0
Find gLFN's of your files
Obtain the "gLFN's" (or global logical filenames) using RUCIO, which that scales better with load and is faster and more robust than the older approaches. To print the RUCIO gLFN use the following executable that is available to you once FAX is setup (see above):
fax-get-gLFNs data12_8TeV.periodH2.physics_Muons.PhysCont.NTUP_SMWZ.grp13_v01_p1067/
root://fax.mwt2.org:1094//atlas/rucio/data12_8TeV:NTUP_SMWZ.01014068._000103.root.1
root://fax.mwt2.org:1094//atlas/rucio/data12_8TeV:NTUP_SMWZ.01014068._000123.root.1
:
The results should use your local
$STORAGEPREFIX
which is in this example
root://fax.mwt2.org:1094/
.
In order to get a text file with the list of individual filenames, you can do:
fax-get-gLFNs data12_8TeV.periodH2.physics_Muons.PhysCont.NTUP_SMWZ.grp13_v01_p1067/ > my_list_of_gLFNS.txt
To use these files
TFile *f = TFile::Open("root://myRedirector:port//atlas/rucio/user/ivukotic:group.test.hc.NTUP_SMWZ.root");
xrdcp $STORAGEPREFIX/atlas/rucio/user/ivukotic:group.test.hc.NTUP_SMWZ.root /tmp/myLocalCopy.root
- from prun: Instead of giving
--inDS myDataset
option, provide it with
--pfnList my_list_of_gLFNS.txt
(where the filelist
my_list_of_gLFNS.txt
specified here is generated using the example above)
Using frun
If using
prun
on local resources please consider using instead the
frun.py
. Frun is a simple wrapper around
prun
and all that it does is to change
inDS
that you give with the gLFNs. All other options that you are using are simply forwarded to
prun
. Main advantage of
frun
is that in case file(s) are missing at the site where program runs, FAX layer will provide it as long as it exists anywhere in US or UK (hopefully soon it will be world wide).
Example:
frun -v --inDS=user.ilijav.HCtest.1 --exec "echo %IN | sed -e \"s/,/\n/g\" > input.txt; ./doRead.sh" \
--athenaTag=17.2.0 \
--noBuild \
--outputs=input.txt,ilija.txt \
--outDS=user.ivukotic.HCtestREWRITTEN.1 \
--site=ANALY_CERN_XROOTD
Example analysis jobs running against FAX
To test efficiency of FAX data access, have a way to stress a site, and have an examples that can be used in tutorials, we came up with analysis examples that should be easy to run.
More information and Troubleshooting
A tutorial found here
Tutorial.tar contain a PowerPoint presentation of FAX, examples of it's usage and ready to test source code.
If something does not work or you have any questions please feel free to contact
atlas-adc-fax-operations@cernNOSPAMNOSPAMPLEASE.ch
Major updates:
--
IlijaVukotic - 05-Sep-2012