CMS XRootD Architecture and AAA

This is the homepage for the XRootD-based federations in CMS also called AAA.

Installation/Upgrade Best Effort Advisory

  • In view of Run3 and Transition to the token-based transition, upgrading to the xrootd version > 5 is recommended
  • It is encouraged for OSG sites to install the Shoveler for the detailed xrootd monitoring (file read/write through xrootd/https/davs other than FTS). See the appropriate link below. More information will be posted for the EU/Asia sites.
  • 5.3.4 as of Jan 19 is a good candidate. However, this version has a throttle issue (See this xrootd issue). Probably, most sites don't use the throttle, so this should not be a big problem.
  • 5.4.0 has the fix for the throttle issue in earlier versions, but it has a high load issue on the redirector (See this xrootd issue). The fix is on its way to the next version, it looks like.
  • 5.4.1 is released with improvement in the TLS issue and the redirector issue
  • 5.4.2 is released:
    • XRootD has a bug impacting CEPH and there is a fix for this in v5.5 which was backported to v5.4.2. For sites with CEPH, this version is recommended.
    • OSG sites: Use OSG 3.6 repo for 5.4.2-1.1 (the throttling patches)
    • non-OSG sites: Use the stable xrootd 5.4.2 + xrootd-cmstfc from the OSG osg-upcoming-development repo
  • 5.4.3 is released:

Documentation

For Users

We have the following user documentation available also:

  • XRootD Client Usage - How to utilize the current infrastructure, on your desktop, in ROOT or in a CRAB job.

For Admins

The following documentation is aimed at the sysadmins of CMS sites:

For Operators

Introduction

CMS is exploring a new architecture for data access, emphasizing the following three items:

  • Reliability: The end-user should never see an I/O error or failure propagated up to their application unless no USCMS site can serve the file. Failures should be caught as early as possible and I/O retried or rerouted to a different site (possibly degrading the service slightly).
  • Transparency: All actions of the underlying system should be automatic for the user catalog lookups, redirections, reconnections. There should not be a different workflow for accessing the data "close by" versus halfway around the world. This implies the system serves user requests almost instantly; opening files should be a "lightweight" operation.
  • Usability: All CMS application frameworks (CMSSW, FWLite, bare ROOT) must natively integrate with any proposed solution. The proposed solution must not degrade the event processing rate significantly.
  • Global: A CMS user should be able to get at any CMS file through the Xrootd service.

To achieve these goals, we will be pursuing a distributed architecture based upon the XRootD protocol and software developed by SLAC. The proposed architecture is also similar to the current data management architecture of the ALICE experiment. Note that we specifically did not put scalability here - we already have an existing infrastructure that scales just fine. We have no intent on replacing current CMS data access methods for production.

We believe that these goals will greatly reduce the difficulty of data access for physicists on the small or medium scale. This new architecture has four deliverables for CMS:

  1. A production-quality, global XRootD infrastructure.
  2. Fallback data access for jobs running at the T2.
  3. Interactive access for CMS physicists.
  4. A disk-free data access system for T3 sites.

Architecture

To explore the XRootD architecture, we put together a prototype for the WLCG, involving CMS sites worldwide and all the relevant storage technologies. This prototype wrapped up in January 2011, and we are moving to a regional redirector-based system. This injects another layer into the hierarchy which will make sure requests keep in a local network region if possible.

Local-region redirection

The image below shows the communication paths for a user application querying the regional redirector when the desired file is within the region. First (1), the user application attempts to open the file in the regional redirector. If the regional redirector does not know the file's location, it will then query all of the logged-in sites (2). In this diagram, Site A responds that it has the file, so the redirector redirects (3) the client to Site A's xrootd server. Finally, the client contacts Site A (4) and starts reading data (5). This is all implemented within the Xrootd client; no user interaction is necessary.

Regional Xrootd.png

Cross-region redirection

The image below shows the communication paths for a user application querying the regional redirector when the desired file is not within the region. This proceeds as in the previous case, except all local sites respond they do not have the file. Then, the regional redirector will contact the other regions (3); if the file location is not in cache, the other regional redirector will query its sites (4). In this example, the user is redirected to Site C (5) and successfully opens the file (6 and 7).

Regional Xrootd Regional Redirect.png

Fallback Access

In the prototype, most sites won't use Xrootd as their primary method; instead, they will use it primarily as a fallback. The image below shows how the file access would work for such a site:

FallbackAccess.png

Other notes related to AAA

Global and Regional Redirectors

The service availability monitoring for the global and regional redirectors

Participating Sites

List of all participating sites (subscribed) in the AAA is monitored here.

Tests and Issues

Scale tests

Historical records

  • Tests for the Xrootd Demonstrator (back to 2010 initiative) we've performed are documented on this page.
  • We are also trying to document all the issues we observe with the xrootd-based system here: CmsXrootdIssues.
  • We record the CMSSW/ROOT I/O improvements needed here: CmsRootIoIssues.

Presentations and Workshops

Topic attachments
I Attachment History Action Size Date Who Comment
PDFpdf AAA_DPM-Federica.pdf r1 manage 799.6 K 2015-02-25 - 15:28 MericTaze  
PNGpng FallbackAccess.png r1 manage 27.5 K 2010-07-26 - 22:30 BrianBockelman  
PNGpng GlobalAccess.png r1 manage 80.5 K 2010-07-26 - 21:56 BrianBockelman  
PNGpng Regional_Xrootd.png r1 manage 51.7 K 2011-02-11 - 19:49 BrianBockelman Diagram of xrootd usage when file is in local region
PNGpng Regional_Xrootd_Regional_Redirect.png r1 manage 56.2 K 2011-02-11 - 19:49 BrianBockelman Diagram of xrootd usage when file is not in local region
PDFpdf ken-CHEP2013-paper.pdf r1 manage 1233.4 K 2015-02-25 - 15:28 MericTaze  
PDFpdf ken-aaa_xrootd_150127.pdf r1 manage 2666.6 K 2015-02-25 - 15:28 MericTaze  
PDFpdf ken-osg-ahm-2014-aaa_140410.pdf r1 manage 2616.7 K 2015-02-25 - 15:28 MericTaze  
PDFpdf matevz-osg-ahm-2014-BeyondIoPatterns-FS14.pdf r1 manage 3816.2 K 2015-02-25 - 15:28 MericTaze  
Edit | Attach | Watch | Print version | History: r73 < r72 < r71 < r70 < r69 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r73 - 2022-06-14 - BockjooKim
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback