Xrootd Production and Integration

The xrootd data access system is oriented for the end-user and we are concerned with maintaining a satisfactory end-user experience and maintaining the stability of the system. Toward this goal, we partition sites into production and integration infrastructures. The integration infrastructure allows us to send controlled tests to a participating site to help it stabilize operations without worrying about exposing them to chaotic user load. The production infrastructure should be stable, allow all CMS users, and export any file registered to the site in PhEDEx.

Once a site feels it is ready (for example, if it passes the production checklist below) and passes our criteria, it will be allowed into the production infrastructure.

Xrootd Production Checklist

System administrators are expected to go through the following checklist in order to verify their site's xrootd install.

  • Verify your xrootd server is not blocked by a firewall. A user should be able to access it from the public internet.
  • Verify the xrootd server requires GSI-authentication for offsite clients. Any CMS user with a grid certificate should be able to read official CMS files.
  • Try opening and downloading a few files using ROOT and the xrdcp client, respectively.
  • Try writing a file via xrootd and make sure it fails. We suggest that xrootd remain read-only (although sites can certainly ignore this suggestion).
  • Any file registered to your site in PhEDEx should be available through Xrootd.
  • Verify Xrootd exports CMS namespace, not the site namespace. That is, file names should start with /store when read through Xrootd.
  • Set up a mapping for the test namespace. Have /store/test/xrootd/$SITENAME/store/(.*) map to /store/$1. This allows us to query the redirector for a specific file and only get a response from hosts at $SITENAME; an important characteristic for our testing.
  • Join your site to a regional redirector within 50ms RTT of your xrootd server.
    • xrootd.unl.edu for US sites
    • xrootd.ba.infn.it for EU sites
    • We are looking for volunteers to run an Asian redirector
  • Read the documentation on throttling Xrootd. Consider the impact of a malicious (or uneducated user). Do you feel you have comfortable controls for the following dimensions: site bandwidth utilized, IOPS, namespace queries.
  • Read the documentation on monitoring. You will want to keep an eye on these, and may want to integrate into your own site monitoring.

Production Criteria

We are looking for the following criteria when evaluating whether a site is ready for production:

  1. At least three xrootd hosts at T2 sites and two hosts for T3 sites are required.
    • For T2s, the expected load should require two servers to handle; T3 sites should need one server. The additional one is for redundancy.
    • This requirement is so an end-user can expect reasonable performance when accessing official CMS data.
    • Each server ought to be equivalent to a 4 core machine with at least 8GB of RAM.
    • Less servers can be used for sites serving only unique namespace. I.e., a T3 has a private namespace which doesn't conflict with the official CMS namespace.
  2. 95% availability in the redirector as measured by heartbeat tests.
    • The heartbeat tests frequently (approximately every 10 minutes) download a few bytes from a single, known file.
    • This will be done directly against each known server (not through the redirector), and, in a separate test, via the integration redirector.
  3. 95% success rate in the SAM tests.
    • A cmsRun job will be run approximately hourly on your site via SAM/Nagios. It will access root://cms-xrd-global.cern.ch/store/test/xrootd/$SITENAME/store/mc/SAM/GenericTTbar/GEN-SIM-RECO/CMSSW_5_3_1_START53_V5-v1/0013/CE4D66EB-5AAE-E111-96D6-003048D37524.root to look for your site in the redirector.
    • Success rate is the percentage of successful jobs run in the time period under consideration.
  4. "Good standing" in the site status board. What "good standing" means is left up to the site; we ask the site run xrootd in production only if they feel all the other fundamental aspects of running a site are covered.

All production sites must also be in the integration, as we will do periodic upgrades to the monitoring and test out new code/configuration there.


This topic: Main > TWikiUsers > BrianBockelman > CmsXrootdArchitecture > XrootdProductionChecklist
Topic revision: r6 - 2014-04-17 - unknown
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback