Difference: XrootdProductionChecklist (3 vs. 4)

Revision 42012-09-06 - BrianBockelman

Line: 1 to 1
 
META TOPICPARENT name="CmsXrootdArchitecture"

Xrootd Production and Integration

Line: 17 to 17
 
  • Verify Xrootd exports CMS namespace, not the site namespace. That is, file names should start with /store when read through Xrootd.
  • Set up a mapping for the test namespace. Have /store/test/xrootd/$SITENAME/store/(.*) map to /store/$1. This allows us to query the redirector for a specific file and only get a response from hosts at $SITENAME; an important characteristic for our testing.
  • Join your site to a regional redirector within 50ms RTT of your xrootd server.
Added:
>
>
    • xrootd.unl.edu for US sites
    • xrootd.ba.infn.it for EU sites
    • We are looking for volunteers to run an Asian redirector
 
  • Read the documentation on throttling Xrootd. Consider the impact of a malicious (or uneducated user). Do you feel you have comfortable controls for the following dimensions: site bandwidth utilized, IOPS, namespace queries.
  • Read the documentation on monitoring. You will want to keep an eye on these, and may want to integrate into your own site monitoring.

Production Criteria

Changed:
<
<
Your site must pass the following criteria in order to be listed as production. These will be re-evaluated monthly at all production sites in order to maintain a minimal quality of service level in production. Eventually, these items will be integrated into the normal site status board.
>
>
We are looking for the following criteria when evaluating whether a site is ready for production:
 
  1. At least three xrootd hosts at T2 sites and two hosts for T3 sites are required.
    • For T2s, the expected load should require two servers to handle; T3 sites should need one server. The additional one is for redundancy.
Line: 32 to 35
 
  1. 95% availability in the redirector as measured by heartbeat tests.
    • The heartbeat tests frequently (approximately every 10 minutes) download a few bytes from a single, known file.
    • This will be done directly against each known server (not through the redirector), and, in a separate test, via the integration redirector.
Changed:
<
<
  1. 95% availability in the random file tests.
    • The random file test will attempt to download a random file registered at the site in PhEDEx approximately once an hour.
    • This will be done via the regional redirector.
  2. 95% success rate in xrootd JobRobot.
    • A CRAB task will be run approximately daily at one site in the region (T2_US_Nebraska for the US) on the JobRobot test utilizing files from the remote site via the redirector.
    • Success rate is the percentage of successful jobs run that day.
>
>
  1. 95% success rate in the SAM tests.
    • A cmsRun job will be run approximately hourly on your site via SAM/Nagios. It will access root://cms-xrd-global.cern.ch/store/test/xrootd/$SITENAME/store/mc/SAM/GenericTTbar/GEN-SIM-RECO/CMSSW_5_3_1_START53_V5-v1/0013/CE4D66EB-5AAE-E111-96D6-003048D37524.root to look for your site in the redirector.
    • Success rate is the percentage of successful jobs run in the time period under consideration.
  2. "Good standing" in the site status board. What "good standing" means is left up to the site; we ask the site run xrootd in production only if they feel all the other fundamental aspects of running a site are covered.
 
Deleted:
<
<
If one of these criteria are not currently measured (for example, we estimate the JobRobot test won't be available until May 1) then the site is excused from the criteria.

All production sites must also be in the integration in order for the monitoring to function.

 \ No newline at end of file
Added:
>
>
All production sites must also be in the integration, as we will do periodic upgrades to the monitoring and test out new code/configuration there.
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback