Tier-2 to Tier-2 Data Transfer Link Commissioning Procedures

Motivation

To be used in the Production instance of PhEDEx, a data transfer link must first go through a commissioning procedure in the Debug instance using the LoadTest infrastructure. New CMS Tier-2 sites should commission PhEDEx data transfer links to and from all Tier-1 and all other Tier-2 sites in CMS. Since there are ~50 Tier-2 sites, or ~100 transfer links from a new Tier-2 to and from all other Tier-2 sites, this process will go much more quickly if the data administrators of the new site can carry out much of the day-to-day commissioning tasks. This twiki outlines the procedures, metrics and usual challenges in commissioning Tier-2 to Tier-2 links, although much of it is applicable for Tier-1 link commissioning too. Documentation for commissioning links to and from Tier-1 sites can be found where?.

PhEDEx Site Commissioning

Instructions for setting up PhEDEx at a new site can be found here. Before commissioning Tier-2 to Tier-2 links, the site should have commissioned all the Tier-1 links to and from their site.

Setting up the Tier-2 to Tier-2 LoadTest

Injections

Unlike Production instance data transfers, the LoadTest injects new replicas at pre-set rates. A Tier-2 site can in general only set the injection rates from their site. As a baseline, injection rates should be set to 0.03 MB/s, which corresponds to at least one file per day per link.

  • Sites will need to ask centrally that injection rates be set to the baseline for links to your site.

Link to the PhEDEx injection page.

Subscriptions

Subscription requests should be made in PhEDEx for LoadTest samples over every Tier-2 to Tier-2 link in both directions. This is best accomplished with:

  • A subscription to the commissioning Tier-2 to download the LoadTest samples from every other Tier-2.
  • A second subscription to every other Tier-2 to download the LoadTest sample from the commissioning Tier-2.

Metrics for Commissioning

To become a commissioned link in the Production instance of PhEDEx, allowing real Production data transfers between sites, a link must pass a commissioning metric. The metric for any data transfer link from a Tier-2 site (including Tier-2 to Tier-2 links) is:

  • 5MB/s average transfer rate in 24h, i.e. transfer 422 GB (typically 169 files of 2.5GB each) in less than 24h.
For all links from Tier-1 sites, the metric is 4 times higher, or 20MB/s average over 24h (approximately 680 files of 2.5GB each in less than 24h).

You can check the progress of your commissioning activities on the PhEDEx links page. Here can see commissioned links to and from a site.

  • The links that do not exist yet are marked in white. Please ask the central operations to create the links before attempting commissioning.
  • The links that still need to be commissioned are marked with "Link is deactivated" (purple).
  • The links that are already commissioned are marked in other colors. Green means commissioned and running in the Production instance.

Procedures

Sites can initiate commissioning of links from their site to begin the procedure:

  • Choose a link to another Tier-2 site that has successfully transferred files itself to sites (with more than 90% success) in the past day or so (see the PhEDEx transfer activity page).
  • Inject ~200 files on this link from the Injection page.
  • Check back in several hours and see if the ~200 files have successfully transferred.

If the files successfully transferred in less than 24h, the link will be automatically activated in the Production instance at 23h30 CET. Choose another link and commission it.

Once all the Tier-2 links from your site are commissioned, please contact central operations to initiate injections to commission links to your site.

Problems

If you see transfer errors, as a site administrator you are in a unique position to debug the problems quickly, rather than waiting for help from experts who may be several timezones away. Broadly, data transfer problems fall into general issues that affect all data transfers to or from a site, and those that affect perhaps one or very few links.

General failure to read from or write to your site:

  • Failure of your SRM
  • Network interruption

General failure to read from your site:

  • Authentication issues for remote CMS users
  • Network interruption

General failure to write to your site:

  • Expired proxy for your PhEDEx agent

See a pattern of failure for several links?

  • What do the links have in common?
    • Geography? (Look to networking)
    • Type of remote storage element/SRM? (Look to the configuration of your agents? Or to similar failures at sites like yours?)
    • Shared FTS Server? (Look to the configuration of the FTS channels open to you)

-- JamesLetts - 12-Apr-2012

Edit | Attach | Watch | Print version | History: r8 < r7 < r6 < r5 < r4 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r7 - 2012-04-14 - JamesLetts
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback