Data Transfer Link Commissioning Procedures

Motivation

To be used in the Production instance of PhEDEx, a data transfer link must first go through a commissioning procedure in the Debug instance using the LoadTest infrastructure.

  • New CMS Tier-1 sites should commission PhEDEx data transfer links to and from T0, all other Tier-1 and all Tier-2 sites in CMS.
  • New CMS Tier-2 sites should commission PhEDEx data transfer links to and from T0, all Tier-1 and all other Tier-2 sites in CMS.
Since there are 8 Tier-1 sites, ~50 Tier-2 sites, or ~100 transfer links from a new site to and from all other sites, this process will go much more quickly if the data administrators of the new site can carry out much of the day-to-day commissioning tasks. This TWiki outlines the procedures, metrics and usual challenges in commissioning links.

PhEDEx Site Commissioning

Intended to: Site Admins

Instructions for setting up PhEDEx at a new site can be found here. Before commissioning Tier-2 to Tier-2 links, the site should have commissioned all the Tier-1 links to and from their site.

PhEDEx Links Creation

Intended to: Central Ops

Before commissioning links, links should be created in Prod/Debug/Dev instances. Instructions for creating links can be found here. While creating links, make sure the proper link weights are given for each link. Description of link weights can be found here here

  • By default, LinkNew will create the links in an 'active' state
  • For Prod links, normally we should disable the links after creation, and then they will be reenabled automatically after commissioning
  • To disable links, use PhEDEx utility: DDTLinksManage. Run the script like this:
    • PHEDEX/Utilities/DDTLinksManage -db ~/phedex/info/DBParam:Prod/OPS linklist.txt
    • linklist.txt is a txt file with one link per line. For example: T1_IT_CNAF_Disk T2_US_MIT disable
    • Bidirectional links should be disabled separately

Setting up the LoadTest

Intended to: Site Admins and Central Ops

Injections

Please refer to Creating and Injecting LoadTest07 Samples web page.

Unlike Production instance data transfers, the LoadTest injects new replicas at pre-set rates. A site can in general only set the injection rates from their site. As a baseline, injection rates should be set to 0.1 MB/s, which corresponds to at least one file per day per link.

  • Sites will need to ask centrally that injection rates be set to the baseline for links to your site.

Link to the PhEDEx injection page.

Subscriptions

Subscription requests should be made in PhEDEx for LoadTest samples over every independent link in both directions. This is best accomplished with:

  • A subscription to the commissioning site to download the LoadTest samples from every other sites.
  • A second subscription to every other sites to download the LoadTest sample from the commissioning site.

Commissioning Steps

Intended to: Site Admins and Central Ops

  1. Go to the LoadTest Injection Page.
  2. Select "Show Options" --> "Create Injection" from Source_Site to Destination_Site at 0.1 MB/s to start. This should create a "Injection Dataset" in the LoadTest Injection Page. Currently sites can only create loadtest streams FROM their site, not TO the site. In that case ask help from transfer team
  3. Go to "Requests" --> "Create Request" --> "Transfer Request" in Debug instance, and create a normal transfer request for the "Injection Dataset" created in the previous step to the Destination_Site. Create Transfer Request. (Remember: request the "Injection Dataset", not the "Source Dataset", and for T1 tapes use Buffer not MSS).
  4. When this request is approved by Destination_Site admins (or by central admins to make things faster), you should see transfers on Quality Plots Link in Debug. If quality is RED, we need to debug the errors and fix them with the site admins. If quality is GREEN we are ready for commissioning.
  5. Go to the LoadTest Injection Page and use "Actions" --> "One Time Injections" to inject 1000 files the "Injection Dataset". (note: inject 1000 files for T1-->T2 link, inject only 300 files for T2-->T1 link instead).
  6. The next day, if files are transferred in less than 24 hours, the rate will be sufficient for commissioning. Check the rate here. If average rate is >20 MB/s for T1-->T2 links or >5 MB/s for T2-->T1 links, the link is commissioned and will be activated automatically in Production Instance around 23:00 CERN time (it should become green): Production Links. Then pass to commissioning the next link. Note: if everything is OK, you can commission ~3-4 links per day

Metrics for Commissioning

Intended to: Site Admins and Central Ops

To become a commissioned link in the Production instance of PhEDEx, allowing real Production data transfers between sites, a link must pass a commissioning metric.

  • The metric for any data transfer link from a Tier-1 site (Tier-1 to Tier-2 and Tier-3 links) is: 20MB/s average transfer rate over 24h (approximately 680 files of 2.5GB each in less than 24h)
  • The metric for any data transfer link from a Tier-2 site (Tier-2 to Tier-1 and Tier-3 links) is: 5MB/s average transfer rate over 24h, i.e. transfer 422 GB (typically 169 files of 2.5GB each) in less than 24h.
  • The metric for any data transfer link from a Tier-3 site (Tier-3 to Tier-1 and Tier-2 links) is: 5MB/s average transfer rate over 24h, i.e. transfer 422 GB (typically 169 files of 2.5GB each) in less than 24h.

You can check the progress of your commissioning activities on the PhEDEx links page. Here can see commissioned links to and from a site.

  • The links that do not exist yet are marked in white. Please ask the central operations to create the links before attempting commissioning.
  • The links that still need to be commissioned are marked with "Link is deactivated" (purple).
  • The links that are already commissioned are marked in other colors. Green means commissioned and running in the Production instance.

Procedures

Intended to: Site Admins

Sites can initiate commissioning of links from their site to begin the procedure:

  • Choose a link to another site that has successfully transferred files itself to sites (with more than 90% success) in the past day or so (see the PhEDEx transfer activity page).
    • For links from Tier-1 site, inject ~1000 files on this link from the Injection page.
    • For links from Tier-2 site, inject ~300 files on this link from the Injection page.
  • Check back in several hours and see if those injected files have successfully transferred.

If the files successfully transferred in less than 24h, the link will be automatically activated in the Production instance at 23h30 CET. Choose another link and commission it.

Once all the Tier-2 links from your site are commissioned, please contact central operations to initiate injections to commission links to your site.

Problems

Intended to: Site Admins

If you see transfer errors, as a site administrator you are in a unique position to debug the problems quickly, rather than waiting for help from experts who may be several timezones away. Broadly, data transfer problems fall into general issues that affect all data transfers to or from a site, and those that affect perhaps one or very few links.

General failure to read from or write to your site:

  • Failure of your SRM
  • Network interruption

General failure to read from your site:

  • Authentication issues for remote CMS users
  • Network interruption

General failure to write to your site:

  • Expired proxy for your PhEDEx agent

See a pattern of failure for several links?

  • What do the links have in common?
    • Geography? (Look to networking)
    • Type of remote storage element/SRM? (Look to the configuration of your agents? Or to similar failures at sites like yours?)

Please refer to the web page in order to troubleshoot transfers.

Edit | Attach | Watch | Print version | History: r10 < r9 < r8 < r7 < r6 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r10 - 2015-12-10 - unknown
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback