Transfer Service Operations Procedure

This page describes the procedure to check the status of the WLCG transfer service. The scope is the transfers managed by the CERN-PROD FTS servers: the tier-0 export and the CAF FTS service.

Both the FTS infrastructure at CERN and the actual status of the transfers should be checked.

The goal is to make sure that any issues affecting the service (whether they are FTS or SRM related) are reported to the correct site as soon as possible. The issues should also be tracked here to make sure that they are being followed up by the responsible.

The daily logs should be recorded in TransferOperationsDailyLog.

FTS infrastructure tests

The purpose of these tests are to check that the basic FTS service infrastructure at CERN is operating correctly. There are a number of alarms and status checks that are made. If any of these tests show problems, report to the FTS operations staff here, or using fts-support@cern.ch.

External tests

  • The main external FTS probe is made by SAM. This checks that the service is correctly registered in the information system and that it is responding to user requests ( SAM FTS test - check that the CERN-PROD entries are both OK).

Internal tests

  • These are fabric-level tests that make sure that the daemons are running correctly and responding. If there have been any alarms, these are reported at the IT daily morning meeting (09.00) for the gridfts cluster. All of the alarms have operator procedures which should fix the problem quickly after it occurs. However, if a problem is still open (i.e. if the procedure did not work), it should be investigated and reported. The current tickets can be reviewed at Logger (search domain: FIO, cluster: gridfts).

FTS overall service monitoring

The purpose of this is to check the status of the overall transfer service: i.e. to check the service level that the experiment users' actually receive. As well as depending on the reliability of the FTS servers, this also depends critically on the reliability of the SRMs that the FTS is using to make the transfers. Any problems highlighted by the FTS or by other monitoring about failing transfers should be reported to the relevant site (using GGUS) so that the issue can be followed up.

  • Check regularly the Gridview overview: GRIDVIEW. This will indicate which channels are (successfully) transferring data, though it will not show failed transfers. The daily report should include a summary of the current major transfers (i.e. which experiments and which sites and the approximate average transfer rate).

  • Check the FTS daily report for failing sites: FTS report. This report is currently only generated once per day for the previous 24 hours transfers. For any issues, try to understand which experiments are affected (e.g. all of them? or just one?).

  • To check for more detail on what the problems currently are, run the log-parsing utility to check for the top reasons for failing transfers. The GGUS tickets that are submitted should focus on the major causes of error first.

How to log a problem

  1. Report infrastructure FTS problems to the CERN FTS administrators here or via fts-support@cern.ch. If no one is available, submit a GGUS ticket for tracking.
  2. Report site storage problems (at Tier-1 sites or at CERN Castor) using the GGUS submission portal. Please use the TransferOperatonsGgusTemplate.
  3. Track the GGUS ticket number (and FTS infrastructure issues) and status in the daily report.
  4. A summary of outstanding issues should be reported in the weekly report for the Joint Operations Meeting.

How to follow up a problem

Check regularly the GGUS tickets that have submitted to make sure that they are being followed up. Sometimes an administrator will ask for more information, typically, the FTS transfer log.

For the moment, contact the FTS administrators at CERN for this.


Last edit: AlexanderUzhinskiy on 2007-05-21 - 09:38

Number of topics: 1

Maintainer: GavinMcCance

Edit | Attach | Watch | Print version | History: r6 < r5 < r4 < r3 < r2 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r6 - 2007-05-21 - AlexanderUzhinskiy
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback