Regional Operation Centre instructions and procedures
[ not yet complete ]
Who is this page for
This page is for Regional Operation Centres (ROCs) who support the FTS service at their Tier-1.
It describes what needs to be done to correctly identify problems with the part of the overall File Transfer Service that your ROC is responsible for.
What is the FTS?
The File Transfer Service is the service which provides production data transfers between the Tier-0 (CERN), Tier-1 sites and Tier-2 sites. Confusingly the software that is used to provide this service is also called File Transfer Service or FTS. The focus of this document is the overall service rather than specifically the software.
FTS servers are installed at the Tier-0 and at each of the Tier-1 centres.
Every transfer that can be made (e.g. every source destination pair) is the resposibility of one (and only one) FTS server. Which transfers your T1 site is responsible for are defined in
https://uimon.cern.ch/twiki/pub/LCG/FtsServerInstall15/SC4FTSsetupplan.doc
. To summarise:
- If it involves the Tier-0, then the Tier-0 (CERN) FTS is responsible.
- For Tier-1 to Tier-1 transfers, your Tier-1 FTS is resposible when you are the destination (you pull data to you).
- Transfers FROM any Tier-2 to your Tier-1 are handled by the FTS at your Tier-1 (you pull data to you).
- Transfers going TO a Tier-2 from anywhere are handled by the FTS server at that Tier-2's associated Tier-1 (you handle transfers for which one of your T2 sites is the destination).
The document above makes recommendations on what channels a Tier-1 sites should set up in order to service the transfers for which they are responsible.
Why did my ROC get assigned a ticket?
Your ROC will be likely assigned a GGUS ticket for the following reasons:
- SFT has failed indicating a problem on a test job that was handled by your FTS server
- A user has reported some problem on a job that was handled by your FTS server
- You have received a user or VO request to "add a channel / add a host / etc"
Problems in detail
A job failed
For case 1) and 2) a ticket may be received which indicates a problem in the storage layer, rather than a failure of your FTS server - in this case the FTS is simply reporting the error to the user. For example:
I submitted a job, but it failed. It says:
Error in SRM get: the file you requested failed error XXXXXXX
You should attempt to decode the error message (some are easy, some are not easy), since it comes from the SRM which failed, not the the FTS directly. Often there is enough informaton to work out which end of the transfer the error occurred on. If it looks like a storage problem on one of the sites you are responsible for, assign it appropriately. If it look like an error on the other site, note the diagnosis in the ticket and reassign to the relevant ROC.
If the channel is running
SRM copy
always assign it to the ROC of the destination asking that they investigate the problem, since the FTS logs have very little information in this case. You should share the ticket with your local FTS support as well, to provide the destination ROC with the information that your FTS does have.
A job failed on a channel running SRM copy (and you are the destination)
You are re-assigned the ticket because the FTS from the other ROC requested the dCache SRM at your Tier-1 to transfer some files, but the job failed.
Please ask your storage support to investigate the problem, since the logs of the FTS that submitted the job contain rather limited information about what went wrong - all the pertinent logs are in the dCache which was handling the copy. There should be a link on the ticket to the staff supporting the FTS which submitted the job, should your storage support need any information from them.
I've been asked to add a channel
You should have set up all necessary channels already. Adding a new channel should only be done when a new site appears.
At any rate, check that the channel is compliant to the model defined for WLCG:
https://uimon.cern.ch/twiki/pub/LCG/FtsServerInstall15/SC4FTSsetupplan.doc
.
Generally the advice is DO NOT add the channel. Bring up the issue at the next operations meeting for discussion.
I've been asked to change channel parameters
For example:
- Switch a channel Active or inactive
- Change the number of concurrent files on a channel
Assign to your local FTS server support. Even if that's you...
Installation problems or problems with YAIM
Try to solve the problem, if possible.
Otherwise, assign to the 3rd level "Installation and Configuration" support unit.
Last edit:
GavinMcCance on 2006-05-30 - 16:20
Number of topics: 1
Maintainer:
GavinMcCance