FTS Channel Administration 1.5
This page describes the basic administration and monitoring of channels upon the service. Underlying details of the
server (starting, stopping, logfiles, etc) are described in
FtsServerAdmin15. Server installation and configuration is described in
FtsServerInstall15. Standalone client installation and configuration is described in
FtsClientInstall15.
Creating new channels
The
FtsServerInstall15 guide details how to create channels in the database. In short, you need to:
- Add a new channel in the DB using
glite-transfer-channel-add
.
- Add a new agent daemon configuration to one of your agent nodes to service this new channel.
See
FtsConfigurationSetup15 for an example on how to setup channels and all the various VO shares.
Managing channels
A channel or "queue" can be:
- point-to-point network connection, e.g. for the Tier-0 there is defined a single channel to every Tier-1 site.
- a "catch-all" channel
A channel or "queue" is the basic unit of management. The channel manager can control whether a channel is active or not, and how much work is being put on the network for a given channel. Channels can be managed independently.
To perform any channel mangement operations, you must have a valid grid or
VOMS proxy:
voms-proxy-init
and you must also be authorised, either as a service administrator or as a channel administrator for the given channel. Use the
glite-transfer-getroles
command to see what priviliges you have upon the service. Note that by default, the root account on the FTS server machine is listed in the server administrator mapfile (in fact the host cert is listed in the mapfile, which the root account uses as its user cert).
Initially, try a list to see what channels are already defined on the system (if it is a new install on a blank database, there should be none):
glite-transfer-channel-list
note that
man
pages are available for all commands.
Problems you may see
If you see:
list: listChannels: SOAP fault: SOAP-ENV:Client - CGSI-gSOAP: Could not open connection ! (TCP connect failed in tcp_connect())
then either the service is down, or your
services.xml
file is pointing to the wrong endpoint. You can see what endpoint the client is attempting to connect to by using the version flag:
glite-transfer-channel-list -v
If you see:
list: Service discovery: No services of type org.glite.ChannelManagement were found
then either the
services.xml
file is missing or not readable by the client.
If you see:
list: listChannels: SOAP fault: "http://xml.apache.org/axis/":Server.NoService - The AXIS engine could not find a target service to invoke! targetService is ChannelManagement
then either the endpoint specified in the
services.xml
file is incorrect, or the service is misconfigured.
To check whether the endpoint specified in the
services.xml
file is correct, connect to it directly with a web browser. e.g. connect to
https://yourhostname:8443/glite-data-transfer-fts/services/FileTransfer
(you will need your grid certificate loaded in your browser). If the service is listening, you should see a web page with a message like:
Hi there, this is an AXIS service!
If you see:
list: listChannels: You are not authorised for channel management upon this service
then you are not in the manager mapfile. Look in the FTS server logs (
org.glite.data
) to see how the authorisation decision was made.
Changing the state of channels
Although any paramters may be changed subsequently on a channel, only three operations are common: setting it active, setting it inactive and setting the number of concurrent transfers.
To set a channel
mychannel
to state
Inactive
:
glite-transfer-channel-set -S Inactive CHANNELNAME
This will stop putting any further work on the network. Any individual file transfers that have already been started will complete, so it can take a minute or two for all activity to stop.
To set the channel to state
Active
:
glite-transfer-channel-set -S Active CHANNELNAME
This will start putting work on the network if there are any jobs assigned to that channel in the
Pending
state.
To change the number of concurrent transfers being put on the network for a given channel:
glite-transfer-channel-set -f 10 CHANNELNAME
This will make the agent try to maintain 10 concurrent transfers on the channel when it is
Active
.
Dropping channels
To drop a channel:
glite-transfer-channel-drop CHANNELNAME
It is a current restriction that a channel
cannot be dropped if any jobs have been assigned to it (because we do not cascade constraints on the database delete). We will provide a script to do this soon. Make sure that you are happy with a channel definition before submitting work that will be assigned to it.
Manual interventions on channels
When a file fails on a single transfer attempt, the system will attempt to retry it after some delay. The number of retry attempts and the delay are configurable on the service (defaults are 3 retries, waiting minimum 10 minutes between each retry attempt).
If a file fails, it will be placed in state
Waiting
. After the retry period has elapsed it will be placed back in the
Pending
state, and will subsequenrly be picked up again by the transfer agent.
If a file has been retried more than the max number of times, it is placed in the
Hold
state. This state indicates that some manual intervention should be made to rescue (or cancel) the file. The channel admin should periodically check for jobs with files in
Hold
state:
glite-transfer-list -c CHANNELNAME Hold
If there are any, there are two options for you. You may fix the whatever you believe the problem may be (e.g. reboot the SRM cluster) and try for one more retry.
glite-transfer-channel-signal -c CHANNELNAME Pending
All jobs on the channel in
Hold
state will be dropped back onto the
Pending
queue. The other option is to set all the jobs on the channel in
Hold
state to
Canceling
if you believe that there is nothing you can do to rescue the jobs (for example, if the filenames are invalid).
glite-transfer-channel-signal -c CHANNELNAME Canceling
You also have the option of applying the signal command to a single job, rather than all jobs on the entire channel.
glite-transfer-channel-signal -j 17cc7055-d3ac-11d9-a905-f173b72e4547 Pending
Running a test job
Identify a source SRM file name and a suitable destination SRM file name. The file names must be in the full SURL format:
srm://castorgridsc.cern.ch:8443/srm/managerv1?SFN=/castor/cern.ch/grid/dteam/storage/transfer-test/1gig/file-001.dat
Load into MyProxy the user proxy that will be used for the transfer. You must use the DN_as_username mode and set the maximum age of the proxy suitably. Make sure that the SURLs domainame's match the channel definition, that the channel is Active and that the number of concurrent files on the channel is greater than zero.
voms-proxy-init
myproxy-init -d
[enter your MyProxy password twice]
glite-transfer-submit \
srm://castorgridsc.cern.ch:8443/srm/managerv1?SFN=/castor/cern.ch/grid/dteam/storage/transfer-test/1gig/file-000.dat \
srm://srm1.tier1.sara.nl:8443/srm/managerv1?SFN=/data/dteam/testdestfile-glite-1.1.1-000.dat
choose your SURLs appropriately. This will return a UUID identifier, e.g.
17cc7055-d3ac-11d9-a905-f173b72e4547
To check the status of the job, run:
glite-transfer-status --verbose 17cc7055-d3ac-11d9-a905-f173b72e4547
which should print the state and some other information. You should see the job progressing through:
-
Submitted
(if you're fast). The job is still to be assigned to channel.
-
Pending
. The job has been assigned to a channel, and is awaiting a transfer slot.
-
Active
. The job is being transferred or the file is awaiting retry.
-
Hold
. The single file in the job failed 3 times and needs to be resuced by manual intervention.
-
Done
. The job and its file has completed successfully.
See the
FtsServerAdmin15 guide about what is happening on the FTS machine itself.
If the job has gone entered state
Done
, then the test went fine.
Last edit:
GavinMcCance on 2006-05-03 - 18:27
Number of topics: 1
Maintainer:
GavinMcCance