FTA change channel type procedure for Release 2.0.
What is it?
This is the procedure to change the type of an existing channel from URLCOPY to SRMCOPY or viceversa.
When to use it?
When you have an existing channel, with an agent running for it, configured as URLCOPY or SRMCOPY and want to change the transfer type.
Reason
Changing the channel type of a channel is not like changing any other parameter and requires particular care.
When the channel agent daemon runs, it periodically checks for the status of the running transfers. This status is serialized in the .mem files that can be found under the /var/tmp/glite-url-copy-edguser folder, and these files have
a different structure if the transfer is an urlcopy or srmcopy mode transfer. Therefore, it is possible that something like the following happens:
- the urlcopy channel agent is stopped while there are running transfers
- the channel type is set to srmcopy
- the channel agent is restarted
- the agent picks up an urlcopy mem file and tries to read it as an srmcopy mem file
- the transfer status verification fails.
Possible consequences are:
- transfers that were actually successful are considered failed and retried
- database corruption: on some occasions two active transfers for the same file are created in the database, and eventually the channel agent will disable many of its actions, not being able to cache the active transfers data.
Procedure
In the example we will use a channel named "CERN-CERN" and change its type from urlcopy to srmcopy.
Drain and stop the channel
Set the channel state to
Inactive so that no new transfers will be started:
glite-transfer-channel-set -S Inactive CERN-CERN
Wait until there are no running transfers, i.e. grep the process table for processes of the form
CHANNEL-NAME__*
:
ps aux | grep CERN-CERN__
Stop the channel:
service transfer-agents --instance glite-transfer-channel-agent-urlcopy-CERN-CERN stop
Change the channel type
Edit the
site-info.def
file. Modify
FTA_CERN_CERN="URLCOPY"
to
FTA_CERN_CERN="SRMCOPY"
Rerun the YAIM configuration script to rebuild the config files:
/opt/glite/yaim/scripts/configure_node site-info.def FTA2
Restart the channel
service transfer-agents --instance glite-transfer-channel-agent-srmcopy-CERN-CERN start
Note that the channel agent instance name has changed from
glite-transfer-channel-agent-urlcopy-CERN-CERN
to
glite-transfer-channel-agent-srmcopy-CERN-CERN
.
Troubleshooting
Why
The problem stems from the fast that the lock file is a function of the agent name - so
glite-transfer-channel-agent-srmcopy-CERN-CERN
and
glite-transfer-channel-agent-urlcopy-CERN-CERN
have different lock files. Consequently, both agents can be running on the same schema at the same time, and this causes the DB corruption.
I can't stop the old agent
If you do not stop the old agent before reconfiguring YAIM, the
init.d
script will not let you address the old agent (i.e. the
stop
command won;t let you stop it). In this case, you should kill the old agent with SIGINT.
The newly configured channel is 'stuck'
If you have already had both the agent running at the same time, there is a risk of schema corruption. The new agent will check for a valid schema a disable itself if it finds a problem. Of course these means the agent is effectively down until you fix the schema. The symptom is that all jobs on the channel will stay in the
Ready
state. If you look in the log file of the channel agent you will find messages about actions being disabled. In this case contact
fts-support@cernNOSPAMPLEASE.ch for the fix.
Last edit:
GavinMcCance on 2008-01-15 - 11:45
Number of topics: 1
Maintainer:
PaoloTedesco