FTS Configuration model and YAIM variables reference.

This document applies to FTS version 2.2

Structure of the configuration file

The YAIM configuration file contains the information to configure the FTS web server (FTS) and the channel and VO agents (FTA).

Web server configuration

FTS web-service YAIM variables are all prefixed with FTS_. Any variable with this prefix that the script does not understand will cause it to fail (to catch spelling mis-types).

There is only one parameter which MUST be set: FTS_DBURL, which specifies the JDBC Oracle connection string. This is usually similar to the agents' FTA_GLOBAL_DB_CONNECTSTRING parameter, but not in every case, so it must be specified explicitly. It can be provided by your database administrator.

In case the FTS web service and the agents are configured in the same file, as is usually the case, it is not necessary to provide the DB username and password, since these are taken from the Agents parameters. The database connection parameters may be overriden if desired (for example if the web-service is using a different DB role from the agents).

Parameter name Default (corresponding agent parameter)
FTS_DB_TYPE Value of FTA_GLOBAL_DBTYPE
FTS_DB_USER Value of FTA_GLOBAL_DB_USER
FTS_DB_PASSWORD Value of FTS_GLOBAL_DB_PASSWORD

FTS_DB_TYPE=ORACLE
FTS_DB_USER=lcg_fts_prod_ws_w
FTS_DB_PASSWORD=yyyyyyyy

By default, the web-service will publish its endpoint in BDII using its hostname. If instead, you wish to publish this using a DNS alias, set the parameter FTS_HOST_ALIAS.

 
# Node names
[...]
FTS_HOST=%FTS_WS_HOSTNAME%.$MY_DOMAIN
[...]

# BDII/GIP specific settings
[...]
BDII_FTS_URL="ldap://$FTS_HOST:2170/mds-vo-name=resource,o=grid"
[...]

# FTS config file for web-service
FTS_DBURL=... # The JDBC url for connecting to the DB
FTS_HOST_ALIAS=prod-fts-ws.cern.ch

WHAT IS FTS_HOST ???

Agents configuration

The agents configuration must provide information about:

  1. Which agents are on which hosts
  2. Type of each named agent
  3. Global parameters, applied to all agents
    • Logging
    • DB connection details
  4. Type specific parameters, applied to all agents of a given type
    • e.g. all 3rd party copy agents
  5. Specific parameters for individual agents

Section 1: which agents are on which hosts

First, give symobolic names to the machines where the agents will be installed using the variable FTA_MACHINES:

 
FTA_MACHINES="ONE TWO"

Then, for each of the machines defined, specify the host name using the variable FTA_AGENTS_${MACHINE}_HOSTNAME and the list of agents that will run on that machine using the variable FTA_AGENTS_${MACHINE}:

 
FTA_AGENTS_ONE_HOSTNAME="fts001.cern.ch"
FTA_AGENTS_ONE="CERN-CERN DTEAM"
FTA_AGENTS_TWO_HOSTNAME="fts002.cern.ch"
FTA_AGENTS_TWO="CERN-RAL CERN-IN2P3"

Agents naming conventions

In order to be consistent and to ensure that the FTA channel discovery mechanism works, please observe the following rules when naming channels and VO agents

  • The agent names should all be upper case, e.g. DTEAM or CERN-RAL
  • For VO agents:
    • It is preferable if the VO agent names do not contain dashes
  • For channel agents:
    • The channel agent names should specify source and destination separated with a single dash.
    • The source and destination should be simple site identifiers (they do not need to be GOC DB names) and should contain only alphanumeric characters. The name should match the pattern [A-Z]*-[A-Z]*.

Section 2: specify the type of each named agent

For each agent defined, specify its type. The type is either:

  • URLCOPY for a 3rd party copy channel agent.
  • SRMCOPY for an SRM copy channel agent.
  • VOAGENT for a VO agent.

Construct the YAIM variable to use as FTA_${agent-name}. Since the underlying shell of YAIM is bash, agent names with a dash '-' should substitute an underscore instead, '_'. Continuing the previous example:

FTA_CERN_CERN="URLCOPY"
FTA_DTEAM="VOAGENT"
FTA_CERN_RAL="URLCOPY"
FTA_CERN_IN2P3="SRMCOPY"

Sections 3, 4 and 5: specify agent parameters

Most of the agents' configuration parameters may be specified at different levels:

  • GLOBAL level
  • TYPE level
  • INSTANCE level

parameters_model.gif

Agents come with reasonable defaults for most of the parameters, and for the vast majority of parameters the default should be fine.

Global parameters are applied to all agents and override the defaults.

Type-specific parameters are applied to all agents of a given type, e.g. all URLCOPY agents. These parameters override the default and global ones; tipically, most of the parameters that need a non-default value are changed at this level.

Instance-specific parameters change the value of a variable for a single agent, and override default values, global and type-specific parameters. Such parameters should (ideally) be used rarely.

The general format of a YAIM variable is: FTA_${SCOPE}_${PARAMNAME}

SCOPE is either GLOBAL, TYPEDEFAULT_${TYPE}, or the name of the agent (replacing dashes with underscores).

Example:

# All agents:
FTA_GLOBAL_LOG_PRIORITY="INFO"
# All URLCOPY channel agents:
FTA_TYPEDEFAULT_URLCOPY_LOG_PRIOIRITY="DEBUG"
# the named CERN-RAL agent (note the '-' converted to '_')
FTA_CERN_RAL_LOG_PRIORITY="INFO"

Please note that DB connection parameters have no defaults, and should be specified at global level:

FTA_GLOBAL_DB_CONNECTSTRING="(DESCRIPTION=(LOAD_BALANCE=no)(ADDRESS=(PROTOCOL=TCP)(HOST=lcgtestdb1.cern.ch)(PORT=1521))(ADDRESS=(PROTOCOL=TCP)(HOST=lcgtestdb2.cern.ch)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=lcg_fts.cern.ch)))"
FTA_GLOBAL_DB_USER=lcg_fts_prod_w
FTA_GLOBAL_DB_PASSWORD=xxxxxxxx

YAIM variables reference

Web service

FTS_DBURL: the JDBC url to connect to the database.

FTS_HOST: ???

FTS_HOST_ALIAS: dns alias to be published in BDII.

FTS_DB_TYPE: the database type for the web service (default = value of FTA_GLOBAL_DBTYPE). The only allowed value for this variable is ORACLE.

FTS_DB_USER: the database username for the web service (default = value of FTA_GLOBAL_DBUSER).

FTS_DB_PASSWORD: the database password for the web service (default = value of FTA_GLOBAL_DBPASSWORD).

Names, types and locations of agents

FTA_MACHINES: a list of identifiers for machines where agents are located, separated by spaces (e.g. "ONE TWO THREE").

FTA_AGENTS_${MACHINE}_HOSTNAME: the hostname for the agents machine. ${MACHINE} must be one of the values specified with FTA_MACHINES.

FTA_AGENTS_${MACHINE}: the list of agents on the machine, sepated by spaces. ${MACHINE} must be one of the values specified with FTA_MACHINES.

FTA_${AGENT}: the type of the agent. ${AGENT} must be one of the values specified in the FTA_AGENTS_${MACHINE} variables. Allowed values:

  • URLCOPY: an urlcopy channel agent
  • SRMCOPY: an srmcopy channel agent
  • VOAGENT: a VO agent

Common agents parameters

GUC_MAXTRANSFERS: The maximum number of concurrent transfers the agent will process (act as a hard-limit on the number of files specified for a channel). Default is 50.

ACTIONS_SRMVERSION: The default SRM version to be used when looking for an SRM endpoint (both source and/or destination) in case of ambiguity. Default is 1.1

ACTIONS_SRMVERSIONPOLICY: The policy to apply when choosing an SRM version in case of ambiguity, i.e. if the specified SURL (either source or destination) does not contain enough information to uniquely identify an endpoint (i.e. port and/or application path). Allowed values are:

  • default: use the value specified by GUC_SRMVERSION
  • with-space-token: choose 2.2 if a space token is provided; use default mechanism otherwise.
  • force: deprecated. Do not use.

ACTIONS_SURLNORMALIZATION: Format to which the agent will convert SURLs before transfering. Allowed values:

  • compact (default value): srm:///
  • compact-with-port: srm://:/
  • fully-qualified: srm://:/srm/managerv2?SFN=
  • disabled: no convertion performed
For SRMCOPY channel agents, in order to prevent an issue with dCache 1.6.6, it is recommended to set:
FTA_TYPEDEFAULT_SRMCOPY_ACTIONS_SURLNORMALIZATION=compact-with-port 

ACTIONS_MAXFILESTOCANCEL: The maximum number of Ready/Active files to cancel at each iteration of the Cancel action. Default is 500.

AGENT_VOSHARETYPE: how the VO share for a channel should be interpreted. The allowed values are:

  • normalized: the share is the value of the channel voshare property for the given VO, normalized to the sum of all the shares for all the VOs in the same channel. This option could be used when channel administrators want to guarantee slots for certain VOs, in order to implement some sort of QoS, accepting to eventually penalize the total throughput (transfer slots would be reserved to a VO even if that VO has no job to process)

  • absolute: the share is the value on the channel voshare property expressed as a percentage. No normalization is performed, that means that the sum of all the shares on the same channel can exceed 100%. This option could be used when channel administrators want to balance the share between the VOs, without allowing that a single VO fully allocate a channel but minimizing the risk to allocate slots to VOs that don't have any job to process. This option implies some tuning on the VO share values based on experience, but it would allow to have a compromise between throughput and QoS.

  • normalized-on-active: the share is the value of the channel voshare property for the given VO, normalized to the sum of all the share for all the VOs in the same channel that has at least one job that can be processed by the Channel Agent (job state should be Active, Pending or Canceling). This option is the default and should be used when the channel administrators want to optimize the throughput of the channel (the channel can be fully allocated even by one VO), but with a lower QoS

As an example, supposing you have a channel that has 30 files and 3 VOs, you could have:

  Normalized Absolute Normalized-on-active*
VO Share Max Files Max Files Max Files
VO_1 50 15 15 0
VO_2 30 9 9 18
VO_3 20 6 6 12

(* supposing VO_1 has no job to submit)

As you can notice, in case the sum of the VO share is 100, there's no difference between the "normalized" and "absolute" setup. But if this constraint is not respected, you can have:

  Normalized Absolute Normalized-on-active*
VO Share Max Files Max Files Max Files
VO_1 70 14 21 0
VO_2 50 10 15 19
VO_3 30 6 9 11

(* supposing VO_1 has no job to submit)

Please note that the value of the column "Max Files" correspond to the maximum number of files a VO is authorized to submit at the same time. In any case the constraint imposed by the "files" channel property is always respected.

Srmcopy channel agents parameters

GUC_MAXBULKSIZE: the maximum size for a SrmCopy bulk request. Default is 100.

AGENT_CHECK_INTERVAL: the frequency for checking the status of active SrmCopy requests. Recommended value is 30.

VO agents parameters

PYTHONPATH: the paths where the python modules and strategies can be loaded. Unless you have a setup non-default setup, this value should be set to: ${GLITE_LOCATION}/lib/python2.2/site-packages:${GLITE_LOCATION}/lib/python/glite/fts/strategies

ACTIONS_RETRYMODULE: the name of the python module that provides the retry logic for the VO. Recommend value is smarter_retry.

ACTIONS_RETRYPARAMS: the parameter passed to the retry logic. The format of this string depends on the strategy module itself. For the smarter_retry module, this values looks like: "MaxFailures = 3 ; HoldEnabled = false ; OverwriteFailedFiles = true ; OverwriteExistingFiles = false ; DefaultRetryDelay = 300 ; RetryDelayForTimeoutOnGet = 1800 ; RetryDelayForDestFileExists = 300 ;" Hopefully, the parameters' names are self-explanatory. Please note that in case a VO requires to reduce the retry delay, you may need also to modify the parameter AGENT_RETRY_INTERVAL, that by deault is set to 60 seconds. For example, if a VO wants to have a Retry delay of 30 seconds, you may need to specify:

FTA_%VO%_AGENT_RETRY_INTERVAL=30
FTA_%VO%_ACTIONS_RETRYPARAMS="MaxFailures = 3 ; HoldEnabled = false ; OverwriteFailedFiles = true ; OverwriteExistingFiles = false ; DefaultRetryDelay = 30 ; RetryDelayForTimeoutOnGet = 1800 ; RetryDelayForDestFileExists = 300 ;"

ACTIONS_ENABLEUNKNOWNSOURCE / ACTIONS_ENABLEUNKNOWNDEST: Enable unknown source/destination site. If this value is set to true, during the allocation phase, if the source/destination SE is not listed on the Information System, it will be assigned to a fake site called "UNKNOWN"; otherwise, job allocation will fail. These options are useful to transfer files from/to a SE that is on a different Grid. Please note that to use these options, appropriate channels (i.e. accepting "UNKNOWN" as either source or destination) must be defined as well. Default value is false.

Recommended values

We recommend you to set, for channel agents:

  • FTA_TYPEDEFAULT_%TYPE%_FSM_ENABLEHOLD=false # since the "Hold" state is a VO policy, only VOAgents should move files to this state
  • FTA_TYPEDEFAULT_%TYPE%_AGENT_CANCEL_INTERVAL=60 # Check if there are ative transfer to cancel every minute
  • FTA_TYPEDEFAULT_%TYPE%_AGENT_DEFAULTINTERVAL=5 # Execute the ChannelAgent operations (fetch new transfers, check the status of the active ones) every 5 seconds instead of every 3 seconds, in order to reduce the load on the DBServer

where %TYPE% is URLCOPY and SRMCOPY (please set both)

Notes

Oracle InstantClient location

The site-info.def file should contain the ORACLE_LOCATION variable for Oracle InstantClient. For example, for the recommended 10.2.0.3 InstantClient:

ORACLE_LOCATION=/usr/lib/oracle/10.2.0.3

Obsolete parameters

The following parameters are now obsolete, and trying to set them through the YAIM template will result in an error. The table lists the parameter name, the corresponding field of the t_channel table in the database and the flag of the command line interface tool glite-transfer-channel-set that should be used to set the value.
Parameter name Database field glite-transfer-channel-set flag
GUC_TRANSFERTIMEOUT urlcopy_tx_to (urlcopy channels)
srmcopy_to (srmcopy channels)
--urlcopy-tx-to
--srmcopy-to
GUC_HTTPTIMEOUT http_to --http-timeout
GUC_STREAMS nostreams -T
GUC_TCPBUFFERSIZE tcp_buffer_size -u
GUC_TCPBLOCKSIZE blocksize --block-size
GUC_SRMPUTTIMEOUT urlcopy_put_to --urlcopy-put-to
GUC_SRMGETTIMEOUT urlcopy_get_to --urlcopy-get-to
GUC_SRMPUTDONETIMEOUT urlcopy_putdone_to --urlcopy-putdone-to
GUC_SRMGETDONETIMEOUT urlcopy_getdone_to --urlcopy-getdone-to
GUC_TRANSFERMARKERSTIMEOUT urlcopy_txmarks_to --urlcopy-tx-marks-to
GUC_SRMCOPYDIRECTION srmcopy_direction --srmcopy-direction
GUC_LOGLEVEL tx_loglevel --tx-log-level
GUC_SRMSTATUSTIMEOUT srmcopy_refresh_to --srmcopy-refresh-to

Migration script: update_channels.py

IMPORTANT NOTE: if you are upgrading from FTS 2.1, this script should not be necessary.

The update_channels.py script copies in the database the parameters defined in the channel agents configuration files. It is located under the same directory as the agents' configuration files, ${GLITE_LOCATION}/etc/glite-data-transfer-agents.d.

In FTS 2.1 it was not yet possible to set all channel parameters properly through the command line interface, so this script was run by YAIM whenever a channel was updated. With FTS 2.2 this is no longer necessary since the CLI allows setting all channel parameters, and YAIM will produce an error in case one of the obsolete parameters is set. Your database should be synchronized with the agents' configuration files if you are upgrading from FTS 2.1, and running this script should not be necessary, but it is still provided in case you want to run it manually before upgrading.

The script requires no parameters. You can run it simply with: python update_channels.py The script will:

  • locate and read the channel agents configuration files.
  • generate an sql update script for the t_channel table.
  • run the sql script.
The following output files are produced:
  • update_channels.log: the log file of the script (same contents as standard output).
  • update_channels.oracle.log: the output of the sqlplus command
  • update_channels.sql: the sql update script
In case an error occurs, please mail FtsSupport including the log files; the log files should contain no sensible information (database connection parameters), but you might want to check before sending them.

Explicitly setting an agent name

The mandatory Name parameter of the agents configuration is already set with a sensible default by the YAIM scripts. This parameter is used by the channel agents as the name of the managed channel and by the VO agents as the name of the managed VO.

In case you should set it explicitly, you should use the parameter FTA_%AGENT%_AGENT_NAME.

Searching for parameters

A prototype tool exists for searching for paramaters and determining the correct YAIM variable:

 cd /opt/glite/share/config/glite-data-transfer-agents/yaim

xsltproc -stringparam Param timeout find-param.xsl ../glite-transfer-channel-agent-urlcopy-oracle.config.xml

xsltproc -stringparam Prefix FTA_GLOBAL -stringparam Param timeout find-param.xsl ../glite-transfer-channel-agent-urlcopy-oracle.config.xml

where "timeout" is the string to search for. The variable names and descriptions are searched.

The Prefix option allows you to get the full yaim variable with the appropriate scope prefix.


Last edit: UnknownUser on 2010-05-11 - 17:23
Number of topics: 1

Maintainer: PaoloTedesco


Edit | Attach | Watch | Print version | History: r8 < r7 < r6 < r5 < r4 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r5 - 2009-05-08 - AkosFrohner
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    EGEE All webs login

This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright & by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Ask a support question or Send feedback