FTS Installation

This document describes the installation and upgrade procedures for FTS.

This document applies to FTS version 2.2.3

Before you start

Hardware requirements

You should install the FTS and FTA servers on reasonable Scientific Linux 4 or Scientific Linux 5 machines, ideally of mid-range server-class.

It is recommended to have a least 512M memory, and a modern processor.

Large amounts of disk space are not critical, so a disk-server class machine is not needed.

Read the Generic Installation and Configuration Guide

Read the generic installation guides:

Please note that YAIM is only used to configure the system. You have to install the service components either by YUM (via gLite metapackages) or by your local package management system.

For FTS-specific YAIM details, see the FtsConfiguration.

Software Prerequisites

  • There should be a valid host certificate and host key in /etc/grid-security/.
  • Your system should have Oracle InstantClient installed on it. This is available from Oracle. The recommended version is currently 10.2.0.3. You need at least oracle-instantclient-basic and oracle-instantclient-sqlplus for a successful installation.
  • For the host running the FTS web-service, the firewall should be open for incoming on port tcp/8443 for the web-service. The connection from the client to the web-service is secured with SSL/TLS.
  • For the host running the FTA agent daemons, the outgoing firewall (if present) should allow access to your MyProxy server, the database, all source SRM/gridFTP clusters and all destination SRM/gridFTP clusters.

Info SL4

Info SL5
  • SL5 comes with java-1.6.0-openjdk, which is satisfactory as a Java runtime, no need to install one by hand.
  • In SLC5 users edginfo should exists.
          useradd -m -p *NP* -c 'EDG info user' -u 153 -g edginfo -G infosys -d /home/edginfo edginfo
         

Choose deployment scenario

Examine the possible deployment scenarios from the deployment model section and choose the one that best fits your needs.

The miniumum suggested configuration is:

  • two machine for the FTS web-service (for high availability)
  • one machine for all the FTA agent daemons (both the VO agents and the channel agents)

Both the web-service instances (if there is more than one) and the agent dameons can be (and should be) spread across machines for scalability.

Grid dependencies

FTS requires:

  • a MyProxy server
    • For legacy mode, the MyProxy server must be configured to allow the FTS host certificate to be an "authorized_retriever" (it should allow access from all machines running FTA agents).
    • For delegation renewal mode, FTS can use the same MyProxy as the WMS. Due to MyProxy restrictions, the server cannot be the same used by the resource broker to renew user credentials.
  • the production storage endpoints to be running SRM (currently SRM v2.2).
    • SRM services should publish themselves into the EGEE BDII information system. SRM v1.1 is still supported.

Prepare the site-info.def file

The site-wide YAIM configuration file, site-info.def, should be prepared in advance. It must contain the configuration for the web-service and agents machines.

Two template files are available:

  • /opt/glite/yaim/examples/siteinfo/site-info.def for site configuration.
  • /opt/glite/yaim/examples/fta-info.def for FTS/FTA.

The FTS/FTA example is provided separately as it contains many parameters. If you want to use the example, append it to the end of the example site-info.def file first. The structure of the configuration file and detailed configuration instructions are available in the #YAIM_variables_reference section.

Identify the agents you need

This will depend on what VO transfers your site is required to serve, and this depends on whether you are the tier-0 or a tier-1. Please read this document which describes the suggested channel deployment model.

Once you have identified the channels you need and the VOs that you want to serve, you will need to define:

  • An agent daemon for every VO you want to serve
  • An agent daemon for every channel you want to serve
If you are unsure of what agents you need and how to define them, there is an example at FtsServerDeployExampleTier1.

New installation

Package Installation

The package installation is managed by YUM.

For gLite3.1/SL4 you need the the DAG and JPackage repositories besides the node specific gLite repository.

For gLite3.2/SL5 only DAG is required, not jpackage.

For example, using YUM for gLite 3.1:

cd /etc/yum.repos.d
wget http://grid-deployment.web.cern.ch/grid-deployment/glite/repos/3.1/lcg-CA.repo
wget http://grid-deployment.web.cern.ch/grid-deployment/glite/repos/3.1/jpackage.repo
wget http://grid-deployment.web.cern.ch/grid-deployment/glite/repos/3.1/dag.repo

There are two distinct server types to install, FTS (the web-service daemons) and FTA (the agent daemons). From the deployment model you have chosen, identify which machines will be running FTS web-service and which machines will be running FTA agents.

Installing FTS web-service

You need to add the FTS gLite repository and install the glite-FTS_oracle metapackage.

cd /etc/yum.repos.d
wget http://grid-deployment.web.cern.ch/grid-deployment/glite/repos/glite-FTS.repo
yum install --enablerepo jpackage5-generic bouncycastle
yum install glite-FTS_oracle

Installing FTA agents

You need to add the FTA gLite repository and install the glite-FTA_oracle metapackage.

cd /etc/yum.repos.d
wget http://grid-deployment.web.cern.ch/grid-deployment/glite/repos/glite-FTA.repo
yum install --enablerepo dag log4cpp
yum install glite-FTS_oracle

Configuring and Starting services

On each node, run the YAIM configuration script, dependent on the node type.

Configuring FTS web-service

For the web-service nodes, run (note the name FTS2 rather than FTS):
/opt/glite/yaim/bin/yaim -c -s my-sitecfg.h -n  FTS2 
If you have a fresh database, it will stop with a note similar to:
Database schema does not appear to be loaded. Please load it using the command:
sqlplus fts_xxxx/xxxxxxxxx@(DESCRIPTION=(LOAD_BALANCE=no)(ADDRESS=(PROTOCOL=TCP)(HOST=grid8.cern.ch)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=fts-pilot.cern.ch))) @/opt/glite/etc/glite-data-transfer-fts/schema/oracle/oracle-schema.sql
Run the suggested command to load the schema and then rerun the YAIM configuration.

Depending on your connection string, you may have to put quotes around it, as:

sqlplus fts_xxxx/xxxxxxxxx@"(DESCRIPTION=(LOAD_BALANCE=no)(ADDRESS=(PROTOCOL=TCP)(HOST=grid8.cern.ch)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=fts-pilot.cern.ch)))" @/opt/glite/etc/glite-data-transfer-fts/schema/oracle/oracle-schema.sql
Normally oracle-instantclient adds itself to the dynamic library configuration via the /etc/ld.so.conf.d/oracle-instantclient.conf file. If it did not happen, then you may also have to add the library path first:
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/lib/oracle/10.2.0.3/client/lib 
You have to use lib64 instead of lib on 64 bit platforms.

Starting the FTS web-service

To start the FTS web-service (if YAIM didn't already):
service tomcat5 start 
A single daemon will start under the tomcat:tomcat user.

Configuring the Initial Channels

You should start the web-service first and create the channels in the DB using the command glite-transfer-channel-add before configuring the FTA agent daemons. The FTA agent daemons will fail to start if the corresponding channel has not yet been created.

For every channel agent you have defined, create the channel using the command:

glite-transfer-channel-add SOURCE-DEST SITE1 SITE2 
To choose the agents' names, see the agents naming convention section in the configuration reference.

Channel parameters can be configured with the glite-transfer-channel-set command line tool.

Configuring FTA agents

For the agent nodes, run (note that the name is FTA2 rather than FTA):

/opt/glite/yaim/bin/yaim -c -s site-info.def -n FTA2 

Starting FTA agents

To start the FTA agent and information system caching daemons:

service transfer-agents start
service glite-sd2cache start

Every agent daemon that you have chosen to configure on the box will be started in sequence.

The second command will start a daemon that populates the services.xml information system cache file as it is described in LCG.FtsServicesXmlGliteSd2Cache. You may want to populate it cache initially by running

sh /etc/cron.daily/glite-sd2cache-cron 

To start a single agent daemon:

service transfer-agents start --instance glite-transfer-channel-agent-srmcopy-CERN-FNAL 

You can get a list of the names of the agents defined on the machine with the command

service transfer-agents status

Installing the history and purge packages

Starting with FTS 2.2 the history package is installed as part of the core schema, so you do not need to install this in a separate step. In addition to the history package FTS 2.2 comes with a purge package as well to delete very old (older than 90 days) entries from the history tables.

To activate the history package you need to start a DBMS job:

SQL> exec fts_history.start_job;
SQL> exec fts_purge_history.start_job;

Setup your site BDII to publish the FTS information

The FTS web-service node is installed with a BDII GRIS where it publishes the FTS endpoints and the channels you have defined. Publish this contents of this resource BDII via your site BDII. This will then be picked up by the top-level BDII servers.

Register the FTS web-service node in the GOC database

Go to the GOCDB and register the FTS web-service node (or its DNS alias) in the Grid Operations Center database. Register with node type FTS. This will automatically add your node to the LCG SAME monitoring.

Upgrade from a previous version

Important note

Warning, important With FTS 2.1, various channel settings previously handled via YAIM and agents configuration files have been moved to the database. Since the channel management command line interface had not been updated accordingly, a migration script, update_channels.py, had been provided and was called by YAIM each time a channel definition was updated. This script copied to the database these new parameters from the generated configuration files.

Starting from FTS 2.2 those parameters are obsolete, and trying to set them through your YAIM configuration scripts will result in an error.

Upgrade recommendation

It is recommended that you update the nodes directly (i.e. without a re-install) using the gLite apt repository. There is a script to upgrade the DB schema so the current job information and queues will be maintained - the config_FTS2_ws script will guide you to run the schema upgrade(s).

Upgrade paths

From 2.2.X -> 2.2.3

The certification process always checks upgrades from the last version available to production. Thus this upgrade has been certified

From 2.1 -> 2.2.3

This path has not been explicitly checked in certification, but we know of no reason why it will not work. FTS support is ready to help any sites who wish to do this upgrade.

From 2.0

There is no direct upgrade path from 2.0 to 2.2.3 (gLite3.0/SL3 is no longer supported).

Upgrade procedure

Announce the downtime

Announce the upgrade and proposed downtime at the WLCG Operations Meeting and using the EGEE broadcast tool, as per WLCG procedure.

Assume at least half a day for the upgrade.

Drain the service

Set all transfer channels inactive - this will drain the network of active jobs and ensure their status is committed fully to the database.

for i in $(glite-transfer-channel-list); do glite-transfer-channel-set -S Inactive $i; done

Newly submitted jobs will sit in the queue in Pending state and should be processed when the service is restarted.

Active files will be completed .

Wait until there are no more active transfers.

Switch off your monitoring and alarms

Otherwise set the nodes to Maintenance. This procedure is site specific. In particular make sure that no daemons are running during the schema upgrade.

Stop any of the PL/SQL jobs

For example, the 'history' job. Log into the owner account using sqlplus:

  • exec fts_stats.stop_hourly_job
  • exec fts_history.stop_job
  • exec fts_statecount.stop_job
  • Verify that select * from user_jobs; returns no rows.

this will stop all known FTS DBMS jobs - it's likely that not all of these packages are installed on your site, so don't worry if some of these fail. The critical thing is that all the FTS PL/SQL jobs are stopped.

Stop the servers and close DB sessions

Stop the agents and tomcat servers on all machines.
  service tomcat5 stop
  service transfer-agents stop
  service glite-sd2cache stop

Stop all 'monitoring' scripts you may have running that access directly the DB.

Make sure they are stopped:

ps aux | grep j2 | grep -v grep 
ps aux | grep glite | grep -v grep

should produce nothing.

Remove any other open sessions you may have to the DB account (both the owner and the writer account, if you are using writer accounts).

Please ask your DBA to check that there are no sessions attached to the FTS schema at this point (and investigate / kill them if there are).

Recommended: ask the DBA to back up your schema

This will allow you to roll back if something goes wrong with the service upgrade.

You don't have to do this, but doing it makes the upgrade a little safer.

This copy may take some time (like hours).

Possibles: export the schema to a file, export to another account.

Recommended: Archive the logfiles

You may wish to archive some or all the logfiles and empty the logging directories afterwards.

Upgrade the software

Upgrade the RPMs from the gLite distribution, see EGEE.DMFtsPatchStatus for the details!

You migth need to remove manually the obsolete rpms:

  rpm -e --nodeps glite-data-srm-cli glite-data-transfer-api-c 

Check for block fragmentation with your DBA

It has been noted on CERN-PROD that the history PL/SQL script can lead to bad block fragmentation on the Oracle database. This is a support request with Oracle to help understand this.

For normal running, this is not so bad (except you use too much space), since FTS accesses these blocks via an index - but it can slow down the schema upgrade a lot, since the addition of new indicies to a table require a full table scan (i..e it needs to read every block in the table).

Your DBA should know how to check for fragmented blocks.

It is recommend that you take the opportunity of downtime to de-fragment any fragmented tables. It may take a while (couple of hours).

The note is FtsRelease20TableFragmentation.

Upgrade the schema and install the history packages

The web-service yaim configuration template does not need to be changed, so you can run the yaim configuration script for the FTS node which will tell you the SQL commands that must be run to upgrade the schema and install the new history and purge-history packages.

/opt/glite/yaim/scripts/configure_node site-info.def glite-FTS2

For sites using a reader/writer DB accounts setup: regardless of what yaim tells you to run, make sure you run the schema upgrade using the schema owner account.

You should expect this to take several minutes per update. If you didn't defragment the schema it may take several hours.

For upgrades to 2.2.3:

# upgrade schema from 3.3.0 to 3.4.0
sqlplus "theuser/thepassword@(theconnectionstring)" @/opt/glite/etc/glite-data-transfer-fts/schema/oracle/oracle-upgrade_3.3.0-3.4.0.sql
# upgrade schema from 3.4.0 to 3.4.1
sqlplus "theuser/thepassword@(theconnectionstring)" @/opt/glite/etc/glite-data-transfer-fts/schema/oracle/oracle-upgrade_3.4.0-3.4.1.sql
# install the history package
sqlplus "theuser/thepassword@(theconnectionstring)" @/opt/glite/etc/glite-data-transfer-fts/schema/oracle/fts_history_pack.sql
sqlplus "theuser/thepassword@(theconnectionstring)" @/opt/glite/etc/glite-data-transfer-fts/schema/oracle/fts_history_body_pack.sql
# install the purge-history package
sqlplus "theuser/thepassword@(theconnectionstring)" @/opt/glite/etc/glite-data-transfer-fts/schema/oracle/fts_purge_pack.sql
sqlplus "theuser/thepassword@(theconnectionstring)" @/opt/glite/etc/glite-data-transfer-fts/schema/oracle/fts_purge_body_pack.sql

Further information on schema changes and FTS versioning can be found at DBSchemaChanges.

Optional: reader/writer account setup

If your Oracle DB setup uses reader/writer accounts (ask your DBA), apply your local procedure to make the necessary objects grants and synonyms in the various accounts. These have to be remade since extra schema objects have been added.

If you don't know what this means, it's probably OK to skip this step. smile

Ask you DBA to check that all the schema objects are valid

This is good practice. They should be recompiled if not.

Restart the history and purge jobs

SQL> exec fts_history.start_job;
SQL> exec fts_purge_history.start_job;

FTS web-service configuration

There are no new parameters to add for FTS 2.2.

You should have already run this yaim component to upgrade the schema. It should have started the tomcat5 daemon already.

Update the services.xml file

The /opt/glite/etc/services.xml file shall contain the local services:

  • org.glite.FileTransfer
  • org.glite.ChannelManagement
  • org.glite.Delegation

If these service types are not present, pointing to the locally configured web service, then please remove this file and re-run the Yaim configuration script, which should create the appropriate entries.

IDEA! The services.xml file is also updated by the glite-sd2cache component by adding information about the known storage elements in BDII.

FTA agent configuration

There are no new required parameters to add for FTS 2.2.

As noted in FtsChangesFrom15To20, you should change agent type VOAGENT_PYTHON to VOAGENT.

The yaim target is FTA2.

To start the agents after reconfig, run:

service transfer-agents start

Check DB connections

Ask your DBA to check that the connections are back on the database correctly, with correct service names, load-balanced properly, etc.

Test a few of your favourite commands

As you want.

Run a few test jobs

The client tools are the same as before except for the changes and additions described in FtsChangesFrom15To20.

It's worth testing that delegation works (glite-transfer-submit -v without the -p option will use credential delegation by default on an FTS 2.2 server).

Re-open all the channels

As you like, e.g.:

for i in `glite-transfer-channel-list`; do glite-transfer-channel-set $i Active -m "Service upgraded to FTS 2.0"; done

Re-enable your service monitoring

... or otherwise bring the nodes out of maintenance. Site specific.

Announce that the service is back

Please use the EGEE broadcast tool to announce that the service is back.

Client side

Installation

Please install the client as part of the gLite 3.1 or 3.2 UI. See Generic install guide 3.1 or Generic install guide 3.2.

Configuration

The client command line tools must be told the endpoint of the service portal that they are to talk to.

This can be done directly on the command line with the -s option:

   glite-transfer-channel-list -s https://HOST.cern.ch:8443/TEST/glite-data-transfer-fts/services/ChannelManagement

or using one of the two options below.

Static services.xml file

If your FTS server is publishing into the EGEE.BDII, you are recommended to use the approach in 'EGEE.BDII configuration' below; skip this section.

If your FTS server is not publishing in the EGEE.BDII, then you can copy the /opt/glite/etc/services.xml file from the web service nodes to the clients for the end point locations.

EGEE.BDII configuration

If your FTS is publishing in the EGEE.BDII, the following should be set on the client:

 export GLITE_SD_PLUGIN=bdii
 export GLITE_SD_SITE=CERN-SC
 export LCG_GFAL_INFOSYS=lcg-bdii.cern.ch:2170

where lcg-bdii.cern.ch:2170 is the appropriate EGEE.BDII (the one in the example is the LCG top-level EGEE.BDII), and CERN-SC is the GOC-DB name for the site your FTS server is running at (the example is for the CERN production EGEE.BDII at CERN-SC). You do not need a client services.xml if you are using EGEE.BDII.

Path and library path

The environment variable LD_LIBRARY_PATH must include /opt/glite/lib or /opt/glite/lib64 respectively.

The environment variable PATH should include /opt/glite/bin.

We provide a script fts.sh as an example (the tcsh profile script is left as an exercise for the sysadmin). Download it to /etc/profile.d/fts.sh

export GLITE_LOCATION=${GLITE_LOCATION:-/opt/glite}
if [ -z "$PATH" ]; then
    export PATH=$GLITE_LOCATION/bin
else
    export PATH=$PATH:$GLITE_LOCATION/bin
fi
if [ -z "$LD_LIBRARY_PATH" ]; then
    export LD_LIBRARY_PATH=$GLITE_LOCATION/lib:$GLITE_LOCATION/lib64
else
    export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$GLITE_LOCATION/lib:$GLITE_LOCATION/lib64
fi

If you have installed the client packages as part of the glite-UI or glite-WN node types, then appropriate environment scripts shall be available in /opt/glite/etc/profile.d:

ln -s /opt/glite/etc/profile.d/* /etc/profile.d/

Testing

A basic test can be made for the client command lines.

From an unpriviliged account, obtain a standard grid proxy or a VOMS proxy. Ensure that this grid proxy has been entered in the manager mapfile of the FTS service you are testing against, or you will receive an authorisation error from the service.

   grid-proxy-init

Then try a simple management command:

   glite-transfer-channel-list -v

which should, after contacting the service, print the service, interface and schema version. If the service is new, it will return no channel names since none have yet been configured. Try next a job list:

   glite-transfer-list -v Pending

which should print the same set of version numbers, and again, if the service is new, will return no Pending jobs, since none have yet been submitted.

If both of the tests work, then the client command line is correctly configured.

Problems you may see

If you see:

list: listChannels: SOAP fault: SOAP-ENV:Client - CGSI-gSOAP: Could not open connection ! (TCP connect failed in tcp_connect())

then either the service is down, or your services.xml file is pointing to the wrong endpoint. You can see what endpoint the client is attempting to connect to by using the verbose flag:

glite-transfer-channel-list -v

If you see:

list: Service discovery: No services of type org.glite.ChannelManagement were found

then either the services.xml file is missing or is not readable by the client.

If you see:

list: listChannels: SOAP fault: "http://xml.apache.org/axis/":Server.NoService - The AXIS engine could not find a target service to invoke!  targetService is ChannelManagement

then either the endpoint specified in the services.xml file is incorrect, or the service is misconfigured.

To check whether the endpoint specified in the services.xml file is correct, connect to it directly with a web browser. e.g. connect to https://yourhostname:8443/site-fts/glite-data-transfer-fts/services/FileTransfer (you will need your grid certificate loaded in your browser). If the service is listening, you should see a web page with a message like:

Hi there, this is an AXIS service!

If you see:

list: listChannels: You are not authorised for channel management upon this service

then you are not a channel manager. Look in the FTS server logs (org.glite.data) to see how the authorisation decision was made. See the FtsServerAdmin14 guide for using the glite-transfer-channel-addmanager command.


Last edit: OliverKeeble on 2010-09-23 - 13:40
Number of topics: 1

Maintainer: AkosFrohner


Edit | Attach | Watch | Print version | History: r13 < r12 < r11 < r10 < r9 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r13 - 2010-09-23 - OliverKeeble
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    EGEE All webs login

This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright &© by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Ask a support question or Send feedback