FTS Server Upgrade 1.5 to 2.0 for LCG service
This assumes you are running the latest 1.5 FTS version
FtsRelease15 and describes how to upgrade the service to FTS 2.0,
FtsRelease20.
It is not recommended to upgrade from any other release, nor to apply cumulative upgrades from the earlier releases (instead make a clean install
FtsServerInstall20).
Read through all the steps first.
Changes from release 1.5 are described in
FtsChangesFrom15To20.
Note!
FtsRelease20Patch1232 contains specific
manual instructions if you are not yet using YAIM 4.0.
Requirements
Are the same as for your current 1.5 version, see
FtsServerInstall15 and
FtsServerInstall20; the deployment model and recommendations are the same.
Make sure you know the Grid dependencies
The Grid dependencies are the same as in 1.5, see
FtsServerInstall15 and
FtsServerInstall20. You should ensure that all the SRM you plan to contact are published correctly in WLCG BDII.
Upgrade recommendation
It is recommended that you update the nodes directly (i.e. without a re-install) using the gLite apt repository. There is a script to upgrade the DB schema so the current job information and queues will be maintained - the
config_FTS2_ws
script will guide you to run the schema upgrade(s).
When can I not upgrade?
There are no know conditions under which you can't upgrade from
FtsRelease15.
Upgrade procedure
This is based upon experience from the CERN-PROD upgrade to FTS 2.0:
FtsTier0ServerInterventions.
Announce the downtime
Announce the upgrade and proposed downtime at the WLCG Operations Meeting and using the EGEE broadcast tool, as per WLCG procedure.
Assume at least half a day for the upgrade.
Drain the service
Set all transfer channels inactive - this will drain the network of active jobs and ensure their status is committed fully to the database.
for i in `glite-transfer-channel-list`; do glite-transfer-channel-set $i Inactive; done
Newly submitted jobs will sit in the queue in
Pending
state and should be processed when the service is restarted.
Go for a coffee
If all channels are working properly, it should take 5 to 15 minutes to drain the channels. If there are problems with some of the SRM, you may have to wait up to 30 minutes for bad transfers to time out.
Switch off your monitoring and alarms
...or otherwise set the nodes to Maintenance. Site specific. In particular make sure that no daemons are running during the schema upgrade.
Stop any of the magic PL/SQL jobs
For example, the (in)famous 'history' job. Log into the owner account using
sqlplus
:
-
exec fts_stats.stop_hourly_job
-
exec fts_history.stop_job
-
exec fts_statecount.stop_job
- Verify that
select * from user_jobs;
returns no rows.
this will stop all known FTS DBMS jobs - it's likely that not all of these packages are installed on your site, so don't worry if some of these fail. The critical thing is that all the FTS PL/SQL jobs are stopped.
Stop the servers and close DB sessions
Stop the agents and tomcat servers on all machines.
service tomcat5 stop
service transfer-agents stop
Stop all 'monitoring' scripts you may have running that access directly the DB.
Make sure they are stopped:
ps aux | grep j2 | grep -v grep
ps aux | grep glite | grep -v grep
should produce nothing.
Remove any other open sessions you may have to the DB account (both the owner and the writer account, if you are using writer accounts).
Please ask your DBA to check that there are no sessions attached to the FTS schema at this point (and investigate / kill them if there are).
Recommended: ask the DBA to back up your schema
This will allow you to roll back if something goes wrong with the service upgrade.
You don't have to do this, but doing it makes the upgrade a little safer.
This copy may take some time (like hours).
Possibles: export the schema to a file, export to another account.
[grid-service-databases, Gordon] Recommendation is for the DBA to make a full database backup at this point.
Recommended: Archive the logfiles
See the
FtsServerAdmin20 for details of logfiles and archive those that you wish to archive.
You may wish the empty those logging directories afterwards.
Upgrade the software
Upgrade the RPMs from the gLite distribution. The patch is patch 1232. gLite release 3.0 update [
to be defined ]
Read release notes for patch 1232
The release notes are here:
FtsRelease20Patch1232
In particular, unless you are reading this in August, and already have the correct version of yaim 4.0, you should override the default
config_FTS2_ws
and
config_FTA2_agents
in
/opt/glite/yaim/functions/local/
with the version provided in the release notes.
Read the 'changes' note
This changes note is here:
FtsChangesFrom15To20
In particular, note the deployment changes:
- In
site-info.def
, the agent type VOAGENT_PYTHON
is now VOAGENT
.
- The yaim configuration targets are
FTS2
and FTA2
. Do not use FTS
and FTA
since these are for the 1.5 release.
Check for block fragmentation with your DBA
It has been noted on CERN-PROD that the history PL/SQL script can lead to bad block fragmentation on the Oracle database. This is a support request with Oracle to help understand this.
For normal running, this is not so bad (except you use too much space), since FTS accesses these blocks via an index - but it can slow down the schema upgrade a lot, since the addition of new indicies to a table require a full table scan (i..e it needs to read every block in the table).
Your DBA should know how to check for fragmented blocks.
It is recommend that you take the opportunity of downtime to de-fragment any fragmented tables. It may take a while (couple of hours).
The note is
FtsRelease20TableFragmentation.
Upgrade the schema
The best way is to run the yaim configuration tool for the web-service. There are no new parameters to add for FTS 2.0.
/opt/glite/yaim/scripts/configure_node site-info.def glite-FTS2
Yaim will prompt you to upgrade the schema (it will tell you what to run). For sites using a reader/writer DB accounts setup: Regardless of what yaim tells you to run, make sure you run the schema upgrade using the schema owner account.
- If you are upgrading from FTS 1.5, you are using schema version 2.2.1. It will upgrade to 3.0.0. Run yaim again to upgrade to the final schema version 3.1.0.
- If you already installed FTS 2.0 from an earlier patch, you are using schema version 3.0.0. Run yaim to upgrade to the final 3.1.0.
- There is a new schema (the delegation schema) that you need to load. Yaim also check for this and will prompt you to install it.
You should expect this to take several minutes per update. (If you didn't defragment the schema it may take several hours).
Upgrade (or install) the history schema
If you have been running the history job, you should upgrade the package to the latest version and upgrade the schema.
Note that a better versioned and released version of this tool is coming soon.
In the mean time...
- Take version 0.1.8-1 from FtsAdminTools20 ("FTS History package")
- Upgrade the history package itself:
-
sqlplus user/pass@"connectstring" < /opt/glite/share/glite-data-transfer-scripts/plsql/fts_history_pack.sql
-
sqlplus user/pass@"connectstring" < /opt/glite/share/glite-data-transfer-scripts/plsql/fts_history_pack_body.sql
- Upgrade the history schema to the 3.0 series:
-
sqlplus user/pass@"connectstring" < /opt/glite/share/glite-data-transfer-scripts/plsql/fts_history_tables-upgrade_2.2.1-3.0.0.sql
You should expect the 3rd step to take a while, depending on how many entries you have (it took 20 minutes on CERN-PROD).
If you have modified the history schema by hand, the upgrade may fail: please ask you DBA. The target schema (i.e. what it should look like is in
/opt/glite/share/glite-data-transfer-scripts/plsql/create_fts_history_tables.sql
.
If you haven't been running the history job package previously, you should install it now: see
FtsAdminTools20 for this.
Optional: reader/writer account setup
If your Oracle DB setup uses reader/writer accounts (ask your DBA), apply your local procedure to make the necessary objects grants and synonyms in the various accounts. These have to be remade since extra schema objects have been added.
if you don't know what this means, it's probably OK to skip this step.
Ask you DBA to check that all the schema objects are valid
This is good practice. They should be recompiled if not.
Restart the history job
Using the procedure described in
FtsAdminTools20 "Start the DBMS job":
SQL> exec fts_history.submit_job;
SQL> exec dbms_job.run(xxx);
FTS web-service configuration
There are no new parameters to add for FTS 2.0.
You should have already run this yaim component to upgrade the schema. It should have started the tomcat5 daemon already.
Edit the services.xml file
Either do this by hand, or by running the famous
make-services.sh
script.
When upgrading, yaim does not replace the existing
services.xml
file. You should modify this, adding the block for the delegation port-type that you will find in:
/tmp/glite-fts-add-delegation.services.xml
For reference, there should be 3
entries for your web-service, looking something like this:
<service name='EGEEfts'>
<parameters>
<endpoint>https://fts107.cern.ch:8443/glite-data-transfer-fts/services/FileTransfer</endpoint>
<type>org.glite.FileTransfer</type>
<version>3.3.0</version>
</parameters>
</service>
<service name='EGEEchannel'>
<parameters>
<endpoint>https://fts107.cern.ch:8443/glite-data-transfer-fts/services/ChannelManagement</endpoint>
<type>org.glite.ChannelManagement</type>
<version>3.3.0</version>
</parameters>
</service>
<service name='EGEEdelegation'>
<parameters>
<endpoint>https://fts107.cern.ch:8443/glite-data-transfer-fts/services/gridsite-delegation</endpoint>
<type>org.glite.Delegation</type>
<version>3.3.0</version>
</parameters>
</service>
This will allow you to test the service from the local node (i.e. it will allow the FTS client on the local node to find the locally running tomcat server).
Please update the file on all nodes using the latest
make-services.sh
script (for 2.0).
The procedure is described at
FtsServerServicesXml20.
It is recommended that you make an initial one (i.e. without merging from the old one from 1.5):
./make-services.sh --ftshost prod-fts-ws.cern.ch --serxml /root/services.xml --addvo ops --verbose
and after the end of the upgrade (wait a while, like an hour or so), remake it to pick up any residual SRMs that may have been missing from BDII when you ran the script the first time:
./make-services.sh --ftshost prod-fts-ws.cern.ch --serxml /root/services.xml --oldxml /root/services.xml.old --addvo ops --verbose
FTA agent configuration
There are no new required parameters to add for FTS 2.0.
As noted in
FtsChangesFrom15To20, you should change agent type
VOAGENT_PYTHON
to
VOAGENT
.
The yaim target is
FTA2
.
To start the agents after reconfig, run:
service transfer-agents start
Check DB connections
Ask your DBA to check that the connections are back on the database correctly, with correct service names, load-balanced properly, etc.
Test a few of your favourite commands
As you want.
Run a few test jobs
The client tools are the same as before except for the changes and additions described in
FtsChangesFrom15To20.
It's worth testing that delegation works (
glite-transfer-submit -v
without the
-p
option will use credential delegation by default on an FTS 2.0 server).
Re-open all the channels
As you like, e.g.:
for i in `glite-transfer-channel-list`; do glite-transfer-channel-set $i Active -m "Service upgraded to FTS 2.0"; done
Re-enable your service monitoring
... or otherwise bring the nodes out of maintenance. Site specific.
Announce that the service is back
Please use the EGEE broadcast tool to announce that the service is back.
Maintainer:
GavinMcCance