Show Children Hide Children FtsRelease20Patch1216 FtsRelease20Patch1232 FtsRelease20TableFragmentation
Main FTS Pages
FtsRelease22
Install
Configuration
Administration
Procedures
Operations
Development
Previous FTSes
FtsRelease21
FtsRelease21
All FTS Pages
FtsWikiPages
Last Page Update
SteveTraylen
2007-10-10

FTS Server Upgrade 1.5 to 2.0 for LCG service

This assumes you are running the latest 1.5 FTS version FtsRelease15 and describes how to upgrade the service to FTS 2.0, FtsRelease20.

It is not recommended to upgrade from any other release, nor to apply cumulative upgrades from the earlier releases (instead make a clean install FtsServerInstall20).

Read through all the steps first.

Changes from release 1.5 are described in FtsChangesFrom15To20.

Note!

FtsRelease20Patch1232 contains specific manual instructions if you are not yet using YAIM 4.0.

Requirements

Are the same as for your current 1.5 version, see FtsServerInstall15 and FtsServerInstall20; the deployment model and recommendations are the same.

Make sure you know the Grid dependencies

The Grid dependencies are the same as in 1.5, see FtsServerInstall15 and FtsServerInstall20. You should ensure that all the SRM you plan to contact are published correctly in WLCG BDII.

Upgrade recommendation

It is recommended that you update the nodes directly (i.e. without a re-install) using the gLite apt repository. There is a script to upgrade the DB schema so the current job information and queues will be maintained - the config_FTS2_ws script will guide you to run the schema upgrade(s).

When can I not upgrade?

There are no know conditions under which you can't upgrade from FtsRelease15.

Upgrade procedure

This is based upon experience from the CERN-PROD upgrade to FTS 2.0: FtsTier0ServerInterventions.

Announce the downtime

Announce the upgrade and proposed downtime at the WLCG Operations Meeting and using the EGEE broadcast tool, as per WLCG procedure.

Assume at least half a day for the upgrade.

Drain the service

Set all transfer channels inactive - this will drain the network of active jobs and ensure their status is committed fully to the database.

for i in `glite-transfer-channel-list`; do glite-transfer-channel-set $i Inactive; done

Newly submitted jobs will sit in the queue in Pending state and should be processed when the service is restarted.

Go for a coffee

If all channels are working properly, it should take 5 to 15 minutes to drain the channels. If there are problems with some of the SRM, you may have to wait up to 30 minutes for bad transfers to time out.

Switch off your monitoring and alarms

...or otherwise set the nodes to Maintenance. Site specific. In particular make sure that no daemons are running during the schema upgrade.

Stop any of the magic PL/SQL jobs

For example, the (in)famous 'history' job. Log into the owner account using sqlplus:

  • exec fts_stats.stop_hourly_job
  • exec fts_history.stop_job
  • exec fts_statecount.stop_job
  • Verify that select * from user_jobs; returns no rows.

this will stop all known FTS DBMS jobs - it's likely that not all of these packages are installed on your site, so don't worry if some of these fail. The critical thing is that all the FTS PL/SQL jobs are stopped.

Stop the servers and close DB sessions

Stop the agents and tomcat servers on all machines.
  service tomcat5 stop
  service transfer-agents stop

Stop all 'monitoring' scripts you may have running that access directly the DB.

Make sure they are stopped:

ps aux | grep j2 | grep -v grep 
ps aux | grep glite | grep -v grep

should produce nothing.

Remove any other open sessions you may have to the DB account (both the owner and the writer account, if you are using writer accounts).

Please ask your DBA to check that there are no sessions attached to the FTS schema at this point (and investigate / kill them if there are).

Recommended: ask the DBA to back up your schema

This will allow you to roll back if something goes wrong with the service upgrade.

You don't have to do this, but doing it makes the upgrade a little safer.

This copy may take some time (like hours).

Possibles: export the schema to a file, export to another account.

[grid-service-databases, Gordon] Recommendation is for the DBA to make a full database backup at this point.

Recommended: Archive the logfiles

See the FtsServerAdmin20 for details of logfiles and archive those that you wish to archive.

You may wish the empty those logging directories afterwards.

Upgrade the software

Upgrade the RPMs from the gLite distribution. The patch is patch 1232. gLite release 3.0 update [ to be defined ]

Read release notes for patch 1232

The release notes are here: FtsRelease20Patch1232

In particular, unless you are reading this in August, and already have the correct version of yaim 4.0, you should override the default config_FTS2_ws and config_FTA2_agents in /opt/glite/yaim/functions/local/ with the version provided in the release notes.

Read the 'changes' note

This changes note is here: FtsChangesFrom15To20

In particular, note the deployment changes:

  • In site-info.def, the agent type VOAGENT_PYTHON is now VOAGENT.
  • The yaim configuration targets are FTS2 and FTA2. Do not use FTS and FTA since these are for the 1.5 release.

Check for block fragmentation with your DBA

It has been noted on CERN-PROD that the history PL/SQL script can lead to bad block fragmentation on the Oracle database. This is a support request with Oracle to help understand this.

For normal running, this is not so bad (except you use too much space), since FTS accesses these blocks via an index - but it can slow down the schema upgrade a lot, since the addition of new indicies to a table require a full table scan (i..e it needs to read every block in the table).

Your DBA should know how to check for fragmented blocks.

It is recommend that you take the opportunity of downtime to de-fragment any fragmented tables. It may take a while (couple of hours).

The note is FtsRelease20TableFragmentation.

Upgrade the schema

The best way is to run the yaim configuration tool for the web-service. There are no new parameters to add for FTS 2.0. /opt/glite/yaim/scripts/configure_node site-info.def glite-FTS2

Yaim will prompt you to upgrade the schema (it will tell you what to run). For sites using a reader/writer DB accounts setup: Regardless of what yaim tells you to run, make sure you run the schema upgrade using the schema owner account.

  • If you are upgrading from FTS 1.5, you are using schema version 2.2.1. It will upgrade to 3.0.0. Run yaim again to upgrade to the final schema version 3.1.0.
  • If you already installed FTS 2.0 from an earlier patch, you are using schema version 3.0.0. Run yaim to upgrade to the final 3.1.0.
  • There is a new schema (the delegation schema) that you need to load. Yaim also check for this and will prompt you to install it.

You should expect this to take several minutes per update. (If you didn't defragment the schema it may take several hours).

Upgrade (or install) the history schema

If you have been running the history job, you should upgrade the package to the latest version and upgrade the schema.

Note that a better versioned and released version of this tool is coming soon.

In the mean time...

  1. Take version 0.1.8-1 from FtsAdminTools20 ("FTS History package")
  2. Upgrade the history package itself:
    • sqlplus user/pass@"connectstring" < /opt/glite/share/glite-data-transfer-scripts/plsql/fts_history_pack.sql
    • sqlplus user/pass@"connectstring" < /opt/glite/share/glite-data-transfer-scripts/plsql/fts_history_pack_body.sql
  3. Upgrade the history schema to the 3.0 series:
    • sqlplus user/pass@"connectstring" < /opt/glite/share/glite-data-transfer-scripts/plsql/fts_history_tables-upgrade_2.2.1-3.0.0.sql

You should expect the 3rd step to take a while, depending on how many entries you have (it took 20 minutes on CERN-PROD).

If you have modified the history schema by hand, the upgrade may fail: please ask you DBA. The target schema (i.e. what it should look like is in /opt/glite/share/glite-data-transfer-scripts/plsql/create_fts_history_tables.sql.

If you haven't been running the history job package previously, you should install it now: see FtsAdminTools20 for this.

Optional: reader/writer account setup

If your Oracle DB setup uses reader/writer accounts (ask your DBA), apply your local procedure to make the necessary objects grants and synonyms in the various accounts. These have to be remade since extra schema objects have been added.

if you don't know what this means, it's probably OK to skip this step. smile

Ask you DBA to check that all the schema objects are valid

This is good practice. They should be recompiled if not.

Restart the history job

Using the procedure described in FtsAdminTools20 "Start the DBMS job":

SQL> exec fts_history.submit_job;

SQL> exec dbms_job.run(xxx);

FTS web-service configuration

There are no new parameters to add for FTS 2.0.

You should have already run this yaim component to upgrade the schema. It should have started the tomcat5 daemon already.

Edit the services.xml file

Either do this by hand, or by running the famous make-services.sh script.

When upgrading, yaim does not replace the existing services.xml file. You should modify this, adding the block for the delegation port-type that you will find in:

/tmp/glite-fts-add-delegation.services.xml

For reference, there should be 3 entries for your web-service, looking something like this:

  <service name='EGEEfts'>
    <parameters>
      <endpoint>https://fts107.cern.ch:8443/glite-data-transfer-fts/services/FileTransfer</endpoint>
      <type>org.glite.FileTransfer</type>
      <version>3.3.0</version>
    </parameters>
  </service>

  <service name='EGEEchannel'>
    <parameters>
      <endpoint>https://fts107.cern.ch:8443/glite-data-transfer-fts/services/ChannelManagement</endpoint>
      <type>org.glite.ChannelManagement</type>
      <version>3.3.0</version>
    </parameters>
  </service>

  <service name='EGEEdelegation'>
    <parameters>
      <endpoint>https://fts107.cern.ch:8443/glite-data-transfer-fts/services/gridsite-delegation</endpoint>
      <type>org.glite.Delegation</type>
      <version>3.3.0</version>
    </parameters>
  </service>

This will allow you to test the service from the local node (i.e. it will allow the FTS client on the local node to find the locally running tomcat server).

Please update the file on all nodes using the latest make-services.sh script (for 2.0).

The procedure is described at FtsServerServicesXml20.

It is recommended that you make an initial one (i.e. without merging from the old one from 1.5):

./make-services.sh --ftshost prod-fts-ws.cern.ch --serxml /root/services.xml --addvo ops --verbose

and after the end of the upgrade (wait a while, like an hour or so), remake it to pick up any residual SRMs that may have been missing from BDII when you ran the script the first time:

./make-services.sh --ftshost prod-fts-ws.cern.ch --serxml /root/services.xml --oldxml /root/services.xml.old --addvo ops --verbose

FTA agent configuration

There are no new required parameters to add for FTS 2.0.

As noted in FtsChangesFrom15To20, you should change agent type VOAGENT_PYTHON to VOAGENT.

The yaim target is FTA2.

To start the agents after reconfig, run:

service transfer-agents start

Check DB connections

Ask your DBA to check that the connections are back on the database correctly, with correct service names, load-balanced properly, etc.

Test a few of your favourite commands

As you want.

Run a few test jobs

The client tools are the same as before except for the changes and additions described in FtsChangesFrom15To20.

It's worth testing that delegation works (glite-transfer-submit -v without the -p option will use credential delegation by default on an FTS 2.0 server).

Re-open all the channels

As you like, e.g.:

for i in `glite-transfer-channel-list`; do glite-transfer-channel-set $i Active -m "Service upgraded to FTS 2.0"; done

Re-enable your service monitoring

... or otherwise bring the nodes out of maintenance. Site specific.

Announce that the service is back

Please use the EGEE broadcast tool to announce that the service is back.


Maintainer: GavinMcCance


Edit | Attach | Watch | Print version | History: r11 < r10 < r9 < r8 < r7 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r11 - 2007-10-10 - SteveTraylen
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback