Show Children Hide Children

Main FTS Pages
Previous FTSes
All FTS Pages
Last Page Update

FTS proxy corruption issue

Impact and Symptom

Total service failure for a given user on all channels on the FTS server: all transfers for a given user fail with the message "Could NOT load client credentials".


We now believe we understand the cause:

The proxy is only delegated if required (the condition is lifetime < 4 hours). The delegation is performed by the glite-transfer-submit CLI. The first submit client that sees that the proxy needs to be redelegated is the one that does it. The proxy then stays on the server for ~8 hours or so (default lifetime is 12 hours).

We found a race condition in the delegation: if two submit clients for the same user detect at the same time that the proxy needs to be renewed, they both try to do it and this can result in the delegation requests being mixed up - so that that what finally ends up in the database is the certificate from one request and the key from the other (i.e. the proxy is corrupted). We don’t detect this and the proxy remains invalid for the next ~8 hours (i.e. the proxy certificate expires, whereupon another delegation is attempted).


The real fix requires a server side update.

The is being tracked in savannah:

Workaround on the client side

There are two options:

Use the legacy myproxy mode

Use the legacy myproxy mode that the FTS 2.0 sever still supports. Upload the proxy to and add the -p option to the glite-transfer-submit CLI, as before. The problem with this option is that only plain grid proxies can be used - i.e. the proxy the FTS gets will not be a VOMS proxy.

Delegate the certificate separately from the job submission

This is the recommended workaround.

Run, ~every hour, per FTS server instance, per user:

/opt/glite/bin/glite-delegation-init -f -s

where the URL is the same as the FileTransfer one except for sed 's/FileTransfer/gridsite-delegation/'.

Make sure you run only one instance of this per server, per user at a time, or you'll be open to the same race condition.

This will ensure you always have a fairly up-to-date proxy on the FTS server, so the transfer-submit commands will never attempt a delegation.

Workaround on the server side

We can implement a (nasty) cron on the server side looking for corrupted proxies and deleting them from the disk and the DB.

This is not nice, because all the jobs will fail until you submit another (since you've still got no valid proxy) - and then when you submit another, you risk the same race condition.

Assuming you continue to submit jobs most of the time, it will limit the damage of a bad delegation to several minutes.

Wednesday 20/02/08: This cron job has now been implemnted on CERN-PROD's FTS-T0-EXPORT and FTS-T2-SERVICE.


Edit | Attach | Watch | Print version | History: r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r1 - 2008-02-20 - GavinMcCance
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback