Show Children Hide Children

Main FTS Pages
FtsRelease22
Install
Configuration
Administration
Procedures
Operations
Development
Previous FTSes
FtsRelease21
FtsRelease21
All FTS Pages
FtsWikiPages
Last Page Update
AkosFrohner
2009-06-05

FTS version 2.1 Known Issues

This is where current known issues are tracked for the FTS release 2.1. See also DMFtsPatchStatus for the status of patches!

Could NOT load client credentials - "delegation" issue

This bug is present in both SLC3 FTS 2.0 and SLC4 FTS 2.1 and is tracked at BUG:33449.

This issue will be fixed by PATCH:2760.

Symptoms: all the transfers from a certain user fail with the error 'SOURCE error during PREPARATION phase: [PERMISSION] [SrmPing] failed: SOAP-ENV:Client - CGSI-gSOAP: Could NOT load client credentials'.

Cause: corruption of the proxy certificate on disk.

Resolution:

  • delete credentials from the database

This can be done by the user himself running:

glite-delegation-destroy -s https://<server>:<port>/glite-data-transfer-fts/services/gridsite-delegation -v

or by the admin, deleting the rows for the user from the T_CREDENTIAL and T_CREDENTIAL_CACHE tables in the db.

  • delete credentials from disk

From all FTS agents machines, check if the /tmp folder contains an x509up_ file for the user and delete it.

  • submit a new job

Explanation The proxy is only delegated if required (the condition is lifetime < 4 hours). The delegation is performed by the glite-transfer-submit CLI. The first submit client that sees that the proxy needs to be redelegated is the one that does it - the proxy then stays on the server for ~8 hours or so (default lifetime is 12 hours). We found a race condition in the delegation - if two clients (as is likely) detect at the same time that the proxy needs to be renewed, they both try to do it and this can result in the delegation requests being mixed up - so that that what finally ends up in the DB is the certificate from one request and the key from the other. We don’t detect this and the proxy remains invalid for the next ~8 hours.

The real fix requires a server side update (ongoing).

The quick fix. There are two options:

a) Use the legacy myproxy mode that the 2.0 sever still supports. Upload the proxy to myproxy-fts.cern.ch and add -p to the submit, as before. I see CMS have started to do this on some jobs.

b) Run, ~every hour, per FTS server instance:

/opt/glite/bin/glite-delegation-init -f -s https://prod-fts-ws.cern.ch:8443/glite-data-transfer-fts/services/gridsite-delegation

Where the URL is the same as the FileTransfer one except for sed 's/FileTransfer/gridsite-delegation/'.

Make sure you run only one instance of this per server at a time, or you'll be open to the same race condition. It will ensure you always have a newish proxy on the server, so the transfer-submit commands will never attempt a delegation.

Configuration notice

Transferring timeout parameters to the DB

FTS 2.1 stores the channel timeout parameters in the database, instead of the configuration files. See also FtsYaimValues21 for the migration script!

These parameters are still initialized through Yaim variables and are still stored in the channel agent configuration files (.properties.xml), however a Python script is used to transfers these values into the database. This script is normally called by the Yaim configuration script: /opt/glite/etc/glite-data-transfer-agents.d/update_channels.py

This script assumes that the channels are already created via the 'glite-data-transfer-channel-add' command. If the channel not yet exists then an error is displayed:

INFO: Transfering the timeout parameters from the channel configurations to the DB:
  ERROR: Channel CERN-CERN was nout found in FTS, aborting.
  ERROR: Please create the channel by
  ERROR:     glite-transfer-channel-add CERN-CERN source-site destination-site

Issues related to SL4

Oracle instantclient libstdc++ dependency on SL4

Oracle Instantclient 10.2.0.3 and 10.2.0.4 RPMs packaged by Oracle come with a deprecated dependency: libstdc++.so.5.

The problem manifests itself by agents not starting up, they are waiting inside a mutex:

   /etc/init.d/transfer-agents start
        ...
        Starting Service glite-transfer-channel-agent-srmcopy-CERN-TOSS[WARNING]
        Service still starting after 60 seconds
        Starting Service glite-transfer-channel-agent-urlcopy-CERN-CERN[WARNING]   
        Service still starting after 60 seconds

Please execute the following command to check, if you have this problem:

rpm -ql oracle-instantclient-basic | grep libocci | xargs ldd | grep libstdc++

If you see libstdc++.so.5 => /usr/lib64/libstdc++.so.5 , then this is a problematic package.

If you see libstdc++.so.6 => /usr/lib64/libstdc++.so.6 , then you are safe.

  Original CERN
Version: 10.2.0.3-1 10.2.0.3-3.slc4
Dependency: libstdc++.so.5 libstdc++.so.6
gcc version: 3.2.3 3.4.6

Oracle has an OCCI library built with the new libstdc++.so.6 dependency at http://www.oracle.com/technology/tech/oci/occi/occidownloads.html

Since CERN has no licence to re-distribute the re-packaged Oracle Instanclient package outside CERN, one has to replace the OCCI library by hand. One has to download the 32 bit or the 64 bit tarball and copy the library over the existing Instantclient directory structure:

tar -zxf ~/occi_gcc343_x86_64_102030.tar.gz
cp libocci.so.10.1 $(rpm -ql oracle-instantclient-basic | grep libocci)
ldconfig
/etc/init.d/transfer-agents restart

Note that the transfer agents might need to be killed (kill -9) by hand for a successful restart!

Starting Tomcat on SL4

With version 5.5.27-7.jpp5 of tomcat5 on SL4 you may experience the following problem:

# /etc/init.d/tomcat5 start
/etc/init.d/tomcat5: line 196: log_success_msg: command not found

This is a issue with the the redhat-lsb implementation, as it is described in RedHat BUG#171052.

You may replace the /lib/lsb/init-functions file with the attached init-functions file, until this problem is fixed.

The alternative is to replace '#!/bin/bash' by '#!/bin/sh' in /etc/init.d/tomcat5 (see the corresponding JPackage bug#311).

Unsigned JPackage package

When installing java-1.5.0-sun-compat package from the JPackage repository the most recent version might give you an error message:

Package java-1.5.0-sun-compat-1.5.0.17-1jpp.i586.rpm is not signed

The issue is tracked upstream as JPackage bug#314.

A workaround, suggested by Marc Caubet Serrabou, is to use an older version of the same package, which is signed.

You can disable a given package in the Yum repository description /etc/yum.repos.d/jpackage.repo:

[jpackage5-generic]
name=JPackage 5, generic
baseurl=http://linuxsoft.cern.ch/jpackage/5.0/generic/free/
enabled=1
protect=1
exclude=*1.5.0.17*
gpgkey=http://www.jpackage.org/jpackage.asc
gpgcheck=1

[jpackage5-generic-nonfree]
name=JPackage 5, generic non-free
baseurl=http://linuxsoft.cern.ch/jpackage/1.7/generic//non-free/
enabled=1
protect=1
exclude=*1.5.0.17*
gpgkey=http://www.jpackage.org/jpackage.asc
gpgcheck=1

Cosmetic Issues

Error reporting for Active transfers

BUG:32942 reported a problem that due to the error classification improvements in FTS 2.1 the error messages retrieved via the web-service interface are truncated. The reason was that we store the error messages in multiple database fields in FTS 2.1 and the web-service only read one of them.

This problem is fixed by PATCH:2551.

The fix has a side effect, empty errors are reported as Reason:       error during  phase: []. It is tracked as BUG:43927.

Missing javamail

In some cases you may see the following error in /var/log/tomcat5/catalina.out:

SEVERE: Failure loading extension /usr/share/tomcat5/common/lib/[javamail].jar
java.io.FileNotFoundException: /usr/share/tomcat5/common/lib/[javamail].jar (No such file or directory)
        at java.io.FileInputStream.open(Native Method)
        at java.io.FileInputStream.<init>(FileInputStream.java:106)
        at org.apache.catalina.util.ExtensionValidator.addSystemResource(ExtensionValidator.java:209)
        at org.apache.catalina.util.ExtensionValidator.addFolderList(ExtensionValidator.java:410)
        at org.apache.catalina.util.ExtensionValidator.<clinit>(ExtensionValidator.java:105)
...

While installing tomcat5 from the JPackage repository it should install its javamail dependency as well and configure the Tomcat service to use its jar file in the RPM's postinstall scriptlet:

$ rpm -q tomcat5 --requires | grep javamail
javamail = 0:1.3.1
javamail = 0:1.3.1
$  rpm -q classpathx-mail --provides | grep javamail
javamail = 0:1.3.1
javamail-monolithic = 0:1.3.1
$ rpm -q tomcat5 --scripts

For some unknown reason sometimes it fails to install the proper dependency classpathx-mail, in which case you can fix this by hand:

yum install classpathx-mail
build-jar-repository /var/lib/tomcat5/common/lib javamail

Since the FTS web service does not depend on javamail to run it is rather a cosmetic issue.

Wrongly configured JAVA_HOME

If you get failures when starting tomcat, check /var/log/tomcat5/catalina.out. If you see lines like

Found JAVA_HOME: /..

then most probably, the line in /etc/tomcat5/tomcat5.conf including JAVA_HOME is commented out. Uncomment it, and re-run the yaim configuration.

Broken link in /var/lib/tomcat5/common/lib

The effect is that tomcat does not start. Check this directory. If you find that \[commons-collections-tomcat5\].jar is a broken link, do the following:

wget http://mirrors.dotsrc.org/jpackage/5.0/generic/free/RPMS/jakarta-commons-collections-tomcat5-3.1-9.jpp5.noarch.rpm
rpm -ivh --force jakarta-commons-collections-tomcat5-3.1-9.jpp5.noarch.rpm
pushd /var/lib/tomcat5/common/lib
ln -sf /usr/share/java/commons-collections-tomcat5.jar \[commons-collections-tomcat5\].jar

Missing libaio

It happened to me that the required libaio package did not get installed. Do it manually in this case:

yum install libaio

FTS only Yaim config

One had to specify FTA DB variables in FTS only Yaim configuration, which was fixed in BUG:51199, you can get the RPM.


Last edit: AkosFrohner on 2009-06-05 - 10:08
Number of topics: 1

Maintainers: AkosFrohner


Topic attachments
I Attachment History Action Size Date Who Comment
Unknown file formatext glite-data-transfer-agents r1 manage 0.2 K 2008-10-06 - 16:51 AkosFrohner glite-data-transfer-agents.logrotate
Unknown file formatext init-functions r1 manage 0.5 K 2009-01-09 - 17:16 AkosFrohner  
Edit | Attach | Watch | Print version | History: r25 < r24 < r23 < r22 < r21 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r25 - 2009-06-05 - AkosFrohner
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback