FTS version 2.1 Known Issues
This is where current known issues are tracked for the FTS release 2.1. See also
DMFtsPatchStatus for the status of patches!
Could NOT load client credentials - "delegation" issue
This bug is present in both SLC3 FTS 2.0 and SLC4 FTS 2.1 and is tracked at BUG:33449
.
This issue will be fixed by
PATCH:2760
.
Symptoms: all the transfers from a certain user fail with the error 'SOURCE error during PREPARATION phase: [PERMISSION] [SrmPing] failed: SOAP-ENV:Client - CGSI-gSOAP: Could NOT load client credentials'.
Cause: corruption of the proxy certificate on disk.
Resolution:
- delete credentials from the database
This can be done by the user himself running:
glite-delegation-destroy -s https://<server>:<port>/glite-data-transfer-fts/services/gridsite-delegation -v
or by the admin, deleting the rows for the user from the T_CREDENTIAL and T_CREDENTIAL_CACHE tables in the db.
- delete credentials from disk
From all FTS agents machines, check if the
/tmp
folder contains an
x509up_
file for the user and delete it.
Explanation
The proxy is only delegated if required (the condition is lifetime < 4 hours). The delegation is performed by the glite-transfer-submit CLI. The first submit client that sees that the proxy needs to be redelegated is the one that does it - the proxy then stays on the server for ~8 hours or so (default lifetime is 12 hours). We found a race condition in the delegation - if two clients (as is likely) detect at the same time that the proxy needs to be renewed, they both try to do it and this can result in the delegation requests being mixed up - so that that what finally ends up in the DB is the certificate from one request and the key from the other. We don’t detect this and the proxy remains invalid for the next ~8 hours.
The real fix requires a server side update (ongoing).
The quick fix. There are two options:
a) Use the legacy myproxy mode that the 2.0 sever still supports. Upload the proxy to myproxy-fts.cern.ch and add -p to the submit, as before. I see CMS have started to do this on some jobs.
b) Run, ~every hour, per FTS server instance:
/opt/glite/bin/glite-delegation-init -f -s
https://prod-fts-ws.cern.ch:8443/glite-data-transfer-fts/services/gridsite-delegation
Where the URL is the same as the
FileTransfer one except for sed 's/FileTransfer/gridsite-delegation/'.
Make sure you run only one instance of this per server at a time, or you'll be open to the same race condition. It will ensure you always have a newish proxy on the server, so the transfer-submit commands will never attempt a delegation.
Configuration notice
Transferring timeout parameters to the DB
FTS 2.1 stores the channel timeout parameters in the database, instead of the configuration files.
See also
FtsYaimValues21 for the migration script!
These parameters are still initialized through Yaim variables and are still stored in the channel
agent configuration files (.properties.xml), however a Python script is used to transfers these
values into the database. This script is normally called by the Yaim configuration script:
/opt/glite/etc/glite-data-transfer-agents.d/update_channels.py
This script assumes that the channels are already created via the 'glite-data-transfer-channel-add'
command. If the channel not yet exists then an error is displayed:
INFO: Transfering the timeout parameters from the channel configurations to the DB:
ERROR: Channel CERN-CERN was nout found in FTS, aborting.
ERROR: Please create the channel by
ERROR: glite-transfer-channel-add CERN-CERN source-site destination-site
Issues related to SL4
Oracle instantclient libstdc++ dependency on SL4
Oracle Instantclient 10.2.0.3 and 10.2.0.4 RPMs packaged by Oracle come with a deprecated
dependency: libstdc++.so.5.
The problem manifests itself by agents not starting up, they
are waiting inside a
mutex
:
/etc/init.d/transfer-agents start
...
Starting Service glite-transfer-channel-agent-srmcopy-CERN-TOSS[WARNING]
Service still starting after 60 seconds
Starting Service glite-transfer-channel-agent-urlcopy-CERN-CERN[WARNING]
Service still starting after 60 seconds
Please execute the following command to check, if you have this problem:
rpm -ql oracle-instantclient-basic | grep libocci | xargs ldd | grep libstdc++
If you see
libstdc++.so.5 => /usr/lib64/libstdc++.so.5 , then
this is a problematic package.
If you see
libstdc++.so.6 => /usr/lib64/libstdc++.so.6 , then
you are safe.
|
Original |
CERN |
Version: |
10.2.0.3-1 |
10.2.0.3-3.slc4 |
Dependency: |
libstdc++.so.5 |
libstdc++.so.6 |
gcc version: |
3.2.3 |
3.4.6 |
Oracle has an OCCI library built with the new libstdc++.so.6 dependency at
http://www.oracle.com/technology/tech/oci/occi/occidownloads.html
Since CERN has no licence to re-distribute the re-packaged Oracle Instanclient
package outside CERN, one has to replace the OCCI library by hand. One has
to download the
32 bit
or the
64 bit
tarball and copy the library over the existing Instantclient directory structure:
tar -zxf ~/occi_gcc343_x86_64_102030.tar.gz
cp libocci.so.10.1 $(rpm -ql oracle-instantclient-basic | grep libocci)
ldconfig
/etc/init.d/transfer-agents restart
Note that the transfer agents might need to be killed (
kill -9
)
by hand for a successful restart!
Starting Tomcat on SL4
With version
5.5.27-7.jpp5
of
tomcat5
on SL4 you may experience the following problem:
# /etc/init.d/tomcat5 start
/etc/init.d/tomcat5: line 196: log_success_msg: command not found
This is a issue with the the
redhat-lsb
implementation, as it is described in
RedHat BUG#171052
.
You may replace the
/lib/lsb/init-functions
file with the attached
init-functions file, until this problem is fixed.
The alternative is to replace '#!/bin/bash' by '#!/bin/sh' in
/etc/init.d/tomcat5
(see the corresponding
JPackage bug#311
).
Unsigned JPackage package
When installing
java-1.5.0-sun-compat
package from the JPackage repository the most recent version
might give you an error message:
Package java-1.5.0-sun-compat-1.5.0.17-1jpp.i586.rpm is not signed
The issue is tracked upstream as
JPackage bug#314
.
A workaround, suggested by Marc Caubet Serrabou, is to use an older version of the same package, which is signed.
You can disable a given package in the Yum repository description
/etc/yum.repos.d/jpackage.repo
:
[jpackage5-generic]
name=JPackage 5, generic
baseurl=http://linuxsoft.cern.ch/jpackage/5.0/generic/free/
enabled=1
protect=1
exclude=*1.5.0.17*
gpgkey=http://www.jpackage.org/jpackage.asc
gpgcheck=1
[jpackage5-generic-nonfree]
name=JPackage 5, generic non-free
baseurl=http://linuxsoft.cern.ch/jpackage/1.7/generic//non-free/
enabled=1
protect=1
exclude=*1.5.0.17*
gpgkey=http://www.jpackage.org/jpackage.asc
gpgcheck=1
Cosmetic Issues
Error reporting for Active transfers
BUG:32942
reported a problem that due to the error classification improvements in FTS 2.1 the error messages
retrieved via the web-service interface are truncated. The reason was that we store the error messages in multiple
database fields in FTS 2.1 and the web-service only read one of them.
This problem is fixed by
PATCH:2551
.
The fix has a side effect, empty errors are reported as
Reason: error during phase: []
. It is tracked as
BUG:43927
.
Missing javamail
In some cases you may see the following error in
/var/log/tomcat5/catalina.out
:
SEVERE: Failure loading extension /usr/share/tomcat5/common/lib/[javamail].jar
java.io.FileNotFoundException: /usr/share/tomcat5/common/lib/[javamail].jar (No such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:106)
at org.apache.catalina.util.ExtensionValidator.addSystemResource(ExtensionValidator.java:209)
at org.apache.catalina.util.ExtensionValidator.addFolderList(ExtensionValidator.java:410)
at org.apache.catalina.util.ExtensionValidator.<clinit>(ExtensionValidator.java:105)
...
While installing
tomcat5
from the JPackage repository it should install its
javamail
dependency as well
and configure the Tomcat service to use its jar file in the RPM's postinstall scriptlet:
$ rpm -q tomcat5 --requires | grep javamail
javamail = 0:1.3.1
javamail = 0:1.3.1
$ rpm -q classpathx-mail --provides | grep javamail
javamail = 0:1.3.1
javamail-monolithic = 0:1.3.1
$ rpm -q tomcat5 --scripts
For some unknown reason sometimes it fails to install the proper dependency
classpathx-mail
,
in which case you can fix this by hand:
yum install classpathx-mail
build-jar-repository /var/lib/tomcat5/common/lib javamail
Since the FTS web service does not depend on
javamail
to run it is rather a cosmetic issue.
Wrongly configured JAVA_HOME
If you get failures when starting tomcat, check /var/log/tomcat5/catalina.out. If you see lines like
Found JAVA_HOME: /..
then most probably, the line in /etc/tomcat5/tomcat5.conf including JAVA_HOME is commented out. Uncomment it, and re-run the yaim configuration.
Broken link in /var/lib/tomcat5/common/lib
The effect is that tomcat does not start. Check this directory. If you find that \[commons-collections-tomcat5\].jar is a broken link, do the following:
wget http://mirrors.dotsrc.org/jpackage/5.0/generic/free/RPMS/jakarta-commons-collections-tomcat5-3.1-9.jpp5.noarch.rpm
rpm -ivh --force jakarta-commons-collections-tomcat5-3.1-9.jpp5.noarch.rpm
pushd /var/lib/tomcat5/common/lib
ln -sf /usr/share/java/commons-collections-tomcat5.jar \[commons-collections-tomcat5\].jar
Missing libaio
It happened to me that the required libaio package did not get installed. Do it manually in this case:
yum install libaio
FTS only Yaim config
One had to specify FTA DB variables in FTS only Yaim configuration, which was fixed in
BUG:51199
, you can get the
RPM
.
Last edit:
AkosFrohner on 2009-06-05 - 10:08
Number of topics: 1
Maintainers:
AkosFrohner