Design of integration of FDT tool with PhEDEx.

The diagram of the intended integration outlines development of three new components to interface high-performance transfer tool FDT (Fast Data Transfer) with PhEDEx (CMS Data Transfers):

  • FDT ( - Perl module, transfer backend in PhEDEx - source code located within PhEDEx installation
  • fdtcp - Python wrapper, interface between PhEDEx and FDT data transfers, features:
    • prepares copyjob/fileList transfer file as required by FDT
    • does necessary translation of source, destination file names
    • harvests report and log files to propagate back to PhEDEx
    • invokes remote fdtd service (forwards certificate proxy for authentication)
  • fdtd - FDT daemon, FDT service wrapper, features:
    • permanently running daemon on FDT-enabled sites
    • receives request (PYRO (Python Remote Objects) calls) to launch either FDT client party on the source sites or FDT server party on the destination sites. FDT writing server must use Hadoop HDSF-FDT adapter if destination storage backend is Hadoop (by FDT client and FDT server are meant FDT Java application (fdt.jar))
    • responsible for authentication.

PhEDEx-FDT integration design

The scenario assumes transfer of file(s) from site A to site B. Source site (A) fdtd invokes client (reading, source) party of the FDT application and destination site (B) fdtd invokes server (writing, destination) party of the FDT application. FDT application (Java) is invoked only on demand and the process is closed once the transfer is over.

Both fdtcp and fdtd will be written in Python. Communication between fdtcp and fdtd will utilise PYRO (Python Remote Objects).

The envisaged feature is on-the-fly transfer monitoring. For this may be worth considering RESTful webservice API by means of CherryPy web framework so that the monitoring could be queried from web browser as well.

Active monitoring utilising MonALISA ApMon will be used throughout the transfer stream from PhEDEx transfer backend to FDT Java application.

Status 2010-03-04 - fdtcp/fdtd functional (yet with no Grid authentication)

Source codes available in this repository phedex-fdt

Using latest FDT 0.9.10

Third-party transfers implemented via and, tested launching from

Group transfers utilizing batch transfer file, (i.e. copyjobfile as known from srmcp), implemented. Various source host, destination host pairs are broken down to corresponding third party transfers, the same source, destination pairs result into group transfer utilizing a single FDT Java client/server pair. --copyjobfile=./copyjobfile

example of copyjobfile:

The main issue is to implement Grid authentication and eventually adding MonALISA ApMon monitoring.

Producing report file with transfers results is not yet implemented. It is not functionally required by PhEDEx which carries out it own checks and report file content is only published on the web interface for debugging purposes.

The plan for Grid authentication is that fdtd instance will have some host certificate, host key available for invocation of FDT server instance using GSI authentication as described here. FDT client instance will be launched with user key, user certificate as forwarded by the user launching job. User mapping at the destination site will be carried out using GUMS - Grid User Management System service. Final file ownership change will be done via a simple sudo-run external script so that neither nor FDT Java don't have to run with root privileges.

Status 2010-03-15 - Grid authentication outline

Grid authentication scheme

The user of (or PhEDEx agent) has its Grid proxy certificate ready in the /tmp/x509up_uUID file

  1. Authentication (and encryption of the initial handshake) between and remote by user's proxy: two options considered so far (only one of them will be used):
    1. Utilize GSI Python infrastructure pyGlobus, PyGridWare and pyGsi
      • download links non functional, e.g. Not Found, Not Found, Not Found, Not Found ; correction: pyGlobus is included in the GT releases, the latest GT release containing pyGlobus is 4.0.7 (March 2008), there has been no update on pyGlobus since, the latest GT release is 5.0.0 (January 2010)
      • dependency on pieces of Twisted and Zope
      • PyGridWare authenticates proxy against grid map file, no support for GUMS
    2. Use PYRO/SSL support for secure transport of user's proxy to remote
      • both and require a key and certificate issued by a trusted CA
      • no further dependency packages required apart from SSL - M2Crypto
      • PYRO is already used for, communication
  2. At this point the user's proxy is authenticated against grid map file (the same in both sub-scenarios at the previous point) and finds the local Grid user matching remote user's proxy certificate (mapping)
  3. issues either client party or server party of FDT under local Grid user via sudo
  4. Both FDT client, server perform Grid authentication (GSI) - server is started with its pair of service key, certificate and the client is supplied user's proxy certificate
  5. The authorization is secured by the local Grid user rights, i.e. reading from the specified location at the FDT client side and writing at the FDT server side. Finally - the actual data transfer begins, files arrive under correct ownership

Grid authentication solution

  • Using SSL directly in PYRO is possibly the easiest and transparent solution, but requires creating additional SSL key, certificate pair for client - this may not be acceptable in the CMS/Grid environment
  • Using Grid proxy certificate with PYRO/SSL doesn't work, certificate doesn't verify
  • Laborious experiments with pyGlobus, pyGridWare showed the libraries are rather obsolete, and experiments to set up GSI authentication and secure encrypted communication weren't successful
  • Final solution to Grid authentication is to develop a pair AuthClient, AuthService - simple GSI authentication applications in Java, taking GSI authentication solution completely from FDT and using this pair to safe transfer of user's proxy to the remote host
  • It turned out that point 4 can't be accomplished since at that stage FDT Java server runs under local grid user account (unprivileged) and as such can't have access to the grid credentials (service key and certificate). So, instead of this (secondary) GSI authentication at this stage it is used -f <allowedIPsList> mechanism of the FDT Java server allowing only certain client IP to connect to itself. This is considered strong enough and in fact similar to how ssh mode in FDT works anyway.

Grid authentication and overall control flow overview

The above picture show current grid authentication and control flow. As explained above, for the grid authentication is currently used a pair of very light-weight Java applications AuthClient and AuthService which perform grid authentication against gridmapfile. As an alternative method of authentication (and the preferred on in US grid environment) will be used GUMS.

Status 2011-05-15 - PhEDEx debug transfers intensively tested

  • FDT-HDFS writing adapter (Hadoop serializer) latest version ready for testing since 2011-02
  • Since then, a series of PhEDEx debug transfers iterations has been carried out between Caltech T2 and Nebraska T2 clusters in both directions.
  • Now, the fdtcp/fdtd layers are stable enough, number of enhancements and fixes implemented since 2011-02
  • However, reliability of these FDT PhEDEx debug transfers is still not satisfactory due to issues in HDFS cluster(s), further testing is being conducted.

Installation and configuration


  • The project code repository is available here: Mercurial repository
  • Project RPM packages incl. dependencies are available through Caltech Koji server - packages
    • cog-jglobus
    • psutil
    • pyro
    • python-apmon
    • fdt
    • fdt-hdfs
    • fdtcp

  • Further requirements: Python v2.4 or higher, Java v1.6, Grid-enabled environment, valid Grid certificate proxy before running fdtcp transfer job


  • If the target system is not enabled to load packages from the Koji repository (yum install ), then wget the latest versions of the packages RPMs and run rpm -i|-U on them.

Set up

  • New fdt account shall be created during the installation as well as configuration (/etc/fdtcp) and log directory (/var/log/fdtd), the fdtd service runs under this account
  • Appropriate sudo configuration for running fdt user to run bash wrapper scripts, example of /etc/sudoers as follows:
Cmnd_Alias FDT_CMD = /usr/bin/, /usr/bin/
Runas_Alias FDT_USR = ALL, !root
#Defaults    requiretty - needs to be disabled

  • /etc/sudoers should allow for non-tty in order to do sudo originating from fdtd running as a system daemon: disable "Defaults requiretty" as shown above (otherwise the sudo command fails with "sudo: sorry, you must have a tty to run sudo")
  • Configuration files are located in /etc/fdtcp/ and values are well documented, port numbers, log directories may be adjusted (/etc/fdtcp/fdtd.conf)
  • Generate service key and service certificate for the fdtd service with correct ownership (fdt) of these files (usual location is /etc/grid-security/fdt/ but can be customized in the /etc/fdtcp/ file)
  • fdtd uses gridmap-file for mapping remote user to local grid accounts, default location is /etc/grid-security/grid-mapfile, configurable in /etc/fdtcp// the entries in the file should look like this:
    • "/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=john/CN=34343434/CN=John Smith" phedex
  • start/stop system daemon /etc/init.d/fdtd

  • Due to libraries paths incompatibilities between versions of Hadoop 0.19-2 (currently Caltech) and Hadoop 0.20 (currently Nebraska, packages comming from Cloudera), the /etc/fdtcp/ file needs further editing:
# Hadoop 0.19-2:
export HADOOP_POSIX_PREFIX=/mnt/hadoop

# Hadoop 0.20:
# export HADOOP_POSIX_PREFIX=/mnt/hadoop must be disabled

Test of correct fdtcp/fdtd installation

The destination has to be Hadoop storage here, fdtd services running on involved hosts and fdtcp locally available.

fdtcp -h
fdtcp fdt:// fdt://

Test of direct FDT Hadoop - Hadoop transfer (just for reference)

Client (compute-1-0):

source /etc/fdtcp/
java -cp $FDTJAR -P 16 -p 54321 -c -d / -fl /home/maxa/fdtcp/fdt-filelist-hadoop -rCount 5 -noupdates

Server (

source /etc/fdtcp/
java -Dedu.caltech.hep.hdfs.HadoopFileChannel.useNewAdapter=true -cp $FDTJAR:$FDTHDFSJAR:$FDTHDFSLIBS -bs 64K -p 54321 -wCount 5 -f -nolock -S -noupdates

PhEDEx integration

TFC (Trivial File Catalog) configuration

See Caltech, Nebraska storage.xml files under CVS for reference.

PhEDEx agent configuration

The proxy is referenced through an env. variable X509_USER_PROXY but an alternative proxy can be specified via CLI option --x509userproxy.

Example of Caltech PhEDEx debug instance download agent configuration (file Config.Debug):

### AGENT LABEL=download-debug-nebraska PROGRAM=Toolkit/Transfer/FileDownload
 -db              ${PHEDEX_DBPARAM}
 -nodes           ${PHEDEX_NODE}
 -delete          ${PHEDEX_CONF}/FileDownloadDelete
 -validate        ${PHEDEX_CONF}/FileDownloadVerify
 -accept          'T2_US_Nebraska'
 -backend         FDT
 -protocols       fdt
 -command         /usr/bin/fdtcp,--debug=DEBUG,--timeout=10

Additionally, if you want to export files using FDT (so others can download), you'll need to change your export agent to support that protocol:

### AGENT LABEL=exp-pfn PROGRAM=Toolkit/Transfer/FileExport
-db          ${PHEDEX_DBPARAM}
-nodes       ${PHEDEX_NODE}
-storagemap  ${PHEDEX_MAP}
-protocols   'srmv2','direct','fdt'

PhEDEx FDT transfer backend Perl transfer backend for PhEDEx has been developed within this project and has been added into the PhEDEx codebase (thus shall be automatically distributed with PhEDEx releases at some point) or it's available from the fdtcp project repository.

The needs to be located in in the PhEDEx installation: PHEDEX/perl_lib/PHEDEX/Transfer directory.

The PhEDEx download agent runs verify script once the transfer is complete. After this verification, the debug download agent, which just downloads files for ever, shall delete the files at the destination. If the files with correct length are detected by FDT, they are not transferred again.

Monitoring - MonALISA utilisation (via ApMon)

monitoring clusters: fdtd_server_writer, fdtd_client_reader, fdtcp

   authentication [time, seconds]
   cleanup         [time, seconds]
fdtd: fdtd_server_writer
   fdt_server_init [time, seconds]
   FDT Java started so that transfer-id is propagated with -enable_apmon -monID %(monID)s
fdtd: fdtd_client_reader
   no parameters are reported
   FDT Java started so that transfer-id is propagated with -enable_apmon -monID %(monID)s

Status 2012-06-xx - test transfers Caltech-Nebraska resumed

The previous data transfer tests were done bidirectionally between Caltech and Nebraska T2 clusters with following HDFS versions:

Caltech T2 (accessed through gridftp{01,05} machines):

$ hadoop version
Hadoop 0.19.2-dev
Subversion -r 748415
Compiled by wart on Mon Mar 23 15:21:37 PDT 2009

This version, as stated above and in more details on corresponding trac tickets, has major issues handling highly parallel load from FDT.

Nebraska T2 (accessed through {srm,red-gridftp11} machines) had HDFS of some bugfix version built on top of 0.20.

This new round of transfer tests has following set up:

Caltech T2:

$ ssh
$ hadoop version
Hadoop 0.20.2+737
Subversion  -r 98c55c28258aa6f42250569bd7fa431ac657bdbd
Compiled by mockbuild on Tue Nov 29 02:52:15 EST 2011
From source with checksum 15b415650bf28785987688b54d5e292c

Nebraska T2:

$ ssh
$ hadoop version
Hadoop 0.20.2+737
Subversion  -r 98c55c28258aa6f42250569bd7fa431ac657bdbd
Compiled by mockbuild on Sat Feb 18 18:01:53 CST 2012
From source with checksum 15b415650bf28785987688b54d5e292c

Hadoop (HDFS) 0.20.2 version was released on 26 February, 2010, and this is the latest base release supported by OSG. Latest update to 0.20 base release was made on 17 Oct, 2011 (release but the relation to +737 is not clear.

Installation notes, prerequisites

Service key, certificate files, grid-mapfile:

$ cd /etc/grid-security/
$ ls -lR1 | grep "fdt\|cms-grid-mapfile"
-rw-r--r-- 1 root root 203843 Jun 13 01:40 cms-grid-mapfile
drwxr-xr-x 2 root root   4096 Dec  8  2011 fdt
-rw-r--r-- 1 fdt fdt 1663 Feb 11 16:14 fdtcert.pem
-r-------- 1 fdt fdt 1679 Feb 11 16:15 fdtkey.pem


As above, check the username is correct and requiretty disabled

RPM installation packages

RPM packages building should be carried out by the Caltech Koji server, particular packages as stated above.

sudo rpm -i cog-jglobus-1.4-2.el5.x86_64.rpm

sudo rpm -U psutil-0.4.0-2.el5.centos.x86_64.rpm

sudo rpm -i pyro-3.12-1.el5.noarch.rpm

sudo rpm -i python26-pyro-3.12-2.el5.centos.noarch.rpm

sudo rpm -i fdt-0.9.18-1.el5.noarch.rpm

sudo rpm -i fdt-hdfs-0.4.0-1.el5.noarch.rpm

sudo rpm -i fdtcp-0.4.11-1.el5.noarch.rpm

rpm -qa | grep "cog-jglobus\|psutil\|pyro\|apmon\|fdt\|fdt-hdfs\|fdtcp"

Post-RPM installation tweaks

May want to set up this Koji repository on the machine(s) so that just "yum install " works. (yum --enablerepo caltech-devel install

It's necessary to update /etc/fdtcp/ (paths) for Hadoop 0.20. This shall be made default in the next fdtcp release.

name of the grid-mapfile to adjust

For Hadoop 0.20 HADOOP_POSIX_PREFIX variable has to disabled.

Once packages are installed, fdtd (the service) will likely fail since ownership of /etc/grid-security/fdt/ needs to be changed to the fdt user. This should be fixed in the fdtcp installation package.

Caltech machine is el5 arch (with Python 2.4 default), Nebraska el6 (with Python 2.6 default) ... this version mismatch causing trouble:

Caltech machine executables (fdtcp, fdtd, fdtd-log_analyzer) needs this modification:

#!/usr/bin/env python26

The fdtcp package on the Caltech end installs under Python 2.4 (unlike the final 2.6 version and unlike other packages which were rebuilt for 2.6). Have to do this tweak for the moment:

so that the fdtcplib Python package is available when running under Python 2.6.

M2Crypto library is remnant from some older tests, if it's not available, the import like can be removed.

Issues binding correct name on the Caltech node: hostname command returns private name. For the moment configure fdtd.conf to set hostname to the public IP address.

New version of the cog-jglobus package at Nebraska (cog-jglobus-1.8.0) changed structure of JAR files (and possibly also API). It seems fine to force install the current cog-jglobus-1.4-2 from the FDT set of RPMs on the Nebraska machine and go with that. Later, the new version of cog-jglobus has to be adopted (need to make sure all automatic system updaters won't remove the obsolete package (puppetd, yum autoupdater ...).

On Nebraska, AuthService complaint about host cert,key files: org.globus.common.ChainedIOException: Authentication failed [Caused by: Operation unauthorized (Mechanism level: [JGLOBUS-56] Authorization failed. Expected "/CN=host/" target but received "/DC=org/DC=doegrids/OU=Services/CN=fdt/")] 
Sorted out by loading host (rather than service (fdt)) cert,key files to AuthService. Probably not the most sane security-wise solution, but agreed upon.

All these above tweaks shall be sorted out in the next release of the package(s).

Test transfers commands

Check if the directories are readable and writeable by the grid mapped users at both ends.

[zdenek@red-fdt test]$ fdtcp --debug=DEBUG --timeout=10 fdt:// fdt://
[zdenek@compute-4-1 ~]$ fdtcp --debug=DEBUG --timeout=10  fdt:// fdt://

The results of these tests were summarized on the ticket and in an email.

Nebraska Hadoop 2.0 tests

Caltech (reading from memory /dev/zero) to Nebraska HDFS tests.

[zdenek@red-fdt ~]$ hadoop version
Hadoop 2.0.0-cdh4.1.1
Subversion file:///builddir/build/BUILD/hadoop-2.0.0-cdh4.1.1/src/hadoop-common-project/hadoop-common -r 581959ba23e4af85afd8db98b7687662fe9c5f20
Compiled by mockbuild on Sat Dec 29 11:51:39 PST 2012
From source with checksum 95f5c7f30b4030f1f327758e7b2bd61f

Getting putting everything together for the FDT server with HDFS adapter on the writer Nebraska end took a while collecting all new dependencies:

source /etc/fdtcp/
export FDTHDFSLIBS=/usr/lib/hadoop-hdfs/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-hdfs/hadoop-hdfs.jar:/usr/lib/hadoop/hadoop-common.jar:/usr/share/java/commons-lang.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop-hdfs/lib/guava-11.0.2.jar:/usr/lib/hadoop/hadoop-auth.jar:/usr/lib/hadoop/client/slf4j-log4j12-1.6.1.jar:/usr/lib/hadoop/client/slf4j-api-1.6.1.jar:/usr/lib/hadoop/lib/log4j-1.2.17.jar

java -XX:-HeapDumpOnOutOfMemoryError -Xms1024m -Xmx1024m -Dedu.caltech.hep.hdfs.HadoopFileChannel.useNewAdapter=true -cp $FDTJAR:$FDTHDFSJAR:$FDTHDFSLIBS -bs 64K -p 5555 -bio -wCount 5 -nolock -S -noupdates

Client Caltech side:

/dev/zero / /mnt/hadoop/user/zdenek/destdir/01.test
/dev/zero / /mnt/hadoop/user/zdenek/destdir/02.test

java -XX:-HeapDumpOnOutOfMemoryError -Xms1024m -Xmx1024m -cp $FDTJAR -bio -P 16 -p 5555 -c  -d / -fl ./filelest-mem-hdfs -rCount 5 -bs 64K -noupdates

Results, logs, behaviour on the server Nebraska side:

cd /mnt/hadoop/user/zdenek/destdir
[zdenek@red-fdt destdir]$ echo "lkakjdladfj" >> file
no output from echo, nothing added into the file

[zdenek@red-fdt destdir]$ echo "lkakjdladfj" >> file
bash: echo: write error: Operation not supported

[zdenek@red-fdt destdir]$ echo "lkakjdladfj" >> file
[zdenek@red-fdt destdir]$ ls -la
total 5177352
-rw-r--r--.  1 zdenek nobody         24 Feb 21 14:35 file 

FDT server logs:

WARNING: NewHadoopWriter - /mnt/hadoop/user/zdenek/destdir/.05.test got Exception. Will stop. Cause:
org.apache.hadoop.fs.FSError: Input/output error
    at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(
    at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(
    at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.writeChunk(
    at org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(
    at org.apache.hadoop.fs.FSOutputSummer.write1(
    at org.apache.hadoop.fs.FSOutputSummer.write(
    at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(
    at edu.caltech.hep.hdfs.NewHadoopBufferWriter.writeHDFSBuffer(

-- ZdenekMaxa - 01-Feb-2010

Topic attachments
I Attachment History Action Size Date Who Comment
PNGpng authentication_and_control_flow.png r1 manage 40.7 K 2011-01-18 - 18:41 UnknownUser  
PNGpng grid_authentication.png r1 manage 19.7 K 2010-07-14 - 11:36 UnknownUser Grid authentication for FDT
PNGpng phedex-fdt.png r1 manage 34.8 K 2010-02-08 - 17:10 UnknownUser PhEDEx - FDT integration design
Edit | Attach | Watch | Print version | History: r30 < r29 < r28 < r27 < r26 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r30 - 2013-02-21 - unknown
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback