Data Transfer Tools for Theory QCD Application

Grid Tools

We investigate xrootd and FTS for this purpose.

xrootd

lxplus.cern.ch : the tools are already installed in: /usr/bin

SLC4,SLC5: yam install xrootd-client

SL4,SL5: find and install rpm in linuxsoft.cern.ch extras repository

Alternatively see below to install "from scratch".

Install and setup xrootd client tools "from scratch" (latest development version)

Minimal steps (should be fine for SLC4,SLC5,RHEL):

Get the installer from the xrootd homepage

I installed the latest CVS development version which should in few weeks become the production version.

=bash xrd-installer --install

By default the client tools get installed in ~/xrdserver. You need this:

export LD_LIBRARY_PATH=~/xrdserver/lib:$LD_LIBRARY_PATH
export PATH=~/xrdserver/bin:$PATH

You are ready to go.

Extra steps in case of problems (what I did to get it running on ubuntu 9.10)

If installer fails to compile xrootd packages then check if you have all needed packages on the system (including the dev versions):

To authenticate via Kerberos 5 make sure that you have krb5 package installed and configured to include CERN.CH:

If needed, add CERN.CH to kerberos realms in [realms] section in the configuration file /etc/krb5.conf:

[realms]
     CERN.CH = {
         default_domain = cern.ch
         kpasswd_server = afskrb5m.cern.ch
         admin_server = afskrb5m.cern.ch
         kdc = afsdb2.cern.ch
         kdc = afsdb3.cern.ch
         kdc = afsdb1.cern.ch

         v4_name_convert = {
            host = {
              rcmd = host
            }
         }
     }

I also made it default:

[libdefaults]
        default_realm = CERN.CH

Transfer your files

kinit user@CERN.CH

You are ready to play with Castor transfer:

  • upload: xrdcp /etc/hosts root://castorpublic.cern.ch//castor/cern.ch/user/m/moscicki/tmp.test
  • download and dump on screen: xrdcp root://castorpublic.cern.ch//castor/cern.ch/user/m/moscicki/tmp.test -
  • download in verbose mode and pipe to /dev/null: xrdcp -d 3 -f root://castorpublic//castor/cern.ch/theory/pcqcd/L12T12_b5.8458_id1/cond/rome/L12T12_b5.8458_cond_run1.tar /dev/null
  • for WAN transfers use -S15 option to use up to 15 parallel streams to speedup transfers

Interactive command line client:

  • xrd castorpublic.cern.ch
  • browse the tree with dirlist and cd

Some transfer tests done by me

Create a big random file on a local disk:

pcarda75: dd if=/dev/urandom of=20GB.RANDOM.TEST bs=20M count=1000
1000+0 records in
1000+0 records out
20971520000 bytes (21 GB) copied, 5908.11 s, 3.5 MB/s

Copy it to Castor using xrootd (intranet):

time xrdcp 20GB.RANDOM.TEST root://castorpublic.cern.ch//castor/cern.ch/user/m/moscicki/20GB.RANDOM.TEST
Disabling apmon monitoring since env variable APMON_CONFIG was not found
[xrootd] Total 20000.00 MB	|====================| 100.00 % [10.9 MB/s]

real	32m6.181s
user	0m9.925s
sys	2m37.810s

Setting up proxy server at CERN with public access

Instructions are here:

/afs/cern.ch/sw/arda/install/theory/xrootd

Grid tools at CERN

Enable debug output for SOAP clients (srmcp,lcg-cp):

export CGSI_TRACE=1

Setup environment:

source /afs/cern.ch/project/gd/LCG-share/current/etc/profile.d/grid_env.sh

GridFTP

Example:

grid-proxy-init
globus-url-copy gsiftp://lxfsrk5801.cern.ch:2811///castor/cern.ch/user/m/moscicki/tmp.test file:///tmp/tmp.test

srmcp from dcache

Here is the trick:

srmcp -srm_protocol_version 2 srm://srm-public.cern.ch:8443/srm/managerv2?SFN=/castor/cern.ch/user/m/moscicki/tmp.test file:///./tmp.test

Warning: the destination path is relative!

tests with big files from CERN intranet

Expiry of grid proxy:

[lxplus250] /afs/cern.ch/user/m/moscicki > date
Thu Feb 25 15:39:01 CET 2010
[lxplus250] /afs/cern.ch/user/m/moscicki > time srmcp -srm_protocol_version 2 srm://srm-public.cern.ch:8443/srm/managerv2?SFN=/castor/cern.ch/user/m/moscicki/20GB.RANDOM.TEST file:///tmp/20GB.RANDOM.TEST
WARNING: SRM_PATH is defined, which might cause a wrong version of srm client to be executed
WARNING: SRM_PATH=/afs/cern.ch/project/gd/LCG-share/3.1.38-0/d-cache/srm
[main] ERROR gsi.CertificateRevocationLists  - CRL /afs/cern.ch/project/gd/LCG-share2/certificates/684261aa.r0 failed to load.
java.security.GeneralSecurityException: [JGLOBUS-16] CRL data not found.
	at org.globus.gsi.CertUtil.loadCrl(CertUtil.java:526)
	at org.globus.gsi.CertificateRevocationLists.loadCrl(CertificateRevocationLists.java:174)
	at org.globus.gsi.CertificateRevocationLists.reload(CertificateRevocationLists.java:129)
	at org.globus.gsi.CertificateRevocationLists$DefaultCertificateRevocationLists.refresh(CertificateRevocationLists.java:225)
	at org.globus.gsi.CertificateRevocationLists.getDefault(CertificateRevocationLists.java:209)
	at org.globus.gsi.CertificateRevocationLists.getDefaultCertificateRevocationLists(CertificateRevocationLists.java:197)
	at org.globus.gsi.gssapi.GlobusGSSContextImpl.verifyChain(GlobusGSSContextImpl.java:717)
	at org.globus.gsi.gssapi.GlobusGSSContextImpl.initSecContext(GlobusGSSContextImpl.java:513)
	at org.globus.gsi.gssapi.net.GssSocket.authenticateClient(GssSocket.java:107)
	at org.globus.gsi.gssapi.net.GssSocket.startHandshake(GssSocket.java:145)
	at org.globus.gsi.gssapi.net.GssSocket.getOutputStream(GssSocket.java:166)
	at org.apache.axis.transport.http.HTTPSender.writeToSocket(HTTPSender.java:440)
	at org.apache.axis.transport.http.HTTPSender.invoke(HTTPSender.java:138)
	at org.apache.axis.strategies.InvocationStrategy.visit(InvocationStrategy.java:32)
	at org.apache.axis.SimpleChain.doVisiting(SimpleChain.java:118)
	at org.apache.axis.SimpleChain.invoke(SimpleChain.java:83)
	at org.apache.axis.client.AxisClient.invoke(AxisClient.java:165)
	at org.apache.axis.client.Call.invokeEngine(Call.java:2784)
	at org.apache.axis.client.Call.invoke(Call.java:2767)
	at org.apache.axis.client.Call.invoke(Call.java:2443)
	at org.apache.axis.client.Call.invoke(Call.java:2366)
	at org.apache.axis.client.Call.invoke(Call.java:1812)
	at org.dcache.srm.v2_2.SrmSoapBindingStub.srmStatusOfGetRequest(SrmSoapBindingStub.java:2213)
	at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:585)
	at org.dcache.srm.client.SRMClientV2.handleClientCall(SRMClientV2.java:178)
	at org.dcache.srm.client.SRMClientV2.srmStatusOfGetRequest(SRMClientV2.java:449)
	at gov.fnal.srm.util.SRMGetClientV2.start(SRMGetClientV2.java:324)
	at gov.fnal.srm.util.SRMDispatcher.work(SRMDispatcher.java:817)
	at gov.fnal.srm.util.SRMDispatcher.main(SRMDispatcher.java:368)
[main] ERROR gsi.CertificateRevocationLists  - CRL /afs/cern.ch/project/gd/LCG-share2/certificates/7b54708e.r0 failed to load.
java.security.GeneralSecurityException: [JGLOBUS-16] CRL data not found.
	at org.globus.gsi.CertUtil.loadCrl(CertUtil.java:526)
	at org.globus.gsi.CertificateRevocationLists.loadCrl(CertificateRevocationLists.java:174)
	at org.globus.gsi.CertificateRevocationLists.reload(CertificateRevocationLists.java:129)
	at org.globus.gsi.CertificateRevocationLists$DefaultCertificateRevocationLists.refresh(CertificateRevocationLists.java:225)
	at org.globus.gsi.CertificateRevocationLists.getDefault(CertificateRevocationLists.java:209)
	at org.globus.gsi.CertificateRevocationLists.getDefaultCertificateRevocationLists(CertificateRevocationLists.java:197)
	at org.globus.gsi.gssapi.GlobusGSSContextImpl.verifyChain(GlobusGSSContextImpl.java:717)
	at org.globus.gsi.gssapi.GlobusGSSContextImpl.initSecContext(GlobusGSSContextImpl.java:513)
	at org.globus.gsi.gssapi.net.GssSocket.authenticateClient(GssSocket.java:107)
	at org.globus.gsi.gssapi.net.GssSocket.startHandshake(GssSocket.java:145)
	at org.globus.gsi.gssapi.net.GssSocket.getOutputStream(GssSocket.java:166)
	at org.apache.axis.transport.http.HTTPSender.writeToSocket(HTTPSender.java:440)
	at org.apache.axis.transport.http.HTTPSender.invoke(HTTPSender.java:138)
	at org.apache.axis.strategies.InvocationStrategy.visit(InvocationStrategy.java:32)
	at org.apache.axis.SimpleChain.doVisiting(SimpleChain.java:118)
	at org.apache.axis.SimpleChain.invoke(SimpleChain.java:83)
	at org.apache.axis.client.AxisClient.invoke(AxisClient.java:165)
	at org.apache.axis.client.Call.invokeEngine(Call.java:2784)
	at org.apache.axis.client.Call.invoke(Call.java:2767)
	at org.apache.axis.client.Call.invoke(Call.java:2443)
	at org.apache.axis.client.Call.invoke(Call.java:2366)
	at org.apache.axis.client.Call.invoke(Call.java:1812)
	at org.dcache.srm.v2_2.SrmSoapBindingStub.srmStatusOfGetRequest(SrmSoapBindingStub.java:2213)
	at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:585)
	at org.dcache.srm.client.SRMClientV2.handleClientCall(SRMClientV2.java:178)
	at org.dcache.srm.client.SRMClientV2.srmStatusOfGetRequest(SRMClientV2.java:449)
	at gov.fnal.srm.util.SRMGetClientV2.start(SRMGetClientV2.java:324)
	at gov.fnal.srm.util.SRMDispatcher.work(SRMDispatcher.java:817)
	at gov.fnal.srm.util.SRMDispatcher.main(SRMDispatcher.java:368)
java.lang.RuntimeException: credential remaining lifetime is less than one minute 
	at org.dcache.srm.client.SRMClientV2.handleClientCall(SRMClientV2.java:167)
	at org.dcache.srm.client.SRMClientV2.srmAbortFiles(SRMClientV2.java:347)
	at gov.fnal.srm.util.SRMGetClientV2.abortAllPendingFiles(SRMGetClientV2.java:432)
	at gov.fnal.srm.util.SRMGetClientV2.start(SRMGetClientV2.java:386)
	at gov.fnal.srm.util.SRMDispatcher.work(SRMDispatcher.java:817)
	at gov.fnal.srm.util.SRMDispatcher.main(SRMDispatcher.java:368)
srm client error: 
java.lang.Exception:  stopped 
java.lang.RuntimeException: credential remaining lifetime is less than one minute 
	at org.dcache.srm.client.SRMClientV2.handleClientCall(SRMClientV2.java:167)
	at org.dcache.srm.client.SRMClientV2.srmAbortFiles(SRMClientV2.java:347)
	at gov.fnal.srm.util.SRMGetClientV2.abortAllPendingFiles(SRMGetClientV2.java:432)
	at gov.fnal.srm.util.SRMGetClientV2.run(SRMGetClientV2.java:409)
	at java.lang.Thread.run(Thread.java:595)

real	691m35.696s
user	6m6.926s
sys	2m18.016s

Now let's go to /tmp before to avoid disk quota exceeded:

[lxplus250] /tmp > date
Fri Feb 26 16:30:46 CET 2010
[lxplus250] /tmp > time srmcp -srm_protocol_version 2 srm://srm-public.cern.ch:8443/srm/managerv2?SFN=/castor/cern.ch/user/m/moscicki/20GB.RANDOM.TEST file:///20GB.RANDOM.TEST
WARNING: SRM_PATH is defined, which might cause a wrong version of srm client to be executed
WARNING: SRM_PATH=/afs/cern.ch/project/gd/LCG-share/3.1.38-0/d-cache/srm
SRMClientV2 : srmReleaseFiles: try # 0 failed with error
SRMClientV2 : ; nested exception is: 
	java.io.EOFException
SRMClientV2 : srmReleaseFiles: try again

real	5m18.570s
user	0m46.486s
sys	2m10.577s

It fails again (it maybe due to a previous attempt where quota was exceeded in a local directory /relative path!/).

We try again:

[lxplus250] /tmp > time srmcp -srm_protocol_version 2 srm://srm-public.cern.ch:8443/srm/managerv2?SFN=/castor/cern.ch/user/m/moscicki/20GB.RANDOM.TEST file:///20GB.RANDOM.TEST
WARNING: SRM_PATH is defined, which might cause a wrong version of srm client to be executed
WARNING: SRM_PATH=/afs/cern.ch/project/gd/LCG-share/3.1.38-0/d-cache/srm

real	6m18.579s
user	0m45.207s
sys	1m29.777s

initial problems

Version mismatch if v1 (default) is used:

Possible reason for this error is an outdate client SOAP protocol which is not understood by the server deployed at CERN:

> srmcp srm://srm-public.cern.ch/castor/cern.ch/user/m/moscicki/tmp.test file:///tmp/tmp.txt
WARNING: SRM_PATH is defined, which might cause a wrong version of srm client to be executed
WARNING: SRM_PATH=/afs/cern.ch/project/gd/LCG-share/3.1.38-0/d-cache/srm
SRMClientV1 : Method 'ns1:get' not implemented: method name or namespace not recognized
SRMClientV1 : get : try # 0 failed with error
SRMClientV1 : Method 'ns1:get' not implemented: method name or namespace not recognized
srm copy of at least one file failed or not completed

smrcp on my ubuntu box

I copied the entire d-cache directory and LCG certificates directory

cp -a /afs/cern.ch/project/gd/LCG-share/3.1.38-0/d-cache .
scp -r lxplus:/afs/cern.ch/project/gd/LCG-share2/certificates .

A fix: srmcp is implemented as 2 bash scripts which finally call java. The magic line of the shell scripts is pointing to use sh which is wrong if on the system sh!=bash (which is often the case).

Workdir environment:

export X509_CERT_DIR=/home/moscicki/srmcp_standalone_client/certificates

moscicki@pcarda75 ~/srmcp_standalone_client

Getting these errors:

copy failed with the error
org.globus.ftp.exception.ServerException: Server refused performing the request. Custom message:  (error code 1) [Nested exception message:  Custom message: Unexpected reply: 500-Command failed. : globus_xio: Unable to connect to 127.0.1.1:41398
500-globus_xio: System error in connect: Connection refused
500-globus_xio: A system call failed: Connection refused
500 End.].  Nested exception is org.globus.ftp.exception.UnexpectedReplyCodeException:  Custom message: Unexpected reply: 500-Command failed. : globus_xio: Unable to connect to 127.0.1.1:41398

Solution: the problem is caused by a wrongly configured /etc/hosts as described at: https://computing.llnl.gov/linux/slurm/faq.html#ubuntu

Some systems by default will put your host in the /etc/hosts file as something like

127.0.1.1	snowflake.llnl.gov	snowflake
This will cause srun and other things to grab 127.0.1.1 as it's address instead of the correct address and make it so the communication doesn't work. Solution is to either remove this line or set a different nodeaddr that is known by your other nodes.

A test of host configuration in python: socket.gethostbyname(socket.gethostname())

Anyway, the even if the client is wrongly configured, it is probably a gsiftp protocol or server implementation flaw that relies on a IP address sent by the client (and not using the client IP address from the connection itself).

lcg-cp

lcg-cp srm://srm-public.cern.ch/castor/cern.ch/user/m/moscicki/tmp.test file:///tmp/tmp.txt

-- JakubMoscicki - 12-Jan-2010

Topic attachments
I AttachmentSorted ascending History Action Size Date Who Comment
Compressed Zip archivetgz srmcp_standalone_client.tgz r1 manage 10536.4 K 2010-03-03 - 16:06 JakubMoscicki java-based srmcp standalone client from dcache and certification authorities certificates
Edit | Attach | Watch | Print version | History: r10 < r9 < r8 < r7 < r6 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r10 - 2010-03-03 - JakubMoscicki
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    ArdaGrid All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback