Most "popular" FTS errors knowledge base
This page describes most frequent FTS errors seen on production tier-0 export transfers and details the causes of these errors.
Id – internal error id;
Sample – error pattern; Error text without personal information
Type – application where error occurred (if possible to determine) or user – if it’s human mistake;
FTS classification – general classification;
Reason – why an error occured;
Id - 25
Sample - FINAL:SRM_DEST: Failed on SRM put: SRM getRequestStatus timed
out on put;
Type - SRM
FTS Classification - REQUEST_TIMEOUT
Reasons - most resent - high load of the storage system, but could be
also FTS misconfiguration (to short timeouts)
Id - 21
Sample - DESTINATION during PREPARATION phase: [GENERAL_FAILURE]
CastorStagerInterface.c:2507 Device or resource busy (errno=0, serrno=0)
Type - Castor;
FTS Classification - STORAGE_INTERNAL_ERROR
Reason - normally means the file is considered as still being written by some other request. An "nsls" (or srmLs) then shows the file having a size of zero bytes. Such files may be left behind when a request was terminated ungracefully;
Id - 40
Sample - Destination and source file sizes don't match!!
Type dCache
FTS Classification - INVALID_SIZE
Reasons - gridFTP doors problem.
Id - 19
Sample - FINAL:SRM_DONE_DEST: failing to do 'setDone' on target SRM
Type SRM
FTS Classification - REQUEST_FAILURE
Reasons -
Id - 16
Sample - The server sent an error response: 421 421 Timeout (900 seconds):
closing control connection.
Type GRIDFTP
FTS Classification - CONNECTION
Reasons - bug in the gridftp code in retrieve() in ftpd.c
Maarten Litmaath comment - "That bug is triggered by another problem: the operation timed out.
As far as I remember, the error is always due to the destination"
Id - 1
Sample - Operation was aborted (the gridFTP transfer timed out)
Type GRIDFTP
FTS Classification - GRIDFTP
Reasons - intermittent transfer timeout, could be lot of reasons
Id - 8
Sample - the server sent an error response: 425 425 Can't open data
connection. timed out() failed
Type GRIDFTP
FTS Classification - CONNECTION
Reasons - the attempt to establish the data connection(s) with the peer
had an error
Maarten Litmaath comment - "This can have at least 2 causes:
1. The connection to the data port (in the destination GLOBUS_TCP_PORT_RANGE)
is blocked by a firewall or by a temporary network problem.
2. The connection succeeded, but the data transfer timed out.
Also this problem is due to the destination.
Id - 30
Sample - FINAL:SRM_DEST: Failed on SRM put: Failed SRM put on [address] no
TURL retrieved for [addres]
Type SRM (Castor?)
FTS Classification - GENERAL_FAILURE
Reasons - internal error on the destination SE
Id - 11
Sample - an end-of-file was reached
Type dCache
FTS Classification - GRIDFTP
Reasons - Error transmitted by the dCache client when file system is full or the data connection was closed prematurely for any other reason
Id - 31
Sample - Failed on SRM get: SRM getRequestStatus timed out on get
Type - SRM
FTS Classification - REQUEST_TIMEOUT
Reasons - Source file is not staged, has to be recalled from tape. SE too busy (the timeout usually is 180 s). SE in bad shape.
Id - 12
Sample - 451 Local resource failure: malloc: Cannot allocate memory
Type Castor
FTS Classification - STORAGE_INTERNAL_ERROR
Reasons - After a timeout caused by inactivity on the data channels the
CASTOR Grid-ftp server (at VDT level) tries to read the rest of the file
into memory and fails on the malloc because the memory limit on such processes has been set low (50 MB), exactly to cause the process to fail and exit under such circumstances:
the destination had a problem with writing the data, then it stalled.
Id - 34
Sample - Failed on SRM get: Failed SRM get on [address] no TURL retrieved
for [address]
Type SRM (Castor?)
FTS Classification - GENERAL_FAILURE
Reasons - internal error
Id - 9
Sample - the server sent an error response: 553 553 Address already in use
Type
FTS Classification - GRIDFTP
Reasons - kind of bag?
Id - 107
Sample - the server sent an error response: 451 451 rfio read failure
Type CASTOR
FTS Classification - STORAGE_INTERNAL_ERROR
Reasons - CASTOR error that can happen due to misconfiguration, SW bug, HW error, and possibly overload
Id - 14
Sample - the server sent an error response: 426 426 Data connection.
data_write() failed: Handle not in the proper state
Type dCach
FTS Classification - GRIDFTP
Reasons - - means that the side sending out data encountered an error
while sending the data to the ftp subsystem. However often this simply
indictaes that the TCP data connection(s) closed - probably the peer
closed them although network problems have also been known to cause the
connection to reset
Maarten Litmaath comment - "The network problems could be caused by firewalls that
either are too strict (sometimes hardcoded in the firmware) or otherwise misconfigured
Id - 313
Sample - SOURCE during PREPARATION phase: [REQUEST_TIMEOUT] failed to
prepare source file in 180 seconds
Type SRM
FTS Classification - REQUEST_TIMEOUT
Reasons - most resent - high load of the storage system, but could be
also FTS misconfiguration or the file was not available on disk, had to be staged in from tape
Id - 32
Sample - SOURCE during PREPARATION phase: [INVALID_PATH] specified file(s)
does not exist
Type User
FTS Classification - INVALID_PATH
Reasons - wrong path
Id - 306
Sample - DESTINATION during PREPARATION phase: [REQUEST_TIMEOUT] failed
to prepare Destination file in 180 seconds
Type SRM
FTS Classification - REQUEST_TIMEOUT
Reasons - most resent - high load of the storage system, but could be
also FTS misconfiguration (to short timeouts)
Id - 75
Sample - SOURCE during PREPARATION phase: [GENERAL_FAILURE]
CastorStagerInterface.c:2162 Required tape segments are not all
accessible (errno=0, serrno=0)
Type Castor
FTS Classification - GENERAL_FAILURE
Reasons - the file has to be staged in from a tape that currently is marked disabled because it has a problem
Id - 359
Sample - SOURCE during PREPARATION phase: [GENERAL_FAILURE] cannot
continue since no size has been returned after
PrepareToGet or
SrmStat
Type SRM
FTS Classification - GENERAL_FAILURE
Reasons - SRM - internal error (bag) files were incorrectly transferred
to castor in the first place. Srmcp or a FTS SRMCOPY channel running by
a dCache SRM gridftp client drops the connection in QUIT and calls srm
setFileStatus("Done") too early (before the gridftp server has closed
it). As a result the filesize is not correctly updated
Maarten Litmaath comment - "CASTOR has been made more robust against such clients now"
Id - 311
Sample - TRANSFER during TRANSFER phase: [TRANSFER_TIMEOUT]
gridftp_copy_wait: Connection timed out
Type gridFTP
FTS Classification - TRANSFER_TIMEOUT
Reasons - transfer takes too long or some (conttrol or data) connection could not even be made
Id - 321
Sample - DESTINATION during PREPARATION phase: [GENERAL_FAILURE]
destination file failed on the SRM with error [SRM_FAILURE]
Type SRM
FTS Classification - GENERAL_FAILURE
Reasons - Internal error at destination SRM
Id - 239
Sample - DESTINATION during FINALIZATION phase: [GENERAL_FAILURE] failed
to complete
PrepareToPut request [id] on remote SRM [srm]:
[SRM_INVALID_REQUEST] ]
Type SRM
FTS Classification - GENERAL_FAILURE
Reasons - Internal error at destination SRM
Id - 304
Sample - empty file size returned
Type dCach
FTS Classification -
Reasons - the file exists, but has a zero file size. Such files can be left by ungracefully terminated requests
Id - 362
Sample - TRANSFER during TRANSFER phase: [TRANSFER_TIMEOUT]
globus_gass_copy_register_url_to_url: Connection timed out
Type gridFTP
FTS Classification - TRANSFER_TIMEOUT
Reasons - high load on channel, copying takes too long or some connection could not even be made, e.g. due to high load on the channel or on the network, or due to network/firewall problems.
Id - 309
Sample - DESTINATION during PREPARATION phase: [CONNECTION] failed to
contact on remote SRM [srm]. Givin' up after 3 tries
Type SRM
FTS Classification - CONNECTION
Reasons - can't connect SRM, SRM downtime or network/firewall problem.
Id - 90
Sample - FINAL:SRM_SOURCE: Failed on SRM get: Failed SRM get on [addres]
call. Error is
RequestFileStatus#-[] failed with error:[ at Wed Feb 21
12:18:44 CET 2007 state Failed : file not found : path [path] not found
Type User
FTS Classification - INVALID_PATH
Reasons - wrong path
Id - 365
Sample - Final error on SOURCE during TRANSFER phase: [TRANSFER_TIMEOUT]
globus_ftp_client_size: Connection timed out
Type gridFTP
FTS Classification - TRANSFER_TIMEOUT
Reasons - getting information takes too long or the connection could not be established
Id - 23
Sample - DESTINATION during PREPARATION phase: [GENERAL_FAILURE]
RequestFileStatus#[id] failed with error:[ [DATE] state Failed :
GetStorageInfoFailed : file exists, cannot write
Type User
FTS Classification - GENERAL_FAILURE
Reasons - file already exist.
Last edit:
AlexanderUzhinskiy on 2007-12-17 - 14:54
Number of topics: 1
Maintainer:
AlexanderUzhinskiy