Most "popular" FTS errors knowledge base

This page describes most frequent FTS errors seen on production tier-0 export transfers and details the causes of these errors.

Id – internal error id;
Sample – error pattern; Error text without personal information
Type – application where error occurred (if possible to determine) or user – if it’s human mistake;
FTS classification – general classification;
Reason – why an error occured;

Id - 25

Sample - FINAL:SRM_DEST: Failed on SRM put: SRM getRequestStatus timed out on put;
Type - SRM
FTS Classification - REQUEST_TIMEOUT
Reasons - most resent - high load of the storage system, but could be also FTS misconfiguration (to short timeouts)

Id - 21

Sample - DESTINATION during PREPARATION phase: [GENERAL_FAILURE] CastorStagerInterface.c:2507 Device or resource busy (errno=0, serrno=0)
Type - Castor;
FTS Classification - STORAGE_INTERNAL_ERROR
Reason - normally means the file is considered as still being written by some other request. An "nsls" (or srmLs) then shows the file having a size of zero bytes. Such files may be left behind when a request was terminated ungracefully;

Id - 40

Sample - Destination and source file sizes don't match!!
Type dCache
FTS Classification - INVALID_SIZE
Reasons - gridFTP doors problem.

Id - 19

Sample - FINAL:SRM_DONE_DEST: failing to do 'setDone' on target SRM
Type SRM
FTS Classification - REQUEST_FAILURE
Reasons -

Id - 16

Sample - The server sent an error response: 421 421 Timeout (900 seconds): closing control connection.
Type GRIDFTP
FTS Classification - CONNECTION
Reasons - bug in the gridftp code in retrieve() in ftpd.c

Maarten Litmaath comment - "That bug is triggered by another problem: the operation timed out.
As far as I remember, the error is always due to the destination"

Id - 1

Sample - Operation was aborted (the gridFTP transfer timed out)
Type GRIDFTP
FTS Classification - GRIDFTP
Reasons - intermittent transfer timeout, could be lot of reasons

Id - 8

Sample - the server sent an error response: 425 425 Can't open data connection. timed out() failed
Type GRIDFTP
FTS Classification - CONNECTION
Reasons - the attempt to establish the data connection(s) with the peer had an error

Maarten Litmaath comment - "This can have at least 2 causes:

1. The connection to the data port (in the destination GLOBUS_TCP_PORT_RANGE)
   is blocked by a firewall or by a temporary network problem.
2. The connection succeeded, but the data transfer timed out.

Also this problem is due to the destination.

Id - 30

Sample - FINAL:SRM_DEST: Failed on SRM put: Failed SRM put on [address] no TURL retrieved for [addres]
Type SRM (Castor?)
FTS Classification - GENERAL_FAILURE
Reasons - internal error on the destination SE

Id - 11

Sample - an end-of-file was reached
Type dCache
FTS Classification - GRIDFTP
Reasons - Error transmitted by the dCache client when file system is full or the data connection was closed prematurely for any other reason

Id - 31

Sample - Failed on SRM get: SRM getRequestStatus timed out on get
Type - SRM
FTS Classification - REQUEST_TIMEOUT
Reasons - Source file is not staged, has to be recalled from tape. SE too busy (the timeout usually is 180 s). SE in bad shape.

Id - 12

Sample - 451 Local resource failure: malloc: Cannot allocate memory
Type Castor
FTS Classification - STORAGE_INTERNAL_ERROR
Reasons - After a timeout caused by inactivity on the data channels the CASTOR Grid-ftp server (at VDT level) tries to read the rest of the file into memory and fails on the malloc because the memory limit on such processes has been set low (50 MB), exactly to cause the process to fail and exit under such circumstances: the destination had a problem with writing the data, then it stalled.

Id - 34

Sample - Failed on SRM get: Failed SRM get on [address] no TURL retrieved for [address]
Type SRM (Castor?)
FTS Classification - GENERAL_FAILURE
Reasons - internal error

Id - 9

Sample - the server sent an error response: 553 553 Address already in use
Type
FTS Classification - GRIDFTP
Reasons - kind of bag?

Id - 107

Sample - the server sent an error response: 451 451 rfio read failure
Type CASTOR
FTS Classification - STORAGE_INTERNAL_ERROR
Reasons - CASTOR error that can happen due to misconfiguration, SW bug, HW error, and possibly overload

Id - 14

Sample - the server sent an error response: 426 426 Data connection. data_write() failed: Handle not in the proper state
Type dCach
FTS Classification - GRIDFTP
Reasons - - means that the side sending out data encountered an error while sending the data to the ftp subsystem. However often this simply indictaes that the TCP data connection(s) closed - probably the peer closed them although network problems have also been known to cause the connection to reset

Maarten Litmaath comment - "The network problems could be caused by firewalls that
either are too strict (sometimes hardcoded in the firmware) or otherwise misconfigured

Id - 313

Sample - SOURCE during PREPARATION phase: [REQUEST_TIMEOUT] failed to prepare source file in 180 seconds
Type SRM
FTS Classification - REQUEST_TIMEOUT
Reasons - most resent - high load of the storage system, but could be also FTS misconfiguration or the file was not available on disk, had to be staged in from tape

Id - 32

Sample - SOURCE during PREPARATION phase: [INVALID_PATH] specified file(s) does not exist
Type User
FTS Classification - INVALID_PATH
Reasons - wrong path

Id - 306

Sample - DESTINATION during PREPARATION phase: [REQUEST_TIMEOUT] failed to prepare Destination file in 180 seconds
Type SRM
FTS Classification - REQUEST_TIMEOUT
Reasons - most resent - high load of the storage system, but could be also FTS misconfiguration (to short timeouts)

Id - 75

Sample - SOURCE during PREPARATION phase: [GENERAL_FAILURE] CastorStagerInterface.c:2162 Required tape segments are not all accessible (errno=0, serrno=0)
Type Castor
FTS Classification - GENERAL_FAILURE
Reasons - the file has to be staged in from a tape that currently is marked disabled because it has a problem

Id - 359

Sample - SOURCE during PREPARATION phase: [GENERAL_FAILURE] cannot continue since no size has been returned after PrepareToGet or SrmStat
Type SRM
FTS Classification - GENERAL_FAILURE
Reasons - SRM - internal error (bag) files were incorrectly transferred to castor in the first place. Srmcp or a FTS SRMCOPY channel running by a dCache SRM gridftp client drops the connection in QUIT and calls srm setFileStatus("Done") too early (before the gridftp server has closed it). As a result the filesize is not correctly updated

Maarten Litmaath comment - "CASTOR has been made more robust against such clients now"

Id - 311

Sample - TRANSFER during TRANSFER phase: [TRANSFER_TIMEOUT] gridftp_copy_wait: Connection timed out
Type gridFTP
FTS Classification - TRANSFER_TIMEOUT
Reasons - transfer takes too long or some (conttrol or data) connection could not even be made

Id - 321

Sample - DESTINATION during PREPARATION phase: [GENERAL_FAILURE] destination file failed on the SRM with error [SRM_FAILURE]
Type SRM
FTS Classification - GENERAL_FAILURE
Reasons - Internal error at destination SRM

Id - 239

Sample - DESTINATION during FINALIZATION phase: [GENERAL_FAILURE] failed to complete PrepareToPut request [id] on remote SRM [srm]: [SRM_INVALID_REQUEST] ]
Type SRM
FTS Classification - GENERAL_FAILURE
Reasons - Internal error at destination SRM

Id - 304

Sample - empty file size returned
Type dCach
FTS Classification -
Reasons - the file exists, but has a zero file size. Such files can be left by ungracefully terminated requests

Id - 362

Sample - TRANSFER during TRANSFER phase: [TRANSFER_TIMEOUT] globus_gass_copy_register_url_to_url: Connection timed out
Type gridFTP
FTS Classification - TRANSFER_TIMEOUT
Reasons - high load on channel, copying takes too long or some connection could not even be made, e.g. due to high load on the channel or on the network, or due to network/firewall problems.

Id - 309

Sample - DESTINATION during PREPARATION phase: [CONNECTION] failed to contact on remote SRM [srm]. Givin' up after 3 tries
Type SRM
FTS Classification - CONNECTION
Reasons - can't connect SRM, SRM downtime or network/firewall problem.

Id - 90

Sample - FINAL:SRM_SOURCE: Failed on SRM get: Failed SRM get on [addres] call. Error is RequestFileStatus#-[] failed with error:[ at Wed Feb 21 12:18:44 CET 2007 state Failed : file not found : path [path] not found
Type User
FTS Classification - INVALID_PATH
Reasons - wrong path

Id - 365

Sample - Final error on SOURCE during TRANSFER phase: [TRANSFER_TIMEOUT] globus_ftp_client_size: Connection timed out
Type gridFTP
FTS Classification - TRANSFER_TIMEOUT
Reasons - getting information takes too long or the connection could not be established

Id - 23

Sample - DESTINATION during PREPARATION phase: [GENERAL_FAILURE] RequestFileStatus#[id] failed with error:[ [DATE] state Failed : GetStorageInfoFailed : file exists, cannot write
Type User
FTS Classification - GENERAL_FAILURE
Reasons - file already exist.


Last edit: AlexanderUzhinskiy on 2007-12-17 - 14:54

Number of topics: 1

Maintainer: AlexanderUzhinskiy

Edit | Attach | Watch | Print version | History: r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r3 - 2007-12-17 - AlexanderUzhinskiy
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback