FTS Transfer Timeouts
This page explains how the timeouts on a transfer are calculated. The timeouts considered here are referred only to the transfer phase (the actual gridftp transfer) for urlcopy channels or the srmCopy operation for srmcopy channels, i.e. all the other timeouts (on get/put operations, http timeouts etc.) are not considered.
In FTS versions <= 2.1, it was only possible to set:
- for urlcopy channels:
- an absolute value for the timeout
- some extra transfer failure conditions based on transfer markers
- for srmcopy channels:
- an absolute value for the timeout, which was then multiplied for the number of files in the request
- failure condition based on refresh interval between statusOfSrmCopyRequest operations
This model was inadequate to handle the case where files with very different sizes were transferred over the same channel.
Starting with FTS 2.2, more complex timeouts will be introduced for the transfers or copy operations.
See also
FtsDbSchema.
Url copy channels
Transfer timeout = urlcopy_tx_to + tx_to_per_mb * file size in mb
Fail the transfer if:
- urlcopy_txmarks_to is set to a value N (not null) and no transfer markers are received for more than N seconds (regardless whether the markers are indicating a transfer progress or not).
- url_copy_first_txmark_to is set to a value N (not null) and the first non-zero transfer marker is not received within N seconds from the start of the transfer.
- no_tx_activity_to is set to a value N (not null) and the transfer markers do not indicate any progress for more than N seconds.
If both urlcopy_txmarks_to and tx_to_per_mb are zero (or null), the transfer is considered to have no timeout.
urlcopy_tx_to acts as a lower limit on the transfer timeout, so that very small files will not be failed just because the calculated timeout was so small that the transfer didn't even have the time to start.
urlcopy_txmarks_to and
url_copy_first_txmark_to have the same meaning as before: abort the transfer if you don't receive the first non-zero marker within a certain time; once you received the first non-zero marker, abort the transfer if you don't receive subsequent markers at least every N seconds, without caring if the marker is indicating a progress or not (if the transfer is stuck at 50% it's ok, as long as you keep receiving markers).
no_tx_activity_to is new in FTS 2.2 and introduces a check on the value of the progress reported by the markers: if the markers indicate no progress for more than a certain time, kill the transfer.
Examples
The following graph shows the behavior of some transfers failed because the transfer timeout was hit on the CERN-STAR channel on the T2 FTS service at CERN.
Please note that none of the above transfers reached 100%. Probably, rather than having an extension grace period (as suggested in
BUG:40947
it would be better to kill those transfers sooner, thanks to the
no_tx_activity_to.
Srm copy channels
Transfer timeout = srmcopy_to * number of files + tx_to_per_mb * (sum of the sizes of all files)
Fail the transfer if:
- srmcopy_refresh_to is set to a value N (not null) and no status updates received for more than N seconds.
Last edit:
AkosFrohner on 2009-06-03 - 11:13
Number of topics: 1
Maintainer:
PaoloTedesco