ASO Non-Blocking Rewrite

Introduction

The page covers a proposed rewrite of the TransferWorker component of ASO to introduce non-blocking behavior. A few goals and non-goals:

  • A non-goal is to manage concurrency. The TransferWorker should not try to rate limit submissions or throttle the number of transfers in the system (although it may try to rate-limit monitoring). We assume that any concurrency management occurs external to the system.
  • We keep the 1-to-1 relationship between active users and TransferWorker processes.
  • We aim for FTS3 compatibility and not FTS2.
  • We aim to separate serialization (CouchDB) of state from the transfer logic
  • We aim to not block on the completion of transfers, but it is acceptable to block on the interaction with components (such as sending commands to SRM or FTS).
  • Each file transfer is managed by at most one TransferWorker. That is, we do not worry about two different TransferWorkers both trying to operate on the same file.
  • We aim to provide the best latency to users.

The TransferWorker process will have two main classes:

  • TransferAgent: This manages the transfer lifecycle of one or more files. It is responsible for submitting transfers to FTS3, monitoring the status, and performing any post-transfer cleanup.
  • TransferState: This manages the state of transfers in the system. It will provide a list of transfers to be performed, serialize the state, and make it possible to recover from a unexpected exit of the TransferProcess.

TransferState

The TransferState object provides the following methods with the noted semantics:

  • getNewTransfers(): Returns a list of (source site, source LFN, dest site, dest LFN, ASO transfer hash) tuples. Each represents a new transfer which should enter the system.
  • setTransferID(list of ASO transfer hashes, FTS job ID): Throws an exception if the transfer state cannot be persisted. This records a FTS job ID for a set of ASO transfer hashes. This should be a durable commit.
  • setTransientTransferState(list of ASO transfer hashes, detail): Throws an exception if the transfer state cannot be persisted. This persists a transient transfer state. On error, the transfer will not be cancelled by the TransferAgent. The implementation may make a non-durable commit as an optimization.
  • getActiveTransfers(user): Returns a list of (list of ASO transfer hashes, FTS job ID) tuples for all ASO transfers in a non-terminal state.
  • setTerminalState(ASO transfer hash, state): Throws an exception if the transfer state cannot be persisted. This persists a terminal transfer state; the implementation should make a durable commit. If this fails, this call will be repeated in the future.

TransferAgent

The TransferAgent will expose one method, execute, which performs the logic of transfers. The execute method will have a main loop with approximately the following logic; the loop will break whenever there is no transfer being managed, no new transfers available, and all completed transfers have been set to terminal state.

  1. Query the TransferState.getNewTransfers for new transfers.
  2. If there are new transfers, group them into lists of transfers for (source site, destination site) pairs. Resolve into PFNs using PhEDEx data service calls. For each (source, destination) pair,
    1. Submit a transfer to FTS using fts-transfer-submit. Block until this returns.
    2. If successful, record the FTS job ID using TransferState.setTransferID. Otherwise, record the failure with TransferState.setTransientTransferState.
  3. Call TransferState.getActiveTransfers for a list of active transfers. As an optimization, this may be done once every N loops. For each:
    1. Call "fts-transfer-status --json" to get the status of the transfer.
  4. Sleep for 60 seconds.
Edit | Attach | Watch | Print version | History: r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r1 - 2014-02-09 - BrianBockelman
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback