HTTP / WebDAV Third-Party-Copy Technical Details

The HTTP TPC mechanism relies on utilizing the existing WebDAV COPY verb (see RFC 4918) and interoperable implementations / interpretations of this part of the specification.

Particularly, in many WebDAV implementations, COPY is limited to resources inside the same service; for third-party-copy, we allow this to trigger transfers from remote services.

GridSite delegation

If one endpoint supports third-party-copy requests and the other endpoint supports authorisation bearer tokens (e.g., macaroons) then it is possible to achieve the transfer using a bearer token. If this is not the case, one of the endpoints must support third-party-copy with GridSite delegation to achieve the file transfer.

The following lists expectations if a third-party-transfer with gridsite delegation is desired:

  • The client/user SHOULD have access to a credential that is valid for at least 20 minutes when making a GridSite COPY request.
  • The server SHOULD request the client delegates a fresh credential when the client makes a GridSite COPY request and either has no delegated credential or the delegated credential has expired.
  • The server MAY request the client delegates a fresh credential when the client makes a GridSite COPY request and the delegated credential has less than 20 minutes validity.
  • The server SHOULD NOT request the client delegates a fresh credential when the client makes a GridSite COPY request and the delegated credential has more than 20 minutes validity.
  • The server SHOULD reject the GridSite COPY request if the delegated credential has expired and the client failed to delegate a non-expired credential when requested.
  • The server MAY reject the GridSite COPY request if the delegated credential has less than 20 minutes validity and the client failed to delegate a credential with greater than 20 minutes validity when requested.

Server requests delegation

A server requests the client delegates a credential by responding to the COPY request with a 302 response that includes the X-Delegate-To response header and a Location response header.

The X-Delegate-To header's value is a space-separated list of absolute URLs. Each URL is a GridSite delegation endpoint.

The client SHOULD delegate a credential to one of the listed GridSite delegation endpoints. If a delegation attempt fails then the client SHOULD attempt to delegate with another of the supplied GridSite URLs. The client MAY contact the GridSite endpoints in any order, but SHOULD try all endpoints before failing.

Once the client has successfully delegated a credential then the client SHOULD issue the same COPY request to the URL provided in the Location response header.

Request Headers

We utilize several HTTP headers in the WebDAV Third-Party Copy (TPC) request that modify how the copy works. These are listed below:

RequireChecksumVerification
Controls whether a successful transfer requires the service to obtain a remote checksum value that using the same checksum algorithm of a known checksum value. This could be because the 3rd-party server does not support reporting checksum values or the local server does not know how to generate the supplied checksum. Valid values are true (fail transfer if the remote server does not provide a matching checksum value) and false (allow transfer to succeed in the absence of checksum information). Regardless of the value of RequireChecksumVerification, if the server determines the checksum is incorrect, it SHOULD fail the transfer. If not specified, then the behavior is implementation-specific. Clients SHOULD specify this header and not rely on implementation-specific behavior. We note two distinct behaviors:
  • DPM: DPM ignores this header and never performs checksum validation.
  • dCache: If not specified then (by default) true is used (the admin may change this default). dCache will always use RFC 3230 to request a checksum from the remote host and fail the transfer if there is a checksum mismatch. However, if the remote host does not return a checksum (or does not support the checksum type), this option controls whether or not dCache will fail the transfer.

Credential
Controls from which source the TPC request credentials will be obtained. Clients performing a COPY request SHOULD always set the Credential header.
  • Currently defined values are gridsite (obtain via gridsite delegation), oidc (obtain credential via OAuth 2.0 Token Exchange), or none.
  • If the client does not specify the Credential header, then the handling of credential delegation is implementation-specific.
  • If token authentication (or any other HTTP header-based authentication) with the inactive endpoint is used, then the client may set Credential to none and specify headers for the transfer via the TransferHeader* mechanism.
  • If the client specifies the Credential header, the server MUST utilize the corresponding mechanism or reject the request. If the server does not support the mechanism, it MUST response with 400 status code.

Overwrite
Controls whether the client desires the TPC request to overwrite an existing file. Valid values are T (overwrite any existing file) and F (fail request if file already exists). If not specified then T is used.

Source
The URL from which data will be read. This makes the COPY request a pull request. The value must be a valid URL. Must not be specified in a request that defines the Destination header.

Destination
Te URL to which data will be written. This makes the COPY request a push request. Must not be specified in the same request that defines the Source header.

TransferHeader*
any header that starts TransferHeader is copied into the GET or PUT request but without this prefix; for example a header TransferHeaderAuthorization: bearer foo is coped into TPC requests as Authorization: bearer foo.

Response

In addition to the general status codes possible, the following status codes have specific applicability to COPY:

201 (Created)
The source resource was successfully copied. The COPY operation resulted in the creation of a new resource.

202 (Accepted)
Copy request accepted. Servers may provide progress information about the COPY request as part of chunked encoding response (RFC 7230).

204 (No Content)
The source resource was successfully copied to a preexisting destination resource.

207 (Multi-Status)
Support for the WebDav (RFC-2518 10.6)

30X (Redirections)
All redirections code MUST be supported by the client.

403 (Forbidden)
The operation is forbidden. A special case for COPY could be that the source and destination resources are the same resource.

409 (Conflict)
A resource cannot be created at the destination until one or more intermediate parent resources have been created.

412 (Precondition Failed)
A precondition header check failed, e.g., the Overwrite header is "F" and the destination URL is already mapped to a resource.

Note that the 202 (Accepted) response code indicates that the transfer request was accepted, but does not necessarily indicate whether the transfer request will succeed, nor is it an indication that the request has even started. There are a number of early checks that can cause the transfer request to fail before any bytes are moved (e.g., invalid URL provided in the Source: header). It is implementation-defined whether any of these are performed prior to accepting the transfer, or if "acceptance" is just into an internal queue.

The 202 (Accepted) response (and corresponding copy progress discussed below) is the only mechanism provided to indicate progress of a transfer. Accordingly, client applications (such as FTS) may time out long-running transfers if this mechanism is not used.

Monitoring copy progress

The 202 (Accepted) response must be followed by a sequence of chunks as part of a chunked transfer encoding response. Each chunk is a referred to as a "performance marker" and updates the client about the progress of the transfer. Each performance marker is meant to be processed independently by the client (i.e., not encoded into multiple chunks) and sent periodically by the server. The Xrootd implementation, for example, sends a performance marker every 5 seconds.

The format of the marker is as follows (newline characters are shown unencoded but are encoded in the response):

Perf Marker\n
Timestamp: $(UNIX TIMESTAMP)\n
Stripe Index: 0\n
Stripe Bytes Transferred: $(BYTES)\n
Total Stripe Count: 1\n
RemoteConnections: $(CONNECTIONS)\n
End\n

If no bytes have been transferred yet, then Stripe Index, Stripe Bytes Transferred, and Total Stripe Count may be omitted.

The RemoteConnections line is optional. If present, it should list the existing network connections currently associated with this transfer. This should be a comma-separated list formatted as ::, where should be set to tcp for a TCP-based connection. An example may be tcp:129.93.3.4:1234. Notes on this field:

  • This should indicate the connections that currently exist when the performance marker was created. It should not be the list of all historical connections for this transfer.
  • Connections should be provided even if the transfer has not started (but the connection exists - an example may be a HEAD request prior to the transfer) or if the transfer is done via a non-HTTPS-based protocol.
  • Another potential transport may be udt.
  • IPv4 address should be given as the normal dot-decimal representation. Clients should accept IPv4-mapped (e.g., ::ffff:192.0.2.128) addresses and interpret them as an IPv4-based connection.
  • Ordering of the connections is not considered significant.

Example:

Perf Marker\n
Timestamp: 1537788010\n
Stripe Index: 0\n
Stripe Bytes Transferred: 238745\n
Total Stripe Count: 1\n
RemoteConnections: tcp:129.93.3.4:1234,tcp:[2600:900:6:1301:268a:7ff:fef6:a590]:2345\n
End\n

The last response chunk should be either:

success: Created

or

failure: $(ERROR MESSAGE)

where $(ERROR MESSAGE) is a message explaining the failure, meant to be human readable.

Edit | Attach | Watch | Print version | History: r15 | r11 < r10 < r9 < r8 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r9 - 2018-10-23 - PaulMillarExCern
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback