TWiki> LCG Web>WLCGTransferMonitoring (revision 37)EditAttachPDF

WLCG Transfer Monitoring

This page documents the WLCG Transfer Monitoring project.

Motivation

  • Currently there is no tool which can provide an overall view of data transfer on the WLCG scope (across LHC experiments, across various technologies used, for example FTS and xrootd, across multiple local FTS instances, etc..)
  • Every LHC experiment follows it’s own data transfer through the VO-specific monitoring system.
  • 3 LHC experiments use FTS for data transfer.
  • For obtaining data transfer statistics experiments parse the monitoring pages of the local FTS instances and/or generate statistics inside the VO-specific monitoring systems.
  • Queries of the local FTS instances produce additional load on the local FTS instances.
  • There is a clear similarity between the tasks performed by all VO-specific transfer monitoring systems. Operations like aggregation of the FTS transfer statistics is done by every VO separately, though can be done once , centrally and then can be served to all experiments via well defined set of APIs
  • In order to organize data transfer in a most efficient way experiments would like to have information about FTS queues, about correlations of data transfer between experiments, some other information which is known to FTS but currently not available to the experiments, for example, latencies related to SRM operations during data tranfers, etc...

Instrumentation of the local FTS instances for reporting of the monitoring information

Instrumentation of the FTS instances for reporting of the data transfer information via MSG will allow to :

  • broadcast data transfer information to all interested parties avoiding additional load on the local FTS DBs caused by chaotic parsing of the local FTS monitoring pages from many clients
  • harmonize client side code and make it independent from the eventual changes of the FTS monitoring UIs

Xrootd federations or any other services dealing with data transfer can be instrumented for the reporting of the data transfer monitoring information via MSG

Data transfer global monitoring system

Information from MSG will be consumed by the central transfer monitoring system which will be responsible for the following tasks:

  • perform common tasks as aggregation of the transfer monitoring statistics, generating summaries, etc..
  • expose transfer monitoring data to users (via UIs) and other applications (via APIs)

Architecture

page0001.jpg

Implementation

We have a good chance to make fast progress in the development of the global WLCG transfer monitoring system by re-using a lot of code and experience from the ATLAS DDM Dashboard in terms of schema, statistics aggregation procedures and user interface.

Transfers Dashboard

Links

Development

See WLCGTransfersDashboard.

Draft of the content of the information which has to be published from FTS to MSG

ATLAS, CMS and LHCb are using FTS for data transfer. Every experiment had developed a monitoring system in order to monitor transfer on the scope of a single experiment. For the moment there is no monitoring tool which provides a single entry point to the transfer monitoring information on the WLCG scope. The first step required in order to develop such a system would be to enable publishing of data transfer monitoring information to MSG.This information can be consumed either by the experiment-specific monitoring systems or by the WLCG global transfer monitoring system. The content of the information which has to be published was discussed among the FTS developers and experiment representatives involved in the development of the data management systems. Below is the first draft describing the content of this information.

There will be two types of messages:

  • "Transfer events"

  • "Transfer queue"

  • Name of the topic/queues has format : transfer*

transfer.start

transfer.complete

transfer.rejected

transfer.queue

For both types of messages every message should contain

  • Type of the message A.U. - we will use different queues or topics for different types so we don't need this parameter
  • FTS instance identifier
  • UTC time stamp of the report

Transfer events can be of two types:

  • When transfer starts
  • When transfer finishes

"Transfer started": sent when the file moves from "Ready" to "Active" state in the queue.

  • Unique ID of the message. This field will include transfer_id plus the FQN of the transfer agent. Also, this field will associate the "transfer started" with "transfer completed" events dublication - we have "Transfer id" and "FQN of the agent initiating the transfer"
  • FQN of the agent initiating the transfer
  • Transfer id
  • Source SURL
  • Destination SURL
  • Source host
  • Destination host new values to be provided

"Transfer complete": sent when the file moves from "Active" to "Done" state. Contains details about the transfer:

  • Unique ID of the message. This field will include transfer_id plus the FQN of the transfer agent. Also, this field will associate the "transfer started" with "transfer completed" events dublication - we have "Transfer id" and "FQN of the agent initiating the transfer"
  • FQN of the agent initiating the transfer
  • Transfer mode(urlcopy, srmcopy). This field will contain info if the transfer is a gridftp transfer(urlcopy) or a direct srmcopy(srmcopy)
  • Transfer id
  • User DN A.U. - can be moved to "user description"
  • User description
  • Source SRM version (1.1, 2.0, etc)
  • Destination SRM version (1.1, 2.0, etc)
  • Source file type (SURM, TURL, URL)
  • Destination file type (SURM, TURL, URL)
  • VO
  • Time stamp transfer_started
  • Time stamp transfer_completed
  • Transfer duration
  • Time stamp checksum_started for source A.U. - don't need this parameter course we have "checksum duration for source"
  • Time stamp checksum_started for source A.U. - don't need this parameter course we have "checksum duration for source"
  • checksum duration for source
  • Time stamp checksum_started for destination A.U. - don't need this parameter course we have "checksum duration for destination"
  • Time stamp checksum_started for destination A.U. - don't need this parameter course we have "checksum duration for destination"
  • Checksum duration for destination
  • Transfer timeout value
  • Checksum timeout value
  • Total bytes transfered (this will include the info retrieved from the performance markers, no matter if the transfer is successful or not)
  • Transfer average throughput (only for gridftp transfers), in kbps
  • Final transfer state: OK/Error/Aborted
  • Reason of failure, error message, as detailed as it can be
  • Failure phase (preparation, transfer, checksum, etc)
  • Source or destination failed
  • Number of streams
  • Tcp_buffer_size
  • Block_size
  • File size
  • A boolean to indicate if the transfer was interrupted by a user (manual) A.U. - parameter moved to "final transfer state" - Aborted
  • Channel used
  • Type of the channel (a "dedicated" channel (CERN-CNAF), a "cloud channel" (CERN-T1S) or a "star" channel (CERN-STAR)
  • Sites in GOCDB convention linked by the channel A.U. - If we are toking about Source Site Name and Target Site Name we can get this information from "Channel used" parameters. Probably collector will do this work
  • Time spent in SRM PREPARATION for the SOURCE
  • Time spent in SRM PREPARATION for the DESTINATION
  • Time of transfers (physical byte streaming)
  • Time spent in SRM FINALIZATION for the SOURCE
  • Time spent in SRM FINALIZATION for the DESTINATION
  • Time stamp of the event initiation (the time stamp the message was initiated)
  • SRM SPACE TOKEN

"Queue status reports" - sampled and sent at regular time intervals (e.g. once per 10 minutes). Also split per VO.

  • Channel the report relates to
  • type of the channel (a "dedicated" channel (CERN-CNAF), a "cloud channel" (CERN-T1S) or a "star" channel (CERN-STAR).
  • sites in GOCDB convention linked by the channel
  • Number of files in "Active" transfer state, and Active/Max ratio - on the channel and on each "link" in the channel

where "link" = "(source endpoint, destination endpoint) ordered pair" - e.g. (srm-cms.cern.ch --> cmssrm.fnal.gov)

  • Number of files in "Ready" state waiting for transfer - on the channel and on each "link" in the channel

"Transfer started message" draft example (see #FTS_message_structure for current message structure)

{
"agent_fqdn" : "fts501.cern.ch", //  FQN of the transfer agent (with Transfer id will be unique identifier of the transfer in monitoring system 
"transfer_id" : "CERN-GRIDKA__2012-01-16-1746_B4iOjC", // Transfer id
"endpnt" : "https://fts-pilot-service.cern.ch:8443/glite-data-transfer-fts/services/FileTransfer", // End point
"timestamp" : "1326736000000.000000", // UTC time stamp of the report
"src_srm_v" : "2.2.0", // Source SRM version
"dest_srm_v" : "2.2.0", // Destination SRM version
"vo" : "cms", // Virtual organization
"src_url" : "srm://srm-cms.cern.ch/castor/cern.ch/cms/store/PhEDEx_LoadTest07_4/LoadTest07_CERN_2ae", // Source url
"dst_url" : "srm://cmssrm-fzk.gridka.de/pnfs/gridka.de/cms/disk-only/store/PhEDEx_LoadTest07/LoadTest07_Debug_CH_CERN_Export/DE_KIT/5364/LoadTest07_T0_CERN_2ae_DWBDPqqijlHraA6m_5364", // Destination url
"src_hostname" : "srm-cms.cern.ch", // Source hostname
"dst_hostname" : "cmssrm-fzk.gridka.de", // Destination hostname
"src_site_name" : "CERN-PROD", // Source site name
"dst_site_name" : "FZK-LCG2", // Destination site name
"t_channel" : "CERN-GRIDKA", // Channel used
"srm_space_token_src" : "", // Source SRM SPACE TOKEN
"srm_space_token_dst" : "" // Destiantion SRM SPACE TOKEN
}

"Transfer complete message" draft example (see #FTS_message_structure for current message structure)

{
"tr_id" : "CERN-INFN__2012-01-12-1524_TChQ6e", 
"endpnt" : "https://fts-pilot-service.cern.ch:8443/glite-data-transfer-fts/services/FileTransfer", // End point
"src_srm_v" : "2.2.0", // Source SRM version
"dest_srm_v" : "2.2.0", // Destination Source SRM version
"vo" : "cms", // Virtual organization
"src_url" : "srm://srm-cms.cern.ch/castor/cern.ch/cms/store/PhEDEx_LoadTest07_4/LoadTest07_CERN_337", // Source url
"dst_url" : "srm://storm-fe-cms.cr.cnaf.infn.it/cms/store/PhEDEx_LoadTest07/IMPORT/LoadTest07_Debug_CERN/CNAF/7903/LoadTest07_T0_CERN_337_SKCANYqjpE7Y2vtR_7903", // Destination url
"src_hostname" : "srm-cms.cern.ch", // Source hostname
"dst_hostname" : "storm-fe-cms.cr.cnaf.infn.it", // Destination hostname
"src_site_name" : "CERN-PROD", // Source site name
"dst_site_name" : "INFN-T1", // Destination site name
"t_channel" : "CERN-INFN", // Channel used
"channel_type" : "urlcopy" // Transfer mode(urlcopy, srmcopy)
"timestamp_tr_st" : "1326381897000.000000", // Time stamp transfer_started
"timestamp_tr_comp" : "1326381922000.000000", // Time stamp transfer_completed
"tr_timestamp_start" : "1326381886000.000000", // Transfer process start irrespective of failure in any phase
"tr_timestamp_complete" : "1326381934000.000000", // Transfer process complete irrespective of failure in any phase
"timestamp_chk_src_st" : "1326381886000.000000", // Checksum duration for transfer started
"timestamp_chk_src_ended" : "1326381892000.000000", // Checksum duration for transfer ended
"timestamp_checksum_dest_st" : "1326381934000.000000", // Checksum duration for transfer started
"timestamp_checksum_dest_ended" : "1326381934000.000000", // Checksum duration for transfer ended
"t_timeout" : "3600", // Transfer timeout value
"chk_timeout" : "3600", // Checksum timeout value
"t_error_code" : "", // Error code 
"tr_error_scope" : "", // Error scope (SOURCE|DESTINATION)
"t_failure_phase" : "", //Failure phase
"tr_error_category" : "", // Error category 
"t_final_transfer_state" : "Ok", // Final transfer state
"tr_bt_transfered" : "2684354560", // Total bytes transfered
"nstreams" : "3", // Number of streams
"buf_size" : "0", // Buffer_size
"tcp_buf_size" : "0", // Tcp_buffer_size
"block_size" : "0", // Block_size
"f_size" : "2684354560", // File size
"time_srm_prep_st" : "1326381886000.000000", // Time spent in SRM PREPARATION for the transfer started
"time_srm_prep_end" : "1326381897000.000000", // Time spent in SRM PREPARATION for the transfer ended
"time_srm_fin_st" : "1326381922000.000000", // Time spent in SRM FINALIZATION for the transfer started
"time_srm_fin_end" : "1326381934000.000000", // Time spent in SRM FINALIZATION for the transfer ended
"srm_space_token_src" : "", // Source SRM SPACE TOKEN
"srm_space_token_dst" : "", // Destiantion SRM SPACE TOKEN
"t__error_message" : "", // Error description
}

"Queue status message" draft example (see #FTS_message_structure for current message structure)

{
"fts_id":"https://fts22-t0-export.cern.ch:8443/glite-data-transfer-fts/services/FileTransfer", // FTS instance identifier
"time":"11:01:36.12", // UTC time stamp of the report
"channels": // array of the channels
[
   {
     "channel_name":"CERN-CERN", // name of the channel
     "channel_type":"", // type of the channel
     "links":  // array of links
       [
        {
          "source_host":"",  // source host of the link
          "dest_host":"", // dest host of the link
          "active":"512", // number of Active transfer
          "ready":"0" // number of Ready transfers
        }
       ]
   },
   {
     "channel_name":"CERN-DESY",
     "channel_type":"",
     "links":
       [
        {
          "source_host":"lxbra1910.cern.ch",
          "dest_host":"ennis.desy.de",
          "active":"24",
          "ready":"0"
        },
        {
          "source_host":"lxbra1910.cern.ch",
          "dest_host":"cork.desy.de",
          "active":"47",
          "ready":"0"
        },
       {
          "source_host":"lxbra1910.cern.ch",
          "dest_host":"galway.desy.de",
          "active":"6",
          "ready":"0"
       }
      ]
   }
  ]
}

Comments on draft

The following comments on the draft have been received and will be taken into account for the next iteration of the draft. Comments received up to 15/06/2011 are summarised in the section Recommended changes to draft. Any further comments can be added directly in this section or sent to wlcg-transfer-monitor@cernNOSPAMPLEASE.ch

David Tuckett (transcribed from email dated 02/05/2011)

I've reviewed the sample FTS messages given on https://twiki.cern.ch/twiki/bin/view/LCG/WLCGTransferMonitoring

In general, there is more information in the proposed FTS messages than in the current DDM Site Service callbacks, so we should be able to build a monitoring system with at least the same statistics as DDM Dashboard. However, I do have 3 comments.

Firstly, I am concerned as to how we map to the VO specific endpoints from the message fields.

In the started message, we have s_surl, d_surl, s_host and d_host. In the completed message, we have vo, channel and srm_space_token. Somehow these fields must map to vo endpoints. Taking ATLAS as an example, we need to recover something like site_token, e.g. "GRIF-LPNHE_PRODDISK". Perhaps the DQ2 team, who presumably know how this mapping is done in the opposite direction, can comment? [D.T. Added to recommendations as open issue.]

Secondly, what is the rational for choosing which fields are in the started message and which are in the completed message?

For example, vo is in the completed message not the started message. It seems to me that if any of the fields are available at the time of the started message then they should be included there rather than in the completed message. That way, if the completed message is never received we can more easily investigate the issue. [D.T. Added to recommendations.]

Thirdly, I see only a single field for reason of failure. What sort of value do we expect to have in f_reason?

It would be useful to both an error code and the error message (truncated if necessary). [D.T. Added to recommendations.]

Tony Wildish (transcribed from email dated 04/05/2011)

I've taken a look at the transfer monitoring wiki, and have a couple of comments/questions.

1) I suggest putting timestamps in epoch-seconds (or milli/micro seconds) rather than a human-readable time. You probably get epoch time from FTS, and its best to let the client format it if, and only if, they want to. PhEDEx will prefer epoch seconds for the application logic, and will have to convert back to it anyway. [D.T. Added to recommendations.]

2) for transfer states, I believe there are other states that are of interest to us, such as if a job is canceled before it starts. We should have messages for those sorts of conditions. [D.T. The current draft covers the following end states: OK, Error, Aborted. More state transitions can be added if required.]

3) Rather than have a separate message for each transfer-type, I would have a semi-fixed structure, like:

transfer_message = {
            type = 'Active',
            other_param1 = 'asdf', // specific to the type
            other_param2 = 1234    // specific to the type
           }

so that for all transfer_messages I need only look at the 'type' to know what to do with it. I think this will make it easier to add client-code for new types. I can add a new handler for a new type, and maintain a constant core that simply dispatches different types to their handlers, without having to know what they are. If the type is encoded in the message-name, I can't do that. [D.T. The message type is distinguished by topic/queue name so unwanted messages can easily be ignored. Common attributes between message types should be consistent. e.g. t_id, afqn, time. Added to recommendations.]

E.g, if we decide that 'Pending' or 'Waiting' are also interesting transitions, they can be added, and I can process them, with no impact on my core code.

4) I'm a little unsure of the intent of all the information in the transfer-completed message. There are some redundant values there (transfer started/ended/duration, pick one and drop it!) and others that are not interesting (transfer average throughput). I would avoid calculating quantities that the user may not need and could calculate for themselves if they did. It will reduce your CPU overhead as well as wire-weight of the messages. [D.T. Added to recommendations.]

You could also drop the 'channel type', since the user should be able to look this up once and cache it, it's not needed with every completion message. Presumably such information can be obtained from the API? [D.T. Added to recommendations as open issue.]

5) the 'Transfer id' of a message, does this identify the FTS transfer job, or the individual file within that job? [D.T. Added to recommendations as open issue.]

Recommended changes to draft 15/06/2011

This section contains a summary of the recommended changes to the draft messages based on feedback on the draft messages.

General recommendations

  1. Replace all string timestamps by UTC seconds since Unix epoch.
    e.g. '2011-06-15T10:05:53.934122' is replaced by 1308132353.934122
    [JSON has no standard date format and it is easy and efficient to work with UTC epoch seconds. As clarification, see Python example below.]

    Python 2.6.5 (r265:79063, Apr 16 2010, 13:57:41) 
    >>> import time
    >>> from datetime import datetime
    >>> s = time.time()
    >>> s # UTC seconds since Unix epoch
    1308132353.9341221
    >>> d = datetime.utcfromtimestamp(s)
    >>> d # Python UTC datetime object
    datetime.datetime(2011, 6, 15, 10, 5, 53, 934122)
    >>> d.isoformat() # ISO 8601 format string
    '2011-06-15T10:05:53.934122'
    >>>

"Transfer started message" recommendations

  1. Move any information that is available at transfer start from "Transfer complete message" to "Transfer started message".
    e.g. "vo", "channel", "srm_space_token", ...

"Transfer complete message" recommendations

  1. Add "time" // UTC seconds of the report.
    [To be consistent with other messages.]
  2. Add "f_code": 123 // Error code of failure.
    [Assuming such a error code exists, this supplements "f_reason" which is the error message truncated if necessary.]
  3. Remove calculated values:
    • "t_duration"
      [Assuming this is just t_completed - t_started.]
    • "throughput"
      [Assuming this is just b_transferred/1024/(t_completed-t_started).]

Open issues

  1. How to translate from host, surl, channel, srm_space_token to VO-specific endpoint names such as "GRIF-LPNHE_PRODDISK"? [Julia will check that endpoint identifiers in the messages are sufficient to recover VO-specific names via BDII]
  2. Can "t_channel" in "Transfer complete message" be derived from "channel" or an API? [We will leave this in the message even if it is redundant because it simplifies the client code.]
  3. Does "t_id" identify an FTS job or an individual file within the FTS job? ["t_id" identifies the FTS job. Michail will investigate how to identify the individual file which the message is about.]

Info about dedicated message broker

Brokers

| hostname | state | ActiveMQ version | DNS alias | Monitoring links | Aminstrative UI

gridmsg007 test 5.5.1-fuse-01-11 dashb-mb-test.cern.ch   https://dashb-mb-test.cern.ch/admin/index.jsp
gridmsg107 production 5.5.1-fuse-01-06 dashb-mb.cern.ch   https://dashb-mb.cern.ch/admin/index.jsp
gridmsg108 production 5.5.1-fuse-01-06 dashb-mb.cern.ch   https://dashb-mb.cern.ch/admin/index.jsp

Ports

  • STOMP+SSL: 6162 (consumer)
  • STOMP :6163 (producer)
  • OpenWire+SSL: 6167

Use-case details can be found here

Topics and queues on the test broker related to the Global WLCG transfer

use-case topic queues
fts transfers transfer.fts_monitoring_complete transfer.fts_monitoring_complete
  transfer.fts_monitoring_start transfer.fts_monitoring_complete
  transfer.fts_monitoring_queue_status transfer.fts_monitoring_status
    transfer.fts_monitoring_rejected

Topics and queues on the production broker related to the Global WLCG transfer

use-case topic queues
fts transfers transfer.fts_monitoring_complete Consumer.dashb.transfer.fts_monitoring_complete
  transfer.fts_monitoring_start Consumer.dashb.transfer.fts_monitoring_start
  transfer.fts_monitoring_queue_status ?
    transfer.fts_monitoring_rejected

--+++ Alarms

Alarms should be sent to dashb-mb-alarms@cernNOSPAMPLEASE.ch, curent rule is more than 5000 messages are stuck in the queues

FTS message structure

Sample messages to demonstrate the structure/content:

fts_monitoring_start (extracted 2011-11-18T10:31:03 UTC):

{
    "agent_fqdn":"fts501.cern.ch",
    "transfer_id":"RAL-CERN__2011-11-18-1031_OlJgnE",
    "endpnt":"https://fts-pilot-service.cern.ch:8443/glite-data-transfer-fts/services/FileTransfer",
    "timestamp":"1321612263000.000000",
    "src_srm_v":"2.2.0",
    "dest_srm_v":"2.2.0",
    "vo":"cms",
    "src_url":"srm://srm-cms.gridpp.rl.ac.uk/castor/ads.rl.ac.uk/prod/cms/store/LoadTest07/LoadTest07_RAL/LoadTest07_T1_UK_RAL_D2",
    "dst_url":"srm://srm-cms.cern.ch/castor/cern.ch/cms/store/PhEDEx_LoadTest07/LoadTest07_Debug_RAL/CERN/267/LoadTest07_RAL_D2_1tmRHS6x0eui1vgh_267",
    "src_hostname":"srm-cms.gridpp.rl.ac.uk",
    "dst_hostname":"srm-cms.cern.ch",
    "src_site_name":"RAL-LCG2",
    "dst_site_name":"CERN-PROD",
    "t_channel":"RAL-CERN",
    "srm_space_token_src":"",
    "srm_space_token_dst":"CMS_DEFAULT"
}

fts_monitoring_complete (extracted 2011-11-18T10:31:17 UTC):

{
    "tr_id":"RAL-CERN__2011-11-18-1031_14JIbi",
    "endpnt":"https://fts-pilot-service.cern.ch:8443/glite-data-transfer-fts/services/FileTransfer",
    "src_srm_v":"2.2.0",
    "dest_srm_v":"2.2.0",
    "vo":"cms",
    "src_url":"srm://srm-cms.gridpp.rl.ac.uk/castor/ads.rl.ac.uk/prod/cms/store/LoadTest07/LoadTest07_RAL/LoadTest07_T1_UK_RAL_46",
    "dst_url":"srm://srm-cms.cern.ch/castor/cern.ch/cms/store/PhEDEx_LoadTest07/LoadTest07_Debug_RAL/CERN/267/LoadTest07_RAL_46_C7Kwk7iFTAEYo71h_267",
    "src_hostname":"srm-cms.gridpp.rl.ac.uk",
    "dst_hostname":"srm-cms.cern.ch",
    "src_site_name":"RAL-LCG2",
    "dst_site_name":"CERN-PROD",
    "t_channel":"RAL-CERN",
    "timestamp_tr_st":"1321612277000.000000",
    "timestamp_tr_comp":"1321612307000.000000",
    "timestamp_chk_src_st":"1321612264000.000000",
    "timestamp_chk_src_ended":"1321612271000.000000",
    "timestamp_checksum_dest_st":"1321612309000.000000",
    "timestamp_checksum_dest_ended":"1321612309000.000000",
    "t_timeout":"1800",
    "chk_timeout":"1800",
    "t_error_code":"",
    "tr_error_scope":"",
    "t_failure_phase":"",
    "tr_error_category":"",
    "t_final_transfer_state":"Ok",
    "tr_bt_transfered":"2684354560",
    "nstreams":"5",
    "buf_size":"0",
    "tcp_buf_size":"0",
    "block_size":"0",
    "f_size":"2684354560",
    "time_srm_prep_st":"1321612263000.000000",
    "time_srm_prep_end":"1321612276000.000000",
    "time_srm_fin_st":"1321612307000.000000",
    "time_srm_fin_end":"1321612309000.000000",
    "srm_space_token_src":"",
    "srm_space_token_dst":"cms:CMS_DEFAULT",
    "t__error_message":""
}

fts_monitoring_queue_status (extracted 2011-11-24T13:45:12 UTC):

{
    "fts_id":"https://vtb-generic-32.cern.ch:8443/glite-data-transfer-fts/services/FileTransfer",
    "time":"1322142312000.000000",
    "vo":{
        "voname":"dteam",
        "channel":{
            "channel_name":"CERN-CERN",
            "channel_type":"",
            "links":[
                {
                    "source_host":"lxbra1910.cern.ch",
                    "dest_host":"lxbra2502.cern.ch",
                    "active":"0",
                    "ratio":"0.000000%",
                    "ready":"0"
                },
                {
                    "source_host":"lxbra2502.cern.ch",
                    "dest_host":"lxbra2502.cern.ch",
                    "active":"0",
                    "ratio":"0.000000%",
                    "ready":"0"
                },
                {
                    "source_host":"lxbra1910.cern.ch",
                    "dest_host":"lxbra1910.cern.ch",
                    "active":"0",
                    "ratio":"0.000000%",
                    "ready":"0"
                }
            ]
        },
        "channel":{
            "channel_name":"CERN-DESY",
            "channel_type":"",
            "links":[
                {
                    "source_host":"lxbra1910.cern.ch",
                    "dest_host":"ennis.desy.de",
                    "active":"0",
                    "ratio":"0.000000%",
                    "ready":"0"
                },
                {
                    "source_host":"lxbra1910.cern.ch",
                    "dest_host":"dublin.desy.de",
                    "active":"0",
                    "ratio":"0.000000%",
                    "ready":"0"
                },
                {
                    "source_host":"lxbra1910.cern.ch",
                    "dest_host":"cork.desy.de",
                    "active":"0",
                    "ratio":"0.000000%",
                    "ready":"0"
                },
                {
                    "source_host":"lxbra1910.cern.ch",
                    "dest_host":"galway.desy.de",
                    "active":"0",
                    "ratio":"0.000000%",
                    "ready":"0"
                }
            ]
        },
        "channel":{
            "channel_name":"DESY-CERN",
            "channel_type":"",
            "links":[
                {
                    "source_host":"dublin.desy.de",
                    "dest_host":"lxbra1910.cern.ch",
                    "active":"0",
                    "ratio":"0.000000%",
                    "ready":"0"
                },
                {
                    "source_host":"cork.desy.de",
                    "dest_host":"lxbra1910.cern.ch",
                    "active":"0",
                    "ratio":"0.000000%",
                    "ready":"0"
                },
                {
                    "source_host":"ennis.desy.de",
                    "dest_host":"lxbra1910.cern.ch",
                    "active":"0",
                    "ratio":"0.000000%",
                    "ready":"0"
                },
                {
                    "source_host":"galway.desy.de",
                    "dest_host":"lxbra1910.cern.ch",
                    "active":"0",
                    "ratio":"0.000000%",
                    "ready":"0"
                }
            ]
        },
        "channel":{
            "channel_name":"DESY-DESY",
            "channel_type":"",
            "links":[
                {
                    "source_host":"galway.desy.de",
                    "dest_host":"galway.desy.de",
                    "active":"0",
                    "ratio":"0.000000%",
                    "ready":"0"
                },
                {
                    "source_host":"ennis.desy.de",
                    "dest_host":"ennis.desy.de",
                    "active":"0",
                    "ratio":"0.000000%",
                    "ready":"0"
                },
                {
                    "source_host":"cork.desy.de",
                    "dest_host":"cork.desy.de",
                    "active":"0",
                    "ratio":"0.000000%",
                    "ready":"0"
                },
                {
                    "source_host":"dublin.desy.de",
                    "dest_host":"dublin.desy.de",
                    "active":"0",
                    "ratio":"0.000000%",
                    "ready":"0"
                }
            ]
        }
    }
}

Further Information

ATLAS DDM transfer callback rate

The following shows the rate of transfer callbacks (calculated over 1 minute bins) received by ATLAS DDM Dashboard for 5th March 2011. The chosen day has high load but is not atypical, see next plot.

callback_rate_1_day.png

The following shows the daily average, maximum and minimum rate of transfer callbacks (calculated over 10 minute bins) received by ATLAS DDM Dashboard from August 2010 to August 2011.

callback_rate_1_year.png

Average size of messages from FTS: Start Message: 0.6 kB End Message: 1.6 kB

XROOTD monitoring

XrootdMonitoring Implementation of the XROOTD monitoring foresees 3 levels of hierarchy: site, federation, global.

Monitoring on the site level is implemented in the framework of the Tier3 monitoring project and is currently under validation by the ATLAS pilot sites. More information can be found in the attached talk Installation and documetation instructions can be found here https://svnweb.cern.ch/trac/t3mon/wiki/xRootdAndGangliaDetailed. xrootd has a buildin monitoirng functionality which enables reporting of the monitoring data via UDPs.

There are curently two data flows enabled, smry and detailed ones. Smry data flow does not contain per-file information and therefore does not provide enough info about transfered files, users, etc... THe smry flow is currently used by the Monalisa xrootd monitoring repository. In order to have a complete view, CMS (UCSD) had developed collector which allows to read detailed flow and to create per-file reports. Initially these reports was sent via UDP. Recently, new version of the UCSD collector was created in order to report to ActiveMQ. UCSD collector is curently deployed one per federation (US ATLAS and US CMS) , but deployment policy can be flexible depending on the size of the federation and volume of reported data. Reports sent to ActiveMQ are consumed by the WLCG Transfer Dashboard, CMS popularity and new federation monitoring system. The latter should provide complete view of all information required for operating of the xrootd federation and for federation-member site support teams. Requirements for this system are being collected from the ATLAS xrootd community (see presentation of Rob Gardner in the attachment)

The overall data flow is presented in the schema below.

* DataFlowFedMon.jpg:
DataFlowFedMon.jpg

Data flow for the federation-level monitor

* XrootdFedDataFlow.jpg:
XrootdFedDataFlow.jpg

XRootD - prototype file-access report message format

Example of a message currently sent by xrdmon on a file-close event. This is in plain-text, just for readability, JSON is already used for ActiveMQ messages. Some comments are inserted into the message itself using the // syntax.

#begin
// 0. Unique record ID constructed from current time; required by Gratia
//
unique_id=xrd-1341778615736000
//
// 1. Information about file and data-transfer
//
file_lfn=/atlas/dq2/user/ilijav/HCtest/user.ilijav.HCtest.1/group.test.hc.NTUP_SMWZ.root
file_size=797152257
start_time=1341778586
end_time=1341778615
// These byte-counts are reported by server in the close message when xrootd.monitor contains the 'files' flag.
// Can be rounded up (if number takes more some number of bits).
read_bytes_at_close=797152257
write_bytes_at_close=0
// These are accumulated by the collector from individual read / write request
// traces that are only sent when xrootd.monitor contains the 'io' flag (requires (or maybe implies) also 'files').
read_bytes=797152257
read_operations=190
read_min=234497
read_max=8388608
read_average=4195538.194737
read_sigma=418468.184710
read_single_bytes=797152257
read_single_operations=190
read_single_min=234497
read_single_max=8388608
read_single_average=4195538.194737
read_single_sigma=418468.184710
read_vector_bytes=0
read_vector_operations=0
read_vector_min=0
read_vector_max=0
read_vector_average=0.000000
read_vector_sigma=0.000000
read_vector_count_min=0
read_vector_count_max=0
read_vector_count_average=0.000000
read_vector_count_sigma=0.000000
write_bytes=0
write_operations=0
write_min=0
write_max=0
write_average=0.000000
write_sigma=0.000000
//
// 2. Information about user / session
//
user_dn=
user_vo=atlas
user_role=
user_fqan=usatlas
client_domain=ochep.ou.edu
client_host=tier2-01
server_username=/DC=org/DC=doegrids/OU=People/CN=John R. Hover 47116
app_info=
//
// 3. Information about server
//
server_domain=usatlas.bnl.gov
server_host=dcdoor12
#end

Notes:

  • The message is composed of three "sections":
    1. file and file-access information;
    2. user / session /client information;
    3. information about xrootd server that served the file / session.
  • About byte-counts:
    1. read/write_bytes_at_close are reported by the server and are rounded up if the numbers are too big.
    2. Other read_, read_single_ and read_vector_ fields are summed up internally by the collector whein io traces are enabled. The read_ without single or vector are just summed up values of single and vector entries and are redundant (there because that was the first thing implemented).
    3. For read_vector_" one vector-request is counted as one operation. Further, there are =read_vector_count_ fields that give statistics about number of file-chunks asked for in a single operation.
    4. Configuring GSI authentication in xrootd is somewhat tricky as there is no common rule how security plugins map various elements into XrdSecEntity structure that is then passed down to monitoring. We / USCMS, use grid-mapfiles on some sites and GUMS via xrootd-lcmpas authentication plugin (and get consistent results for those cases). Depending on what ATLAS decides to use, another iteration in XrdSecGsi might be needed.

Additional questions:

  • What happens if there is something wrong with file reading/transferring. Do we get end of file reading message at all?

It depends how it happens, but Gled will always send at least some info.

a) Client session gets disconnected -- server will send the file-close message and all goes as normal.

b) File/session close UDP message is lost: there are setting for inactivity timeout from a given user ... I think the default is 24 hours. Once this is reached, all the files get reported as closed. Fields that report server-side read/write amount are zero in this case.

c) Severs goes down / gets restarted. Again, we have a timeout for no messages from a server. Additionally, servers can be configured to send periodic "server identification" messages. When these are in use, we can detect that server went down much faster (we/uscms send indent every 5min and consider server to be down when three consecutive messages do not arrive (the time delta between idents is "measured")).

  • In the native xrootd monitorig flow we have no info about failures. However, if we plot timeout transferes/accesses we migt be able to get an idea about efficiency. Is it correct?

Matevz could add a status flag, telling if proper close record was received. To the first approximation

if file_close_reported = (read_bytes_at_close = 0 || write_bytes_at_close = 0) the transfer is OK, otherwise something went wrong. This can be taken to calculate an efficiency.

Discussion about close to real time monitoring

  • What is a suggestion for the format for more real-time view

Matevz suggests to look into http://xrootd.t2.ucsd.edu:4243/?no_same_site

  • Summary of the mail discussion

Close-to-real-time information should be in a form of time-based events (~ once per minute). What is interesting is number of reads/writes, single and vector ones. This is contained in a file close events. Though it can happen that the file is being opened for a long time, in this case single close-file report is not enough. Seeks are useless and should not be included in the aggregated reports. What should be included in the time-based events is fileid and bytes transferred. Matevz suggested to just send "summed up" single & vector reads over the last time bin for files that had been changed. Ilija agreed that it is what is needed , but the problem is that according to Andy this would require significant changes in xrootd an it is not the top priority for the xrootd development. However the discussion is still ongoing.

Requirements and what is realistic to implement based on the current data flow

requirement scope doable or not, based on what we can get from xrootd monitoring data
redirection staitics: fraction of time accesses are local, redirected within region, cloud , or global relevant both for WLCG monitor and federation level monitor OK given we have a proper topology in place
Authentication successes/failures federation-level monitor Currently this info is not available in the data reprted to ActiveMQ, but is available in the ML repository , therefore there is a way to retrive it, might be a part of the smry flow, to be checked
Number of files opened Federation level monitor Is not possible to have it ~ to real time if we have only file close reports. Should be possible if we also get reports when the file is being opened, check with Matevz whether this kind of reports can be added
Distinguish direct access versus copy Relevant either for popularity, or for Federation level monitoring, not relevant for Global Transfer Dashboard Should provide an ability to understand whether file was accessed directly from the remote storage, or instead the copy request had been issues an dthen the file was accessed at the local site. In principle in order to compare direct vs copy, should be enough to compare #read bytes/#of transferred bytes, but # of read bytes should not include local access to files which had been transferred beforehand. The first approximation would be to consider only remote reading
Distinguish local versus WAN relevant both for WLCG monitor and federation level monitor OK, since we can make difference based on server -client domains
Statistics for files actually used and mode of access Federation level monitor OK, is already available in popularity, can be enabled in the federation level monitor as well
User statistics for direct access versus copy Relevant either for popularity, or for Federation level monitoring, not relevant for Global Transfer Dashboard similar to # of bytes, but claculated per user, or in terms of user
For brokerage, cost matrix - To make a decission from where in the federation it will be most efficient to get a file, job broker will need to know cost of each transfer. Currently we define the cost in the simplest way: a number of seconds spent to copy a well defined amount of data from one federated site to another.
ranking plots, sites by data relevant both for WLCG monitor and federation level monitor OK
File lifetime distributions by site - Not clear wheter it belongs here, where this info is supposed to come from, how we know when the file was deleted?
"active" data volume at site, absolute and as a fraction of capacity, where "active" file is one used in the last X weeks/months Federation level monitor or rather popularity application This requirement implies 1).to keep long history for every file access (in order to define active files) , which might not be a good idea for the federation level monitor, 2).knowledge of storage capacity at every site. Where the latter should come from? Not sure whteher it belongs to federation monitoring, might be rather VO-level popularity application which can get aggregated daily data from the federation level monitor through an API. On the other hand can enable transparent navigation from federation level monitor to the popularity plots/tables
plot of file age and deletion (cleanup) , and plot of avg file age at deletion by site Federation level monitor or rather popularity application Same as previous, looks like it belongs to VO-level popularity application. Needs to get data about data cleaning at the site, where is it supposed to come from?
Site availability metrics and ranking plots SSB Though natural place for this information is SSB, federation-level monitor should provide transparent navigation to the SSB plots generating required links from the federation monitor UI

Integration of ALICE xrootd traffic into WLCG transfer dashboard

To be completed by Sergey

Federation-level monitoring

Federation-level monitoring is being currently implemented for two persistency backends : ORACLE and Hadoop. ORACLE does not match per-federation deployment model, but is used for quick prototyping of the functionality (understanding which metrics are required, aggregation levels , etc...) and user interface.

Federation-level monitoring should contain :

  • transfer throughput metrics (similar to ones in the WLCG Transfer Dashboard)
  • metrics which are currently available in the MonAlisa repository (so that users would have everything in one single place)
  • user access information (popularity-like)

User access information

To be completed by Alex

Importing data from the ML repository

To be completed by Sergey

Federation-level monitoring with ORACLE backend

One DB schema is created per federation. Currently one is for the CMS federation, another is for ATLAS one. Links to the UI to be provided

Federation-level monitoring with Hadoop/Hbase backend

Topic attachments
I Attachment History Action Size Date Who Comment
JPEGjpg DataFlowFedMon.jpg r1 manage 50.9 K 2012-08-01 - 12:55 JuliaAndreeva  
PDFpdf Requirements.pdf r1 manage 258.1 K 2012-08-01 - 13:09 JuliaAndreeva  
JPEGjpg Schema_xrootd_monitoring-1.jpg r1 manage 2629.8 K 2012-03-28 - 11:19 JuliaAndreeva  
Unknown file formatpptx Tier3Mon_slides_for_wlcg_transfer_monitoring_page.pptx r1 manage 176.9 K 2012-03-29 - 14:17 ArtemPetrosyan  
JPEGjpg XrootdFedDataFlow.jpg r1 manage 58.2 K 2012-08-01 - 12:59 JuliaAndreeva  
Edit | Attach | Watch | Print version | History: r41 | r39 < r38 < r37 < r36 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r37 - 2012-10-23 - JuliaAndreeva
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback