SLS Tape Metrics Review - OBSOLETE (newer version here)
Introduction
This page contains documentation related to the WLCG SLS tape questionnaire and review presented at the WCLG Management Board meeting in November 2010. The goal is to review existing SLS tape metrics and to ensure they are suited for reporting how well and how efficient a site is performing; that they are providing appropriate information from the user (experiment) perspective for improving operations and to allow meaningful comparisons between tape MSS sites.
SLS tape metrics questionnaire
Questionnaire Results
From the 17 sites/VO's contacted, 16 have returned a filled-in questionnaire. There is a handful of conclusions which can be derived from the received answers:
- All T0/T1 sites do have their own internal tape monitoring in place which is used for daily operations. The SLS tape metrics are rather used for "exporting" availability / efficiency metrics, than for internal use; for several sites, it is not clear who the target audience is supposed to be.
- Most sites look periodically at the SLS plots; sites tend to only look at their own SLS graphs - this is mostly done to check that the SLS export is working correctly (see previous point).
- From the 3 VO's responding to the questionnaire, the interest in the SLS tape metrics is currently moderate, with one VO not looking into them and the other two only occasionally.
- There are differences between the metrics considered relevant by the VO's and by the sites:
- From the core metrics, VO's are more interested in availability, average file size and overall data transfer rate. Sites pay attention to tape repeat mounting and volume per mount which are not considered to be important by the VO's.
- From other existing metrics, VO's are particularly interested in total volume read/written and tape access (queueing) time.
- Looking at what new metrics could be added, the VO wish-list is the file queue length, the disk cache failure rate, and tapes containing inactive (not recently accessed) data. Drive transfer efficiency and drives used for housekeeping are only seen as interesting by the sites.
Initial proposal
From the questionnaire, it seems that the SLS tape views can be made more attractive by clearly setting the target audience
to the VO's. Metrics could be readjusted to fit what the VO's want to see. This would allow to significantly simplify the current metric set, dropping many metrics which are not considered relevant and internal to the sites, and to concentrate on the half a dozen metrics the VO's are really interested in following.
The core metrics which can be proposed to be retained are :-
- Availability (definition to be reviewed)
- Average file size
- Average data transfer rate
- Average access queueing time
- Fraction of inactive data
- Total data stored
The proposed metric set is not tape-specific and could be used as well for non-tape based archival storage (e.g. cloud-based storage). Frequency, and exact specifications of each metric can be worked on once the metric set has been agreed on.
Excel summary sheet
Per-site / VO questionnaire responses
Proposal specification
A specification for the above proposal can be found here:
TapeMetricsSpec
Proposal implementation status
See
TapeMetricsImplementation for the current implementation status for each T0/T1 site.
More information
--
GermanCancio