Site

Question ________________________________Response__________________________________
Site and Endpoints
What is the site name? INFN-T1
Which endpoint URLs do your archival systems expose? srm://storm-fe.cr.cnaf.infn.it for atlas; srm://storm-fe-cms.cr.cnaf.infn.it for cms; srm://storm-fe-lhcb.cr.cnaf.infn.it for lhcb; root://alice-xrootd-tsm.cr.cnaf.infn.it for alice
How is tape storage selected for a write (choice of endpoint, specification of a spacetoken, namespace prefix). By endpoint and path. GPFS policies define the mapping between paths and tape pools.
Queue
What limits should clients respect?  
---> Max number of outstanding requests in number of files or data volume We experienced a queue of 100000 files to recall via SRM (StoRM) that was correctly handled. We do not know a limit with xrootd.
---> Max submission rate for recalls or queries Up to 15 Hz
---> Min/Max bulk request size (srmBringOnline or equivalent) in files or data volume We can support up to 100000 files to recall in bulk. In terms of data size, a single bulk can fill up the size of disk buffer in front of tapes; this size is different for the 4 LHC VOs.
Should clients back off under certain circumstances? Yes
---> How is this signalled to client? In case of massive repeated errors on almost all the requests, the tape administrators may ask users to stop their activity.
---> For which operations? For all operations.
Is it advantageous to group requests by a particular criterion (e.g. tape family, date)? Yes. Grouping requests by tape familiy would reduce the mounts of the same volume in a short period.
---> What criterion? Grouping as many requests as possible
Prioritisation
Can you handle priority requests? Not at users/groups of users level. We can handle priority for VOs.
---> How is this requested? The administrators can manually assign more or less tape drives to specific VOs. We are working on a solution of an orchestrator, integrated in our tape system (GEMSS, IBM Spectrum Protect), that would dinamically assign drives to VOs on the basis of their requests and previous usage. This would optimize the usage of shared tape drives (all our production tape drives are shared among the experiments).
Protocol support
Are there any unsupported or partially supported operations (e.g. pinning) ? We support pinning.
Timeouts
What timeouts do you recommend? It is recommended to put no timeouts.
Do you have hardcoded or default timeouts? Default timeout of backend system (GEMSS) is 4 days, but it can be changed by administrators.
Operations and metrics
Can you provide total sum of data stored by VO in the archive to 100TB accuracy? Yes
Can you provide space occupied on tapes by VO (includes deleted data, but not yet reclaimed space) to 100TB accuracy? Yes
How do you allocate free tape space to VOs? Tape manager software (IBM Spectrum Protect) allocates a new volume from a shared scratch pool.
What is the frequency with which you run repack operations to reclaim space on tapes after data deletion? We do space reclamation after scheduled deletion campaigns by experiments. Otherwise, we reclaim space when we notice a certain number of volumes full and with a percentage of occupancy less than 80%.
Recommendations for clients
Recommendation 1 It would be useful to know the expected data flow during the year, in terms of writing on and reading from tape. This would help to plan the purchase of the needed number of tapes to fulfil pledges and to optimize the usage of resources shared with other experiments. Important writing or reading activities should be announced, as sometimes happens.
---> Information required by users to follow advice  
Recommendation 2 In general the correct usage of tape resources is to write mainly custodial data, limiting as much as possible to write data that will be removed, since intense repack is a resource-consuming activity that could limit the performance of production. Anyway, it is recommended to write non-custodial data on dedicated storage pools, in order to limit the amount of data to repack after deletions.
Buffer Management
Should a client stop submitting recalls if the available buffer space reaches a threshold? Yes. Generally, when an high threashold is reached, GEMSS triggers the GPFS garbage collector that removes from buffer files starting fron the older ones. It can happen that the file system is full (till the garbage collector high threshold) of files that are written on buffer and not yet migrated on tape, e.g. in case the writing rate on disk is higher than the migration rate on tape. The same happens if the buffer is full (till the garbage collector high threshold) of recalled files all pinned until a date in the future. In both of these cases, or in a combination of them, the garbage collector can not remove any file.
---> How can a client determine the buffer used and free space? SRM publishes these metrics for each storage area.
---> What is the threshold (high water mark)? In this moment there is no threshold set for clients at CNAF, but it is desirable. The high threshold should be higher (e.g. 1% higher) than that used by garbage collector, This depends by the file system: alice 95%, atlas 89%, cms 97%, lhcb 97%.
---> When should the client restart submission (low water mark)? The client should restart submission when occupation has reached a parcentage lower (e.g. 1%-2% lower) than the high threshold for garbage collector.
If the client does not have to back off on a full buffer, and you support pinning, how is the buffer managed? When gargabe collector runs, it removes files no more pinned, starting from the older ones. If the buffer is full (till the garbage collector high threshold) of recalled files all pinned until a date in the future, the garbage collector can not remove any file.
---> Is data moved from buffer to another local disk, either by the HSM or by an external agent? Not automatically.
Additional questions
Should any other questions appear in subsequent iterations of this survey? Just a clarification: the first question of the "buffer Management" session should be related to both recalls and migrations (now it refers to recalls only).

-- OliverKeeble - 2018-01-30

Edit | Attach | Watch | Print version | History: r7 < r6 < r5 < r4 < r3 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r7 - 2018-05-04 - EnricoFattibene
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    HEPTape All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback