Question | ________________________________Response__________________________________ |
Site and Endpoints | |
---|---|
What is the site name? | CC IN2P3 |
Which endpoint URLs do your archival systems expose? | For Atlas / CMS / LHC (dCache) srm://ccsrm.in2p3.fr For Alice (XRootd) root://ccxrdralice.in2p3.fr:1096/ |
How is tape storage selected for a write (choice of endpoint, specification of a spacetoken, namespace prefix). | in dCache, we have different spacetokens used to select tapes pools. The XRootd endpoint for alice is tape only |
Queue | |
What limits should clients respect? | |
---> Max number of outstanding requests in number of files or data volume | > 100 K |
---> Max submission rate for recalls or queries | |
---> Min/Max bulk request size (srmBringOnline or equivalent) in files or data volume | Files: Max : 100 K Min : 1 K Volume > 100 TB |
Should clients back off under certain circumstances? | |
---> How is this signalled to client? | SRM_INTERNAL_ERROR at request level and SRM_FILE_BUSY at file level returned by SRM. Stalling client by xrootd. |
---> For which operations? | For SRM, any synchronous operation. For xrootd, any operation can be stalled by the server. |
Is it advantageous to group requests by a particular criterion (e.g. tape family, date)? | |
---> What criterion? | Group requests by creation time in dcache: Data written in the same time are grouped on the same tapes. Reading data according creation time wil help to reduce mount/dimount of the sames tapes. |
Prioritisation | |
Can you handle priority requests? | No, tape archive is shared between all VO and we not handle priority. But all recall request coming from dCache and Xrootd take benefit of our tape queuing system (TREQS : Tape Request Scheduler) |
---> How is this requested? | |
Protocol support | |
Are there any unsupported or partially supported operations (e.g. pinning) ? | |
Timeouts | |
What timeouts do you recommend? | Timeout should be increase to 24h in order to benefit of large bulk recall |
Do you have hardcoded or default timeouts? | Default dcache timeout per request on tape pool is 14400 s (4h) |
Operations and metrics | |
Can you provide total sum of data stored by VO in the archive to 100TB accuracy? | Yes, accounting value is computed in byte |
Can you provide space occupied on tapes by VO (includes deleted data, but not yet reclaimed space) to 100TB accuracy? | Yes, it is the same value as above. |
How do you allocate free tape space to VOs? | We monitor the storage class usages of all VOs, and we do the allocation by bunch of 50-100 tapes when a storage class goes short of tapes |
What is the frequency with which you run repack operations to reclaim space on tapes after data deletion? | We run repack manually when tape filling is bellow 70-80 %. We also run repack when we suspect tape to generate errors on recall |
Recommendations for clients | |
Recommendation 1 | Run prestaging (ie SRM BRINGONLINE) on large dataset with an huge timeout value. |
---> Information required by users to follow advice | |
Recommendation 2 | |
Buffer Management | |
Should a client stop submitting recalls if the available buffer space reaches a threshold? | |
---> How can a client determine the buffer used and free space? | |
---> What is the threshold (high water mark)? | |
---> When should the client restart submission (low water mark)? | |
If the client does not have to back off on a full buffer, and you support pinning, how is the buffer managed? | |
---> Is data moved from buffer to another local disk, either by the HSM or by an external agent? | |
Additional questions | |
Should any other questions appear in subsequent iterations of this survey? |