ASGC | |
BNL | Please send bulk requests, we prefer to do pre-staging |
CCIN2P3 | |
CERN | |
FNAL | |
GSDC-KISTI | For the moment, any limits are not enforced to client side (experiment). |
INFN-CNAF | |
JINR | Only physical tape limits (all tapes) |
KIT-GridKa | |
NDGF | |
NIKHEF-SARA | |
NRC-KI | |
PIC | For read access, the requests should come in big bulks, if possible |
STFC-RAL | |
Triumf | We do accept any kind of recalls, but prefer bulk requests. We purposely delay requests to get processed in order to get bulk requests |
ASGC | |
BNL | In theory, unlimited. We observed max record of 245k requests, and processed smoothly. Took about 5 days to complete. For my own reference: STAR 2016-09-28. |
CCIN2P3 | > 100 K |
CERN | infinite |
FNAL | queue depth is ~15k, if full, clients retry |
GSDC-KISTI | |
INFN-CNAF | infinite |
JINR | |
KIT-GridKa | |
NDGF | In theory unlimited, but not tested above a few million |
NIKHEF-SARA | No immediate limit. |
NRC-KI | |
PIC | No limit. But if the requests are coming through SRM, there is a limit of 15k requests per VO. |
STFC-RAL | infinite |
Triumf | No exactly number, there was one peak number more than hundreds of k during ATLAS test, no problem for us |
ASGC | |
BNL | Min: prefer no less than 1000. Max is unlimited, in theory. Try sending us as many as possible. |
CCIN2P3 | Min : 1 K Max : 100 K |
CERN | Up to about 10 Hz. |
FNAL | no limit |
GSDC-KISTI | |
INFN-CNAF | |
JINR | |
KIT-GridKa | |
NDGF | As much as fits in an SRM request, 1k - 10k I think it is. No limit on rate. |
NIKHEF-SARA | |
NRC-KI | |
PIC | We allow a minimum of 1 request to unlimited, but we recommend to group the requests >= 1k. |
STFC-RAL | ~10Hz |
Triumf | 5k-30k is good, even 1k is ok, but few at a time is not welcomed |
ASGC | |
BNL | same as above |
CCIN2P3 | > 100 TB |
CERN | 1 to 1000. The upper limit is not hard but being SRM based on XML, larger counts make requests handling heavier. |
FNAL | no limit |
GSDC-KISTI | |
INFN-CNAF | |
JINR | |
KIT-GridKa | |
NDGF | See above. |
NIKHEF-SARA | |
NRC-KI | |
PIC | We allow a minimum of 1 request to unlimited, but we recommend to group the requests >= 1k. |
STFC-RAL | 1-1000. Maybe we should check if anyone has submitted 1000 |
Triumf | few TB - 200TB |
ASGC | |
BNL | |
CCIN2P3 | |
CERN | YES |
FNAL | |
GSDC-KISTI | In case of maintenance, we may request the clients to pause their actions |
INFN-CNAF | Yes |
JINR | |
KIT-GridKa | |
NDGF | Yeah |
NIKHEF-SARA | No |
NRC-KI | |
PIC | Yes. The system is dimension to work fine taking into account the PIC Tier-1 size and the experiments expectations from the site. If the load is very high, then problems might appear. |
STFC-RAL | YES |
Triumf | Ideally no, our HSM is able to handle hundreds of k requests without load problem, however there is a hard limit from disk buffer size, we don't use any extra disk buffer for tape operations, the disk buffer that tape use is also used for ATLAS((dcache hsm pools), space is limited to that, so realistically speaking, few TB-200TB a day is good enough, though can be reached to 500TB |
ASGC | |
BNL | |
CCIN2P3 | SRM_INTERNAL_ERROR at request level and SRM_FILE_BUSY at file level returned by SRM. Stalling client by xrootd. |
CERN | SRM_INTERNAL_ERROR at request level and SRM_FILE_BUSY at file level returned by SRM. Stalling client by xrootd. |
FNAL | |
GSDC-KISTI | We inform via directly e-mail to experiment management |
INFN-CNAF | SRM_INTERNAL_ERROR at request level and SRM_FILE_BUSY at file level returned by SRM. Stalling client by xrootd. |
JINR | |
KIT-GridKa | |
NDGF | According to SRM standard signalling |
NIKHEF-SARA | |
NRC-KI | |
PIC | If the requests are coming through SRM, refuses occur when the requests reach 15k. This is a SRM limit, to protect the service. Reaching the limit is an exceptional situation, that rarely happens. |
STFC-RAL | SRM_INTERNAL_ERROR at request level and SRM_FILE_BUSY at file level returned by SRM. Stalling client by xrootd. Or through admin processes: sysadmins communicating with experiments |
Triumf | through SRM |
ASGC | |
BNL | |
CCIN2P3 | For SRM, any synchronous operation. For xrootd, any operation can be stalled by the server. |
CERN | For SRM, any synchronous operation. For xrootd, any operation can be stalled by the server. |
FNAL | |
GSDC-KISTI | Maintenance e.g. urgent security update or required upgrade of systems: xrootd clusters or backend filesystems... |
INFN-CNAF | For SRM, any synchronous operation. For xrootd, any operation can be stalled by the server. |
JINR | |
KIT-GridKa | |
NDGF | The ones giving error |
NIKHEF-SARA | |
NRC-KI | |
PIC | If 15k is reached through SRM, read/writes are affected. |
STFC-RAL | Potentially all |
Triumf | Depends on ATLAS how launch and check requests |
ASGC | |
BNL | Yes, we constantly seeing repeat mounts in ATLAS tapes. A tape might be re-mounted again within less than 15 minutes, over 20 remounts a day, which really should be avoided. We try not to delay any request, but we may have to implement a way to delay processing such frequent mounted tapes. |
CCIN2P3 | |
CERN | YES absolutely. This helps to avoid requesting same tape over-and-over again in a short period of time. |
FNAL | |
GSDC-KISTI | |
INFN-CNAF | Yes. Grouping requests by tape familiy would reduce the mounts of the same volume in a short period. |
JINR | |
KIT-GridKa | |
NDGF | Not really |
NIKHEF-SARA | No, the system will optimize recalls |
NRC-KI | |
PIC | For writing, the disk servers are configured to send bunch of files per tape family to also reduce the tape re-mounts. For reads this helps to reduce the number of tape re-mounts, since datasets are stored in tapes according to predefined tape families. |
STFC-RAL | YES |
Triumf | tape family grouped by datatype, dataset, and date |
ASGC | |
BNL | Please, do the pre-staging, send us all requests once, and send them fast. This is the best practice to handle sequential access media. |
CCIN2P3 | Group requests by creation time in dcache: Data written in the same time are grouped on the same tapes. Reading data according creation time wil help to reduce mount/dimount of the sames tapes. |
CERN | Simply grouping as many requests as possible should be enough |
FNAL | |
GSDC-KISTI | |
INFN-CNAF | Grouping as many requests as possible |
JINR | |
KIT-GridKa | |
NDGF | Roughly grouped by time might help a bit, but not much |
NIKHEF-SARA | |
NRC-KI | |
PIC | By tape family. |
STFC-RAL | |
Triumf | data will wait for at least 45 hours within a dataset if the dataset size not exceed a tape capacity, or will be processed when the dataset size > a single tape capacity, different datasets will be packed together by project, data type, for example: data17_900GeV, mc15_5TeV, further grouped by datatype, datatape, mctape etc.. |
ASGC | |
BNL | Yes we can |
CCIN2P3 | No, tape archive is shared between all VO and we not handle priority. But all recall request coming from dCache and Xrootd take benefit of our tape queuing system (TREQS : Tape Request Scheduler) |
CERN | YES |
FNAL | |
GSDC-KISTI | No. So far we have not been asked for any priority related matters. It is because we only support one experiment (ALICE) for now. |
INFN-CNAF | Yes |
JINR | No. |
KIT-GridKa | |
NDGF | No |
NIKHEF-SARA | No |
NRC-KI | |
PIC | Yes, Enstore allow to modify the priority of a specific request |
STFC-RAL | YES |
Triumf | quite often tape is quiet, no need yet, also no priority flat in ATLAS operations, can be implemented if particular circumstance is identified |
ASGC | |
BNL | Any tape that has at least 1 high priority flagged request, will be placed in front of the queue. Prioritized tape will wait and get the next available drive. All priority tapes will be processed in the same selected logic: by demand, FIFO, or LIFO. |
CCIN2P3 | |
CERN | Selected groups of users might have higher priorities than others. However, this is balanced between experiments. Contact Castor.Support@cernNOSPAMPLEASE.ch. |
FNAL | |
GSDC-KISTI | |
INFN-CNAF | The administrators can manually assign more or less tape drives to specific VOs. We are working on a solution of an orchestrator, integrated in our tape system (GEMSS, IBM Spectrum Protect), that would dinamically assign drives to VOs on the basis of their requests and previous usage. This would optimize the usage of shared tape drives (all our production tape drives are shared among the experiments). |
JINR | |
KIT-GridKa | |
NDGF | If this is a strong request from one of our VOs, we would look at implementing it |
NIKHEF-SARA | |
NRC-KI | |
PIC | This is only available for admin purposes. VOs are typically using the tape system with the same priority level. |
STFC-RAL | In practice administratively. Typically, to prioritise recalls for a given user/VO, we will allocate more drives. If a lot of data needs to be recalled (petabytes), CASTOR admins can help reschedule recalls to be more efficient |
Triumf |