Introduction
"Grid Spillover" is defined as reconstruction of ATLAS data at locations other than the Tier-0 with a configuration that is designed to give results compatible with the prompt Tier-0 reconstruction.
Spillover may be used in two modes:
- Mode 1: the standard method of prompt reconstruction for certain data during data-taking. This mode is primarily foreseen for non-Main streams, in which case the data presented to the end user in physics containers will have been uniformly reconstructed in spillover.
- Mode 2: occasional use to relieve backlogs of data that are otherwise reconstructed at Tier-0. In this case, datasets presented to the end user in physics containers will contain a mixture of Tier-0 and non-Tier-0 data.
For 2018 operation, Data Preparation anticipates that the "delayed"
BphysLS stream will be spilled over in Mode 1, while Mode 2 may be used for the Main stream if LHC conditions require it (in particular sustained operation in a high-intensity 8b4e scenario).
Metadata
It will be necessary to configure and track the spillover tasks and output. The standard ATLAS mechanisms for these tasks are AMI tags (for the jobs and output dataset names in Rucio). The bulk data processing on Tier-0 is configured with "f" AMI tags. The equivalent spillover configuration (taking into account any necessary Athena job option changes for Grid operation) will be done with "k" AMI tags. To give a specific example, the AMI tags f950 and k950 should correspond to the same Athena releases, reconstruction options, database tags, etc., with the only difference being that f950 will be the Tier-0 version and k950 will be the Grid spillover version. (No use case is seen for making spillover versions of x- or c-tags.)
Optimally, the k-version of tags will be created concurrently with the f-version. If the k-tag is created improperly, a revised version will be made, and the f-tag will be cloned to increment the number so that the tag revision numbers remain in sync.
It should be noted that the Grid Prodsys uses a non-AMI tag mechanism to determine the set of file outputs to be produced in a job. An automatic system must be set up to translate the output file types in a stream-dependent way from the AMI configuration to Prodsys.
Reconstruction Configuration
The spillover reconstruction configuration will necessarily differ from that used at Tier-0, as the nodes are configured differently at Grid sites, and optimal operation will depend on the envelope of available resources. In particular Grid nodes have much lower amounts of memory per core. Given the memory use and very high requirements on data integrity, it is expected that a limited number of appropriately resourced GRID endpoints with a good reliability record would be used. Alterations to Tier-0 reconstruction to make it more compatible with what runs on the Grid will only be done if the impact on Tier-0 throughput is positive.
Based on current information, the Data Preparation-approved Grid reconstruction configurations are (in order of priority)
- Option 1: (not yet functional) AthenaMP RAWtoALL using the "rewind-after-first" fork-after-first-event implementation by Vakho Tsulaia. This requires a monitoring code update;
- Option 2: AthenaMP RAWtoALL not in fork-after-first-event mode;
- Option 3: AthenaMP "conventional" workflow not in fork-after-first-event mode;
- Option 4: AthenaMP "conventional" workflow with fork-after-first-event mode only in the RAWtoESD step.
Option #4 has actually been run in Grid production before; however it is known to create incompatibilities in the monitoring (HIST) output and is therefore very strongly disfavored for Mode 2 operation in which data quality signoffs will have to be done on spillover output. Option #3 is the safest mode which has been run in Grid production and is the proposed workflow for initial Mode 1 operation. Option #1 is closest to reproducing the nominal Tier-0 workflow (except in multiprocessing mode) and will be most compatible with non-spillover output; it will be optimal if this can be deployed for any Mode 2 use.
Validation
Validation of the output of Grid workflows can occur at several levels:
- verification of the correct number of total events in the output datasets. These basic checks should be implemented as part of the Tier-0 orchestration of the Grid tasks;
- comparisons of the output of equivalent data processed at Tier-0 and in spillover. It should be sufficient to compare output of a short job processed with f- and k-tags, although there may be reasons to compare full runs processed at Tier-0 and on the Grid.
- HIST output is the "easy" way to do such a comparison, but is probably only reliable for workflow 3 or better as shown above (workflow 4 is too divergent to be useful);
- a comparison of POOL output at bit level is required, but needs a tool that is able to extract and compare events between single core and AthenaMP modes.
Guidelines for spillover use
Data Preparation anticipates that in 2018:
- Mode 1 spillover will be active for the complete BphysLS stream for the full year, and possibly switched on for other bulk-processed streams.
- Mode 2 spillover may be used for Main as needed, however this will require an explicit signoff from Data Preparation. If possible, multiple runs should be released to spillover at once so they can be monitored together.
The DP reprocessing coordinator(s) will work with designated ADC personnel to ensure the smooth and complete operation of the Grid tasks.
ADC shall prepare a list of Tier-1 sites and queues to be used for spillover tasks, as well as required Rucio rules for replication. As much as possible, queues without strong memory constraints should be used.
--
PeterOnyisi - 2018-04-25