CHEP2012 ABSTRACT:
Title: Consistency between Grid Storage Elements and File Catalogs for the LHCb experiment's data
In the distributed computing model of WLCG Grid Storage Elements (SE) are by construction completely decoupled from the File Catalogs (FC) where the experiment's files are registered.
On the basis of the experience of managing large volumes of data in such environment, inconsistencies have often happened either causing a waste of disk space, in case the data were deleted from the FC, but still physically on the SE, or serious operational problems in the opposite case, when some data registered in the FC was not found on the SE.
Therefore, the LHCbDirac data management system has been equipped with a new dedicated system to ensure the
consistency of the data stored on the SEs with the information reported in the FCs implementing systematic checks.
Objective of the checks is to spot any inconsistency above a certain threshold, that cannot only be due to the expected latency between
data upload and registration, and in such case try and identify the problematic data.
The system relies on information provided by the sites who should make available to the experiment a full dump of their SEs on
weekly or monthly basis.
The definition of a common format and procedure to produce the storage dumps have been coordinated with the other LHC experiments in order to provide a solution as generic as possible that can suite all LHC experiments and will reduce the effort for the sites who are asked to provide such data.
- Track: "Distributed Processing and Analysis on Grids and Clouds"
- Authors: Vincent Bernardoff, David Bouvet, Marco Cattaneo, Philippe Charpentier, Pete Clarke, Joel Closier, Ricardo Graciani, Elisa Lanciotti, Victor Mendez, Raja Nandakumar, Daniela Remenska, Stefan Roiser, Vladimir Romanovski, Roberto Santinelli, Federico Stagni, Andrei Tsaregorodtsev, Mario Ubeda Garcia, Alexey Zhelezov
--
ElisaLanciotti - 06-Oct-2011
Topic revision: r3 - 2011-10-24
- unknown