Title: Data storage accounting and verification in LHC experiments.

All major experiments at Large Hadron Collider (LHC) are using a hierarchical multi-tier model of distributed data storage proposed by the LHC Computing Grid (LCG) project. Data produced at Tier-0 at CERN are distributed to Tier-1 centers, which are responsible for safe archiving, processing, and serving data to other sites. Multiple replicas of the same data exist in different locations. Experiments maintain central data catalogs to keep track of data placement. Each site maintains its local file namespace, which can be accessed via general Storage Resource Management protocol available as LCG middleware component, or by various other means inherent to local storage technology and infrastructure. To verify consistency of the central catalogs, experiments are asking sites to provide full list of files they have on storage, including size, checksum, and other file attributes. Such storage dumps provided at regular intervals give a realistic view of the storage resource usage by the experiments. Comparison of the central file catalog and the content of the local storage system at the sites helps to detect missing or corrupted files, and files residing on storage without being registered in the central catalogs. Regular monitoring of the data consistency and the space usage serve as additional internal check of the system functionality and performance. Both the importance and the complexity of these tasks increase with the constant growth of the total data volumes during the active data taking period at the LHC. Adopting a standard format for the storage dumps allows tools written by the experiments to be more generic. Developed common solutions help to reduce the maintenance costs both at the large Tier-1 facilities supporting multiple virtual organizations, and at the small sites that often lack manpower. We discuss the requirements to the common tasks of data storage accounting and verification and present solutions adopted by the WLCG sites to provide regular storage reports, which can be equally consumed by the different LHC experiments. We illustrate the experiment-specific implementations that the LHC experiments have built on the common framework according to the differences in their computing models.

  • Track: "Distributed Processing and Analysis on Grids and Clouds"

  • Presentation type: Talk

  • Authors: Elisa Lanciotti, Natalia Ratnikova, Nicolo' Magini

-- ElisaLanciotti - 06-Oct-2011

Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r1 - 2011-10-06 - unknown
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback