Overview
LHC Experiments have requested both the LFC and DPM components to be
deployed in production during 2005.
While the software is of a high quality, work is needed to improve the
deployment model of the components. In particular, the LFC must fit
into different roles for different experiments (e.g. local or global
cataloging components).
Also, for many sites, Mysql is used as a database components, especially
at the Tier-2 sites. We must provide a mechanism to back up such data
to the Tier-1 for archival storage.
This paper covers the following topics:
- How to deploy the LFC in the modes required by the experiments
- How to provide reliable backups of the data stored in Mysql for these components.
LFC in experiments computing models
ALICE :
- Alien as central metadata catalog
- LFC as local catalog at each site (Tier-0, Tier-1s and Tier-2s)
ATLAS :
- central "location" catalog
- LFC as local catalog at each site (Tier-0, Tier-1s and Tier-2s)
CMS :
- POOL catalog (flavour to be defined : Mysql, XMl, LFC or other) as local site catalog
LHCb :
- LFC as central catalog
- LFC READ-ONLY replicas to be installed at selected Tier-1s, in the future
LFC Current Deployment
A
Pilot LFC Service is available at CERN as part of the Service Challenge activity (see
LfcInformation).
It will be replaced by the
Production LFC Service in September.
Architectural Components
In order to fit the experiments models, we require the following architectural
building blocks out of which we build experiments specific solutions.
Read-Write Global
To ensure failover : Oracle Master -> Oracle Slave. In case of a planned/unplanned outage of the Oracle Master, we could switch the Oracle Slave from READ-ONLY TO READ-WRITE. The Slaves become the Master until the first Master goes back online.
-> who really asked for this ?
-> to be tested...
-> solutions : Data Guard, Oracle Streams with the 3D project ??
Read-Write Local
The DPM could act as a local READ-WRITE LFC.
what does it imply at the LFC/DPM/EGEE.BDII level ?
- DPM changes required ?
- LFC changes required ?
- Info Sys changes required ?
Read-Only Global Replicas
LHCb requires READ-ONLY global replicas of the LFC.
(Oracle-Oracle)...
Per-VO slices...
Read-Only local replicas
(Oracle-Mysql)... + DPM + ...
The DPM could act as a local READ-ONLY LFC.
what does it imply at the LFC/DPM/EGEE.BDII level ?
Solution Choices
- Oracle Master -> Oracle READ-ONLY Slave (Data Guard, Streams)
- Oracle Master -> Oracle READ-ONLY Slave, that can be turned into READ-WRITE (Data Guard, Streams)
-
In terms of database deployment, several possibilities exist :
DB account |
Pros |
Cons |
one per VO |
entries belonging to one VO easily separable |
as many LFC server to setup/administer as there are VOs |
one for all Vos |
only one DB account to manage + few LFC servers to setup/administer |
entries belonging to different VOs are not easy to separate from each other (unless there are separate tablepsaces ?) |
In the case of one DB account share by all VOs, :
Tablespace |
Pros |
Cons |
one for all VOs |
only one tablespace to administer |
VO entries no easily separable |
one per VO |
many tablespaces to administer (is it a pb ?) |
VO entries easily separable |
In the case of separate tablespaces, what to do with the special "/", "/grid" entries ? -> put them is another separate tablespace ??
Per-VO Recommendation and Timescales
- ALICE :
- ATLAS :
- CMS :
- LHCb : Oracle to Oracle READ-ONLY replication using Data Guard (physical standby) -> recommended by Nilo Segura. Needs to be tested. October ?
We should probably test the one DB account + many tablespaces option. Timescale : October.
Mysql Backups
As many sites use Mysql as the LFC/DPM database backend, we need to provide a standard procedure/tool to perform Mysql backups. Especially, the Tier2s could use their corresponding Tier1 to ship the backups, as Tier1s provide reliable storage.
Requirements :
- Volume :
- Rate :
- Tier-1 choice : the Tier1 that the Tier2 depends on
DPM (per host downloads... security...) -> ?
Problems :
- Security : when transfered to a Tier1, the data needs to be protected. This is supposed to be handled by the
Global Grid backup
project (currently developed by Dimitar Shiyachki).
Solution Choices :
- Mysql replication + backups : having one Master and one Slave. The Slave serves as the Master backup : it is regularly shut down, it is shut down and and a cold backup is performed. Need Mysql version 4 at least.
- Using the future
Global Grid backup
: a dump of the database is regularly shipped to the Tier2 corresponding Tier1, as a Tier1 provides reliable storage.
- Combination of option 1. and 2.
- Using INNODB Hot Backups, to avoid having to run 2 Mysql servers : but non free tool !
- Using the DPM
There is also a script to use as a cron job written by Graeme Stewart (first solution) which allows you to back up your MySQL DB. You can find it here with the instructions:
http://www.gridpp.ac.uk/wiki/MySQL_Backups
Recommendations and Timescales :
As first step, we recommend
option number 1, as it would be the fastest to put in place : the Master->Slave replication is already used in Lyon to perform the backups. If the tests are successfull, this solution could realistically be made available to Tier2s in
early October.
When the
Global Grid backup
exists, the Slave can have its backups shipped to the Tier1 reliable storage system.
--
SophieLemaitre - 29 Aug 2005