LCG Management Board |
|
Date/Time: |
Tuesday 14 March 2006 at 16:00 |
Agenda: |
|
Members: |
|
|
(Version
2 - 20.3.2006) |
Participants: |
A.Aimar (notes), D.Barberis, L.Bauerdick, S.Belforte, I.Bird, K.Bos, N.Brook, T.Cass, Ph.Charpentier, L.Dell’Agnello, I.Fisk, D.Foster, B.Gibbard, J.Gordon, A.Heiss, F.Hernandez, M.Lamanna, E.Laure, M.Litmaath, P.Mato, G.Merino, B.Panzer, Di Qing, L.Robertson (chair), J.Shiers, J.Templon |
Action List |
|
Next Meeting: |
Tuesday 21
March 2006 from 16:00 to 1700 |
1. Minutes and Matters arising (minutes) |
|
1.1
Minutes
of Previous Meeting (more information)
Minutes approved. 1.2
GDB
Search Committee members
Volunteers
for the Search Committee were identified at the GDB meeting: -
John Gordon -
Simon Lin -
Klaus-Peter
Mickel -
Jeff Templon |
|
2. Action List Review (list of actions) |
|
No outstanding actions. |
|
3. Short Progress Reports (from last meeting) |
|
3.1
Progress
with data recording tests at CERN (transparencies) - Bernd Panzer
The network topology
(slide 2) shows the internal LCG network, with 10 Gb connections between CPU,
disk and tapes servers. The current setup includes about 50 disk servers and
40 tape servers. Several tests have
been performed in the last months (slide 3) in order to validate the complete
Tier-0 data flows. The SC3 throughput rerun (16 disk servers in Castor2,
reached 1 GByte/s to the T1 sites) and the Castor2 Data Recording Challenge
(reached 950 MB/s) provided useful feedback and positive results. Slide 4 describes
the tests done in order to check throughput rates to the tapes. These tests
confirm a good usage of the tape drive capabilities (about 75% of nominal performance with 2 GByte
files). This performance largely depends on the number of streams involved,
therefore it will be studied further in order to understand, and avoid, the
risks of access congestion on the disk pools. Slide 5: Tests of
the Castor2 disk server throughput (input/output) reached 4.4 GB/s, which is
a rate comparable to the 4.5 MB/s expected for the proton-proton running of
all 4 experiments. More tests will be performed increasing the number of data
streams (currently 250) involved. In the
experiments’ computing models there are several use cases that need to be studied and verified. For instance some use
cases first
copy and then open the files, while
others open the files directly on the disk servers without copy. Which are the
implications of these different use cases? A future note will describe in
detail the results obtained. 3.2
Rfio
and/or rootd - how is the decision going to be made - Ian Bird
Rootd is currently available in Castor2 and dCache. The DPM implementation is being studied and it is not yet clear whether rootd can replace rfio in all experiments’ use cases, and for other (non-HEP) applications. If rootd cannot be used for all LHC applications then rfio will be maintained, and the conflicts between the different rfio versions used by DPM and Castor2 will have to be removed by assigning human resource to develop a common implementation. |
|
4. SRM for SC4 and long-term implementation (transparencies)
|
|
Maarten Litmaath presented the proposal for supporting two SRM storage classes for the SC4 run, and the MB also discussed longer-term strategies regarding SRM. For SC4 the experiments need “permanent” and “durable” storage classes with this meaning: -
Permanent:
-
Durable:
An additional SRM storage
class "durable-permanent" could be needed in order make guarantee
that "durable" data is also stored permanently on tape. The attached transparencies are the support material for the discussion and the decisions that follow. In order to reduce software development for SC4 the proposal (for SC4 only) is that storage systems may support different classes in different ways. Castor2 will support only one storage class in each SE – that is there will be two different hostnames (castor-durable and castor-permanent) for each of the two classes supported (see slide 4). dCache (slide 5) will use a single hostname with two different file paths. DPM (slide 6) is currently using the SAType attribute. The DPM publishes the same path for each SAType. The future uniform solution may require data to be copied and may require re-cataloguing of data produced during SC4 (e.g. caused by hostname or paths changes). If needed, catalogue migration/clean-up tools will have to be implemented. During the discussion the
MB agreed that: -
The idea of
having a new class “durable-permanent” should not be implemented for SC4 and it will again be discussed for the
long-term implementation. But some sites may back durable storage with tape
in SC4. -
DPM should
advertise itself as “durable” in order to match the above meaning of the SRM storage classes. -
The “wantPermanent”
attribute should be ignored. On the client software: GFAL, lcg-utils will
use the attribute “permanent”
by default. FTS will continue to work with complete SURLs (as now) so that the VO must choose the transfer
end-points explicitly. The usage of multiple SAPaths of the same SAType should be investigated because some sites will probably implement such solutions. This use case is not needed for SC4 and therefore it will be taken into account when defining the long-term solution. Note: The example in slide 10 should contain different hostnames specified for the “-d” option as it is showing the commands referring to durable and permanent data used via Castor2. If the SAType requested
and the type of the SE addressed do not, match ideally
an error should be returned (slide 11). But CASTOR and dCache will ignore the
flag (for SRM v1.1), so they will not return an error. Therefore Clients _may_ decide that a mismatch between the indicated class
and the SAType of the indicated SAPath is an error. The general “ontology for storage” in term of file retention time, quality of retention and transfer performance (slides 12, 13 and 14) should be clarified between implementation projects, experiments and sites. Slide 15 and following describe the features agreed for the SRM 2.1 by the WLCG Baseline Services Working Group (http://cern.ch/lcg/PEB/BS). They were originally planned to be implemented before WLCG Service Challenge 4 but then delayed until Fall 2006. Since the workshop conclusion many of the features requested (file types, space reservation, quotas, permissions, etc) need to be revised and clarified, in order to match the recent discussions and the lessons learned since Summer 2005 (during service and data challenges). Therefore the MB agreed to appoint a permanent “SRM Coordination Committee”(slide 24), chaired by M.Litmaath, with the mandate to define the external details of the SRM 2.1 implementations and the storage classes to be used by LCG, and monitor the evolution and testing of the corresponding implementations. The committee will include members from the SRM and mass storage system implementation projects, experiments, sites, deployment and middleware development. The last slide contains the list of members proposed: the parties involved can propose (before next MB) to change their representatives and additional sites can make proposals to join the group, but the number of members should not increase considerably. Decision: The MB agreed to form an
“SRM Coordination Committee” (SRMCC). Action: 21 Mar 06 - M.Litmaath
will clarify with the SRM and Mass Storage projects, experiments and sites
the SRM Coordination Committee members. And circulate the SRMCC membership
list to the Management Board. |
|
1. Progress Reports for this Quarter (2006Q1) |
|
1.1 2005Q4 reports (documents)Not discussed at this meeting. |
|
2. AOB |
|
2.1 VOBoxes discussionsThe Overview Board will discuss the VOBoxes next week; K.Bos is invited to the meeting because additional information may be needed. There will be a second workshop on VOBoxes to reach a conclusion on the subject. The GDB decided that until then there should not be new services deployed at the sites. LHCb and ATLAS stated that they need VOBoxes deployed for SC4 (from June 2006). Currently all Tier-1 sites have VOBoxes installed but NIKHEF. NIKHEF stated that they are going to provide VOBoxes for SC4. |
|
3. Summary of New Actions |
|
Action: 21 Mar 06 - M.Litmaath
will clarify the membership of the SRM Coordination Committee and circulate
it to the MB list. The full Action List, current and past items, will be in this wiki page before next MB meeting. |