LCG Management Board

Date/Time:

Tuesday 07 March 2006 at 16:00

Agenda:

http://agenda.cern.ch/fullAgenda.php?ida=a057125

Members:

http://lcg.web.cern.ch/LCG/Boards/MB/mb-members.html

 

(Version 1 - 13.3.2006)

Participants:

A.Aimar (notes), D.Barberis, L.Bauerdick, S.Belforte, I.Bird, K.Bos, N.Brook, T.Cass, Ph.Charpentier, T.Ekelof, B.Gibbard, J.Gordon, I.Fisk, F.Hernandez, M.Lamanna, S.Lin, H.Marten, P.Mato, M.Mazzucato, G.Merino, B.Panzer, H.Renshall, L.Robertson (chair), Y.Schutz, J.Shiers

Action List

https://twiki.cern.ch/twiki/bin/view/LCG/MbActionList

Next Meeting:

Tuesday 14 March 2006 from 16:00 to 1700

1.      Minutes and Matters Arising (minutes)

 

1.1         Feedback on the minutes (E.Laure) (more information)

Minutes approved.

1.2         QR reports updated (7 March 2005) (more information)

Updates by: CC-IN2P3, FZK, NDGF, PIC, RAL, TRIUMF, US-ATLAS

 

Some updates received and included in the document.

https://twiki.cern.ch/twiki/pub/LCG/QuarterlyReports/QR_2005Q4.pdf

 

The reports and an executive summary will be sent to the Overview Board.

 

The production and submission of the reports will be discussed in the future, in order to be able to produce the reports more quickly.

1.3         Matters Arising

 

Acceptance of the document “Conclusions, Decisions and Open Issues after the Mumbai SC4 Workshop”

The document was circulated to the GDB for comment. Three responses were received concerning the decision not to deploy the SRM 2.1 implementations for SC4, while mandating support for two storage classes in existing implementations even if some development is required. The reasons for not deploying SRM 2.1 in SC4 were reiterated: delivery schedule of the dCache implementation, need for compatibility testing, decisions on support of multiple storage classes. It was agreed to uphold the decision, while emphasizing the importance of pressing ahead with finalizing and testing the SRM 2.1 implementations.

 

There was some discussion on the current proposal for support of two storage classes in current implementations in SC4, which would require limited developments. In the case of Castor 2 this development would not be available before May (see point 9.1 below). [This discussion continued by email after the meeting and will have to be concluded at the next MB].

 

The document, which does not include such implementation details, was approved.

 

Tier-1 for Switzerland’s sites

Switzerland asked which Tier-1 they should use for data access. There is a continuing misunderstanding about the relationship between Tier-1s and Tier-2s. It is the responsibility of the experiment to assign these relationships, which may be different for different experiments at the same Tier-2. This is scheduled to be discussed at the GDB on 8 March.

 

Middleware development and releases

Ian Bird reported that there will be a meeting of the lead developers of the EGEE middleware packages and the deployment management in the next weeks to discuss how to improve the release process and testing procedures following the experience of the gLite 3.0 release where many difficulties were encountered.

 

2.      Action List Review (list of actions)

 

         25 Feb 06 – MB members and SC4 Contacts should send feedback and clarification to J.Shiers on the document about “Conclusions, Decisions and Open Issues after the Mumbai SC4 Workshop”.

 

Done. Feedback received and the updated document was sent to the GDB for comments and approval (see above).

 

         28 Feb 06 - Experiments should clarify with B.Panzer the exact schedule (week no) and resources needed (KSI2000) in 2006.

 

Done. Presented in this meeting.

 

         28 Feb 06 - L.Robertson prepares a proposal to answer the recommendation on re-defining experiments support as an area will be distributed.

 

To do.

 

         7 Mar 06 – J.Gordon reports to the MB about the feedback received on the accounting policies and to the request of installing accounting software.

 

Done. Presentation in this meeting.

 

3.      Acceptance of the Mumbai Decisions Paper

 

See above in the “Matters Arising” section.

 

4.      Resource Scheduling in 2006

 

4.1         Resource Scheduling at Regional Centres – G.Merino (document), H.Renshall (schedule table)

We now have the overall plans of the experiments for SC4. There is a need for more information to enable regional centres to plan the availability of resources and the appropriate support staff. It was agreed last year that this should be done through the weekly service coordination meeting. This item is intended to allow us to discuss and if appropriate agree on the level of detail and the accuracy of the timing required for each site.

 

G.Merino proposed a template for detailed resource planning for each Tier-1, to be used during SC4 and beyond. Each site needs to know what and when resources are needed by the experiments in order to be ready to provide them in time. As example, the document shows:

-          Capacity needed (CPU, disk and tape) for each month and

-          Data flows that will be tested.

Each experiment should specify the resources needed for an average Tier-1 and then all sites will normalize it to their resources. Even better would be if the experiments can provide a table with the needs for each Tier-1. Sites do not only have to schedule availability of equipment, but also ensure that appropriate expertise is available when required.

 

The experiments (in particular CMS and then ATLAS) commented that it is difficult to plan their requirements in such detail and so much in advance. It would be easier to know what resources will be available and when. The sites know the experiments’ needs for SC4 (Mumbai Workshop) and they should progressively make those resources available. Meanwhile the experiments will work in order to use all resources and capacities when they become available.

 

A complementary approach was presented by H.Renshall, taking all the experiments plans presented in Mumbai he combined them together in a single schedule table without all details but providing a simple starting point for the sites. The schedule presented provides a first “global view” of the needs of all experiments and then a “view for each site”, with monthly granularity for now.

 

Decision:

The MB agreed that these approaches are useful and that sites needs to be able to do detail planning based on the needs of the experiments.

The experiments agreed that any change to their plans should immediately be communicated to the sites (via the LCG Services Weekly Meeting) in order to discuss its feasibility (and plans) using resource available at the sites. As a starting point the tables prepared by H. Renshall will be maintained to reflect the changing plans.

 

The channel for this “sites resource planning” is the LCG Services Weekly Meetings:

-          Every week sites should summarize the status or their resource allocation, and

-          Every month they should plan with the experiments the details of the work and allocation of the following month.

The Service Meeting participants should receive a written report, every Monday before 12h, from all experiments and sites.

4.2         Scheduling of Resources at CERN – B.Panzer (document, transparencies )

The aim is to come to an agreement on the schedule for use of the CERN resources, taking account of specific experiment needs and general throughput tests. We should decide on the response to the request from ATLAS for time to test their online farm.

 

The attached document is an update of the one presented in February. There seem to be no major issues with the resources allocation proposed.

 

The only critical periods are in April and in November:

-          April: It is not a resource allocation issue. The experiments are doing several tests at the same time, and should be planned carefully, in the LCG Services Weekly Meetings.

-          November: ATLAS request for their TDAQ tests was presented in the past and still needs approval by the other experiments and by the MB. This requires the other experiments to reduce their usage during a 3 week period by up to 50%.

 

Decision: The other experiments and the MB accepted the proposal to assign IT resources for the ATLAS TDAQ as described in the attached document (i.e. for the weeks 45-47 in 2006).

 

5.      SFT Reports - Format and Details – P.Nyczyk (transparencies )

In SC4 the SFT results will be used to measure site reliability and availability. The purpose of this item is to agree on the frequency, format and level of detail that we, as the management board, want to see.

 

 

The SFT framework runs regular tests at all sites to check “service availability”. The new version of SFT is called SAME. Slides 5 to 7 show which tests are currently planned for each service (SRM, FTS, LFC, etc). Tests consist of “sensors” that measure the status and availability of each service. Sensors are being developed and will be progressively installed from now until the end of April. In the next couple of weeks (by end March 2006) sensors for SRM, CE, RB, and BDII will be available. Slide 8 has all the details on the status of the software development.

 

“VO specific” tests can be added to the framework in order to verify what each VO needs. LHCb has implemented a test to check the DIRAC installation; ATLAS is using the SFT tests with a customized set of tests, selecting a subset of tests via FCR (Freedom of Choice for Resources). The FCR choices of each experiment are available via the CIC portal (http://cic.in2p3.fr/). How experiments can provide specific tests is documented on the SFT/SAME wiki pages.

 

It was agreed that the standard metric for site availability will be success of the site specific tests for the CE, the SE and the site BDII. The site will be assumed to be unavailable between any failure of one of these tests and the next time that the test is run successfully. The percentage availability will be calculated on a daily basis. In addition, availability for each VO will be calculated in a similar way using the tests selected by the VO using FCR.

 

Metrics data will be available for export from the SFT database and different summaries and data aggregations of site availability will be produced initially with external tools. Views to produce graphs directly from the SFT tool will also be implemented, but first one should learn what kind of “default views” are really useful.

 

Sites have requested a mechanism to annotate/explain the result of the SFT tests and be able to comment, in particular, the situations of “service unavailability” on their site. This feature request will be taken into account, noting that this feature is also already available in other reporting tools.

 

6.      Preparation of the meeting with the LHCC referees - Tuesday 21 March

 

 

The topics requested by the referees are:

-          Outcome of Mumbai workshop

-          Status of SC4 planning: with emphasis on Tier-1 sites, presented by one of the Tier-1s.
F.Hernandez or D.Boutigny will do the presentation.

-          LCG 3D project: deployment and testing plans
Status report by D. Duellmann with input/slides from the experiments.
Possible also someone from INFN could be present.

-          SEAL+ROOT migration status.
P.Mato will do the presentation, with input from the experiments.

 

7.      Accounting Status – J.Gordon (transparencies )

Report on the feedback received on site accounting policies, status of installation of accounting software, impediments to reporting.

 

On the 22nd of February all sites (via MB, GDB, and ROC managers) received the list of EGEE sites that are not publishing their accounting data, asking to check the reasons/problems for the sites in the country or supported by the Tier-1/ROC.

 

The message also requested information on possible legal issues, and asked when the sites not doing so would start publishing accounting information (Slides 2 and 3).

 

Not many replies arrived (Slides 4 and 5). In green are the sites that meanwhile have started to publish, in yellow those that have a good reason not to publish. In some cases maybe the information in the sites database is obsolete (e.g. sites can have changed name, others are not yet in production, etc).

 

Slide 7 shows that 61 sites were not publishing any information in the period April 2005- March 2006.

Not much progress is being done on the publication of accounting data, but this is considered very important by VOs and funding organizations, in order to know how resources are being used.

 

Note:

The Overview Board has “accounting” for discussion at the next meeting (20 March).

 

Local usage (not via the grid) should also be accounted; simple instructions to add data to APEL from external sources are available from the APEL support site.

 

8.      GDB chair election process (document )

The term of office of Kors finishes in September. We should agree on any changes to be made to the process for electing the chair - see attached document for the process used in 2004

 

The attached document describes the process used in 2004 and a proposed time table for 2006.

 

Decision:

The MB agreed to use the same process used in 2004 and the proposed time table.

 

A Search Committees needs to be formed soon, volunteers will be asked at the GDB.

 

Update: Volunteers for the Search Committee were found at the GDB, on the following day:

-          John Gordon

-          Simon Lin

-          Klaus-Peter Mickel

-          Jeff Templon

 

9.      Short Progress Reports

Please inform the secretary if you would like to make or to hear additional short reports in this section - in particular any potential problems or successes that have not been signalled in the last quarterly report or resolved in the Mumbai workshop

9.1         Castor 2 - experiment migration, deployment, Mumbai fall-out – T. Cass

Migration experiments to CASTOR2 is going on:

-          ALICE: migrated 100% and using CASTOR 2 since a couple of weeks, no major problems.

-          ATLAS: migration by end of March.

-          CMS and LHCb: configurations are ready; the migration is going to be done next week (LHCb).

 

Sites deployment status:

-          CNAF, PIC and RAL: deployment successful (PIC ongoing).

-          ASGC: Late. Because of delay in purchasing of new tapes.

 

Some work is needed in order to support more than one storage class in SRM1.1, the same developments being required also for SRM 2.1. This cannot be completed before May.

 

Issue:

Different SRM/MSS systems at present implement multiple storage classes in different ways or not at all. A proposal was made at the Mumbai workshop to support two classes (permanent and durable) in a way that would be compatible from a user standpoint with any future implementation of multiple storage classes that may be decided for SRM 2.1. John Gordon said that he believed that we should use existing mechanisms in SC4. [The discussion continued by email and the conclusion will be reported at the next MB].

 

9.2         Progress with data recording tests at CERN – B.Panzer (transparencies)

Postponed to next MB meeting.

9.3         RFIO and/or rootd - how is the decision going to be made – I.Bird

Postponed to next MB meeting.

 

10. AOB

 

No AOB issues.

 

11. Summary of New Actions

 

 

No new actions assigned.

 

The full Action List, current and past items, will be in this wiki page before next MB meeting.