LCG Management Board

Date/Time:

Tuesday 24 October 2006 at 16:00

Agenda:

http://indico.cern.ch/conferenceDisplay.py?confId=a063270

Members:

http://lcg.web.cern.ch/LCG/Boards/MB/mb-members.html

 

(Version 1 - 27.10.2006)

Participants:

A.Aimar (notes), D.Barberis, L.Bauerdick, I.Bird, K.Bos, N.Brook, T.Cass, Ph.Charpentier, Di Quing, B.Gibbard, J.Gordon, C.Grandi, F.Hernandez, M.Lamanna, J.Knobloch, H.Marten, G.Merino, L.Robertson (chair), Y.Schutz, J.Shiers, R.Tafirout, J.Templon

Action List

https://twiki.cern.ch/twiki/bin/view/LCG/MbActionList

Next Meeting:

Tuesday 31 October from 16:00 to 17:00, CERN time

1.      Minutes and Matters arising (minutes)

 

1.1         Minutes of Previous Meeting

Minutes of the previous meeting not distributed in time. Apologies.

 

Note: The minutes of the previous meeting are now available at the MB minutes page.

 

2.      Action List Review (list of actions)

Actions that are late are highlighted in RED.

  • 6 October - B.Panzer distributes to the MB a document on “where disk caches are needed in a Tier-1 site” everything included (buffers for tapes, network transfers, etc).

 

Postponed to end October.

 

  • 10 Oct 2006 - The 3D Phase 2 sites should provide, in the next Quarterly Report, the 3D status and the time schedules for installations and tests of their 3D databases.

 

Ongoing. This was discussed later during the meeting and additional information will be asked during the review of the QR reports.

 

  • 13 Oct 2006 - Experiments should send to H.Renshall their resource requirements and work plans at all Tier-1 sites (cpu, disk, tape, network in and out, type of work) covering at least 2007Q1 and 2007Q2.

 

Ongoing. H.Renshall is contacting the experiments in order to obtain information about resource requirements.

 

3.      Proposal of OSG Participation to the MB – L.Bauerdick

 

The proposal by L.Bauerdick followed some discussions that he already had with L.Robertson.

 

The OSG is, with EGEE, one of the major providers of grid infrastructure (grid sites, middleware software and grid standards) used by the LCG. Until now some indirect representation was provided by the US-ATLAS and US-CMS representatives.

 

The proposal is that OSG be, from now on, directly represented at the LCG Management Board.

 

The Management Board endorsed the proposal of OSG be represented at the LCG MB.

L.Robertson will inform R.Pordes, the OSG Executive Director, and will ask her to join the Management Board.

 

H.Marten asked about a similar representation for Nordugrid. L.Robertson replied that the agreement with Nordugrid is that they are seen as one single Tier-1 facility, NDGF, and therefore Nordugrid is already represented by O.Smirnova.

 

4.      Update on tools for sites monitoring, after the Hepix meeting (Slides) – I.Bird

 

I.Bird presented the progress made at Hepix in finding general tools for monitoring sites facilities and services.

 

The discussions took place at Hepix, after the presentation of the SAM/SFT testing system. The main sources of information were the developers of existing tools and people at the sites that need monitoring tools (slide 2).

 

There are three groups of tools needed:

-          Fabrics and Sites Management

-          Monitoring

-          System Analysis

 

Hepix will focus on Site Management only, which is one of the important aspects of the problem. 

 

Fabrics and Sites Management

The discussions (slide 3) focused on how Hepix could help and it was agreed that it should:

-          act soon and not wait for next HEPiX meetings, which take place every 6 months.

-          not be limited to one grid community which is why HEPiX appears a more appropriate forum

-          be coupled with other activities on improving tests and reporting facilities

 

Hepix agreed that there should be a series of workshops to improve site management. Therefore is urgent to:

-          Find a coordinator – immediately – and the participants

-          Send mail to HEPiX board inviting for participation

-          Some people already expressed interest in participating (but not in leading/coordinating)

 

The first actions could/should be:

-          Prepare a launch session at January Tier 2 workshop

-          Re-establish cookbook/best practices document that was discussed at earlier operations workshop

-          Take top 5 (or 10) issues seen by operations and ask the laboratories to document how they manage them

 

Monitoring and System Analysis

About Monitoring and System Analysis (slide 4 to 6) there is the need of a way of correlating existing monitoring tools (fabrics monitoring, SAM/SFT and job wrappers monitoring) into a high level display and overview for site administrators and site grid managers.

 

The work should be based on what has been done by ARDA in the analysis and correlation of job logs:

-          In continuing debugging the RB and starting with the CE.

-          In instrumenting existing services (such as Logging and Bookkeeping, etc) and adding job wrappers monitoring.

 

The  goal is to better understand the underlying problems, improve error handling and provide dashboard views of the overall service (by VO, by site, by service)

 

Tools should clearly separate the way information is collected, transported and displayed, so that one could plug-in different components and, for instance, provide multiple views or different displays of the same information.

 

The use of common interfaces should enable interoperability between different monitoring systems. E.g. GIP allows different sensors to be "plugged-in" and gives a common interface to transport layers, Lemon sensors re-used in GridIce, etc.

 

And also the proposal of a "sensors developer guide" to explain how to create standardized sensors. Adapters can then be used to take this information and publish it into any transport mechanism which is available. Will be also useful to have a repository of re-usable sensors, developed for the various monitoring tools (Nagios, Lemon, GridIce, etc).

 

Proposal

The proposal on how to proceed is:

-          Start the HEPiX group:
Find a coordinator to start immediately (should be someone from the LCG).

-          Start 2 RTAG-like groups on (1) Monitoring and (2) System analysis.
They should define the plan of development, priorities and participants responding to the needs of site management and views for each community.

 

The people to involve are:

-          For Monitoring: Site managers, SAM team, monitoring tool developers

-          For System analysis: ARDA team, IC (RTM) people, L&B team, etc.

 

J.Templon noted that the SAM/SFT tests are centrally-run services. But that for the sites is also important to have tools that they can run locally when they want to do so. I.Bird, J.Gordon and other sites representatives agreed that this is a relevant point to bear in mind.

 

Action:

The MB members should send to I.Bird names of candidates for coordination and participation to the three groups (Site Management, Monitoring and System Analysis) before Friday, 27 October 2006.



5.      Feedback on the RRB Meeting and Planning for 2007-8 – L.Robertson

 

The RRB Meeting took place in that same morning. L.Robertson presented a Progress and Status Report on the LCG and C.Eck presented the aspects about budget, accounting and resource requirements.

 

Not much reaction from the Board to the presentations but important news: The MoU has been signed by 20 members and there are still 10 signatures missing. The members that have signed cover 85% of the capacity agreed. The members that have not signed yet are mostly working on improving the commitment of their funding agencies.

 

It was also proposed that only those that have signed will participate to future RRB meetings.

 

The RRB requested that before next meeting the values in the MoU should be updated, taking into account the new plans and the requests revised by the experiments.

 

The MB then discussed whether it was necessary or useful to contact the funding agencies, through the sites or the experiments, to ensure that the revision would take place in time.

 

J.Gordon said that in the UK the proposal was already done (via GridPP) to the funding agency and their answer should come by end of the year.

 

J.Templon said that in The Netherlands the funds were already discussed for the next 4 years in total and this will not change. The distribution over time of when to spend the fund is decided by the Tier-1 and it will have to be revised depending on the experiments requirements.

 

D.Barberis said that ATLAS had discussed the issue with the national computing representatives and that each national community will explain the situation with its funding agency.

 

N.Brook said that, similarly, LHCb had discussed this inside the experiment and in some countries the discussions were already taking place.

 

Y.Schutz said that ALICE is passing this information via their Collaboration Board and that the national representatives are expected to report to their funding agencies.

 

L.Robertson noted that different countries have different processes, but it is important to monitor the situation to make sure that these will be completed by the next RRB meeting (April 2007).

 

H.Marten confirmed that in Germany the funding agencies discuss the resources with a Technical Advisory Board directly and with the sites. Therefore a revision should go via the same Technical Advisory Board.

 

The discussion then covered what kind of information should be revised with the funding agencies.

 

The agreement was that the summary tables prepared by C.Eck should be those on which new requirements and pledges are discussed.

The discussion should actively be done by the experiments and Tier-1 sites in each country and followed-up by the Management Board.

 

Note: C.Eck redistributed updated resource tables after the MB meeting.

 

 

6.      AOB

 

 

No AOB

 

7.      Summary of New Actions 

 

 

Action:

The MB members should send to I.Bird names of candidates for coordination and participation to the three groups (Site Management, Monitoring and System Analysis) before Friday, 27 October 2006.

 

The full Action List, current and past items, will be in this wiki page before next MB meeting.