LCG Management Board

 

Date/Time:

Tuesday 20 December 2005 at 16:00

Agenda:

http://agenda.cern.ch/fullAgenda.php?ida=a057114

Members:

http://lcg.web.cern.ch/LCG/Boards/MB/mb-members.html

 

(Version 1 - 5.1.2005)

Participants:

A.Aimar (notes), D.Barberis, L.Bauerdick, I.Bird, N.Brook, F.Carminati, T.Cass, I.Fisk, B.Gibbard, M.Lamanna, S.Lin, H.Marten, M.Mazzucato, G.Merino, B.Panzer, L.Robertson

Action List

https://uimon.cern.ch/twiki/bin/view/LCG/MbActionList

Next Meeting:

Tuesday 10 January 2006 from 16:00

Minutes and Matters Arising

  • Minutes of Last Meeting ( minutes )


FZK suggested using “CPU core” instead of “processor” as a unit for CPU capacity. The MB agreed.

  • Quarterly reports (for sites, areas and experiments)

 

The Quarterly reports will be distributed before the end of the year. Should be completed and mailed back by 15 January 2006.
A small review team will be formed. Will include the PO and two other members, one from a site and one from an experiment.

 

Actions:

 

31 Dec 05 - A.Aimar will distribute the Quarterly Reports for 2005 Q4.

 

15 Jan 06 - Areas, experiments and sites representatives complete the Quarterly Reports and send it to A.Aimar.

 

10 Jan 06 - MB members propose candidates for the QR review team (one from a site one from an experiment).

 

 

Action List Review (15') ( list of actions )

 

  • 06 Dec 2005 - CNAF should send milestones for Tier1-Tier1 and Tier1-Tier2 operations. For next MB meeting.

On the way. CNAF will soon send a preliminary list.

  • 06 Dec 2005 - Experiments should fill the VO boxes questionnaire on operations and send it to the GDB.

ATLAS, CMS and LHCb done. ALICE to do.
The latest information is maintained on the wiki page: https://uimon.cern.ch/twiki/bin/view/LCG/VoBoxesInfo

  • 06 Dec 2005 - J.Shiers - Confirm the reporting and tracking of operations problems deriving from the CMS data transfer service.

On the way. Move the LCG SC weekly conference call to Monday, right after the EGEE operations meeting. Attendance should become more formal, with minutes of the meetings and with regular participation by sites and experiments. An action list should be maintained.

On the way.
Answer from J.Shiers to the MB mailing list:
We clearly need to improve /
formalise the whole problem tracking / support / operations process. This is something that we are working on. At the Asia-Pacific T1-T2 workshop last week, it was proposed that we move the LCG 'SC' con-call to Monday to follow the existing operations call (with perhaps a 15' 'convenience' gap for people to disconnect / (re-)connect. A proposal for streamlining support has already been made at the recent GDB / LHCC review. So whilst it is correct that CMS and other experiments participating in the con-calls do bring up any on-going issues, this is not followed in a systematic or rigorous manner today.

  • 06 Dec 2005 - J.Shiers and K.Bos will distribute a proposal on how experiments plan can be made available to Tier1 sites. For next MB.

On the way. A proposal will be mailed by J.Shiers.

The current understanding is that a weekly report, based on existing reports in IT, will be distributed to the MB list.
The weekly reports currently available within IT are: (1) a weekly report of the EIS team on experiments activities, (2) the weekly report on LCG operations and (3) the EGEE operations report. A summary of those will be sent out to the MB list every Monday afternoon and should improve the situation for Tier-1 sites.

It is important that the report does not only summarize past activities but it should also include the experiments plans for the following week.

  • 6 Dec 05 - L.Robertson and A.Aimar will prepare the High Level Milestones for formal approval by the MB.

Done.

The approval of these milestones is later in this meeting.

  • 6 Dec 05 - J.Shiers has initiated a team to define metrics to measure site reliability and availability, data transfer capability, and overall grid performance . He will distribute a proposal on this to the MB for input from the sites and services, before 6 December 2005.

On the way. H.Renshall will produce the proposal. The LHCC referees wanted something wider than the SFT values, including metrics about middleware reliability.

Action:
15 Jan 06 – I.Bird and L.Robertson will discuss metrics on middleware availability and reliability with J.Shiers and H.Renshall.

  • 10 Dec 05 - Tier-1 Sites (at GDB) should complete their plans following the guidelines of the presentation given at the GDB. (GDB presentation).

On the way. Some sites said that they will send the updated plans when they have their 2006 plans discussed at the beginning of January.

  • 10 Dec 05 - People responsible for each area and service should also define some measurable metrics to follow during Phase 2.

Action:
20 Jan 06 - L. Robertson will discuss with the areas and services managers in order to define measurable metrics for Phase 2.

  • 12 Dec 05 - Sites should send updates to the capacity tables of the MoU.

A letter was sent to all centers asking for confirmation of: (1) the values in the MoU, (2) who will sign and (3) when they plan to sign the MoU. Sites should urgently reply.

  • 15 Dec 05 - The MB will review the situation of CASTOR on middle of December. When the plans for distributing Castor should have been defined.

A successful installation has been performed at CNAF. In January the same installation will be done in RAL. But all sites involved with Castor2 should provide their roll-out plan

Action:
20 Jan 06 - ASGC, CNAF, PIC and RAL provide a plan for the deployment of CASTOR 2 at each site.

  • 19 Dec 05 - L.Robertson should explain the Tier-1 milestones and the procurement issues at next Oversight Board meeting in December.

A summary is discussed later in this MB meeting.

  • 20 Dec 05 - D.Kelsey - The current text in Section 1.5 should have reference mentioning that there will be another document explaining how the user data must be stored and how long must be kept.

Done.

 

D.Kelsey agreed that this is going to be included in a separate document.

  • 20 Dec 05 - L.Robertson - A proposal to answer the recommendation on re-defining experiments support as an area will be distributed before end 2005.

To do.

  • 20 Dec 05 - To complete definition of milestone OPN-2. D.Foster should find information about the GEANT2 plans and send it to the MB.

To do.

  • End 2005 - The experiments provide a more “Tier-1 accessible” description of their models, data and workflow. Should be simpler than the TDR to help Tier-1 sites to understand what they need to set up.

To do.

 

Feedback from the POB meeting (L.Robertson)


POB comments about the LHCC Comprehensive Review

 

The LHCC report draft was distributed to the MB members. It is confidential until it is approved at the LHCC meeting in February. The points in it have all been discussed already in the MB during November.

The summary of the CSO to the overview board highlighted the following concerns:

-          connection between experiments and middleware is weak;

-          delays of the Castor2 and 3D projects;

-          the analysis models are untested.


Connection between experiments and middleware too weak; the presence of the experiments in the TCG is a decision in the right direction but we will have to see how this works out in practice.

Delays of Castor2 have been discussed in detail on many occasions. It is not obvious that the 3D project could have gone faster as it took time to reach an agreement on what the experiments need within the constraints of the available technology.

 

POB discussion about SC4 planning and implementation


The experiments spokespersons stressed several times that SC4 needs to have a complete plan with clear milestones, in order to avoid misunderstandings or raise unachievable expectations.

The goal is to make sure that the sites know exactly what is needed, and the experiments know what will be delivered to them.

 
The experiments proposed a closer operational coordination of the SC4 activities.  SC4 should have a daily meeting, a run coordinator, and daily follow up of issues. With participation of the experiments and of all the active grid centers.

The MB discussion follows. Several different views were discussed, without a final decision.

 

The MB discussed the possibility to have time slots (of one or two weeks) specific to some services. This would help to focus the (daily) meeting and also not have the same attendance during the whole 6 months of SC4.

 

The LCG Operations weekly meeting should continue and become a real operation meeting following up progress. But in addition there is the need of a focused “problem-solving” meeting that concentrates and follows up on specific issues.

 

Maybe a daily meeting is not needed but there must be someone in charge of proactively following/steering the progress of SC4 every day (with phone calls, mails and meetings when needed). 

 

The “run coordinator” should not be always the same person but could be nominated for a period, or for a specific task and role. Just like experiments have a trigger coordinator, an analysis coordinator, a run coordinator, etc.

 

One major difference is that these services will continue to improve also after SC4; therefore a defined lasting service support should be set up. 

 

POB discussion about reporting on accounting

The decision is that all accounting should be reported, both local and via the grid. There was no recommendation on how this should be done.

 

No discussion about the fact that capacity is not yet used or used by other experiments for the time being. The OB did not see this as a major issue (unused capacity can in most cases be used by other experiments) and considers it essential to continue to build up the capacity of the installations to learn how to install and support such a complex infrastructure.

 

POB discussion on minimal memory per core

The agreed memory per core is 1 or 2 GB?

 

This should be decided and then reminded to the GDB. Experiments should state it clearly what they need for 2008.

 

Currently CMS and LHCb require 1 GB per core. ATLAS and ALICE require 2 GB.

 

High Level And Areas Milestones ( documents )

 

Approved.

 

MB meeting format

 

The current format is a phone meeting of one hour every week. And a longer “face to face” meeting in the week of the GDB.


The majority of the MB members likes the current format and it was decided to continue to try the current format before making any change.

 

The one hour format seems sometimes too short because of the time spent to review minutes and action list. The action list processing should be streamlined.

 

Longer discussions should be done during the face to face meetings.

 

AOB

 

No AOB.

 

Summary of New Actions

 

 

31 Dec 05 - A.Aimar will distribute the Quarterly Reports for 2005 Q4.

 

15 Jan 06 - Areas, experiments and sites representatives complete the Quarterly Reports and send it to A.Aimar.

 

10 Jan 06 - MB members propose candidates for the QR review team (one from a site one from an experiment).

 

15 Jan 06 - I.Bird and L.Robertson will discuss metrics on middleware availability and reliability with J.Shiers and H.Renshall.

 

20 Jan 06 - L. Robertson will discuss with the areas and services managers in order to define measurable metrics for Phase 2.

 

20 Jan 06 - ASGC, CNAF, PIC and RAL provide a plan for the deployment of CASTOR 2 at each site.

 

The full Action List, current and past items, will be in this wiki page before next MB meeting.