LCG Management Board

Date/Time:

Tuesday 11 July 2006 at 16:00

Agenda:

http://agenda.cern.ch/fullAgenda.php?ida=a063092

Members:

http://lcg.web.cern.ch/LCG/Boards/MB/mb-members.html

 

(Version 3 - 19.7.2006)

Participants:

A.Aimar (notes), L.Bauerdick, M.Barroso, S.Belforte, I.Bird, K.Bos, N.Brook, T.Cass, Ph.Charpentier, L.Dell’Agnello, I.Fisk, B.Gibbard, J.Gordon (chair), M.Lamanna, H.Marten, G.Merino, Di Quing, H.Renshall, Y.Schutz , J.Shiers, O.Smirnova, R.Tafirout, J.Templon 

Action List

https://twiki.cern.ch/twiki/bin/view/LCG/MbActionList

Next Meeting:

Tuesday 18 July from 16:00 to 1700

1.      Minutes and Matters arising (minutes)

 

1.1         Minutes of Previous Meeting

The minutes of the 4 July 2006 were only distributed on Tuesday; therefore there will be a few more days for comments. Apologies from A.Aimar.

1.2         QR report 2006Q2

The reports are expected since the 10 July 2006.

 

Received before the meeting (chronological order): CERN, FZK, ALICE, Deployment Area, SARA-NIKHEF, INFN, ATLAS, and PIC.

 

Update: Received since then: ASGC, TRIUMF, IN2P3, LHCb

 

Missing QRs at the time of the MB:

Applications Area, ARDA, CMS, DB-Services, GDB, NDGF, RAL, SC4, US-ATLAS, US-CMS, WLCG

 

2.      Action List Review (list of actions)

Actions that are late are highlighted in RED.

  • 23 May 06 - Tier-1 sites should confirm via email to J.Shiers that they have set-up and tested their FTS channels configuration for transfers from all Tier-1 and to/from Tier-2 sites. * Is it not sufficient to set up the channel but the action requires confirmation via email that transfers from all Tier-1 and to/from the "known" Tier-2 has been tested.

 

Not done: BNL, FNAL, NDGF and TRIUMF.

 

-          BNL had some difficulties that are being solved.

-          FNAL: Their FTS server is running. The connections to Tier-1s are running, the ones to the Tier-2s are still being defined and will be done this week.

-          NDGF does not have any hardware for now.

-          TRIUMF tested the Tier-1/Tier-2s but not yet the connection to other Tier-1 sites.

 

  • 31 May 06 - K.Bos should start a discussion forum to share experience and tools for monitoring rates and capacities and to provide information as needed by the VOs. The goal is then to make possible a central repository to store effective tape throughput monitoring information.

 

Not done.

 

  • 13 Jun 06 - D.Liko to distribute the Job Priority WG report to the MB.

 

Not done yet.

 

  • 30 Jun 06 - J.Gordon reports on the defined use cases and policies for user-level accounting in agreement with the security policy working group, independently on the tools and technology used to implement it.

 

Not done.

Presented to the GDB in June and asked for feedback. Did not receive any.

Reminded it to the July GDB. If J.Gordon does not received any use cases he will propose some use cases himself.

 

 

  • 20 Jul 06 - Tier-1 sites should nominate one or two representatives to the TCG. Nomination should be sent to I.Bird before the 18 July 2006

 

 

3.       Evolving the Operations and Service Coordination meetings (transparencies) - M.Barroso

 

The presentation, by M.Barroso, proposed some changes to the WLCG Operations and to the Service Coordination meetings.

 

First she presented an overview of the current meetings (slide 2).

3.1         Current Meetings

 

WLCG-OSG-EGEE Operations Meeting (OPS) - Link to Agendas

-          Mondays at 16:00 (Wednesdays from mid-September) in 28-R-15.
Usually lasts 1.5 hours.

-          Attendance by representatives of the ROCs, sites, VOs, operations coordination, GGUS team

-          Discuss and solves operational issues from the previous week, as raised by ROCs and VOs.
ROCs and VOs are supposed to send their weekly reports well before the meeting and to raise in advance the issues to discuss.

-          Distributes and discusses information about operational tools and procedures, future releases.

 

L.Bauerdick asked whether the move to Wednesdays in September was definitive. Maite replied that until now there was not any objection. Wednesday is considered better because (1) in this way the OPS is not just after the SCM meeting and (2) is not at the beginning of the week when typically there are  fewer attendees.

 

No objections from the MB to moving the OPS meeting to Wednesday at 16:00.

 

LCG Service Coordination Meeting (LCG SCM) - Link to the Agendas

-          Wednesdays at 10:00 in the OpenLab space at CERN (513 ground floor)

-          Attended by the service responsibles from FIO, PSS and GD groups at CERN
(see any SCM agenda for details).

-          Defines and discusses the deployment and delivery of the CERN services

 

LCG Resource Scheduling Meeting (LCG RSM) - Link to the Agendas

-          Mondays at 15:00 in 40 R-D-10, chaired by J.Shiers

-          Attended by the LHC experiments, Tier-0, SC and PPS representatives.

-          The experiments bring there their resource requirements and their schedule and plans.

 

J.Templon and J.Gordon asked for details about the RSM meeting.

J.Shiers explained that it is a meeting started after Mumbai in order to receive the experiments requirements and plans as soon as they change. For any change H.Renshall updates the wiki page where the Experiments Plans are collected. If the changes are important he also reports them to the following Operations Meeting (OPS) in order to inform all the Tier-1 sites. H.Renshall also reports to the MB, periodically or if there are important issues.

 

Daily CERN Operations Meeting - Link to the Agendas

-          Every morning at 09:00 in the OpenLab space.

-          Attended by the SMOD, GMOD, service teams in FIO, PSS, DES, CS and GD

-          It focuses only on short term operational issues.

3.2         People involved

Slide 3 shows the people and team that coordinate the meetings:

-          The Operations Coordination team: Maite Barroso and Nicholas Thackray

-          The Service Coordination team: Jamie Shiers, Harry Renshall, James Casey, Maarten Litmaath, Flavia Donno

-          The Experiment Integration Support (EIS) team: Andrea Sciaba, Simone Campana, Patricia Mendez Lorenzo, Roberto Santinelli

3.3         Proposed Changes to the OPS and SCM Meeting (Slide 4 and 5)

 

Improve the Information Flow

-          The EIS team should report, for each experiment, to the SCM meeting with an overview of the achievements of the week, possible suggestions and the list of outstanding problems.

-          The Coordination team will collect and take this information (in writing) to the Operations Meeting, in a summarized way and focusing on the outstanding issues to solve. Problems should be discussed and followed-up at the Operations Meeting until they are solved.

 

Improve the Escalation of Problems

The experiments should report all problems and they should be escalated until they are solved:

-          1st step: experiments report to GGUS

-          2nd step: experiments escalate, via EIS, the GGUS ticket not solved to the LCG SCM meeting

-          3rd step: the Service Coordination team bring the issue to Operations Meeting and a new action is created in the OPS Action List, referring to the original GGUS ticket(s) that is put on “hold” status.

 

J.Shiers noted that this process was used with CMS recently and some old outstanding problems were solved efficiently. L.Bauerdick agreed that this process worked very well for their case.

 

J.Gordon asked how the GGUS tickets were escalated. It was explained that they were put on hold and managed through the Operations Meeting where the actions were referenced by GGUS ticket number

 

Changes to the Operations Meeting

 

The OPS meeting should follow more rigorously the action list and do a better prioritization of the actions:

-          Extract the “Top 10 actions” with highest priority, and discuss them at every meeting.

-          Define the escalation step to undertake when an action is not fulfilled on the due date (e.g. escalate to LCG MB).

 

As already mentioned above, when discussing the “information flow”, the OPS meeting will have a section “weekly report on experiments activities and issues”.

 

Ph.Charpentier noted that the SCM meeting is involving mostly people from CERN services and that the participation should be re-discussed. J.Shiers agreed that the participants are mostly from CERN, but replied that many experts are at the meeting (on Castor, FTS, LFC, etc) and they do not cover only CERN issues but all the GGUS ticket that they received, from all the LCG community.

The services not covered by the SCM have their own user support and escalation mechanisms.

All services are *covered* at SCM but some (dCache, WMS, etc) are not *represented* by the responsible people.

 

J.Templon said that is important that the EIS team reports as much information as possible to the SCM meeting. I.Bird and M.Barroso noted that is what should already be happening: the experiment should already “use” the EIS team as the channel for escalating if the GGUS tickets are not solved.

 

Decision:

The MB endorsed the changes to the OPS and SCM meeting proposed in the presentation.

 

Note:

N.Brook restated that LHCb, as announced at the GDB, will continue to define parallel contact channels with the Tier-1 sites.

 

4.      LCG Bulletin Proposal (initial proposal of bulletin, email text ) - A.Aimar

 

A.Aimar described the purpose of the bulletin along with the text of the email distributed to the MB before the meeting.

 

The goal of the bulletin is try to streamline the distribution of information in the LCG. And to concentrate in a short summary "what should be known" by the people in the LCG, providing all links to such information.

The Bulletin will be distributed every two weeks. Therefore future issues will be shorter than the proposal below.

Everybody wishing to have some (relevant) information published in an issue should simply send it to A.Aimar.

 

The suggestions received by email and at the meeting were discussed.

 

D.Boutigny suggested reversing the chronological order (most recent first).

In the future the bulletin will be much shorter.

Note: Done already in Bulletin Issue No. 1.

 

J.Gordon suggested that the link to accounting and monitoring (SAM) data should be provided in a consistent way and be easily reachable from every bulletin issue.

Note: Already done in Bulletin Issue No. 1. The top heading on the bulletin “Sites Availability - Accounting Summary - LCG Planning” contains links that will be in every bulletin issue.

 

O.Smirnova suggested using some weblog tool were many can contribute.

J.Gordon answered that the goal of the bulletin is to concentrate all the information that is already in several wikis, webs, agendas, etc. and try to provide a short summary from where to reach that information. One more weblog were many contribute would not help. - Alberto's editorial input is what distinguishes the bulletin from just another wiki or blog.

 

J.Shiers suggested to have one section collecting information from sites to/from the experiments. But at the MB was unclear how this would work and if there would be any contributions. This possibility is postponed for the moment.

 

The MB agreed on the sections in the initial proposal of bulletin except the “Top Concerns” section. That section should be discussed more in detail at the MB. What should be its contents? How entries would be added and removed from the list? Which body would decide on what are the top concerns?

 

Action:

The MB should reconsider whether to have somewhere a “top concerns” list and, if so, how to manage it.

 

Decision:

The MB supports the initial proposal and the distribution of Issue No. 1.

And suggested that the MB meetings should have an entry like “Items for the Bulletin”.

 

 

1.      AOB

 

 

No AOB.

 

2.      Summary of New Actions

 

 

 

Action:

31 July 2006 - The MB should reconsider whether to have somewhere a “top concerns” list and, if so, how to manage it.

 

The full Action List, current and past items, will be in this wiki page before next MB meeting.