LCG Management Board

Date/Time:

Tuesday 28 March 2006 at 16:00

Agenda:

http://agenda.cern.ch/fullAgenda.php?ida=a061510

Members:

http://lcg.web.cern.ch/LCG/Boards/MB/mb-members.html

 

(Version 1 - 30.3.2006)

Participants:

A.Aimar (notes), D.Barberis, S.Belforte, I.Bird, K.Bos, D.Boutigny, N.Brook, T.Cass, Ph.Charpentier, L.Dell’Agnello, I.Fisk, J.Gordon , M.Lamanna, E.Laure, P.Mato, H.Marten, G.Merino, B.Panzer, Di Qing, L.Robertson (chair), Y.Schutz, J.Templon

Action List

https://twiki.cern.ch/twiki/bin/view/LCG/MbActionList

Next Meeting:

Tuesday 4 April 2006 from 17:00 to 19:00 in Rome

1.      Minutes and Matters arising (minutes)

 

1.1         Minutes of Previous Meeting

Minutes approved.

 

Apologies from J. Gordon for not attending the previous meeting.

1.2         Update on the Plans Updates

The team for the review for the 2006Q1 Quarterly Reports is being formed. Input from the MB members, on possible candidates, is welcome. 

1.3        Overview Board Conclusions

Conclusions of the OB have been distributed to the MB and to the GDB mailing lists. No comments received so far.

 

2.      Action List Review (list of actions)

 

 

No outstanding actions.

 

3.      Heating in the Computer Centre - T.Cass

 

The upgrade of the air-conditioning system in Computer Centre (Build. 513) was planned, and is being executed, from January until March 2006. This implied that, until end of March 2006, there would not be chilled water for air-cooling of the centre.

 

Therefore the unpredicted high temperatures (above 25 degrees) of last week end considerably increased the heat in the CC. In order to avoid any serious damage to the systems some systems had to be switched off. This caused the disruption of some services at CERN (e.g. Castor); while several other systems, the 10 GB routers for instance, reached temperatures almost approaching their “operating limits” and therefore they continued to operate under close monitoring.

 

The situation was back to normal on Tuesday. From the beginning of April an external water chiller will be installed in order to ensure full air-conditioning capacity until the work on the definitive air-conditioning system is completed.

 

The experiments (ATLAS in particular) stated that they were correctly informed about the emergency situation and about the unplanned service interruption.

T.Cass presented his apologies to the users for the inconvenience caused by the disruption of some services.

 

4.      LHCC Referees Feedback - L.Robertson

-          Job & site reliability

 

4.1         Report on the TDR Reviews

The report was only distributed as attachment of February’s LHCC meeting. It does not require any response, but its recommendations should be taken into account for the Comprehensive Review in September.

4.2         LCG 3D performance targets

The LHCC referees requested that clear performance targets and tests should be defined for the LCG 3D DB project.

 

Action:

15 Apr 06 – D.Duellmann should add performance targets and tests in the LCG 3D project plan.

4.3         Metrics for job execution reliability

The referees stressed the need for better data, analysis and statistics of the causes of job execution errors. Clear values about reliability of jobs execution should be produced periodically (once a month for the start) and, whenever possible, be automatically generated.

 

The failure rate seems still quite high. The experiments agreed that some preliminary work of investigation of their log files and databases is needed. In addition they will collect the cases where information is insufficient and that will be passed to the software developers of the middleware, in order to progressively improve error reporting and logging.

 

Better error reporting and adequate quality of logging is a priority for the project because it is very difficult to find, automatically or manually, the causes of the current job failures. This should become a priority for the middleware development and this priority should be communicated to the developers in a consistent way in all working groups (e.g. TCG, etc).

 

Overall coordination of this work is needed in order to collect the needs from all experiment. It is important to approach the issues in a homogeneous way, and have a single channel for the requests to the middleware providers. The MB agreed that this work should be done in the framework of the Experiments Task Forces, involving the EIS support and ARDA people.

 

4.4         Metrics for sites availability

A first useful set of metrics on the availability of each site can be produced using the success rate of the SFT tests.

 

H.Renshall will provide an update at the next MB meeting in Rome.

 

5.      Topics for Next Face-to-Face Meeting in Rome

Proposal of topics should be sent to L.Robertson or A.Aimar in the next few days.

 

The VOBoxes meeting will have just ended on the same day. Clear conclusions will probably not be already available, but a timetable of the “what happens next” could informally be presented.

Update: This timetable will be presented at the GDB meeting in Rome, on Wednesday.

 

6.      AOB

 

 

No AOB.

 

7.      Summary of New Actions

 

 

Action:

15 Apr 06 – D.Duellmann should add performance targets and tests in the LCG 3D project plan.

 

 

The full Action List, current and past items, will be in this wiki page before next MB meeting.