LCG Management Board

 

Date/Time:

Tuesday 6 December 2005 at 16:00

Agenda:

http://agenda.cern.ch/fullAgenda.php?ida=a057112

Members:

http://lcg.web.cern.ch/LCG/Boards/MB/mb-members.html

 

(Version 2 - 12.12.2005)

Participants:

A.Aimar (notes), L.Bauerdick, D.Boutigny, F.Carminati, T.Cass, Ph.Charpentier, C.Eck, I.Fisk, D.Foster, J.Gordon, M.Lamanna, H.Marten, P.Mato, M.Mazzucato, B.Panzer, L.Robertson (chair)

Action List

https://uimon.cern.ch/twiki/bin/view/LCG/MbActionList

Next Meeting:

Tuesday 13 December 2005 at 1600

Minutes and Matters Arising

Changes to Minutes of 22.11.2005 (document)

 

The changes proposed are approved.

 

Minutes of Last Meeting (documents)

 

The minutes are approved.

 

Matters Arising

L. Robertson received a draft of the Report of the LHCC Comprehensive Review. He had various comments and corrections which he will discuss this evening (6 December) with E.Tsesmelis.

 

Action List Reviewlist of actionsdocument )

 

  • 29 Nov 05 - ATLAS will send a proposal about reporting on issues concerning non-EGEE resources to the MB.
  • 29 Nov 05 - CMS will send a similar statement to formally confirm the CMS opinion on OSG reporting.

Done.

The final version of the Task Forces functions is attached (link).It explains also how ATLAS and CMS will report on their non-EGEE grid sites:

  • ATLAS will have the Task Force and of Operations meetings held consecutively, every second Friday.
  • CMS will report about their non-EGEE sites via FNAL.

December 2005

  • 06 Dec 2005 - J.Gordon will send more information to J.Shiers about the EGEE group on metrics.

Done.

  • 06 Dec 2005 - CNAF, NDGF, ASGC and FNAL should send feedback on the proposed targets for the SC3 re-run. For next week.

Done.

J.Shiers distributed a summary (link) of the feedback received:

 

From J.Shiers’ email:
- The target data rates to ASGC have been confirmed (75MB/s rather than 100MB/s due to network limitations)

- CNAF would like to attempt transfers at the full nominal rate (but outside the formal SC3 rerun, which as such uses the SC3 target of 150MB/s as max per site)

- The level of participation of NDGF is not clear at this time

- All other sites have confirmed the target rate.

  • 06 Dec 2005 - J.Shiers should circulate the full plan and tests for the SC3 throughput re-run. For next MB.

Done.

In the current proposal there is no usage of tapes, as part for the SC3 re-run.
Early in SC4 (April) there will be throughput disk-disk tests at the MoU nominal rates and disk-tape tests at 50 or 75 MB/sec..

The MB decided that the tests to tape at Tier1 sites should be re-instated in the SC3 re-run early in the year, no later than end February. It is felt important to check that the complete dataflow (from CERN disk to network to storage on tapes at the Tier-1) is working, even if these tests are not done using the latest version of the mass storage systems (Castor 1 or 2, dCache). Delaying the first tests to April risks that problems will be discovered too late to be rectified for SC4.

Action: 28 Feb 06 – The disk-tape tests should be re-instated in the SC3 throughput re-run at the beginning of the year, to be completed by the end of February. Such tests should verify that a steady satisfactory transfer rate can be reached and that past problems have been solved.

  • 06 Dec 2005 - ALICE, ATLAS and CMS should fill the VO boxes questionnaire on operations and send it to the GDB.

On the way.

LHCb is added to this action as it plans use VO Boxes during SC4. Not all experiments have provided both VO Boxes questionnaires (operations and security).
The latest information is maintained on this wiki page: https://uimon.cern.ch/twiki/bin/view/LCG/VoBoxesInfo

  • 06 Dec 2005 - J.Shiers - Confirm the reporting and tracking of operations problems deriving from the CMS data transfer service.

On the way.

Answer from J.Shiers to the MB mailing list:
 We clearly need to improve / formalise the whole problem tracking / support / operations process. This is something that we are working on. At the Asia-Pacific T1-T2 workshop last week, it was proposed that we move the LCG 'SC' con-call to Monday to follow the existing operations call (with perhaps a 15' 'convenience' gap for people to disconnect / (re-)connect. A proposal for streamlining support has already been made at the recent GDB / LHCC review. So whilst it is correct that CMS and other experiments participating in the con-calls do bring up any on-going issues, this is not followed in a systematic or rigorous manner today.

  • 06 Dec 2005 - CNAF and FNAL should send milestones for Tier1-Tier1 and Tier1-Tier2 operations. For next MB meeting.

On the way.
The FNAL milestones (link) are attached to the agenda.

Action: 6 Dec 2005: CNAF should send milestones for Tier1-Tier1 and Tier1-Tier2 operations.

  • 06 Dec 2005 - CMS will send a contact name for the migration to Castor 2.

Done.
The CMS contact person is Nick Sinanis. 

  • 06 Dec 2005 - J.Shiers and K.Bos will distribute a proposal on how experiments plans can be made available to Tier1 sites. For next MB.
  • 6 Dec 05 - L.Robertson and A.Aimar will prepare the High Level Milestones for formal approval by the MB.

On the way.
The high level milestones and the areas milestones have been discussed and a proposal is being prepared.

  • 6 Dec 05 - we need the final plan of the 3D project. D.Duellmann should contact the sites to explain what is being planned (including replication at Tier-1 and Tier-2 sites, and clarifying the usages of Oracle, Frontier, Squid, mysql, etc.)

Done.

The presentation of the plan is part of this meeting.

  • 6 Dec 05 - J.Shiers has initiated a team to define metrics to measure site reliability and availability, data transfer capability, and overall grid performance. He will distribute a proposal on this to the MB for input from the sites and services, before 6 December 2005.

On the way:

H.Renshall will produce the proposal.

 

Milestones and Plans Issues

Network Milestones and Service Availabilitymore information )

 

Concerning the milestones:

 

OPN-1

31.12.05

Tier-0/1 high-performance network operational at CERN and 3 Tier-1s.

OPN-2

31.03.06

Tier-0/1 high-performance network operational at CERN and 6 Tier-1s, at least 3 via GEANT.

 
Email from D.Foster: 
 
OK, so I quote from my previous mail:
"So for the 3 T1's I think we can foresee Fermilab, SARA and IN2P3 but
also TRIUMF, BNL and CNAF will not be far behind, if at all by the end
of this year. ASGC has already 2.5G to netherlight (sharing the 10G
transport to CERN via SURFNET) and it is not completely clear how that
will evolve."
 
I think this is clear the list of 6 T1's, with the 3 before the end of
this year.
 
Only CNAF uses GEANT2/GARR.
 
The situation is changing all the time as priorities for connectivity
are adjusted by the NREN's and GEANT. I will ask for a GEANT specific
update.
 

Action: To complete definition of milestone OPN-2 D.Foster should find information about the GEANT2 plans and send it to the MB.

 

DB Project Milestonesdocumenttransparencies )


The plan and the presentation are linked above.

 

The presentation covered the following points:

-          Replication tests

-          Proposed Tier 1 Hardware Setup

-          Proposed 2006 Setup Schedule

-          Production Service Setup Status

-          Production Software Status

-          Open Issues

 

The 2006 Setup Schedule document has been discussed with the Tier1 sites and by the end of January all hardware agreed should be available and tested.


Not all sites have DB teams, therefore only the sites already participating to 3D (ASGC, BNL, CNAF, FNAL, GridKA, IN2P3 and RAL) will be part of to the pre-production in March 2006. The other sites (PIC, SARA, NDGF and TRIUMF) should join 3D now and they will prepare a production setup as soon as they can.

There will be a Tier-1 readiness workshop (Feb 2006) with:

-          reports from Tier-0 and Tier-1 sites about database  and squid cache installation

-          report from ATLAS, CMS and LHCb about conditions implementation for main subdetectors.

 

Open Issues:

         Squid support at Tier 1 sites:
It is not clear who will provide such support. The Tier 1 DB team or sysadmin team?

         Applications server support at Tier 0
FroNtier and ATLAS AMI require the availability of an applications server.
IT-DS provides J2EE hosting service but the support does not seem sufficient. The SLA of the service says that it only covers
“medium-sized, non-critical apps.

         Oracle streams production setup for the Tier 0 site:
To achieve decoupling of the production DB from Tier 1/network problems and to test alternative configurations that are being prepared with Oracle.

         Oracle licenses and support for Tier1 sites must be clarified before the production period starts.

 

The MB approved the proposed plan.

 

Procurement and Resource Scheduling Issuesmore information )

 

Attached there are: the summary of the procurement plans and the collection of emails that discuss this issue.

 

The MB agreed that the tables in the MoU should match what is planned by the sites. And if needed they should be changed accordingly as soon as possible.

 

Action: 12 Dec 05 - Sites should urgently send updates to the capacity tables of the MoU.

 

For 2006 the nominal capacity can be made available by July 2006, but for the following years the required capacity should be available by April each year.

 

At presently not all capacity available is used, but:

-          such capacity is needed in order to provide realistic performance tests in the service challenges

-          the risks in delaying hardware acquisitions too much are high and they need to be reduced

-     the sites need to continue to gain experience in to 2006 in order to be ready to provide full hardware capacity in 2007-08.

 

These issues should be explained to the funding agencies.

 

Action: 19 Dec 05 - L.Robertson should explain the Tier-1 milestones and the procurement issues at next Oversight Board meeting in December.

 

ATLAS Large Scale Test in 2006document )

 

Postponed to next MB meeting.

 

Summary of New Actions

 

Action: 28 Feb 06 – The disk-tape tests should be re-instated in the SC3 throughput re-run at the beginning of the year, to be completed by the end of February. Such tests should verify that a steady satisfactory transfer rate can be reached and that past problems have been solved.

Action: 20 Dec 05 - To complete definition of milestone OPN-2 D.Foster should find information about the GEANT2 plans and send it to the MB.

 

Action: 12 Dec 05 - Sites should send updates to the capacity tables of the MoU.

 

Action: 19 Dec 05 - L.Robertson should explain the Tier-1 milestones and the procurement issues at next Oversight Board meeting in December.

 

The full Action List, current and past items, will be in this wiki page before next MB meeting.