LCG Management Board

 

Date/Time:

Tuesday 7 February 2006 at 16:00

Agenda:

http://agenda.cern.ch/fullAgenda.php?ida=a057121

Members:

http://lcg.web.cern.ch/LCG/Boards/MB/mb-members.html

 

(Version 1 - 17.2.2006)

Participants:

A.Aimar (notes), S.Belforte, I.Bird, K.Bos, N.Brook, T.Cass, Ph.Charpentier, D.Foster, J.Gordon, M.Lamanna, E.Laure, H.Marten, P.Mato, G.Merino, B.Panzer, R.Popescu, R.Pordes, Di Qing, L.Robertson (chair), Y.Schutz

Action List

https://uimon.cern.ch/twiki/bin/view/LCG/MbActionList

Next Meeting:

Tuesday 21 February 2006 from 16:00 to 1800

1. Minutes and Matters Arising (minutes)

 

1.1 Minutes

No feedback received. Minutes of the previous approved.

 

1.2 Matters Arising

 

FZK SC4 throughput tests in April

In April FZK will be in a “hardware set-up” phase and have difficulty to perform the SC4 throughput tests in that period.

This will be discussed with J.Shiers and the SC4 team.

 

SC4 Sites Planning

The initial sites plans for SC4 (by F. Donno) are available on LCG/Planning page in the section: SC4 Planning.

 

Sites and experiments should review the plans and send their feedback to F.Donno.

2. Action List Review (list of actions )

 

  • End 2005 - CMS provide a more “Tier-1 accessible” description of their models, data and workflow.

    Done – draft provided on 11 February. The documents are all accessible from LCG/Experiments under the heading: Simplified Computing Models.

  • 20 Jan 06 - CNAF provide a plan for the deployment of CASTOR 2 at their site.

    Done. All plans are accessible from LCG/Planning under the heading: Castor 2 Plans.

         31 Jan 06 - B.Panzer will discuss with the experiments and present to the MB a plan with possible dates and resources for the experiments and for IT activities (performance tests, etc). The plan will include a proposal for the ATLAS TDAQ large tests.

On the way. Note sent to the MB members and will be presented during this MB meeting.

         31 Jan 06 MB - J.Gordon prepares a presentation on the situation of grid accounting.

Done. Presented during this MB meeting.

  • 31 Jan 06 - D.Foster should find information about the GEANT2 plans and send it to the MB. Needed to complete definition of milestone OPN-2.

    Done. Presentation during the MB meeting.

  • 31 Jan 06 - L. Robertson will discuss with the areas and services managers in order to define measurable metrics for Phase 2.

    Done. The metrics that will be adopted were presented by H.Renshall at the Mumbai SC4 workshop.

 

3. EGO status - E.Laure (presentation)
Including report on the EGO workshop on 30/31 Jan 2006

 

The EGEE Project Office has organized a workshop in order to discuss a strategy toward a permanent European Grid Organization (working name “EGO”), with the participation of the EGEE MB and of Kyriakos Baxevanides from the EU.

 

The goal of EGO is to provide a sustainable grid infrastructure, as follow-up to the EGEE2 project. A summary, by W.Von Rueden, will be distributed.

 

The presentation at the MB described the goals and the open issues. The model followed is the one of GEANT for the network organization. EGO would be a consortium of the partners involved in EGEE 1 and 2 and provide an infrastructure in collaboration with the National Grid Initiatives (NGI). Slide 4 shows list of NG partners working with EGEE already.

 

The resources would be owned by the NGIs, EGO will enable them to work together and share common channels of communication and operations. Slides 5 and 6 show that EGO would provide:

-          Inter-operation among the NGI infrastructures, coordination of common services, interaction with users’ communities (VOs) and other organizations (EU, GEANT, etc)

-          Testing and certification of middleware, as well as coordination of the NGI support groups.

-          Dissemination, including media relations, event organization and public surveys. As well as public outreach and training of the NGI management and operations.

 

Industry could benefit by making commercial use of this infrastructure or as sub-contractor for support, operations and infrastructure. Industry could contribute also via manpower (joint-projects) and sponsorship (equipment).

 

The funding could come from several channels (NGI, EU FP7, open lab, etc) but this is still all under discussion. Governance of the organization would be by a separate entity, initially hosted at CERN but then moving to separate premises.

 

Slide 9 shows the calendar of key events, with the target date of Q1 2008 for the constitution of EGO.

 

The open issues that still need to be discussed are(slide 10):

-          What type of organization should EGO be? (a consortium or an organisation?)

-          Should EGO be involved in grid development? Have R&D projects?

-          What is the right model for funding involving the EU and the NGI organizations?

-          How will it interact with users communities and bring benefit to industries?

 

A short discussion followed:

EGO’s focus will not be limited to Europe because interoperability, standardization and collaboration with other organizations are fundamental. And the LCG should not have to interact with many grid bodies and separate regional organizations.

 

Currently some Tier-2 sites are dependent on the EGEE funds for operations, but in the future the funding will have to come from other sources.

 

4. OSG status - R.Pordes (presentation)

 

Status of the middleware support:

-          Condor and Globus have a funding of 5 more years from NSF.

-          GriPhyN has ended, maybe one year to bridge the gap for VDT

-          PPDG and iVDGL will complete by end of FY2006.

 

Slide 3 describes the organization chart and the next steps. The OSG consortium is transformed into a project-like structure, with the goal of delivering and operating a Distributed Facility to meet LHC needs and timescales.

 

OSG includes three thrusts (slide 4):

-          Distributed Facility: including operation, security, middleware etc in order to define a long-term infrastructure

-          Education, Training and Outreach: with grid schools, minority and international outreach, etc

-          Science Driven Extension: adding capacity to the Facility via joint projects with external teams and organizations

 

The call for funds (SciDAC 2 call) have been sent, see slide 5 and 6 for the funding levels and the scope. A same single proposal has been sent to multiple program offices in DOE and NSF.

 

It is expected that by June 2006 the call will be answered and, if positive, work should start from FY07 (July or September 2006) for 3 or 5 years.

 

The cooperation between OSG and EGEE to deliver to the WLCG is continuing. Collaboration focuses mostly on:

-          OSG 0.4.0 includes the EGEE information providers, the publishing through OSG-BDII to LCG- BDII, and support (to-date) for RB submissions for ATLAS, CMS and Geant 4.

-          Common testing and monitoring tools (SFT and ACDC).

-          OSG 0.4.1 working in testing interoperation and collaboration with gLite-CEMON.

-          Synchronizing the schedules for the SRM deployment.

-          There are common concerns for the VOMS ADMIN situation, with issue of compatibility of role-based authorization between EGEE and OSG.

 

OSG is also working on interoperability with TeraGrid and contributing to the Multi-Grid Interoperability work between EGEE and OSG.

 

 

5. Accounting issues and policy - J.Gordon (documenttransparencies)

 

 

There is not a single solution for accounting: there are multiple grids and therefore multiple accounting systems have been developed.

 

The accounting system in production since December 2004 is APEL, with about 180 sites publishing data and 100K records per week. Slides 6 to 10 show several examples of views and aggregations of the accounting data.

 

The issues currently open are:

-          Not full deployment of APEL by the sites.
Only 80% of the sites publish their accounting data. Some sites have not the correct configuration while others refuse to collect and publish their data. Tier-1 and ROCs should help to persuade more sites.

-          Legal and privacy concerns.
APEL does not publish information on individual usage. APEL is only recording the activity and data could also be aggregated in order to protect individual privacy.

-          Validation of the data. Are all record collected? Is normalisation correct?
There is no easy way to check the data, and quite often data is submitted locally without using the RB. Sites should check the logs and compare with their existing local accounting systems.

-          Level of detail.
Sites can protect privacy by providing only aggregate numbers, but is this acceptable by the VOs? Should this data be centralized and include also the local site submissions? Currently the summary is “every week the usage of each site by each VO”.

-          Other resources to account in addition to CPU.
Storage could be accounted once a day, similarly to the GGF Usage Record schema (GGF UR). Memory and network cannot be easily accounted.

-          Maintenance problems for the system. APEL. The system depends on data that is coming from many log files and other sources. They are all different and even change format almost at every release (Globus, batch systems, LSF, gLite, etc).

-          Standards and interoperability.
The GGF UR schema is being investigated (slides 20 to 23) and a web-service front-end is being developed in the UK (Manchester).

 

The MB discussed about the mandatory deployment of accounting at all sites, whether accounting should also include the local job submissions and the granularity of the data (VO, or groups, or users).

 

Decision:

As also stated in the MoU, accounting from the sites should be mandatory at the VO group level, and the data be submitted to a central repository. All LHC usage (grid and locally submitted work) must be accounted for. Reports to the CB and to the RRB will only take into account the data in the central accounting.

 

Action (to report to GDB):

1 Apr 2006 - Sites should provide information to the GDB about any legal issue in their country.

 

Action:

28 Feb 2006 – J.Gordon will ask the sites to install the accounting software. If they don’t do it, they should explain the reasons for not doing so. J.Gordon will report to the MB early in March.

 

6. OPN update - D.Foster (transparencies)

Following the meeting on 31 January

 

The Optical Private Network for the LHC includes all the infrastructure needed for connecting Tier-0, Tier-1, Tier-2 sites and relies on many different networks (e.g. USLHCNet, NRENS, GEANT, Internet, etc.).

 

A first clear objective is to take data quickly and reliably from Tier-0 to Tier-1 sites. But the complexity increases when looking also at the Tier-2 sites and all the possible interconnections.

 

Currently the status of the different parts of the LHCOPN to the Tier-1 sites is:

-          Complete: SARA, IN2P3, CNAF, FNAL

-          Spring: FZK, BNL

-          Mid-Year: RAL, NDGF, TRIUMF

-          End-Year: PIC

-          Unclear: ASGC (plans to go to 2x2.5 Gb/s)

 

Probably 6 sites will be ready for April, but not 3 of them on GEANT. Therefore the “OPN milestones” in the WLCG High Level Plan should be modified accordingly.

 

The open issues and problems to solve are:

-          Cost sharing with the GEANT partners is not agreed yet and negotiations are still on-going.

-          Some links are not dedicated but used also by other traffic and will become fully available when needed.

-          Monitoring and Operations are not very clear yet. Several options exist but the exact processes have not been defined yet.
For Operations there are several Network Operations Centers involved (NREN NOCs, DANTE, a proposed EGEE NOC, etc) and it would be useful to have a single LCG entry point for network operations issues.

-          Many network operations are not supported at a 24x7 level.

 

Tier-1/Tier-1 connections are not organized as part of the OPN and are discussed directly between the sites, presumably using the standard Internet connectivity. A connection (via CERN) could be provided but it is not planned unless this turns out to be really needed. Backup links are usually available for all connections and there are several options possible. The focus now is on the main links to the Tier-1 sites.

 

More information on the OPN status is available at the site: http://lhcopn.cern.ch

 

7. Scheduling of the CERN resources in 2006 - B.Panzer (document)

 

 

The document describes the planning for the IT resources at CERN for 2006. It includes allocations of CPU, disk storage and tapes.

On pages 3 and 4 there is the list of the activities planned. Experiments should check that their plans concord with this general scheduling.

 

Clashes will have to be discussed. An example is on page 5 for the “ATLAS DAQ Tests”, during 4 weeks in October, the other experiments will have a reduced amount of resources available (i.e. in week 45 only 55% of the CPU will be available to the other experiments).

 

B.Panzer will produce a week-based detailed table describing the resources needed by each experiment every week of 2006. Once this initial schedule is done all clashes will have to be discussed and negotiated.

 

Action:

28 Feb 2006 - Experiments should clarify with B.Panzer their exact schedule (week no) and resources needed (KSI2000) in 2006 so that the whole plan can be consolidated.

 

8. AOB

 

Because of the CHEP conference the next meeting is in 2 weeks.

 

9. Summary of New Actions

 

 

Action (to report to GDB):

1 Apr 2006 - Sites should provide information to the GDB about any legal issue in their country concerming reporting of accounting data.

 

Action:

28 Feb 2006 – J.Gordon will ask the sites to install the accounting software. If they don’t do it, they should explain the reasons for not doing so. J.Gordon will report to the MB early in March.

 

Action:

28 Feb 2006 - Experiments should clarify with B.Panzer the exact schedule (week no) and resources needed (KSI2000) in 2006.

 

The full Action List, current and past items, will be in this wiki page before next MB meeting.