WLCG Management Board

Date/Time

Tuesday 24 February 2009 – Phone Meeting - 16:00-17:00

Agenda

http://indico.cern.ch/conferenceDisplay.py?confId=49392

Members

http://lcg.web.cern.ch/LCG/Boards/MB/mb-members.html

 

(Version 1 – 3.3.2009)

Participants

A.Aimar (notes), I.Bird (chair), K.Bos, D.Britton, T.Cass, J.Casey, Ph.Charpentier, L.Dell’Agnello, M.Ernst, Qin Gang, A.Heiss, F.Hernandez, I.Fisk, S.Foffano, M.Kasemann, U.Marconi, P.Mato, G.Merino, A.Pace, R.Pordes, H.Renshall, Y.Schutz, J.Shiers, O.Smirnova, R.Tafirout, J.Templon

Invited

J.Casey

Action List

https://twiki.cern.ch/twiki/bin/view/LCG/MbActionList

Mailing List Archive

https://mmm.cern.ch/public/archive-list/w/worldwide-lcg-management-board/

Next Meeting

Tuesday 10 March 2009 16:00-17:00 – F2F Meeting

1.   Minutes and Matters Arising (Minutes)

 

1.1      Minutes of Previous Meeting

The minutes of the previous meeting were approved.

 

2.   Action List Review (List of actions)

 

 

  • SCAS Testing and Certification

Was discussed at the Overview Board and in a long tread of email to the MB Mailing list MB Mailing List.

 

  • VOBoxes SLAs:
    • Experiments should answer to the VOBoxes SLAs at CERN (all 4) and at IN2P3 (CMS).
    • NL-T1 and NDGF should complete their VOBoxes SLAs and send it to the Experiments for approval.

 

Below is the latest assessment.

CERN: Done for ALICE, ATLAS, LHCb. CMS still need to agree on the SLA document.

NL-T1: J.Templon reported that the NL-T1 SLA has been sent to the Experiments for review and approval. ATLAS approved the SLA.

NDGF: O.Smirnova reported that NDGF has sent their SLA proposal to ALICE and is waiting for a reply.

IN2P3: Waiting for CMS.

 

  • 16 Dec 2008 - Sites requested clarification on the data flows and rates from the Experiments. The best is to have information in the form provided by the Data flows from the Experiments. Dataflow from LHCb

 

J.Shiers reported that the CCRC 2007 document is the most recent version and clarifies what information the Sites need. The document will be updated for the WLCG Workshop.

 

  • 17 Feb 2008 - R.Pordes agreed to provide, within 2 weeks, the milestones for OSG reporting installed capacity into APEL.

Done. In this meeting.

 

3.   LCG Operations Weekly Report (Experiments Dashboard Summaries; Minutes; Slides) - J.Shiers

 

Summary of status and progress of the LCG Operations since last MB meeting. The daily meetings summaries are always available here: https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOperationsMeetings

3.1      Summary

No “major” incidents last week. Usual background of problems that were addressed promptly – see the details in the minutes from the daily calls.

 

It was agreed to add some Key Performance Indicators (KPI) to the reports, such as:

-       Summary of (un)scheduled interventions (including overruns) at main sites,

-       Summary of sites “suspended” by VOs.
BTW: Do sites always even know they have been suspended?
Experiments to report changes of state at daily meetings

-       Production / analysis summaries. e.g. see CMS Site Availability report - here

3.2      GGUS Summary

No alarm tickets and an increasing number of Team tickets this week.

 

VO concerned

USER

TEAM

ALARM

TOTAL

ALICE

3

0

0

3

ATLAS

16

16

0

32

CMS

13

0

0

13

LHCb

9

2

0

11

Totals

41

18

0

59

 

Soon there will be a test of the alarm tickets. Before the GDB, so that issues can be reported at the GDB.

 

Slide 4 has some examples of an intervention summary.

 

Slide 5 has example of unscheduled interventions that should have been better planned. \

 

L.Dell’Agnello reported that CNAF had the network down on Saturday night and they will send the SIR soon. He noted that was only reported as a “team” ticket and not as an alarm. It should have been an alarm because was a very serious failure.

 

Slide 6 to 8 show how the reliability of the sites working for a VO could be summarized, daily or weekly and with a colour scheme.

 

 

The example above is taken from the current dashboards and J.Shiers will distribute the information used to obtain the pictures above.

He pointed out that this is just an example of visualization to use, not to report on the actual data represented above.

 

I.Bird asked whether the action of combining SARA and NIKHEF as a single site in the GOCDB is progressing.

J.Templon replied that a new site registration has been requested. Once it is available a site BDII must be installed and configured to collect the information both from SARA and NIKHEF in to a single BDII and not separately as it is done now.

 

J.Casey asked that when the change of Site is done NL-T1 should inform the teams at CERN. Many tools have hardcoded the names of the Tier-1 Sites (GridMap, GridView, SAM, etc.): they must all be informed.

 

The same for FNAL and BNL, the BDII configuration must be fixed. M.Dimou is following the situation and will do FNAL in March and BNL in April. The goal is to make sure that they report correctly the accounting information.

 

4.   Feedback from the LHCC Mini Review and Overview Board Meeting (Slides) - I.Bird

 

4.1      LHCC Mini Review

The material available on the Mini Review is the following:

-       MMP.pdf presented in the closed sessions by the Review Chairman (Mario Martinez-Perez).

-       Conclusions_from_mini-review.pdf summarized by I.Bird.

 

Note: The other presentations of the closed sessions were not yet available.

 

The recommendations of the reviewers are that:

-       Make some form of CCRC09 challenge as in 2008 to verify conditions where at least ATLAS and CMS run in parallel.

-       Test reprocessing at Tier 1s (recall from tape) and massive/chaotic user analysis at Tier-2 Sites

-       Need to define metrics with which to evaluate this activity above

-       Make sure we are not limited by resources when data comes. Sites should not delay of one year their purchasing.

 

The main comments were:

-       There is the need an official statement on 2009/2010 running time and LHC efficiency factor common for all experiments so they can provide a consistent/coherent estimation of resources needed in 2009/2010. The LHCC Committee will send this statement via M.Martinez-Perez.

-       The Experiments still suffer from MSS performance.”

-       Applications Area, very good progress on all fronts with very mature organisation well managed and giving results”

4.2      WLCG Overview Board

The material available is the following:

-       IGB-POB-Status-230209.pdf presentation by I.Bird at the Overview Board meeting.

-       Summary_of_OB.pdf

 

Below are the main topics discussed:

 

LHC schedule:

-       I.Bird will circulate efficiency numbers, when received from the LHCC (ACTION),

-       Experiments need to reassess their new requirements for MB in the week after CHEP (ACTION)

 

New Action:

First Week April- Experiments present to the MB (Exp. with spokespersons) the assessment their requirements for 2009-2010.

 

SCAS readiness:

-       A.Van Rijn will send an updated plan

 

INFN Resources:

-       INFN will agree their plan with Experiments. The 2009 resources will be installed by April 2010, but can buy 2010 resources before if really needed and urgent.

 

EGI:

-       The OB will send a statement to the EGI_DS PB. (ACTION: must quantify list of services)

-       The OB endorses the need for a gLite consortium.

 

The draft proposed is the following.

 

OB Statement on EGI

 

The WLCG Overview Board strongly supports the creation of a European Grid Infrastructure based on National Grid Initiatives with a European level coordination. In particular WLCG will rely on the National infrastructures to provide operational tools and services for the Tier 1 and Tier 2 sites in each country, and requires a European coordination body with which it, as an application community, can work together on requirements and evolution of the services. The Overview Board also supports the concept of a Specialised Support Centre for High Energy Physics, and WLCG would collaborate with EGI.org in the setting up of such an organisation.

 

The Overview Board is concerned about the timescales involved, in particular the timing of a transition between EGEE and the EGI/NGI model, which comes at a time during the first year of accelerator running when the disruption of existing services will be least tolerable. To this end the WLCG will work together with EGEE and the EGI_DS projects to propose and evaluate acceptable transition scenarios. There is also concern over the preparedness of the NGIs to be able to take over the core operation in 2010, and the Overview Board would like to see evidence of progress of the NGIs committing themselves to the EGI model.

 

 

 

US participation in ion program

-       It should be resolved. OB endorses no multiple solutions (ACTION: I.Bird will follow up in consultation with ATLAS/CMS, etc).

 

Under the ALICE and CMS Ion programs the US funding agencies asked to pay for additional transatlantic network but the OB asked that this is fixed by the US funding agencies.

 

I.Fisk asked for the names of the funding agencies that are asking for additional payment for the network.

I.Bird replied that the contact persons are D.Petravik and Bolek Wyslouch.

 

5.   Plans for Reporting OSG installed capacity (Slides) - R.Pordes

 

R.Pordes reported the plans of OSG, US ATLAS and US CMS on reporting the installed capacity at the OSG Sites.

The goal is to report installed capacity and information about the usage of such resources.

5.1      Timeline

Below is the timeline and progress of this activity.

 

Date

OSG Deliverable

Status

February 15‘09

Complete development of GIP release and configuration scripts to meet needs for Site Reporting of Dynamic usage information.

Done. One patch identified as needed.

February 23-March 10

Evaluation of mechanisms for monthly reporting of Installed Capacity from management to WLCG management reports.

Discussion with US LHC management, GOC, EGEE, for publishing of static reports from GOC to WLCG office (APEL). Decision on Mar 10th.

March 15 ‘09

OSG release with needed GIP and Configuration scripts

In test on Integration Test bed now.

March through May

Deployment on US LHC sites.  Will depend on US LHC Tier-2 coordinators to encourage sites to upgrade.

May 1 ‘09

Finish development of publishing and validation scripts for OSG and WLCG reports.

Downloaded GIP validation scripts from EGEE. Starring to install/look at these now.

July 1 ‘09

Dissemination and validation of published reports.

 

I.Bird asked when is the first draft report going to be ready. .

R.Pordes replied that for installed capacity it should be in May.

 

The discussions between OSG, US ATLAS, US CMS management.  Plan to distribute this plan with the agencies over next month.

Report monthly installed capacity/resources and dynamic information for the VO through 2 separate paths.

 

Note: One difference with EGEE is that all US Tier-1 and Tier-2s are each supporting a single VO. Therefore there are no issues of transfer of availability between VOs.

 

In addition are working on contributions to the “WLCG Installed Capacity Deployment Plan”(with F.Donno) they are:

-       Writing “OSG configuration and installation” (today or tomorrow)

-       Working on table with attributes for OSG software tools group (draft distributed)

-       Working on more detailed internal OSG documents and will move relevant information into WLCG document.

5.2      Monthly Installed Capacity Reporting

OSG will report the minimum between the pledged and installed capacity. Will not report installed capacity that is above the pledges. Meanwhile the dynamic information for the VOs will be correct.

 

The WLCG working group/GOC/EGEE monitoring to evaluate technical impact of direct publishing information input directly (and thus vetted by definition) by the US LHC management.

 

It will be “text based” information to APEL for WLCG management and the averages (CPU, storage per VO, both averages and peaks) are calculated on OSG side.

 

M.Kasemann asked why instead of the average the report is not about the storage at the end of the period.

R.Pordes replied that the agreement is that both “average” and “peak” installed capacity values will be reported.

5.3      Dynamic Information to the VOs

This information is for the VOs (i.e. not for the WLCG management reporting)

 

US CMS

Tier-2 sites have correct computing data and ~50% have correct storage data. Additions are in progress.

 

I.Bird asked whether it is now corrected because the Sites are using the latest version of dCache.

R.Pordes replied that this is likely the case.

 

US ATLAS

Are working on making the information correct on OSG. It is currently US ATLAS policy not to publish resource discovery information to WLCG. We may publish ATLAS sites to WLCG, but only for the usage attributes (i.e. not for resource discovery).

 

I.Bird asked whether the fact that the US T2 are invisible outside OSG the non-US users cannot access it.

R.Pordes replied that is already the situation.

K.Bos replied that ATLAS is not using the BDII only. And they can use Panda to submit their jobs to the US Sites.

 

I.Bird asked the limitation of the cloud.

K.Bos replied that they try to limit the access of data within a cloud and not over clouds.

 

US ATLAS and US CMS software and computing management talking to ATLAS and CMS management about whether this dynamic information is already available in the VO services/databases. If so, is this additional information really still needed from the Grid layer (GIP/BDII)?

 

6.   AOB

 

 

No Meeting next week, because of the EGI Workshop in Catania.

6.1      Next F2F and GDB Meetings

The next meeting will be the F2F meeting in 2 weeks. The agenda should be coordinated with the GDB.

 

J.Gordon proposed the following

-       Installed capacity (F.Donno)

-       Pilot jobs and SCAS status and schedule

-       CRFEAM installations and roll-out

-       WMS performance

-       Access to Accounting Data Policies (D.Kelsey)

 

On installed capacity, F.Donno will be replaced by S.Traylen during her absence.

 

J.Templon asked about deployment of 64 bit systems and how to limit the usage of treads and memory.

J.Gordon replied that this would be a good topic for a pre-GDB discussion or GDB. But should be distributed to the Sites before so that they come ready to the meeting.  

 

7.   Summary of New Actions

 

 

New Action:

4 March 2009 - M.Schulz to present the list of priorities for the Analysis working group.

 

New Action:

First Week April- Experiments present to the MB (Exp. with spokespersons) the assessment their requirements for 2009-2010.