LCG Management Board

Date/Time

Tuesday 10 February 2009 – F2F Meeting - 16:00-18:00

Agenda

http://indico.cern.ch/conferenceDisplay.py?confId=49390

Members

http://lcg.web.cern.ch/LCG/Boards/MB/mb-members.html

 

(Version 1 – 13.2.2009)

Participants

A.Aimar (notes), D.Barberis, I.Bird (chair), K.Bos, M.Bouwhuis, D.Britton, T.Cass, Ph.Charpentier, L.Dell’Agnello, S.Foffano, Qin Gang, J.Gordon, F.Hernandez, M.Kasemann, P.Mato, G.Merino, S.Newhouse, B.Panzer, R.Pordes, H.Renshall, Y.Schutz, J.Shiers, O.Smirnova, R.Tafirout,

Invited

R.Forty, F.Gianotti, M.Girone, H.Meinhard, J.Virdee

Action List

https://twiki.cern.ch/twiki/bin/view/LCG/MbActionList

Mailing List Archive

https://mmm.cern.ch/public/archive-list/w/worldwide-lcg-management-board/

Next Meeting

Tuesday 10 February 2009 16:00-17:00 – F2F Meeting

1.   LHC Schedule after the Chamonix Workshop (Slides)

 

1.1      Introduction

Before the usual agenda I.Bird presented the latest news about the LHC Scheduling in order to have it discussed at the MB Meeting. He summarized the previous discussions on the topic.

 

The spokespersons (or their representatives) of the 4 LHC Experiments were invited to participate to this discussion.

 

Before the Workshop this was the situation.

 

November 2008

The WLCG MB had agreed that with the information currently available and the present understanding of the accelerator schedule for 2009:

-       The amount of data gathered in 2009 is likely to be at least at the level originally planned, with pressure to run for as long a period as possible this may be close to or exceed the amount originally anticipated in 2008 + 2009 together

-       The original planning meant that the capacity to be installed in 2009 was still close to x2 with respect to 2008 as part of the initial ramp up of WLCG capacity

-       Many procurement and acceptance problems arose in 2008 which meant that the 2008 capacities were very late in being installed; there is a grave concern that such problems will continue with the 2009 procurements

-       The 2009 procurement processes should have been well advanced by the time of the LHC problem in September

 

The WLCG MB thus does not regard the present situation as a reason to delay the 2009 procurements, and we urge the sites and funding agencies to proceed as planned.  It is essential that adequate resources are available to support the first years of LHC data taking.

 

January 2009

The MB agreed to wait for the results of Chamonix Workshop (Feb. 6) to understand better the likely running schedule for 2009 and 2010.

 

The goal was to:

-       prepare an updated plan for resource procurement /installation/ commissioning taking into account new schedule and site constraints

-       discuss this plan with the LHCC on Feb 16 (mini-review) and following days and

-       Present this plan to the RRB in April.

1.2      Implications of the Workshop in Chamonix

Below is the schedule agreed at the Workshop in Chamonix.  

 

 

In the timeline above one can see:

-       Injection in Sept. 2009

-       Collisions in Oct. 2009

-       A long run that will last about one year, starting in November 2009.

-       The start-up is delayed vs. what anticipated but will be running over winter.

-       A short stop for LHC over Christmas. But one should check the Experiments’ work during that period (i.e. no stop for WLCG).

-       Energy will be limited to 5 TeV

-       Heavy Ion run is scheduled at the end of 2010.

-       At the end of 2010 there will be 6 months of shutdown.

 

This scenario in practice makes the WLCG to go back to the original capacities and resources planned for 2009 and 2010.

 

2009
Is like the 2009 run just shifted to October and will continue until the original 2010 run. Therefore all 2009 resources should be commissioned by September 2009.

2010
The 2010 run will start by April 2010 as planned. Disk installations could be staged as already discussed in the past at the MB (April and August?).

1.3      Possible Issues

The schedule has some implications on Sites and Experiments::

-       Sites will have to provide installations while the run is on, obviously without impact on the Experiments’ activities.

    • Are there Tier 1 issues with the installation scheduled for 2010?
    • Funding agencies (and some sites) could see the delay as a reason to push back all procurements by ~ 1 year (i.e. 2009 is like 2008 should have been etc.)

-       Experiments will not be able to change the original requirements

    • No change in budgets, but delay in some cases will allow for more resources for same cost
    • How to handle the ATLAS request for additional Tier 0 resources?
    • How do Experiment models deal with no shutdown?

 

What is the counter-argument? (Cosmics will not suffice as justification). We must ensure that we have adequate resources to rapidly exploit the data from this first period of running – the computing must not be the block to extracting physics

 

The goal is to get physics out as rapidly as possible:

-       Is it useful to revisit the idea of a full analysis simulation exercise now (at least ATLAS+CMS simultaneously)?

1.4      Comments

ATLAS

D.Barberis stated that ATLAS is reassessing their requirements but for the moment are not yet available. But, without details and with possible minor changes, he summarized that for ATLAS:

-       2009 is similar to what originally planned ATLAS will probably need the same 2009 resources, just shifted from April to October.

-       2010 will start in April as originally planned and the requirements will not change much.

 

LHCb

Ph.Charpentier noted that there are many other parameters to discuss on the schedule, not just how many days of run (seconds of run, efficiency, event rates, etc). How many seconds will be run over the 44 weeks of run? If the efficiency is lower one will need fewer resources needed. These figures must be known in detail before requirements are defined.

 

F.Gianotti (ATLAS) replied that initially the duty cycle will be low (5-10%) but it could go up quickly. It is difficult to predict today how quickly it will happen. But 5 TeV will be reached progressively but quite soon, i.e. as soon as possible.

 

CMS

J.Virdee added that the amount of data will be decided by how many events the Experiments will decide to store. The availability of disk, for instance, must be a limiting factor for the data-taking. The long run could provide more complex events and how to manage them could be an issue. Experiments need some time to understand the implications of a long run. Is important that Sites are able to provide new resources while the LHC is in full run and WLCG data is recorded and distributed.

 

M.Kasemann agreed with D.Barberis that 2009 and 2010 are going to be like the original full nominal years. Simply 2009 is shifted from April to October 2009 and is now contiguous with the start of the run in 2010. The 2009 resources should be ready by September 2009 and the 2010 by April 2010.

 

I.Fisk noted that adding new resources while the systems are running is likely not a problem. But if there are scaling limitations they will be reached by the long run. Tests in 2009 should try to identify and solve possible scaling limitations (queues, DBs, etc). The number of users will be at a level never reached until now.

 

J.Gordon asked when the Experiments sites will be closed and will start taking data. When they start with cosmics initially, even without collisions?

F.Gianotti replied that the schedule for closing the detector will be defined by early March.

J.Virdee confirmed that CMS will have 4-6 weeks of cosmics before collisions generating about 1-1.5 PB of data.

 

ALICE

Y.Schutz confirmed that also ALICE is preparing their new requirements after the Workshop in Chamonix. ALICE needs the 2009 resources by September and the 2010 resources at the end of 2010, for their heavy ion runs. .

 

J.Virdee noted that organizing CCRC09 seems difficult – CMS is not in favour - and one will have to rely on the fact that Experiments run at the same time, But cannot be a coordinated effort.

I.Bird replied that one should at least try how many users can be supported before the system breaks. If is not called “CCRC09” there should be other specific coordinated tests to be performed. For instance User Analysis at full scale tests for the Experiments should overlap.

 

G.Merino asked that Experiments provide their estimates in terms of quarters so that Sites can tune their installations.

 

M.Bouwhuis (NL-T1) stressed the fact that the Experiments should use the same formats for specifying their estimates. Sites have difficulty to understand different units and descriptions.

 

I.Bird concluded that the new Experiments plans and requirements must be ready by mid-March in order to be presented at the RRB in April.

 

2.   Minutes and Matters Arising (Minutes)

 

2.1      Minutes of Previous Meeting

FZK has explained in an email the delays on reporting an issue to the Operations meeting.

 

I.Bird suggested that this was an isolated issue and it does not further discussion. Item closed.

 

No other comments. The minutes of the previous MB meeting were approved.

 

3.   Action List Review (List of actions) 
 

Not all due actions were discussed.

  • VOBoxes SLAs:
    • Experiments should answer to the VOBoxes SLAs at CERN (all 4) and at IN2P3 (CMS).
    • NL-T1 and NDGF should complete their VOBoxes SLAs and send it to the Experiments for approval.

NL-T1: D.Barberis reported that ATLAS approves the SLA from NL-T1.

NDGF: O.Smirnova reported that NDGF has sent their SLA proposal to ALICE and is waiting for a reply.

 

4.   LCG Operations Weekly Report (Minutes daily meetings; Slides; WLCG Collaboration workshop agenda) – J.Shiers

 

Summary of status and progress of the LCG Operations since last MB meeting. The daily meetings summaries are always available here: https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOperationsMeetings

No “major” incidents last week. There was the usual background of problems that were addressed as they arose. Experiments were pleased of the quick replies from the Sites.

 

Some “SIRs” are still outstanding:

-       Interim report from FZK on Oracle problems affecting FTS/LFC

-       Report on CMS 500 lost files at FZK (see last week’s minutes).

-       High failure rate seen with ATLAS transfers to  FZK (20-80%) – problem understood and requires a compact of DB

 

No Alarms in the GGUS tickets this week.

 

VO concerned

USER

TEAM

ALARM

TOTAL

ALICE

3

0

0

3

ATLAS

25

8

0

33

CMS

5

0

0

5

LHCb

11

2

0

13

Totals

44

10

0

54

 

Slides 4 and 5 explain issues at FKZ with their ORACLE DB and dCache. The response time with ORACLE should be clarified.

 

J.Shiers proposed a few questions about the coming WLCG Workshop:

  1. Experiment plans: Sites often say that they are not clear in terms – could a site provide a rapporteur to collect and present the information in the right format?

 

D.Barberis commented that if a template is defined the Experiments are probably happy to complete them.

G.Merino agreed that is useful to prepare a template.

 

  1. Given the recent LHC news does it make sense to have a talk from LHC operations or should we save this for later?
  2. More suggestions are welcome – they typically come in the last days (or hours) which can be difficult to handle.

 

5.   Changing the Day of the GDB Meeting (Slides) – J.Gordon

 

At the previous F2F was proposed to move the GDB on Tuesday finish by 4 PM. J.Gordon summarized the feedback received during last month.

 

The positive factors in favour of this change were that GDB + MB would only take one day. In addition would be better if the MB was following the GDB in order to approve right away the proposals coming from the GDB. On the other hand often the MB needs some time before approving a proposal just made by the GDB.

 

The negative aspects are that people already have other Tuesday meetings scheduled and travelling on Monday clashes with EGEE/WLCG Weekly Operations Meeting. There would be little time to prepare GDB issues for MB immediately afterwards and no time for pre-GDB meetings, that usually need to report to the GDB.

 

Given that the moving the GDB causes changes to more people why not to move the F2F MB to Wednesday?

 

Proposal:

The proposal from J.Gordon is to leave the GDB and MB meetings unchanged.

 

I.Bird supported the proposal, provided that the F2F MB and the GDB avoid in the future overlapping presentations. This implies that the MB before the GDB the agendas are discussed.

 

6.   AOB

 

6.1      Frequency of the MB

I.Bird asked whether the MB meeting could be reduced in frequency and have 2 MB meeting per month. One F2F + one phone meeting 2-hours long. To be discussed next week.

6.2      Tier-1 Reviews

Should Tier-1 reviews be organized as had been mentioned? How should they be organized? Visiting the Site, for instance? Having a standard Q&A to fill? Are the Experiments going to review the Tier-1 Sites?

 

J.Gordon suggested that maybe the Sites could review themselves, with a rotary system, with the Experiments’ participation.

F.Hernandez added that in his opinion would be also very useful if the Sites share best practices, more than be reviewed.

 

I.Bird agreed that maybe “review” is not the purpose but is more about “sharing knowledge”.

K.Bos noted that the ATLAS Tier-1 visits were very useful. The Sites who felt reviewed were the most difficult to work with. Should be a visit to the Sites, not a review.

 

7.    Summary of New Actions