LCG Management Board

Date/Time:

Tuesday 18 December 2007 16:00-17:00 – Phone Meeting

Agenda:

http://indico.cern.ch/conferenceDisplay.py?confId=22191

Members:

http://lcg.web.cern.ch/LCG/Boards/MB/mb-members.html

 

(Version 1 - 2.1.2008)

Participants:

A.Aimar (notes), I.Bird, K.Bos, T.Cass, Ph.Charpentier, J.Gordon, F.Hernandez, M.Ernst, J.Knobloch, H.Marten, P.Mato, L.Robertson (chair), Y.Schutz,, J.Shiers, O.Smirnova

Action List

https://twiki.cern.ch/twiki/bin/view/LCG/MbActionList

Mailing List Archive:

https://mmm.cern.ch/public/archive-list/w/worldwide-lcg-management-board/

Next Meeting:

Tuesday 8 January 2008 16:00-18:00 – F2F Meeting at CERN

1.    Minutes and Matters arising (Minutes) 

 

1.1      Minutes of Previous Meeting

Minutes of the previous meeting approved.

1.2      Experiments' Representatives for the HEP Benchmarking

Several computers with different configurations (processors types mostly) are being set up, by the HEP Benchmarking working group, in order to prepare for:

-       The execution of the SPEC 2k and 2k6 benchmarks.

-       The tests of HEP Applications by the LHC Experiments.

 

The goal is to start the Experiments’ tests in January therefore the Experiments were asked to propose the person(s) who will be responsible, for each Experiment, of collaborating with the HEP Benchmarking working group.

The replies received so far are:

-       ALICE: Peter Hristov

-       ATLAS: Alessandro De Salvo and Franco Brasolin for technical help.

-       CMS: No reply received.

-       LHCb: Asked for more information, for now how will be as contact for the information emails.

 

New Action:

8 Jan 2008 - H.Meinhard will distribute the information about HEP Benchmarking to the contacts for benchmarking in the Experiments.

 

H.Marten noted that the best would be to set up worker nodes that are not influenced by other limitations (I/O, queues, etc).

L.Robertson agreed but noted that only the CPU time will anyway be considered. This approach will avoid counting the delays caused by the execution environment (queues, network, etc).

 

2.    Action List Review (List of actions) 

Actions that are late are highlighted in RED.

·         21 October 2007 - Sites should send to H.Renshall their resources acquisition plans for CPU, disks and tapes until April 2008

 

NDGF and NL-T1 should send to H.Renshall and S.Foffano an estimate about the delivery of their 2008 capacity.

Done. From the input received seems that NDGF and NL-T1 will be making available the 2008 pledges on October and November 2008.

 

L.Robertson warned that if more sites will reduce their pledges the lack of capacity (esp. Disk capacity) will really become a major issue for the activities in 2008.

 

ATLAS: K.Bos commented that this kind of delays and reductions of capacity can cause considerable problems to ATLAS. ATLAS risks to be short of disk space for the generation of the MC data needed for the challenges, especially in May 2008.

 

LHCb: Ph.Charpentier noted that in the data distributed by S.Foffano it seems that LHCb has enough resources. But actually as NL-T1, if fills their pledges, would be providing more than needed and this hides the lack of resources at several other sites. LHCb needs to have their DST data on all sites and the current set up would not allow that.

 

ALICE: Y.Schutz noted that also for ALICE the ratio in the Tier-2 sites is not respecting ALICE’s expectations.

 

H.Marten asked Ph.Charpentier that LHCb should send the exact 2008 LHCb needs for each site.

New Action:

8 Jan 2008 - Ph.Charpentier agreed to distribute the 2008 LHCb needs for each site to the MB mailing list and to the LHCb national representatives.

  • 30 Nov 2007 - The Tier-1 sites should send to A.Aimar the name of the person responsible for the operations of the OPN at their site.

Not complete.
Received from TW-ASGC, FR-CCIN2P3 (Jerome Bernier), IT-INFN (Stefano Zani), RAL (Robin Tasker)

  • 14 Dec 2007 - L.Dell’Agnello, F.Hernandez and G.Merino prepare a questionnaire or a check list for the Experiments in order to collect the Experiments requirements in a form suitable for the Sites.

Done. Discussed later in the Agenda.

  • 18 Dec 2007 - Experiments should nominate who is responsible for the benchmarking of their applications on the machines made available by the HEPiX Benchmarking Working Group.

On going. See Section 1.2 above. .

 

3.    Sites Services support during the Christmas period – Sites Roundtable

Replies received by email:

-       TRIUMF: 24x7 support and will be responding to alarms as usual.

-       FNAL: 24x7 support as usual. But off-hours calls not available on the December 25 and 31 evenings.

-       INFN: 24x7 support provided as usual except on the December 25, 26 and on January 1. 

-       NL-T1: support on a best effort level for the whole.

 

Present at the meeting:

-       BNL: 24x7 support for the whole period.

-       CERN: Operations coverage 24x7 as normal. Piquet on call as usual with Engineer-level support for the most services (mail, grid services, DB support, etc. With 4-hours reply except on December 25 and 31 in the evenings.

-       IN2P3: Support 24x7 as usual. But maybe with longer reaction time.

-       KIT: Will run for the whole period. First level support will be monitoring the system during the days and call second-level support. Requests by emails could be answered with delays.

-       NDGF: Will mostly be a best effort support except for December 27 and 28 when there will be full support.

-       RAL: Normal monitoring except on December 25 and 31.

 

No information:

-       PIC, ASGC: not represented and no information received.

 

The Experiments activities will be:

-       ALICE: Continue MC production at the current rate (4000 job/day)

-       ATLAS: Will be running validation short jobs. No massive production.

-       LHCb: Will not running much because had small problems and will restart hopefully before the end of the week. The jobs will use the SE therefore the SEs should be available at the sites.

-       CMS: Not represented.

 

4.    SRM 2.2 and CCRC08 Updates (Report; Site Feedback Paper) J.Shiers

 

 

Note: There will be a January CCRC 08 F2F Meeting on Thursday January 10th 2008 from 09:00 in B160 1-009. The draft agenda is here.

 

J.Shiers presented a summary report for 2007. The report that is available here.

The details are in the report, but in summary the main points reported were:

 

CCRC Preparation Overview

-       The CCRC Planning group met every week by phone and F2F once a month.

-       This has led to significant progress in defining the details of the two challenges foreseen
(February 4th – 29th and May 5th – 30th).

-       The window for additional bug fix releases is very limited – additional fixes will be „fast tracked, e.g. through use of pilot services (e.g. FTS, LFC) and/or the Applications Area as appropriate.

-       Much of the discussion has focussed on storage / data management (SRM 2.2) other areas have not been discussed in such detail and “late surprises” are not to be excluded.

 

SRM2.2 Issues

-       The details for ATLAS and LHCb have been provided and sites are in the process of performing the necessary configuration. A dash status has been developed, showing the status of the specific end-points at the various sites.

-       The situation regarding ALICE and CMS is less clear. However, it turns out that – in particular for (multi-VO) dCache sites, supporting space tokens for some experiments and not for others is not a realistic option.
It is therefore proposed that both ALICE and CMS use a single space token (see also the discussion below), e.g. ALICE_DEFAULT.

-       A document describing the implications of the current behaviour with respect to space tokens for Get and BringOnline operations is being prepared. After iteration with the storage and site experts, it is proposed that this be presented in January (GDB or CCRC08 F2F)

 

Recuperation of Tape Space

-       The default assumption is that data created during the challenge will be deleted thirty (30) days after the end of the challenge. This is essential to recuperate the tape space.

-       Any data that is to be retained must use well identified storage tokens (and back-end infrastructure) accordingly.

 

Critical Services

-       Following the WLCG Service Reliability workshop, an action plan has been established covering the services classified as critical – both in terms of the CCRC08 challenges and with the goal of obtaining measured improvement in service quality by the April WLCG Collaboration workshop.

 

Site Concerns

-       Following the presentations by the experiments at the December F2F meeting, additional questions were raised by some of the sites. These were presented to the ATLAS Tier1 jamboree and a revised version has been produced (see MB agenda page). Experiments are asked to provide the requested information no later than the January F2F meeting (30 slot per experiment)

 

Tracking the CCRC Challenges

-       Whilst the various tools for monitoring the state of the services and the experiment views (dashboards) are essentially in place, the way in which operational aspects of the challenge are followed up on a daily basis still have to be finalised.

 

Tiny Files

-       Files should be of 1GB in order not to be inefficiently using the tape systems. Now we have cases very small.

-       The record so far for small (raw data) files is 22 bytes. This is not a single case – it applies to a very large number of files – and is disastrous for the tape system and the total bandwidth to persistent storage. It is unlikely that the target data rates to tape at the Tier0 can be satisfied unless these persistent issues with small files are resolved. There is also a non-negligible overhead related to FTS setup (current transfers of all files show an average of 30‟‟ total – 10‟‟ at source and 20‟‟ at sink, with a long tail out to a few minutes) for the transfer of such files and it is also likely to be an issue for Tier1 sites.

 

New Action:

8 Jan 2008 - ALICE and ATLAS will present at the MB F2F in January how they intend to solve the issues caused by tiny files.

 

CCRC Focus

-       Need to focus on things that are known to work. Other possibilities may be desirable in the future – possibly on the timescale of the May challenge – but the list of outstanding actions for February is such that adding new issues (be they bug fixes or feature requests) when there is already an existing solution only stacks the odds against us.

 

New Action:

8 Jan 2008 - K.Bos had prepared a document describing the ATLAS requirements for the CCRC. J.Shiers will distribute it to the CRC list. 

 

5.    AOB

 

 

L.Robertson informed the MB that this was the last MB meeting he will attend.

I.Bird and the whole Management Board thanked Les for his outstanding contribution to the LCG Project.

 

6.    Summary of New Actions

 

The full Action List, current and past items, will be in this wiki page before next MB meeting.

 

8 Jan 2008 - H.Meinhard will distribute the information about HEP Benchmarking to the contacts for benchmarking in the Experiments.

 

8 Jan 2008 - Ph.Charpentier agreed to distribute the 2008 LHCb needs for each site to the MB mailing list and to the LHCb national representatives.

 

8 Jan 2008 - ALICE and ATLAS will present at the MB F2F in January how they intend to solve the issues caused by tiny files.

 

8 Jan 2008 - K.Bos had prepared a document describing the ATLAS requirements for the CCRC. J.Shiers will distribute it to the CRC list.