LCG Management Board

 

Date/Time:

Tuesday 17 January 2006 at 16:00

Agenda:

http://agenda.cern.ch/fullAgenda.php?ida=a057118

Members:

http://lcg.web.cern.ch/LCG/Boards/MB/mb-members.html

 

(Version 2 - 24.1.2005)

Participants:

A.Aimar (notes), D.Barberis, L.Bauerdick, I.Bird, K.Bos, T.Cass, Ph.Charpentier, I.Fisk, B.Gibbard, J.Gordon, F.Hernandez, M.Lamanna, E.Laure, P.Mato, G.Merino, B.Panzer, L.Robertson, J.Shiers

Action List

https://uimon.cern.ch/twiki/bin/view/LCG/MbActionList

Next Meeting:

Tuesday 24 January 2006 from 16:00 to 1700

Minutes and Matters Arising

 

Modified minutes (after feedback from ASGC, IN2P3, PIC and RAL) (minutes )
The changes vs. the previous version are highlighted in blue.

 

The modified minutes did not receive further comments, therefore they are considered approved.

 

Quarterly reports received  at the time of the MB meeting
Areas: Applications
Experiments: LHCb, ALICE
Sites: PIC, CERN, FZK, IN2P3, RAL, US-CMS, US-ATLAS

 

The quarterly reports are urgent therefore should be sent to A.Aimar as soon as possible.

This is the page where the QRs are available:

https://uimon.cern.ch/twiki/bin/view/LCG/QuarterlyReports

 

Outstanding action:

15 Jan06 - All send the QR to A.Aimar. 

 

Matters Arising

 

-          CNAF will be represented at the VO Boxes Workshop (24-25 Feb 2006) by D.Salomoni.

-          RAL will not take part in the SC3 throughput test re-run or in the April SC4 tape bandwidth tests, but will perform tape tests with CERN in June.

 

Action List Review (list of actions )

 

The action list is reviewed outside this meeting, directly contacting the MB members involved.

 

Outstanding Actions:

  • 06 Dec 2005 - CNAF should send milestones for Tier1-Tier1 and Tier1-Tier2 operations. For next MB meeting.

  • 06 Dec 2005 - ALICE should fill the VO boxes questionnaire on operations and send it to the GDB.

  • End 2005 - Experiments provide a more “Tier-1 accessible” description of their models, data and workflow.

  • 13 Jan 06 – Experiments and sites should all send urgently the name of one person with the authority for discussing the details of the plans for SC4).

  • 15 Jan 06 - Areas, experiments and sites representatives complete the Quarterly Reports and send it to A.Aimar.

  • 17 Jan 06Tier-1 sites send updated information and dates about their SC3 tape tests.

  • 20 Jan 06 - ASGC, CNAF, PIC and RAL provide a plan for the deployment of CASTOR 2

Action reschedule:
31 Jan 06 MB - J.Gordon prepares a presentation on the situation of grid accounting

 

31 Jan 06 - B.Panzer will discuss with the experiments and present to the MB a plan with possible dates and resources for the experiments and for IT activities

 

SC4 Planning (more information)

More detailed planning for SC4 has started. Report on the status.
Discussion on the global time-scale for releasing, testing by the experiments, deploying and installing the services on the sites.

 

 

Contact names received (more information )

 

I.Bird, L.Robertson and J.Shiers met the experiment coordinators and spokespersons to discuss the SC4 planning strategy (notes).

 

A first draft of the components that could be available for the SC4 challenge is being prepared by the SC4 team. For each service it will explain exactly which features would be available.

 

The draft will be discussed with the contact names of the experiments and sites and a new draft will be prepared at the end of January for discussion at the Mumbai workshop. The target is to reach agreement on the fiinal plan there, for final approval by the MB before the end of February.

 

In addition, by end of January, for each site there will be a similar plan to show exactly when each service will be available at the site.

 

The current SC4 time scale (backwards) is:

 

-          1st  June: SC4 production phase must start

-          30th April: SC4 software must be ready, in order to have one month to deploy it on all Tier-1 sites.

-          Early March: The beta version of the SC4 software must be deployed on the pre-production servers, to allow experiments to do real testing for 7 weeks and the patches will be implemented.
Therefore experiments should be ready to do the testing in early March.

-          28th February: The SC4 software must be packaged and released in order to install it on the preproduction servers (one week is needed).

-          31st January: gLite 1.5 released and tested. One month will be needed to check and package what is in the release.

 

It was pointed out that this is one month later than in the plan published in the TDR in June. It does not, however, seem practical to proceed any faster.

 

The next release of the distribution with any major changes would be for around November 2006. Patches and updates will be released during SC4, but without functionality changes.

 

J.Shiers noted that, with this schedule, the throughput tests planned for April will not run with the same software used for the SC4 production service. I.Bird said that the schedule foresees the distribution being available for testing by the middle of March, and the key components for the throughput tests (FTS and the SRM implementations) should be available in the restricted throughput test environment at the end of April.

 

It is important that experiments allocate resources to validate the software in March and April.

 

The EGEE-TCG discussion on the content of gLite 3.0 is strongly inter-related with the SC4 planning work. High Energy Physics, in particular the LHC experiments and LCG, give their input in the TCG as to their priorities for various developments. The TCG uses this input, together with those from other application areas, to develop a workplan and set priorities for EGEE developers. This process should ensure an EGEE workplan closely focused on the needs of SC4

 

Disk and tape capacities specified in the MoU (more information)

Follow-up to the discussion started at last GDB about the meaning of the tape and disk capacities numbers. Are the disk caches of the tape systems included in the total disk capacity?

 

The issue is about whether the amount of disk capacity specified in the TDRs includes the disks used to provide caching for the mass storage systems and buffering for the network transfers.

 

LHCb and ATLAS said in last GDB that they did not include this capacity in their MoU requests.

 

For the sites the amount of disks specified in the MoU represents the usable capacity requested for funding, including all caches that will be needed. The size of the caches depends on the computing models and therefore will vary from one experiment to another.

 

LHCb intends to have on mass storage data that will be accessed infrequently and through scheduled processes (e.g. reprocessing). What they call “disk” is all data that needs to be permanently online and that does not need to be stored on the mass storage system. This online data is any way replicated or can be rebuilt, therefore does not need to be stored on tape. The transient space before data is sent to other sites or to tapes is not taken into account in the LHCb numbers (and this is made clear in the LHCb TDR)

 

ATLAS has included the capacity of the buffers needed at Tier-0 for calibration, alignment, processing and reconstruction. The disks on the CAF are accounted separately in the TDR. Tier-1 sites instead can organize their data storage and ATLAS specifies the amount of data to have online and to have on the mass storage system. Therefore the values from ATLAS do not include any estimate of the transient cache or buffering for the reprocessing of the data on the Tier-1 sites.

 

CMS estimated the data sample needed at the Tier-1 sites and all CMS data is going to be stored on the MSS. The “disk” capacity requested bY CMS is assumed to be the MSS cache, with some specific parts of this pinned on disks.

 

ALICE is not present. From previous discussions, also ALICE’s capacity requirements include all their needs both for permanent and transient data.

 

B.Panzer distributed a note with some estimate of the amount of space needed for reprocessing and data transfer in the case that the equipment is dedicated to these functions. Depends widely on the computing model, but is in the order of 10% of the total capacity.

 

The ATLAS and LHCb requirement numbers may need to be modified to take account of the misunderstanding.

 

CMS stated that all changes of requirements from experiments need to be discussed at general level because any change will influence the availability of resources to other experiments sharing the same site. Therefore any change must be agreed at the MB.

 

The MB concluded that it is up to ATLAS and LHCb to consider whether their requirements need to be restated for the Tier-1s. if there are substantial issues to consider the MB will discuss them.

 

 

AOB

 

 

Performance of dual core systems

 

RAL has noted from their initial tests that dual core systems perform only 150% better than a single core system on some applications. Previous tests at FZK, FNAL and other sites show up to 98% improvements. This issue should be clarified and, if needed, reported to the MB in the future.

 

Sites should share their benchmarking results and track this potential issue.

 

SC3 re-run status

The setting up of the re-run was much easier than in July 2005: many lessons have been learned since then.

 

The situation is much better than last year: Many problems have been tackled, the sites are strongly motivated to reach/exceed their targets, several already-scheduled T0-T1 throughput tests lined up to solve further problems.

 

All Tier-1 sites are online except NDGF (but work is in progress).

Some sites (IN2P3, INFN, FZK, and RAL) have reached or exceeded their target rate (TRIUMF and PIC).

BNL and FNAL are running at 100 MB/s, SARA has ongoing network problems, ASGC is running at a very low rate.

 

The current average total rate is about 600-700 MB/s. During short periods the target rate (1 GB/s at CERN) was exceeded, but this must be maintained for a sustained period in order to consider the re-run successful.

 

 

 

 

Summary of New Actions

 

 

none

 

The full Action List, current and past items, will be in this wiki page before next MB meeting.