LCG Management Board

Date/Time:

Tuesday 13 March 2007 - 16:00 – 17:00 – Phone Meeting

Agenda:

http://indico.cern.ch/conferenceDisplay.py?confId=11629

Members:

http://lcg.web.cern.ch/LCG/Boards/MB/mb-members.html

 

(Version 1 – 15.3.2007)

Participants:

A.Aimar (notes), I.Bird, Ph.Charpentier, L.Dell’Agnello, D.Duellmann, M.Ernst, J.Gordon, C.Grandi, F.Hernandez, J.Knobloch, E.Laure, M.Lamanna, H.Marten, P.Mato, G.Merino, R.Pordes, L.Robertson (chair), Y.Schutz, J.Shiers, R.Tafirout, J.Templon

Action List

https://twiki.cern.ch/twiki/bin/view/LCG/MbActionList

Mailing List Archive:

https://mmm.cern.ch/public/archive-list/w/worldwide-lcg-management-board/

Next Meeting:

Tuesday 20 March 2007 - 16:00-17:00 – Phone Meeting

1.      Minutes and Matters arising (minutes)

 

1.1         Minutes of Previous Meeting

Comment received from F.Hernandez and J.Templon. Minutes of the 6 March 2007 updated (changes are in blue).

1.2         Documents Distributed

A.Aimar distributed the following documents for comment and approval from them MB next week:

 

-          QR Report modified (QR Template)

In the QR report the “Summary of Progress” is now split in several sections covering different services and activities of each site. In this way all sites will report in a more uniform and complete way, covering all aspects of their service.

 

-          Targets and Milestones for 2007 - Upcoming Milestones (Targets and Milestones)

 

Targets for 2007
A.Aimar showed the latest targets provided by ATLAS and CMS.
The targets from ALICE and LHCb are awaited.

 

Milestones for 2007
A.Aimar will distribute the High Level Milestones in a matrix format (milestones vs sites) so that one can easily see, for each site, which milestones have been achieved.

 

2.      Action List Review (list of actions)

Actions that are late are highlighted in RED.

 

  • 22 Feb 2007 - Experiments (ALICE and LHCb in particular) should verify and update the Megatable (the Tier-0-Tier-1 peak value in particular) and inform C.Eck of any change.

Done. C.Eck reported that LHCb’s values are now correct. The values from CMS and ALICE still need to be updated.

 

  • 27 Feb 2007 - H.Renshall agreed to summarize the experiments; work for 2007 in a “Calendar/Overview Table of the Experiments Activities in 2007.

Not done. H.Renshall presented the Experiments Plans at the GDB on the following day.

 

  • 10 Mar 2007 – ALICE and LHCb should send to A.Aimar their data rates targets for 2007.

Not done.

 

  • 15 Mar 2007 - ALICE, CMS, LHCb should send to C.Eck their requirements until 2011.

Not done. ALICE values were sent to Chris. Waiting for CMS (D.Newbold) and LHCb (N.Brook).

 

  • 16 Mar 2007 - Tier-0 and Tier-1 sites should send to the MB List their Site Reliability Reports for February 2007

 

 

3.      GDB Summary (Document) – J.Gordon

 

J.Gordon presented a summary of the latest GDB meeting. More details are available in this document.

Here are the main points, without going in to all the details:

 

Introduction

-          Next GDB meeting will be in Prague, as well as the F2F MB Meeting. Phone conferencing will be available as usual.

-          No changes in the Pre-GDB and GDB Meetings. The Pre-GDB meeting will remain on Tuesdays with flexible time. The F2F MB meeting will remain at 16:00 on Tuesdays.

-          Countries with a Tier-1 representative should nominate a second (non-voting) representative for their Tier-2 sites.

-          Starting in March, Accounting Data will be produced manually and also automatically from the APEL repository. From May the Accounting Data will be only extracted from APEL automatically, no more manual accounting. Many sites do not yet publish their data in APEL.

SL4 Porting

-          The WN and UI are buildable on SL4, but not yet using ETICS

-          Major decisions will be required if the gLite WMS or CE are not considered acceptable to the experiments. Continuing with LCG versions is doable but will require porting to SL4.

-          SA standard way to advertise the OS run on a site (32 bits, 64 bits, mixed, sl3 on sl4) must be uniform across sites. A SAM tests will check that only standard OS names are used.

BDII

-          Load problems have made the BDII a bottleneck for many job and test failures.

-          The short-term solution is that the Tier-1 sites add top-level regional BDIIS and that Tier-2 sites point to the right BDII, not to the BDII at CERN (which is a kind of “catchall” server and is overloaded).

-          Longer-term improvements should take into account the BDII’s scalability problems and include more caching in the client, with clear separation between static and volatile information.

VOMS

-          Need more coordination. In particular some working groups are needed to:

-          Coordinate between user registration and experiments

-          Verify that among implementations there are the same assumptions and behaviour (storage, batch, ACLs, generic attributes, etc).

This should not overlap with the TCG; the mandate of these groups would be to prepare the list of LCG priorities as input to the TCG.

Job Priorities

-          Sites agreed to report at the GDB in April about their progress.

Access Control for Storage

-          This was a report on how VOMS groups and roles are used by the different storage systems to control access to data.

-          DPM and StoRM have already full implementations. dCache has significant support, CASTOR has minimal support and BestMan (DRM) has none.

-          We cannot expect grid-wide consistent VOMS-ACL support this year for files or space tokens.

Accounting

-          Accounting by Primary FQAN (the same as used by Job Priorities) has been deployed in APEL but to work correctly it requires a patch which is currently in certification.

-          UserDN information is encrypted but the FQAN is currently not encrypted. While this may be needed in the future it was agreed that there was no reason to encrypt it for now because this would delay deployment.

 

Future GDB meetings should not take place at CERN in March, in order not to clash with the Geneva Car Show.

 

4.      Status of the 3D Project (Slides) – D.Duellmann

 

D.Duellmann presented a progress update of the 3D Project.

4.1         Requirements Update

The “Old” request (unchanged since November ’05 GDB)

 

ATLAS Tier-1:

-          3 node db server, 300 GB usable DB space

 

LHCb Tier-1:

-          2 node db server, 100 GB usable DB space

 

CMS T1+2:

-          2 squid server nodes, 100 GB cache space per node

 

An updated request for the first half of 2007 month has been collected at the January 3D workshop and presented to the GDB meetings linked below:

-          http://indico.cern.ch/conferenceDisplay.py?confId=10132

-          http://indico.cern.ch/conferenceDisplay.py?confId=8469

4.2         Security and Recovery

The Security and Backup/Recovery policy proposals by CERN T0 have been agreed at the last GDB also for database backup at Tier 1 sites. The main points are:

-          Sites are responsible for applying security patches in agreement with 3D project

-          RMAN based backup and recovery required as of April at all sites

-          Retention period of at least 30 days for full and incremental backup data

4.3         Resource Requests and Experiments Predictions

 

ATLAS and LHCb

 

Slide 4 shows the requests and predictions until 2008/2009 in terms of database space for ATLAS and LHCb.

 

Before July 2007: The table show different phases and milestones.

-          In next 6 months no new requests, same number of CPU and disk space.

-          All sites should implement it and split the service for ATLAS and LHCb on different nodes.

 

Before Nov 2007:

-          ATLAS would like all sites to upgrade to 4GB on 64bit DB server.

-          LHCb would like all sites to have 2 LFC r/o servers in place.

 

From November 2007:

-          ATLAS and LHCb propose to double the space at each site (1.0 TB and 0.3 TB respectively).

-          New nodes maybe be requested in May 2007 and should be ready by then.

 

After June 2008:

-          For TAGS data up to 6.0 TB for a nominal year of ATLAS. This is still being verified by ATLAS.

 

G.Merino asked whether the LFC servers for LHCb should be dedicated or can be shared with other VOs.

Ph.Charpentier replied that at CERN and CNAF the rate is really not high. One could start with 1 LFC server and monitor the load and performance. Sharing it among VOs also needs to be tried out. LFC will be tested as soon as it is available at the site.

D.Duellmann added that the LFC servers will be added to all Tier-1 sites and made available to the Experiments in the next weeks.

 

CMS

The squid installations at Tier-1 are at the same level as last year:

-          100 GB per Squid and 2-3 Squids per site.

-          About 10 users per squid, and > 100 for all sites included.

4.4         Implementation at the Tier-1 Sites

All phase 1 sites in ATLAS and LHCb tests: ASGC, BNL, CNAF, GridKA, IN2P3 and RAL

Also TRIUMF and SARA (phase 2) have finished their clusters and are joined to streams setup

 

-          PIC has hired a DBA and set up one two node cluster

-          NDGF has set up a single node DB and is acquiring the h/w for a cluster

 

All sites are now progressing and there will no longer be a distinction between Phase 1 and 2 sites. One can expect all 10 database Tier-1 sites to be available for experiment production in April.

Some of the sites do not yet have the final number of nodes/experiment splitting implemented:

-          Experiments need to be prepared to distribute DB load according to the available resources

-          Sites should be prepared to add additional nodes, storage and server splitting to fulfill the request from November ‘05

4.5         Summary

Expect all 10 Database Tier 1 sites to be available for ATLAS & LHCb conditions testing in April. Experiments s/w ready for accessing Tier 1 replicas from experiment jobs submitted via the grid.

 

No major resource upgrade requested by the experiments until August. But sites should be prepared to fully implement the request from Nov ‘05

 

The first realistic workload against the Tier 1 servers will soon arrive. And this will provide input for next resource review in May, in order to finalise the experiments’ requests for the setup of November 2007.

 

The Database Administrator Workshop @ SARA (March 20-21) will finalise configuration, monitoring and deployment procedures

 

 

5.      FTS V2 Status (Slides) – G.McCance

 

 

Postponed to next week.

 

6.      AOB

 

-          Reliability Data Summary for February 2007

 

The Reliability Summary was republished. Now the values are all correct for February. The report has applied the new reliability calculation to previous months, thereby changing the individual site averages for these months.  

 

The MB agreed that this was the correct approach, in order to allow a direct comparison of the current month with the historic data.

 

-          MB end of August before CHEP

 

The GDB is on the Friday before CHEP. L.Robertson proposed to make an MB on Tuesday at 4 PM (Geneva) as usual. The MB agreed.

 

-          Referees Meeting on Monday

 

Each experiment should present their requirements and how they calculated them.

The other points in agenda are “CASTOR Status and Plans” and the “SRM 2.2 Status”.

 

7.      Summary of New Actions

 

 

No new actions.

 

The full Action List, current and past items, will be in this wiki page before next MB meeting.