LCG Management Board

 

Date/Time:

Tuesday 31 January 2006 at 16:00

Agenda:

http://agenda.cern.ch/fullAgenda.php?ida=a057120

Members:

http://lcg.web.cern.ch/LCG/Boards/MB/mb-members.html

 

(Version 1 - 2.2.2006)

Participants:

A.Aimar (notes), D.Barberis, I.Bird, D.Boutigny, N.Brook, T.Cass, Ph.Charpentier, B.Gibbard, J.Gordon, H.Marten, P.Mato, G.Merino, B.Panzer, L.Robertson, J.Shiers, Y.Schutz

Action List

https://uimon.cern.ch/twiki/bin/view/LCG/MbActionList

Next Meeting:

Tuesday 7 February 2006 from 16:00 to 1800

1. Minutes and Matters Arising (minutes)

 

1.1 Minutes

No feedback received. Minutes of the previous approved.

 

1.2 Quarterly Reports Expected

Note: After the meeting all QR have been received.

 

They are now under review. F.Carminati and F.Hernandez accepted to help for the review of the QRs for this quarter.

 

2. Action List Review (list of actions )

 

  • End 2005 - ALICE and CMS provide a more “Tier-1 accessible” description of their models, data and workflow.

ALICE sent it after the MB meeting. CMS still missing.

The documents are all accessible from this page under the heading Simplified Computing Models.

 

  • 15 Jan 06 - Areas, experiments and sites representatives complete the Quarterly Reports and send it to A.Aimar.

Done. All reports are available at the wiki page LCG/QuarterlyReports

 

  • 20 Jan 06 - ASGC, INFN, PIC and RAL provide a plan for the deployment of CASTOR 2 at each site.

On the way.

RAL’s CASTOR 2 milestones are included in their general site milestones. PIC sent a plan after the meeting. Missing for ASGC and for INFN.

 

  • 31 Jan 06 - B.Panzer will discuss with the experiments and present to the MB a plan with possible dates and resources for the experiments and for IT activities (performance tests, etc).

On the way. A note has been written, is currently cross-checked and will be sent to the MB members in the beginning of next week.

 

  • 31 Jan 06 MB - J.Gordon prepares a presentation on the situation of grid accounting.

Will be presented at next MB (7 Feb 2006)

 

  • 31 Jan 06 - D.Foster should find information about the GEANT2 plans and send it to the MB. Needed to complete definition of milestone OPN-2

 

Update from D.Foster:

 

The plans for the GN2 wavelengths to connect to the T1 centers aim to
have most centers connected with dedicated 10G bandwidth by mid-year.
The exception seems to be PIC where full connectivity is not expected by
Q4.
 
The pricing for dedicated connectivity has still not yet been agreed
formally by the NREN policy committee but the urgency is understood. In
the meantime some sites will be connected anyway with CNAF already
complete and fzk coming.
 
In terms of the OPN then, the following sites are "complete"
SARA - 10G provided by Surfnet but transition to GEANT expected.
IN2P3 - 10G provided by RENATER owned dark fiber.
CNAF - 10G connected via GEANT
FNAL - 10G link available but will share with other US transit traffic.
 
Coming soon
FzK - Currently GEANT IP service. 10G dedicated will be provided by
GEANT.
BNL - Currently 10G transatlantic link in place, 10G "last mile" via
ESNET expected soon.
 
Mid-Year
RAL - Currently 2x1G. Will go to 4x1G. SuperJanet4 should complete
UKERNA 10G link to RAL and then via GEANT to CERN.
NDGF - Currently GEANT IP service. Copenhagen 10G link via GEANT will be
transited to distributed T1 by Nordunet.
TRIUMF - Currently 10G to amsterdam in place, transited via SURFNET to
CERN. Will pass to GEANT transit from Amsterdam.
 
End-Year
PIC - Currently 1G shared, will become 1G dedicated. 10G via GEANT
(maybe via drop off of Madrid dark fiber in Barcelona)
 
Unclear
ASCC - Currently 2x1G. Plans to go to 2x2.5G
 
Current aggregate bandwidth from CERN is currently 70Gb/sec (8.75
GB/sec).

 

  • End Jan 06 - L.Robertson: Organize phone meeting with the LHCC referees, end of January.

Done. The organization of the LHCC meeting is explained in this MB meeting.

 

  • 31 Jan 06 - L. Robertson will discuss with the areas and services managers in order to define measurable metrics for Phase 2.

On the way.

 

 

3. Note on SC4 planning (L.Robertson's email) (more information)

Summary of feedback received.

 

A note, prepared by L.Robertson, I.Bird and E.Laure, explained the process for the SC4 Planning and its relationship with the work of the TCG. The document attached includes the new version taking into account the feedback received.

 

The changes are highlighted in blue and they clarify that:

-          after the workshop the conclusions will be approved formally by the GDB, prior to presentation to the MB

-          another major release will have to be scheduled for October 2006, between the end of SC4 and the commissioning of the system in April 2007

 

 

4. SC4 Planning (documentstransparencies)

Status update by F.Donno

 

F.Donno provided a summary of her work on SC4 planning.

 

4.1 SC4 Middleware Plan (HTML)

 

It specifies in detail all services that should be able to be part of the “gLite 3.0” distribution, ready for testing by the end of February.

The colours on the table cells mean:

-          green: available in production in June 06, i.e. the features of LCG 2.7 and  gLite 1.5 that are in certification now

-          orange: will not be available in time for SC4 but code scheduled for delivery during the first few months of 2006 – a testing and deployment plan should be elaborated for testing in parallel with SC4 for introduction later in the year (this is mainly the SRM 2.1 implementation)

-          white: not planned for SC4 – development plan still to be made

 

The features were not discussed in detail. The experiments will comment the features list further and currently are preparing their plan in terms of what will be available.

In response to a question about which critical features were missing from the “green” list the following were mentioned:

-          Castor and DPM should use the same version of RFIO, requiring changes of namespaces and protocol names.

-          There should be a unique way of scheduling FTS transfers, independent of the end points (currently the experiment itself has to understand the configuration and issue different requests at different sites).

 

The features in orange are about the SRM 2 implementations. The MB stressed the fact that all implementations (CASTOR2, dCache and DPM) should be tested, individually and then together, in great detail before installing them in any production system.

 

4.2 SC4 Sites Plan (HTML)

 

The document will contain, for each site, the services that will be deployed and which are the experiments that need such services.

At the time of the meeting feedback from only a few sites had been received; therefore only the data from FZK is currently in the document linked above.

 

Some sites are going to install SRM1 and SRM2 and it will be important to test how the two systems can coexist.

 

4.3 Pre-production Phase
Every new service will be available on the pre-production service. The pre-production service will be available in order to allow the experiments applications to run and thoroughly test the software before it is released to the sites. The pre-production service, in order to have a realistic testing environment, will have access to real (production) storage systems.

 

The middleware installations will only be done on the sites in May, after the pre-production phase (March-April) is over and the software is released and fully certified.

The throughput tests will be done in April, even if they will not use the latest versions of the middleware. This will still be very useful and will check that the site hardware infrastructure and operations procedures can reach the target rates.

 

The gLite RB and the LCG RB will be both available, but further discussions (at the CHEP workshop) and tests need to be done, in order to verify the status and features of the two implementations, and decide on a parallel deployment strategy.

 

Note

The SC4 plans are always reachable from the LCG Planning page: https://twiki.cern.ch/twiki/bin/view/LCG/Planning

Following the links for “SC4 Middleware Plan” and “SC4 Sites Plan”.

 

5. SC3 Disk & Tape Tests (more information)

Update on the ongoing tape tests, review of data rates achieved to sites vs. MoU targets, other major SC activities.

 

Note: Better viewed in “Slide Show” mode.

 

5.1 SC3 Re-run – Individual Site Throughput Tests

 

The main goals of the SC3 re-run were:

-          Get data rates at all Tier-1 sites up to the values in the MoU

-          Re-deploy the required services at sites

-          Uncover remaining use cases, in order to define milestones and tests at future workshops in Mumbai ( for Tier-1 sites) and at CERN (for Tier-2 sites)

 

Many sites met or exceeded the nominal rates required for SC4; the others exceeded the SC3 target rates. Stability of the services needs to be improved and will be the focus of SC4.

 

5.1 SC3 Re-run Tape Tests and SC4 Throughput Tests


BNL, DESY (Tier-2), GRIDKA, IN2P3, PIC, SARA, TRIUMF will execute the tape tests.

FNAL and RAL had already achieved SC3 tape targets.

ASGC, INFN and NDGF will send their plans in the near future.

 

During the tape tests, in order to measure the real amount of “data to tape”, the sites will have to monitor what is really stored on the tapes, and what is handled in the disk buffers (no central monitoring available).


As mentioned, the SC4 throughput test will be in April with the technology available at that moment (e.g. CASTOR 1 for the sites running that SRM service).


The target throughput rates for the future Tier-1/Tier-1 and Tier-1/Tier-2 tests will be discussed at the CHEP workshop. Experiments have already said that they will need a rate of about 10 MB/s to each Tier-2.

 

6. Meeting with the LHCC Referees (more information)

Preparation for next Monday

 

6.1 Agenda and participation

 

The agenda of the meeting is here: http://agenda.cern.ch/fullAgenda.php?ida=a057187

The participation to the meeting is open to all MB members.

 

6.2 Topics of the Meeting

 

At the last review two of the major concerns were CASTOR 2 and Grid Services Metrics.

 

CASTOR2 Progress

-          Status of the SRM 2.1 Development

-          Status of Migration of LHC Production to Castor 2 

-          750 MB/sec Data Recording Milestone

-          ATLAS Tier-0 tests

 

Grid Service Metrics 

-          Proposal for Base Reliability Measures to be followed by the LHCC
(the slides for the metrics will be circulated to the MB before the presentation).

 

A summary of SC3 will also be provided by the experiments and the sites.

SC3 Expectations and Results 

-          Summary of the Experiment Experience (N.Brook)

-          Site Experience and SC3 throughput Test Results (K.Bos)

 

The last point will be an update on the process defined for the SC4 Planning.

 

7. AOB                                                                                                                                                          

 

 

Next week MB meeting will be a face-to-face meeting, but phone connection will be available as usual.

 

 

8. Summary of New Actions

 

 

No new actions.

 

The full Action List, current and past items, will be in this wiki page before next MB meeting.