LCG Management Board

 

Date/Time:

Tuesday 22 November 2005 at 16:00

Agenda:

http://agenda.cern.ch/fullAgenda.php?ida=a057098

Members:

http://lcg.web.cern.ch/LCG/Boards/MB/mb-members.html

 

(Version 4 - 6.12.2005)

Participants:

A.Aimar (notes), D.Barberis, L.Bauerdick, I.Bird, K.Bos, D.Boutigny, N.Brook, T.Cass, Ph.Charpentier, Di Quing, I.Fisk, D.Foster, J.Gordon, H.Marten, P.Mato, M.Mazzucato, G.Merino, B.Panzer, J.Shiers, Y.Schutz, L.Robertson (chair)

Excused:

B.Gibbard, M.Lamanna

Next Meeting:

Tuesday 29 November 2005 at 1600

Minutes and matters arising

 

Minutes of last two MB - 8 and 15.11.2005 (see minutes)

Minutes of both meetings approved.

 

Duration of the MB meetings

The MB decided that the meetings should last one hour.

Participants should connect at 4 PM in order to start in time.

 

Status of the VO Box meeting

The dates agreed are 24-25 January 2006, location still to be decided.

 

Participating sites are: IN2P3, PIC, FZK, RAL and FNAL (but the dates are not convenient for FNAL and it may not be able to attend).


This meeting must clarify all issues related to VO boxes. Therefore sites and experiments should be represented by appropriate persons for such discussion. All arrangements will handled directly with the experiments and sites by Jeff Templon and Cal Loomis.

 

Actions Review

 

  • ATLAS will send a proposal about reporting on issues concerning non-EGEE resources to the MB.

A proposal was sent, will be discussed with J.Shiers and I.Bird.

 

  • CMS will send a similar statement to formally confirm the CMS opinion on OSG reporting.

Still to be done.

 

  • Tier-1 Sites (at GDB) should complete their plans following the guidelines of the presentation given at the GDB.

Few sites have sent their plans updated, others have asked to postpone until December because this matches the time by when they will make their internal plans. A couple of sites did not reply.

 

Feedback from the Comprehensive Review (documents )

 

Summary of concerns (document )

 

Summary of the concerns presented by the referees to the LHCC after the Comprehensive Review.

 

Discussion


- Lack of quantitative data


The referees want few numbers, but very relevant to track progress in all services and areas. The LCG needs to build this set of metrics and then start explaining them with the referees during the LHCC meetings.

Action: J.Shiers has initiated a team to define metrics to measure site reliability and availability, data transfer capability, and overall grid performance . He will distribute a proposal on this to the MB for input from the sites and services, before 6 December 2005.


Next LHCC meeting is early February, at the same time as CHEP.  Therefore we need to organize telephone meeting with the referees, one week before CHEP.

 

General tests of availability should be independent of the implementation (e.g. catalog availability and similar services) and work for all grids and sites.

SFT can run on different grid implementations and should provide a set of tests common to all sites. In addition there may be different tests depending on the grid infrastructure and VO that a site has to support (e.g. FTS service metrics and RB metrics apply to VO that are using the FTS and RB). L.Bauerdick noted that the SFT framework could already be used with Tier-1 sites in OSG, and that there is a proposal to deploy the necessary information providers on the Tier-2s before the end of the year.

Sites can add their specific tests which will be executed in addition to the standard SFT tests.

Action: People responsible for each area and service should also define some measurable metrics to follow during Phase 2.


Action: L.Robertson - Organize phone meeting with the LHCC referees, end of January.

- CASTOR2 issues

  • Detailed migration plan to complete the basic cycle of tests and committed to tight collaboration between developers and experiments.

This is an item for later in the agenda [postponed to next week’s meeting]

  • Redundancy and performance issues have not yet been properly tested, in the opinion of the reviewers.

 

  • Documented plan for distributing CASTOR 2 to other sites to provide SRM 2.1 support for SC4


A person from the CASTOR team will visit CNAF at the beginning of December to investigate the installation of CASTOR 2 there. RAL is working on the development of SRM 2.1 and investigating the installation of CASTOR 2.
PIC and ASGC will be contacted soon.

 

The MB will follow the planning and progress closely for the next few months. A.Cass said that he plans to organise a CASTOR 2 Readiness Review, foreseen for March 2005.


Action: The MB will review the situation of CASTOR and its plans in the middle of December. By that time the plans for distributing Castor should have been defined.

- 3D database comes very late

The referees are concerned about the involvement and effective support of the sites for the 3D plan. For instance, what is the commitment to the Oracle solution or to other solutions that are being developed?


INFN: Supports the replication using Oracle, which has been tested between CNAF and CERN,

 

ATLAS: Committed to the 3D project.


3D is for the first six month of production starting in March support
two different distribution techniques with complementary deployment
features:

- Oracle Streams for data transfer between online and offline (ATLAS,  CMS,
   LHCb) and between Tier 0 and Tier 1 (ATLAS, LHCb). This solution offers
   transactionally consistent replication (asynchronous) but  requires a database
   services to be offered by the sites. ATLAS is planning to complement this
   with POOL and/or Octopus based data copies into MySQL/SQLight  databases
   at Tier 2.

- FroNtier/squid (CMS) for read-only  transfers between Tier 0, 1 and 2
   and distributed caching of database data. This approach does not  require
   to run a database server outside CERN (deployment of squid is  considered to
   be minimal effort) but the cache consistency impact on  applications (possibly
   stale data in squid caches) needs validation by the experiments.

Both technologies are considered by ATLAS, LHCb and CMS as fallback  for their
initially different baseline.


SARA: Has purchased Oracle licenses and all hardware needed. No problems with the 3D plan.

PIC: Not aware until recently of the existence of the 3D project and of the need for Oracle on the Tier-1. PIC will support the solution but the schedule will be tight.

RAL: No problems, production equipment is being bought.

FZK: Working in the 3D project testing Oracle and will look into FroNtier.

 

There was some discussion about the preference of CMS for a solution different from that selected by the other experiments. Operating two services at Tier-1 centres would generate an additional load. L.Bauerdick explained that the FroNtier/Squid system was straightforward to deploy and support, and that this had bee extensively discussed and agreed. M.Mazzucato considered that nonetheless this required sites supporting CMS and other experiments to expend effort investigating and operating the service.

Action: Before 6 December, we need the final plan of the 3D project. D.Duellmann should contact PIC and CNAF to ensure that they understand fully what is being planned (including replication at Tier-1 and Tier-2 sites and clarifying the usages of Oracle, Frontier, Squid, mysql, etc.)


INFN proposes to try a duplication of a full database in order to check the general replica mechanism, and to learn Oracle replication.


The implications to run Frontier and Squid and what additional work for the sites is needed should also be clarified.

- Application Areas issues

 

Few issues from the referees.

- SEAL/ROOT:
The major concern is the move from SEAL to ROOT. The referees encourage the experiments to move quickly to ROOT and therefore allow as early as possible the end of the SEAL support. While P.Mato considers this a message to the experiments, P.Charpentier thinks that the Applications Area should take the initiative of ensuring that the equivalent functionality is available and setting the timetable for migration.

- PROOF: The referees consider that the major investments in PROOF should be matched by a decision by several experiments  to use it.

 

ALICE is already using the current demo set-up.

ATLAS needs first to discuss this at the AF to see the implications on the applications.
CMS will participate to the PROOF test bed.
LHCB sees no use case for this now, but is interested to see the results and learn from the tests of ALICE and CMS. 

The interest in PROOF should be clearly stated by the experiments. F.Rademakers is in contact with the experiments and the goals and plan for the evaluation will be presented to the MB. The current proposal is for a facility on a single centre for a service to have experiments to try it. In order to know what is possible to deploy and to maintain in the long term, people running services should be involved in the discussion.


Action: Presentation and demonstration of PROOF at the GDB in January, in order to have an open discussion in the LCG.

 

Action: Preparation of the goals and plans for the PROOF evaluation.

- Fabrics Area Issues

 

- should improve sharing experience between Tier-1 centers

MB agrees that this is very important, and Hepix is doing well this function. Sites have been active in Hepix, and there have been several presentations.

 

Sites are encouraged to discuss directly the details on how they are setting up operations and services support.

 

- LHCC wants to understand and monitor fabric strategies


The referees should be presented with more details than in the past. The milestones collected from Tier-1 sites could also be presented to the LHCC in January.

- Procurement issues

 

CERN, IN2P3, FNAL presented their acquisition procedures. FNAL has a very flexible process which enables the price to be fixed much later than in the longer European procedures. Possibilities for improvement at CERN will be investigated again, but it is not clear how it could be changed.

 

As the hour was up - the remaining sections of the paper (Grid Deployment, Service Challenges, Management) will be continued at the next meeting.

 
Action: In order to make a list of action and react to the recommendations, the MB should look at the document and at the LHCC reviewers’ presentations. Comments are expected before next meeting. 

 

 

 

Milestones Issues

 

The items below are postponed to the next MB meeting

 

 

 

AOB

 

No AOB.

 

Summary of the Pending Actions

 

New Actions:

 

  • J.Shiers has initiated a team to define metrics to measure site reliability and availability, data transfer capability, and overall grid performance . He will distribute a proposal on this to the MB for input from the sites and services, before 6 December 2005.

 

  • L.Robertson - Organize phone meeting with the LHCC referees, end of January.

 

  • People responsible for each area and service should also define some measurable metrics to follow during Phase 2.

 

  • The MB will review the situation of CASTOR in the middle of December. When the plans for distributing Castor should have been better defined.

 

  • Before 6 December, we need the final plan of the 3D project. D.Duellmann should contact the sites to explain what is being planned (including replication at Tier-1 and Tier-2 sites, and clarifying the usages of Oracle, Frontier, Squid, mysql, etc.)

  • F.Rademakers - Presentation/demonstration of PROOF at the GDB in January, and preparation of the goals and plans for the evaluation.

  • In order to make a list of action and react to the recommendations, the MB should look at the document and the LHCC reviewers’ presentations. Comments expected before next meeting.

 

 

The full Action List, current and past items, is updated at this wiki page