LCG Management Board

 

Date/Time:

Tuesday 29 November 2005 at 16:00

Agenda:

http://agenda.cern.ch/fullAgenda.php?ida=a057105

Members:

http://lcg.web.cern.ch/LCG/Boards/MB/mb-members.html

 

(Version 2 - 2.12.2005)

Participants:

A.Aimar (notes), D.Barberis, L.Bauerdick, K.Bos, D.Boutigny, N.Brook, T.Cass, Ph.Charpentier, I.Fisk, D.Foster, B.Gibbard, J.Gordon, M.Lamanna, H.Marten, P.Mato, G.Merino, Di Qing, H.Renshall, L.Robertson (chair)

Next Meeting:

Tuesday 6 December 2005 at 1600

Minutes and Matters Arising

 

Minutes of Last Meeting ( more information )

 

Minutes of the 22nd November 2005 approved, pending one point from CMS:

 

---------------------------------------------

From: bauerdick@fnal.gov
Hi Alberto
in the minutes there is a paragraph which has some misunderstanding
 
CMS:  In the GDB the 3D plan had two baselines: replication using Oracle to Tier-1, 
and replication still undefined to Tier-2 centers. 

CMS uses data streaming of Oracle between online and offline, and this is being 
tested with the help of the IT team at CERN.  For distributing read-only data 
using replication of queries, CMS is evaluating a solution base on FroNtier and 
Squid and is working on this with the 3D project and some regional centers.  
CMS will decide in Spring 2006 which technologies to use.
 
I think the two baselines of LCG-3D are 
 
1) replication using Oracle to Tier-1, and while the replication to Tier-2 centers 
is still undefined.
2) Squid/Frontier caching of queries to the central Oracle instance at CERN
 
CMS is proactively pursuing the 2nd, Atlas the 1st. Depending on the results we 
will use 2), 1) or a combination of the two (e.g. stream replication to some 
Tier-1 centers and squid/frontier caching to Tier-2 centers, or whatever!)
 
All this is happening as part of the LCG-3D project.
 
        Cheers, LatB

---------------------------------------------

 

Action List Review ( list of actions )

 

Actions due now

29 Nov 05 - ATLAS will send a proposal about reporting on issues concerning non-EGEE resources to the MB.

29 Nov 05 - CMS will send a similar statement to formally confirm the CMS opinion on OSG reporting.

 

ATLAS and CMS have sent text and L.Robertson will integrate it and distribute it to the MB.

 

29 Nov 05 - MB sends feedback on the GDB summary.

 

No comments to the GDB summary. Action closed

 

29 Nov 05 - In order to make a list of actions and react to the recommendations, the MB should look at the document and the LHCC reviewers’ presentations. Comments expected before next meeting.

 

No comments received. Action closed

 


 

Actions due in the future

                6 Dec 05 - J.Shiers has initiated a team to define metrics to measure site reliability and availability, data transfer capability, and overall grid performance. He will distribute a proposal on this to the MB for input from the sites and services, before 6 December 2005.

 

J.Gordon noted that there is already a team defining similar metrics for EGEE. L.Robertson (in the absence of J.Shiers and I.Bird) said that he believed that this was part of the background for the work. The LCG metrics must of course focus on metrics of importance for LCG and cover all LCG sites [as discussed at the previous meeting]. H.Renshall confirmed that he is in touch with Piotr Nyczk who is coordinating the EGEE metrics work.

 

Action: J.Gordon will send more information to J.Shiers about the EGEE group on metrics.

 

Approval of Security Documents by the MB ( documents )

 

The GDB requested formal endorsement of the Security documents.

IN2P3 noted that, as had been discussed in the GDB, there are legal constraints to the information that sites are able to distribute.

FZK asks that the security group prepares a complement for section 1.5 of the Security document, with clear specifications of which data should be stored and how long should be preserved.

 

Action: D.Kelsey - The current text in Section 1.5 should have reference stating that another document will explaining how the user data must be stored and how long must be kept.

 

Decision: The MB endorses the 3 Security documents.

 

Approval Schedule and Target for re-run of SC3 ( more information )

 

On the 24th November J.Shiers sent a proposal about the targets for the re-run of the SC3 data transfers in January 2006.

 

All sites that answered have confirmed the targets with +- 20% difference from the proposed targets. This is often depending on their current network set-up and ongoing work on their network equipment.

 

Four sites have not replied yet: CNAF, NDFG, ASGC and FNAL

Action: CNAF, NDFG, ASGC and FNAL should send feedback on the proposed targets for the SC3 re-run. For next week.

 

It was pointed out that the proposal under discussion covers only disk-disk data transfers, while the SC3 throughput tests also included Tier-0 disk to Tier-1 tape. J.Gordon noted that J.Shiers has proposed a further series of tests including tape at Tier-1s for March and April.

 

Action: J.Shiers should circulate the full plan and tests for the SC3 throughput re-run. For next MB.

 

Feedback from the Comprehensive Review (15') ( documents )

 

Conclusion of the Discussion ( document )

 

Grid deployment

 

15. Should have a plan B if gLite components fail – But the basic essential components of gLite are already distributed with the exception of the new WMS. Other components are seen as a bonus rather than a necessity.


The strategy is that the current components are the “plan B”. If new components will be available they will be additional functionality, but the current component are sufficient.

Apart from bug fixes, and new version of the existing component,  the essential software needed is only:

-    the Workload Management System ( WMS) component and

-    (although these are not gLite components) support  of the SRM 2.1 interface by CASTOR, dCache and DPM



VO Boxes

 

17.        VO Boxes

a.                   wherever possible requirements should be met by general services,

b.                   but computing centres should be flexible and pragmatic enough in order to allow experiments to run.


The MB agrees with this recommendation. VO boxes will be discussed in detail in the meeting end of January.

The 2 questionnaires about VO boxes should be filled well before the January meeting.

The GDB asked several times already. (LHCb does not use VO boxes and so is not concerned)

    ATLAS, ALICE, CMS have sent the questionnaire about security.

    No one sent the questionnaire about VO Boxes operations.

 

Action:  ALICE, ATLAS and CMS should fill the VO boxes questionnaire on operations and send it to the GDB.

Service Challenges

 

19.        Concern over the reported CMS file transfer reliability – good that numbers were presented, but not clear to what extent LCG services are involved.

 

L.Bauerdick said that the underlying problems were fed into the Service Challenge weekly meeting. L.Robertson said that this had not been so clear in his discussions with J.Shiers (who was not present at this MB).

 

Action: J.Shiers - Confirm the reporting and tracking of operations problems deriving from the CMS data transfer service.

 

Management and planning

 

27. Look at re-defining experiment support as an area once the role of NA4 has been defined.


Action: A proposal to answer the recommendation on re-defining experiments support as an area will be distributed before end 2005.

 

28.         Relationship with EGEE and OSG improved - essential for interoperability and service quality, but there is no clear forum to coordinate all grid activities

 

J.Gordon noted that at Super Computing 2005 there was a meeting organized by Charlie Catlett to discuss coordination of grids, including interoperability issues. E.Laure, R.Pordes, and N.Geddes participated and their feedback was positive.

 

Conclusions

The written report will explain better the requests of the committee. And a response will be prepared for a future meeting with the LHCC referees, probably not at the next meeting as that coincides with CHEP.

 

The next review will be in September, not overlapping with Super Computing 2006, but during the first 2 days of the EGEE conference in Geneva

.

Milestones Issues

Tier-1/Tier-1 and Tier-1/Tier-2 milestones ( more information )

 

No comments received about the attached mail from RAL and IN2P3.These are examples but we should get other suggestions. SARA will send their milestones by the end of the week.

 

Action: CNAF and FNAL should send milestones for Tier1-Tier1 and Tier1-Tier2 operations. For next MB meeting.

 

Update on the SRM 2.1 development for Castor ( more information )  - T. Cass

 

The work is done at RAL, and the document described the status of the current development against the plan prepared in July 2005. The development is on schedule with the plan.

 

Verification suite
The code is tested internally, but a verification suite is being developed and is awaited by the Castor team.
This external verification suite is developed by the GD group at CERN, based on the test suite used for DPM (and therefore not entirely independent of all implementations).

 

The verification suite is intended as a test for all implementation of the SRM interface used by LCG (Castor, dCache, DPM). As in some cases the SRM 2.1 specification is ambiguous or not fully defined there is a clear risk that the three implementations may have different interpretations. These differences will take time to resolve next year. However, the SRM working group is being restarted now, run by M.Litmaath, which will begin to look at possible differences.

 

Verification by the Experiments

L.Robertson asked about the status of development of 2.1 clients by the experiments. Usage of the three implementations by the experiments at an early stage will be an important factor in resolving differences in interpretation, as well as being essential as the final step in testing. The DPM implementation is already available to be used for testing by developers.

 

N.Brook said that a DPM implementation had been requested at CERN but this had not been granted. A.Cass said that the request had not been for a test DPM environment, but for a production SE running DPM – which is very different. There does not seem to be a need to set up a test DPM environment at CERN as DPM is used at several sites where experiments can run there tests.


L.Bauerdick said that CMS would have problems with such tests due to the incompatibility between the RFIO version supported by DPM and the one used by Castor. A.Cass said that this affects only applications that need to use both Castor and DPM, and so this should not affect SRM 2.1 testing. L.Bauerdick said that the problem for CMS may be due to the use of an older version of POOL that does not support DPM.

 

In response to a comment by P.Charpentier that the different implementations of RFIO were a significant problem for production applications, requiring different versions for different sites, A.Cass agreed and said that he was preparing a plan to resolve this problem.


The experiments present (ATLAS, CMS and LHCB) all stated that they will use SRM 2.1 via ROOT or LCG utils (which in turn uses GFAL) and do not expect to interface directly.

 

Migration of LHC experiments to Castor 2 ( transparencies )

 

The plan for the migration of experiment production to Castor 2, agreed in June, has not been implemented. A new plan now has to be prepared, with stronger commitments from the experiments. An appropriate contact person had been requested from each experiment and ATLAS, ALICE and LHCB have responded.

 

Action: CMS will send a contact name for the migration to Castor 2.


Migration to Castor at different sites was mentioned in the LHCC review. A.Cass noted that the current status is:

    Visits planned to CNAF in December and RAL in January.

    PIC and ASGC still to be discussed

 

 

The items below are postponed to next MB meeting

 

    Network milestones and service availability ( more information )

    Procurement Plans Issues ( document )

 

AOB

 

Communication of experiment plans to the Tier1 centers is being discussed with several mails and a proposal on how to proceed is needed.

 
Action: J.Shiers and K.Bos will distribute a proposal on how experiments plan can be made available to Tier1 sites. For next MB.

 

 

Summary of New Actions

 

Action: J.Gordon will send more information to J.Shiers about the EGEE group on metrics.

 

Action: D.Kelsey - The current text in Section 1.5 should have reference mentioning that there will be another document explaining how the user data must be stored and how long must be kept.


Action: CNAF, NDFG, ASGC and FNAL should send feedback on the proposed targets for the SC3 re-run. For next week.

 

Action: J.Shiers should circulate the full plan and tests for the SC3 throughput re-run. For next MB.


Action:  ALICE, ATLAS and CMS should fill the VO boxes questionnaire on operations and send it to the GDB.

 

Action: J.Shiers - Confirm the reporting and tracking of operations problems deriving from the CMS data transfer service.

 

Action: L.Robertson - A proposal to answer the recommendation on re-defining experiments support as an area will be distributed before end 2005.

 

Action: CNAF and FNAL should send milestones for Tier1-Tier1 and Tier1-Tier2 operations. For next MB meeting.

 

Action: CMS will send a contact name for the migration to Castor 2.

 

Action: J.Shiers and K.Bos will distribute a proposal on how experiments plan can be made available to Tier1 sites. For next MB.

 

 

The full Action List, current and past items, will be in this wiki page before next MB meeting.