LCG Management Board

Date/Time:

Tuesday 21st August 2007 16:00-17:00 - Phone Meeting

Agenda:

http://indico.cern.ch/conferenceDisplay.py?confId=17197

Members:

http://lcg.web.cern.ch/LCG/Boards/MB/mb-members.html

 

(Version 1 24.8.2007)

Participants:

I. Bird, T. Cass, Ph. Charpentier, L. Dell’Agnello, F. Donno, T. Doyle, M. Ernst,  X. Espinal,

I. Fisk, S.Foffano (notes), P. Fuhrmann, J. Gordon, A. Heiss, M. Kasemann, J. Knobloch,

M. Lamanna, R. Tafirout., P. McBride, D. Petravick, R. Pordes, L.Robertson (chair), J. Shiers,

O. Smirnova, J. Templon

Action List

https://twiki.cern.ch/twiki/bin/view/LCG/MbActionList

Mailing List Archive:

https://mmm.cern.ch/public/archive-list/w/worldwide-lcg-management-board/

Next Meeting:

Tuesday 28th August 2007 16:00-17:00 - Phone Meeting

1.    Minutes and Matters arising (Minutes)

 

1.1      Minutes of Previous Meeting

 

There were no comments on the last minutes.

1.2      Matters arising

1.2.1       SRM 2.2 dCache clients for CMS

On the issue of dCache specific SRM clients for CMS, M. Kasemann did not have final answers due to absences, therefore this action is postponed again.

1.2.2       SAM for OSG and NDGF

L. Robertson reported that this action which has been outstanding for some weeks was waiting for agreement on the composition of the test comparison teams. It is now agreed for the OSG tests that J. Casey, D. Collados & J. Templon will do the assessment, and for NDGF P. Nyczyk,  D. Collados & J. Templon. Work has started with OSG and a document has been produced and circulated by D. Collados on 20/08/07 summarising critical tests and a first comparison with the OSG tests.

L. Robertson requested OSG and NDGF to consult the document and start the process.

1.2.3       CMS Resource Requirements

L. Roberston asked if CMS resource requirements were available. M. Kasemann confirmed that a discussion would take place during the week at the CMS MB. L. Robertson requested that the information be sent to C. Eck with a copy to S. Foffano.

 

2.    SRM 2.2 Roll-out update (Report) - F. Donno

 

 

F. Donno presented the site reports and the current SRM 2.2 status.

 

dCache version 1.8.0.8 which was used by most sites did not allow space configuration as required by LHCb. A fix was provided by the dCache developers and installed, but the documentation available for the space reservation was insufficient therefore IN2P3, FZK and Edinburgh were unable to configure their space. Other dCache sites SARA, BNL and NDGF are also having difficulty to configure their sites. F. Donno is assisting with tests however problems still remain.

P. Fuhrmann commented that documentation will be delivered within the next days as the strategy had been to deliver the functionality first, then provide sufficient documentation which should help dCache sites to configure correctly. 

 

Castor at CERN and CNAF: support was not possible on the open issues list due to absences however this has now started to pick up again and most issues are now solved at CERN and CNAF with a few remaining outstanding issues (speed of get requests and too many threads problem). Now both CERN and CNAF can be used for preliminary tests not including gfal as it is still not working. 

 

L. Robertson questioned what had happened and expressed concern that understanding the configuration problems could only be done by one person who happened to be absent. T Cass remarked that this fell at a time when the CASTOR team were delivering 2.1.4 support which provides improved tapexdisk1 support that is a high priority for ATLAS. He added that it had not been made clear to him that this issue was blocking progress, or of its urgency, therefore 2.1.4 support for ATLAS was kept as the priority within the CASTOR team, particularly as there was a 2.1.4 deadline to meet during that period.

 

J. Gordon asked if LHCb were the only ones that had planned to test during this period, which was confirmed by F. Donno adding that ATLAS is scheduled to test in September, and CMS late October.

 

F. Donno concluded that no dCache sites are configured correctly, and as of 21/08/07 2 CASTOR sites (CERN and CNAF) “more or less” work. Following an enquiry about progress at BNL by L. Robertson, M. Ernst announced full support for the tests, and close collaboration with P. Fuhrmann to move to SRM version 2 as soon as possible and before the end of 2007. Tests have been running for some weeks and BNL are moving towards a functional test set as soon as possible. P. Fuhrmann added for standard tests that although all systems are configured identically, the log file records all tests as passing even if they do not. This problem is being discussed with F. Donno. M. Ernst added that there was good collaboration and he was confident that the situation would be more clearly understood within the next few days.

 

F. Donno explained the status of the experiment pre-testing (SRM-10) and the high level tool tests (SRM-14: lcg-utils, gfal and FTS latest version). A back-up team has been put together covering for people on vacation. This team have discovered bugs in these tools which are summarised on a Wiki page.  New lcg-utils and gfal versions are imminent and should fix some of these bugs. A new patch for FTS is being tested.  F. Donno concluded that for SRM-10 and SRM-14 it had been difficult to debug the problems with lcg-utils and gfal as there had been no site available to test against, however a number of CASTOR tests had been performed on 21/08/07.

 

SRM-11 stress tests have been resumed on CASTOR however the SRM internal error (too many threads) is blocking. The stress test suite has now been distributed to dCache developers. P. Fuhrmann confirmed that the stress tests are running against BNL, Fermilab and Desy sites. T. Cass commented that S. De Witt was actively working on the problems blocking the stress testing of CASTOR.

 

SRM-16-B – ATLAS and CMS plans for  tests.  ATLAS provided details on the plan, however nothing had been received from CMS.

 

SRM-17 – F. Donno remarked that LHCb cannot start testing as no dCache sites are working. Limited FTS transfer tests can only be made between CERN and CNAF.

 

A TWiki page gives the status of the open issues. A meeting of CERN with the dCache developers and some of the dCache sites took place on 16/08/07 to discuss open issues and the consequences on the availability of the sites and the tests performed by the experiments.  Minutes are available. Space reservation implementation differences were observed in dCache for tapexdisk1 storage classes, with respect to other SRM v2 implementations, and a new timescale is expected for this to be fixed.

 

Finally F. Donno reported that for CASTOR, DPM and StoRM issues had been found and reported, and the key developers were already back from vacation or back next week and would work on fixing them.

 

L. Robertson thanked F. Donno for the report and asked for comments or additions. Ph. Charpentier asked if discussions could take place at the WLCG Workshop in Victoria on compatibility and SRM implementation issues. He remarked as SRM is a layer hiding the underlying technology, it would not be desirable to have specific actions depending on the technology therefore discussions should take place between the CASTOR, dCache and DPM experts to resolve this. J. Shiers confirmed that this could be organised in the context of the workshop.

 

L. Robertson commented that the plan has slipped by 3-4 weeks since it was agreed on 10 July, as LHCb was expected to start testing in early August and the timescale for availability of the first CASTOR and dCache sites at best were only in a few days. He expressed concern that a plan could be agreed to at the beginning of July and then not adhered to during the next month. Although this coincided with the holiday period, holiday plans must have been know in advance.

 

Concerning dCache, P. Fuhrmann confirmed that all functionality is now present and once the documentation has been improved he was confident that things could start moving quickly.  

J. Templon confirmed that the key people were now available after the vacation period at SARA and asked F. Donno to report to him if there were response problems.

D. Petravick reiterated that liason between the WLCG and OSG was useful and should be improved noting that WLCG is ahead of OSG. He added that no gap analysis has been done comparing the number of people who should be working on storage systems against the number of people actually working on them, and that those people are working very hard and to their limits.

 

R. Pordes commented that OSG as a whole is concentrating on integrating dCache 1.7 with SRM 1.1 in VDT and is not planning to deploy SRM 2.2 and dCache 1.8 in production across OSG until February 2008. I. Bird questioned what ATLAS T2s in the US support and it was confirmed that T2s supported by OSG deploy dCache 1.7. I. Bird asked for clarification from ATLAS and CMS as to how in the US it is acceptable to have the version decided by OSG, whereas in Europe it has to be the version ATLAS requires. Concerning the February timescale, M. Kasemann confirmed that for CMS it was acceptable and I. Fisk commented that it was rather late, but acceptable especially as it was for T2s. L. Robertson concluded that the separate planning process in the US should also be discussed at the workshop.

 

While waiting for the workshop L. Robertson questioned if 3 European sites starting SRM 2.2 implementation simultaneously was too ambitious, or should one site be selected and concentrated on as had been done for BNL. P. Fuhrmann confirmed that he is planning to concentrate on FZK who will go into production in mid-October. There is an agreement to send an SRM developer (Timor Perelmutov) and a dCache developer to FZK while they are upgrading. NDGF and BNL provide their own dCache developers so they can upgrade simultaneously.  P. Fuhrmann also confirmed that the documentation will be ready this week including details for configuration, and he expects that this will enable FZK and IN2P3 to set-up to be ready next week so LHCb testing can start before CHEP.

Ph. Charpentier commented that key people are preparing for CHEP and the high level tools need to be ready with the required functionality for LHCb to start testing.

 

L. Robertson concluded that an amended plan is needed which will have to be presented at the referee meeting on 24th September. He suggested an SRM Management weekly phone-call for people from the key sites involved in the development and deployment to ensure important issues are highlighted and understood as quickly as possible.

 

3.    SRM 2.2 dCache status – (Report) - P. Fuhrmann

 

 

L. Robertson commented that most of the points had been covered in the above item, but asked if there were any further issues to raise.

P. Fuhrmann questioned the impact of the interpretation of the space token behaviour in case it would impact the LHCb testing as it may take some time to fix. F. Donno confirmed that this was an ATLAS requirement.

 

P. Fuhrmann reiterated that the SRM 2.2 production upgrade plans will start with NDGF and FZK and BNL in mid-October and asked if this was still possible. L. Robertson remarked that the original schedule was for mid-October for FZK, but as we have already slipped by at least three weeks this might be too ambitious. Following discussion about the planning, L. Robertson concluded that October 15th would be a good date to aim for; however there are many unknowns between now and then to be understood before we could decide how realistic it is.

 

4.    AOB

 

-                

-       L. Robertson announced that we are preparing to start reporting the results of SAM tests at T2s. Draft results will be sent out to see if things are mature enough to go ahead.

-                

-       July Tier-1 accounting - some comments were received and L. Dell’Agnello’s late submission will exceptionally be accepted. A new version will be sent later in the week. J. Gordon has attached a paper to the agenda showing the discrepancies for July.

-        

-       Tier-2 accounting reporting – several sites have already given feedback to S. Foffano, any more feedback should be sent as soon as possible. A new draft will be sent for August data, and the September report will be shown at the C-RRB on 23rd October.

-        

-       Next meeting - In view of the SRM discussion, a short MB meeting will take place on 28/08/07.

 

-                

5.    Summary of New Actions

 

 

New Action: J. Shiers to organise discussions at the WLCG Workshop in Victoria on compatibility and SRM implementation issues and the US planning process.

 

New Action: L. Robertson to make an SRM Management meeting written proposal following discussion with J. Shiers and F. Donno.

 

The full Action List, current and past items, will be in this wiki page before next MB meeting.