LCG Management Board
Tuesday 16 October 2007 16:00-17:00 – Phone Meeting
(Version 1 - 18.10.2007)
A.Aimar (notes), D.Barberis, I.Bird, T.Cass, Ph.Charpentier, L.Dell’Agnello, T.Doyle, M.Ernst, S.Foffano, J.Gordon, F.Hernandez, M.Kasemann, J.Knobloch, M.Lamanna, U.Marconi, P.Mato, G.Merino, R.Pordes, Di Qing, L.Robertson, Y.Schutz, J.Shiers, R.Tafirout, J.Templon
Mailing List Archive:
Tuesday 23 October 2007 16:00-17:00 – Phone Meeting
1. Minutes and Matters arising (Minutes)
1.1 Minutes of Previous Meeting
The minutes of the previous meeting were distributed only on Tuesday morning. Unless the MB members have any comment or feedback in the next few days the minutes will be considered approved.
Update: No comments received, minutes approved.
1.2 Sites Names (Document)
A.Aimar proposed that the name of each site becomes unique in all reports, document, tables, etc. Now several names are used for some of the sites.
Below is the table with a couple of proposals, but is each site that should decide.
He proposed that:
- each site chooses the name that will be used to identify the site itself
- as for the Tier-s sites, the 2-letters ISO country code is prefixed to the name
- the only character used as separator is the hyphen (avoiding slashes, underscores, etc).
L.Robertson added that the Tier-2 sites were asked their name and the name they have registered in GOCDB. The proposal is that the Tier-1 sites do the same and this will not require any change in GOCDB.
L.Dell’Agnello noted that there is “CNAF-INFN” as the Tier-1 and also a smaller Tier-2 also called “CNAF”. They will have to discuss and agree which name to choose.
The MB agreed to have the country code prefixed to all Tier-1 sites name (except NDGF, being across several countries).
The Tier-1 sites should define the name that will be used in all reports of the LCG.
1.3 LCG MB Private Web Area (Slides)
The MB had agreed that the Sites would share their 24x7 and VO Boxes SLA documents but it should be done only among the members of the MB.
A.Aimar has created a private Web Area accessible only by the members of the MB and will distribute the details after the meeting
Information sent by email after the meeting:
R.Tafirout asked whether confidential information can be put safely there.
L.Robertson replied that is a site’s decision what to share and whether remove some confidential information in the documents. But it would be useful that the 24x7 and the VO Boxes documents are shared among sites, as requested by several of them.
2. Action List Review (List of actions)
Actions that are late are highlighted in RED.
I.Bird agreed to report to the Management Board about the progress of the Job Priority working group.
He will report next week after a new meeting.
· 16 October 2007 - Sites should send the pointers to their documents about 24x7 and VO Boxes to A.Aimar. A.Aimar will prepare a protected web area for confidential documents of the LCG Management Board.
Done. A.Aimar created the private Web Area and sites can upload themselves the documents.
· D.Barberis agreed to clarify with the Reviewers the kind of presentations and demos that they are expecting from the Experiments at the Comprehensive Review.
Ongoing. D.Barberis started the discussions with the Reviewers and with the other Computing Coordinators. He will send a summary via email in the next days.
3. SRM 2.2 Weekly Update (Agenda Edinburgh workshop; dCache 1.8 deployment schedule; dCache site) – J.Shiers
J.Shiers presented the weekly update on the SRM Roll-out at the Tier-1 Sites.
The dCache 1.8 deployment schedule is now available (Link). Preparations for the production deployment at named sites, at defined dates, continues and is on track.
The outstanding issues that were blocking high-level tools were fixed with latest dCache releases. A new issue was found and is being solved.
A similar situation also for CASTOR2 SRM. The 1.1.5 release is going to be available for deployment in the coming week(s) and fixes several outstanding bugs
CNAF (CASTOR2 + STORM) is in production for ATLAS and is now available for testing by the Experiments
A recent mail from Frank Wuerthwein proposes that dCache 1.8 is ready for production deployment at (CMS) Tier2s now (forwarded by GSSD & GDB mail lists).
It is important that Experiments test the SRM V2.2 features not only by running SRM 1.1 applications by with applications using the SRM 2.2 features. This will come back later, under the CCRC'08 topic.
LHCb foresee restarting testing on Thursday this week, along the lines of the original plan:
- transfer of data
- access data from applications on the WN
- deletion of data (not originally included in our list of tests)
4. Update on CCRC-08 Planning (CCRC'08 Meetings, Slides) – J.Shiers
J.Shiers summarized the weekly CCRC’08 phone meeting held on the day before.
The goals of the first meeting were to start working on:
1. First draft of a combined scheduled
2. First draft of combined goals (as started in the CSA08 description)
Initial identification of key (existing)
services for February run
Slide 2 shows the overall proposed schedule.
Phase 1 - February 2008:
Possible scenario: blocks of functional tests,
Try to reach 2008 scale for tests for:
1. CERN: data recording, processing, CAF, data export
2. Tier-1’s: data handling (import, mass-storage, export), processing, analysis
3. Tier-2’s: Data Analysis, Monte Carlo, data import and export
Phase 2: May 2008
Duration of challenge: 1 week setup, 4 weeks challenge
Of course the Phase 1 results will be used to define the Phase 2:
- Use February (pre-)GDB to review metric, tools to drive tests and monitoring tools
- Use March GDB to analyze CCRC phase 1
- Launch the May challenge at the WLCG workshop (April 21-25, 2008)
The next F2F CCRC meetings will cover:
- Nov 6: agreement on key services & goals – including with sites; draft schedule for component testing; check-point on Explicit Requirements (ERs)
- Dec 4: progress with component testing; plans for integration testing; remaining ERs; status of site services
- Jan 8: review metric, tools to drive tests and monitoring tools; progress with integration
- Feb 12: mid-challenge assessment.
Slide 4 shows the tasks, activities, holidays, etc from now to May 2008.
Slide 5 shows the current explicit requirements from the Experiments.
J.Gordon asked how many sites are going to run SL3 for WMS in CCRC.
J.Shiers replied that CERN will do so and maybe one other site will have to do it for LHCb, if needed.
But Ph.Charpentier added that LHCb does not require any WMS outside of CERN.
J.Templon asked how many RB is ATLAS expecting to have on the Tier-1 sites.
J.Shiers replied that CCRC will soon prepare a clear list of which service should be running at which site (“what service and where” table).
J.Templon noted that “pilot jobs” limitation will also imply a change in the way ALICE is operating and therefore ALICE should be added to LHCb in the corresponding cell, in the table above.
J.Shiers then highlighted (slide 10) the fact that there cannot be “implicit requirements”. For each service VOs must specify the installations required, the level of service and the target performance to be reached in CCRC-Feb08 and CCRC-May08.
Slides 12 and 13 show two examples (CMS and LHCb) of the kind of information about targets that is needed for the preparation. All Experiments agreed to produce this kind of information at the CCRC meeting on Monday.
At the CCRC meeting was also agreed that a usual milestone plan will be initiated (by A.Aimar) and then used to plan and monitor CCRC’08.
The focus of next 1-2 CCRC meetings is to obtain the detailed target described above from all 4 Experiments:
- Week 1: equivalent of CMS targets
- Week 2: resource requirements at sites
Agendas for the next CCRC meetings (up to but not including F2F) are in Indico: http://indico.cern.ch/categoryDisplay.py?categId=1613
5. Report from the EGI Workshop (Slides) - J.Knobloch
J.Knobloch presented a summary of the EGI Workshop in Budapest and other information on EGI.
5.1 Introduction to EGI
As described in slide 2, Starting nearly two years ago, CERN has prepared the ground (nickname: EGO) to create a sustainable infrastructure in Europe with the vision of transferring the know-how and the responsibility for a global e-infrastructure into a new organization independent from CERN. CERN has also offered to host this new organization at least initially, to facilitate a smooth transition from present CERN-led operations.
CERN’s proposal was well received and following a large information campaign initiated via EGEE, which included visits to many countries, the idea is now generally accepted. Supported by the position of e-IRG, the EU has now opened as part of FP7 a call for design studies with EGI in mind. Although CERN has proposed to lead the design project, the recent choice by the majority of the EGI preparation team was different. CERN accepts this and it does not change CERN’s position as far as the need for a sustainable e-infrastructure is concerned.
CERN must ensure that the needs of the LHC community are fully taken into account. CERN expects that the future EGI will gradually take over, together with the NGIs, the responsibility for the operation presently provided by EGEE, the software integration, certification and distribution, as well as the required support and training.
While global coordination is important, it is not sufficient and EGI will have to provide reliable long-term services until such services can be obtained from industry. CERN also sees a role for EGI in the coordination of future middleware developments and in standardization as described in the vision paper prepared for the February 2007 workshop in Munich.
The EGI project must help to ensure that National Funding will take over a large fraction of the EU funding for operations, which is expected to run down with time.
J.Templon asked how this mandate compares to the OMII mandate (http://omii-europe.org/).
I.Bird replied that the OMII goal is mainly the standardization of the existing middleware and this will not be part of the EGI mandate.
5.2 EGI Design Study
The current work is regarding the setup and operation of a new organizational model of a sustainable pan-European grid infrastructure.
The main dates are:
- February 26-27: EGI Workshop in Munich
- May 2: Proposal submitted to the EC for funding within FP7-INFRA-2007-1, 1.2.1 Design Studies
- September 1: Project start (if approved)
- September 27: End of negotiations with EC
- October 2: EGI workshop in Budapest
And it will involve about 300 person months.
The institutes that will participate to the preparation of the design documents are:
- Johannes Kepler Universität Linz (GUP)
- Greek Research and Technology Network S.A. (GRNET)
- Istituto Nazionale di Fisica Nucleare (INFN)
- CSC - Scientific Computing Ltd. (CSC)
- CESNET, z.s.p.o. (CESNET)
- European Organization for Nuclear Research (CERN)
- Verein zur Förderung eines Deutschen Forschungsnetzes - DFN-Verein (DFN)
- Science & Technology Facilities Council (STFC)
- Centre National de la Recherche Scientifique(CNRS)
A slide 6 shows the Management Structure that has been agreed, with an Advisory Board and a Management Board.
The overall Project Director is Dieter Kranzlmüller (Linz) and 6 work packages have been defined (WP1 to WP6).
Below are the details of each work package, each with a leading partner (slide 7).
J.Knobloch also added that J.Shiers represents CERN in WP3: Functions Definitions.
I.Bird added that “Functions Definitions” will be about the functions of the EGI infrastructure from the NGIs, etc. Not the functionalities and requirements of the middleware software.
J.Shiers said that he will distribute the links to the EGI web and to the Use Cases Letter that is being prepared.
5.3 Scope of WP5: Establishment of EGI
The main objectives of WP5 are:
- Generate with WP3 and WP4 the “blueprint” which will serve to establish EGI
- Get the Organization and its Conventions ratified by a significant majority of European States
- Prepare and start the transition from EGEE to EGI
WP5 will be based on the results from WP2, WP3 and WP4 and, if needed, direct investigations. It is vital that the process for establishing EGI completes successfully at least 3 months before the end of EGEE-III, anticipated to be March 2010. WP5 will therefore span 23 months starting in January 2008, not counting preparatory work done outside the project.
The main tasks of WP5 (with partner working on it) will be
- Establish the convention of the organisation (CERN)
- Get the convention agreed by a majority of European NGIs (all)
- Maintain the relationship with the EC in view of supporting EGI (CERN and GUP)
- Initiate and complete the ratification process with the NGIs willing to join EGI (all)
- Incorporate the organisation (CERN)
- Initiate and complete the hand-over from major RI-project (e.g. EGEE) operations (all)
The preparation work will be done mostly by the lead partner – CERN - but all partners will contribute to obtain the agreement from the NGIs and during the ratification process.
5.4 Project Status
Now they have submitted “Description of Work” and “Grant agreement Preparation Forms” to the EU.
The Project started 1 September 2007 even if is still waiting for official approval:
- Development of an NGI knowledge base
- Collected use uses (NGIs, EGEE, …)
- Elected chair of advisory board: Gaspar Barreira (Portugal)
5.5 Next Steps
J.Templon asked when the proposal will be negotiated with all the other NGIs that do not participate to the design project.
J.Knobloch replied that all the NGIs are represented in the Advisory Board which is a more operation role than its name tells. That board is now more an oversight body on the whole project than just advisory. In addition the workshop in March 2008 is open to all people that want to contribute.
6. VO-specific SAM Tests (VO-specific SAM tests) - Experiments
The Experiment had agreed to comment the results of the VO-specific SAM tests when they are above the targets (in red in the table below).
Y.Schutz will ask information about the SAM tests.
D.Barberis explained that the ATLAS tests are still under development therefore the values reported are not to be considered very realistic, both the positive and the negative ones. The issues were mostly on the test configuration of the connection between SE and CE at some of the sites.
A major bug was fixed only in mid-October; therefore also for this month part the values will not be reliable. From this week on the ATLAS SAM values should be correct.
S.Belforte and A.Sciabá are looking into the issue.
- FNAL: there is a name clash in the SRM endpoint that is being fixed.
- IN2P3: executing a test that is misconfigured at IN2P3. CMS will fix it.
The issue at INFN are about accessing files. Could be a mismatch between the file catalog and the SE and not a site problem.
There is no real follow-up daily of the tests and is difficult to catch up at the end of the month.
The numbers are over positive in the table because the SAM test do not test all services needed by LHCb.
L.Dell’Agnello asked where could INFN find more information about the LHCb tests.
Ph.Charpentier replied that all information in on the LHCB SAM tests page.
L.Robertson concluded that the tests are still under development and will be prepared during the next few months. Until then the results will be discussed in the MB but not presented in other reports.
7. Sites Reliability Reports for September 2007 (Sites Reports; Slides) - A.Aimar
A.Aimar briefly commented the Site Reliability Reports for September 2007.
Here is the summary of the reliability since January 2007.
We have again 7 sites above 91% (current target) and 2 above 82% (90% of the target). The two sites below target are very close (FNAL and RAL) so we could easily have had 9 sites this time above target.
Below is the progression of the global averages in the last 6 months:
Average 8 best sites: Sept 93% Aug 94% Jul 93% Jun:87% May 94% Apr 92%
Average all sites: Sept 89% Aug 88% Jul 89% Jun:80% May 89% Apr 89%
The site reports are available here and are summarized by the table below.
Most of the issues are on:
- SRM and SE components that will anyway be upgraded; therefore not much progress is expected until these upgrades.
- Operational issues regarding certificated, network or power problems or maintenance
J.Gordon explained that a certificate badly renewed on Friday was discovered on Monday. And was enough to go to 90% and therefore RAL is below the target.
F.Hernandez noted that the scheduled downtime for IN2P3 is not taken into account and otherwise the availability will be better.
G.Merino reported that on 11 September the unavailability is not clear. SAM restarted working at PIC without intervention. As is only in PIC down that day it is not clear why the SAM tests failed and restarted at PIC.
J.Templon added that in October there will be a week with a major problem at SARA and he submitted a ticket about it.
L.Robertson noted that from next quarter the target should raise to 93%.
I.Fisk and R.Tafirout were confirmed as speakers at the Comprehensive Review.
9. Summary of New Actions
The full Action List, current and past items, will be in this wiki page before next MB meeting.