LCG Management Board

Date/Time:

Tuesday 12 December 2006 at 16:00

Agenda:

http://indico.cern.ch/conferenceDisplay.py?confId=a063278

Members:

http://lcg.web.cern.ch/LCG/Boards/MB/mb-members.html

 

(Version 1 - 14.12.2006)

Participants:

A.Aimar (notes), S.Belforte, L.Betev, I.Bird, N.Brook, K.Bos, T.Cass, Ph.Charpentier, L.Dell’Agnello, F.Donno, I.Fisk, J.Gordon, C.Grandi, S.Ilyin, M.Lamanna, E.Laure, M.Litmaath, J.Knobloch, V.Korenkov, H.Marten, G.Merino, B.Panzer, L.Robertson (chair), J.Shiers, J.Templon

Action List

https://twiki.cern.ch/twiki/bin/view/LCG/MbActionList

Next Meeting:

Tuesday 19 December 16:00-17:00 - Phone Meeting

1.      Minutes and Matters arising (minutes)

 

1.1         Minutes of Previous Meeting

No comments. Minutes approved.

1.2         Reviewers for the 2006Q4 QR Reports

The reviewers this quarter should be from ALICE and 2 Tier-1 sites.

1.3         Milestones and Targets for 2007: Discussion at the ECM

J.Shiers explained that a first proposal of milestones should be available within a week, input from ATLAS and CMS received.

 

2.      Action List Review (list of actions)

Actions that are late are highlighted in RED.

 

  • 21 Nov 06 - I.Bird will prepare the mandate, participation and goals of the working groups on “Monitoring Tools” and “System Analysis”.

 

Done. Mandates distributed. Here is the link to the message in the MB mailing list archive (NICE password required).

 

  • 25 Nov 06 - Sites should send to H.Renshall their procurement plans by end of next week.

 

Not done. The values from H.Renshall were distributed two weeks ago; therefore the procurement plans from the sites are due.

 

  • 29 Nov 2006 - L.Robertson and F.Carminati will discuss with K.Bos about changes to the mandate of the Storage Classes working group.

 

Done. Discussed in the GDB meeting the previous week.

The work of the Storage Classes group, as it is defined now, still requires to complete:

-          Presentations at the GDB of the remaining Tier-1 sites in order to see and comment the different implementation choices of each Tier-1 site.

-          Presentation at the GDB of one Tier-2 site.

 

After that the mandate will be expended to include “data access” from the experiments (data, AOD, etc).

And the working group will be extended to one representative from each experiment and two representatives from the Tier-1 sites.

 

New Action:

19 Dec 2006 - K.Bos should distribute to the MB list the mandate, and participation, of the Storage Classes working group.

 

  • 19 December - J.Shiers and H.Renshall will report on the progress on the definition of targets and milestones for 2007 at the LCG ECM meeting.

 

On the way. Input from ATLAS and CMS received. A first proposal of the 2007 targets should be available within a week.

 

3.      Status of the Russian Tier-2 sites (slides) – S.Ilyin

 

 

S.Ilyin presented status and plans of the Russian Tier-2 sites (RuTier2 Cluster).

 

The RuTier2 Cluster is going to:

-          Serve all four experiments with all sites. All experiments will have access to all sites, even if some institutes do not participate to all collaborations.

-          Support all the basic functions such as analysis, simulation and user data support.

-          Provide some Tier-1 functions, i.e. storing a fraction of the data for local groups developing reconstruction algorithms.

 

Slide 2 shows the Russian sites that participating to the Cluster. Six of them are already active or will start next year as Tier-2 sites; the others will be mostly Tier-3 sites.

The operational model that is being defined will have CPU resources and disk space distributed among the four experiments as decided by the Russian community. The details and the operational procedures will be developed in early 2007.

 

As in the MoU, there are two financing agencies (FASI and JINR) that contribute to the Russian LCG effort.
Their representations are the following:

-          Russia and JINR representatives in C-RRB:
Yu.F. Kozlov (FASI) and V.I. Savrin (SINP MSU) for Russia, A.N. Sisakian  for JINR

-          Representatives in WLCG Collaboration Board:
V.A. Ilyin (SINP MSU), alternative V.V. Korenkov (JINR)

 

Slide 4: The process to get to the MoU signatures was started in February 2006:

-           May 2006  approved by FASI and sent to Ministry of Finances and to Ministry of Foreign Affairs

-           September 2006 approved by Ministry of Finances

-           November 2006 returned from MFA for improving the Russian version, end of November sent to MFA again

-           Current situation: waiting MFA approval

 

Note: They need a reminding letter from CERN addressed to the Chief of FASI S.N. Mazurenko.

 

The budget for the coming years is also being discussed: It will probably imply a reduction of 50% for disk and a small reduction for CPU as indicated in slide 5.

 

Slide 6.

The association with one Tier-1 site depends on each experiment:

-          ALICE - FZK
to serve as a canonical Tier-1  for Russian Tier-2 sites

-          ATLAS – NIKHEF/SARA
to serve as a canonical Tier-1 for Russian Tier-2 sites

-          CMS - CERN
In May 2006 FZK decided that they have no facilities to serve Russian CMS Tier-2s. In this urgent situation the solution has been found by RDMS, CMS and LCG management:
CERN agrees to act as a CMS Tier-1 centre for the purposes of receiving Monte Carlo data from Russian and Ukrainian Tier-2 centres.

-          LHCb - CERN
to serve as a canonical Tier-1 centre for Russian Tier-2 sites

 

CERN will act as a special-purpose Tier-1 centre for CMS, taking a share of the general distribution of AOD and RECO data as required.
In particular, CMS intends to use the second copy of AOD and RECO at CERN in order to improve the flexibility and robustness of the computing system, and will not act as a general-purpose Tier-1.

 

Slide 7 shows the current and future networking connectivity.

International connectivity for Russian science is based on a 2.5 Gigabit/s link(Moscow - St-Petersburg – Stockholm). In Mid-2007 will be upgraded to a 10 Gb/s link.

 

Connectivity with Europe:

-          GEANT2 PoP (connected to Moscow G-NAP) has been opened in November 2005 with 2x622++ Mbps links to GEANT PoPs. .
From Dec. 2006 upgraded to 2.5 Gb/s.
Plan from Mid-2007 to have 10 Gb/s link via Amsterdam.

Connectivity with USA, China, Japan and Korea - GLORIAD project:

-          622 Mbps -Chicago-Amsterdam-St-Petersburg-Moscow

-          155 Mbps - MoscowNovosibirskKhabarovsk – Beijing
2006  622 Mbps – 1 Gb/s, 2007 1-2.5-10 Gb/s

 

Slide 8 shows the Russian regional connectivity:

-          Moscow upgraded to 10 Gb/s   (ITEP, RRC KI, SINP MSU, …LPI, MEPhI), 

-          IHEP (Protvino) 100 Mbps fibre-optic (to have 1 Gb/s in 2006)

-          JINR (Dubna) 1 Gb/s f/o (update to 10 Gb/s in mid 2007)

-          BINP  (Novosibirsk) 45-100 Mbps (depends on the GLORIAD++ project)

-          INR RAS (Troitsk) 1 Gb/s in early 2007

-          PNPI (Gatchina) 1 Gb/s started

-          SPbSU (S-Petersburg)  1 Gb/s

 

The tests performed by CMS show a connectivity of ONLY 10 MBytes/s, over a link that is nominally of 1 Gbits/s.
The reasons of this reduced performance (bottlenecks, etc) are under investigation.

 

N.Brook asked whether the breakdown of the resources (CPU vs. disk space) will be agreed with the experiments because LHCb needs CPU on the Tier-2 sites, rather than disk. S.Ilyin replied that this is being discussed within the experiments. The Russian Cluster is ready to adopt the breakdown that better suits the experiments.

 

H.Marten asked details on which “Tier-1 functions” the Russian Cluster plans to perform.

S.Ilyin replied that it depends on the experiment but are only for specific purposes. For instance LHCb will need some AOD data available for local usage. And the developers of reconstruction programs will need some raw data locally for fast access and testing.

 

4.      Update on the SRM 2.2 Installations (slides) - M.Litmaath

 

4.1         Status of the services

Slides 2 to 4.

Compared to the previous status update there are now the results of the “S2 tests” develop by F.Donno:

Here are the link to the result pages:

http://cern.ch/grid-deployment/flavia/basic

http://cern.ch/grid-deployment/flavia/cross

http://cern.ch/grid-deployment/flavia/usecase

 

The scripts are run manually for now; but they will soon be started by hourly cronjobs.

F.Donno continues to improve the tests adding new important uses cases and also checking how the services react to wrong instructions. When she finds issues she reports them to the SRM developers.

 

The list of open issues is maintained here: https://twiki.cern.ch/twiki/bin/view/SRMDev/IssuesInTheSpecifications

None of these open issues is crucial or a show stopper.

 

Currently there is not a second endpoint for any of the services but these will be set up in the next few weeks.

 

L.Betev asked about the meaning of the error codes in the web pages above. Some failures end up producing a “success” status.

F.Donno replied that this may be dependent on which server the tests are executed on and after a failure there could be successful execution (or vice versa). But this mismatch should not happen and will be reported.

 

The links to the LBL “SRM tester” tests are available here:

-          http://sdm.lbl.gov/srm-tester/v22-progress.html

-          http://sdm.lbl.gov/srm-tester/v22daily.html

 

The results and issues on the SRM tester and in the “srm-devel” mailing lists are available here:

-          https://hpcrdm.lbl.gov/mailman/listinfo/srmtester 

-          http://listserv.fnal.gov/archives/srm-devel.html

 

 

The rumour that “CASTOR supports only VOMS proxies” is not true and is not an issue:

-          FTS is not yet ready for VOMS (January)

-          A fall-back is anyway available and with an old “VO map-file” it would be easy to implement it (cf. LFC/DPM)

 

A new WSDL (of Sep. 27) is going to be installed by Dec. 15: it fixes some small long-standing issues (e.g. srmLs output format).

4.2         Status of Clients

FTS: The FTS development continues without surprises. The SRM client code is being unit-tested and will be checked with DPM later this week. The integration into the FTS framework needs a few days and a release for the installation on the “development test bed” is expected next week.

 

GFAL/lcg-utils: New RPM installation files are expected on the “test UI” by Wed13 Dec. And a patch for gLite release certification is expected next week.

4.3         GLUE Schema

The 2-months discussion has now produced a draft document of GLUE 1.3: http://glueschema.forge.cnaf.infn.it/Spec/V13

Not everything originally proposed is included, but the important features are present.

 

S.Andreozzi has already stated a first LDAP implementation and we could have a first version next week.

L.Field started to configure the information providers that he manages.

4.4         Storage Classes Working Group

Further future pre-GDB meetings have been agreed. Other Tier-1 and Tier-2 sites will present their plans about storage classes.

More experiments and Tier-1 representatives will join the group and A.Trumov is no longer available to chair it.

4.5         Plans

The main activity is to run the test suites as often as possible in order to find issues and verify improvements.

Feedback continues through e-mail and phone conferences with the developers.

 

Installations and verification of the releases of GFAL/lcg-utils and FTS for SRM V2.2.

 

During the WLCG workshop Jan. 22-26 there will be 3 SRM sessions:

-          Discuss SRM V2.2 deployment with sites

-          Discuss remaining issues with the SRM developers

-          Define some “mini-SC” milestones to check the SRM installations progress

 

S.Belforte notes that the pages with the test results contain mostly failures. It is difficult to spot the progress and how far we are from the production level required. Is there real improvement visible, compared to September?

M.Litmaath replied that there is definitely a lot of improvement. The remaining problems are sometimes due to the MSS back-end being offline and not because of real SRM issues. F.Donno also added that now there is a clear metric of the situation and in January the measurements will become much more frequent. The progress should then become visible also in the test results pages.

 

M.Litmaath said that all implementations have shown that the basic SRM V2.2 calls are working. When there is a problem the service providers are informed. But not all service providers and developer teams react with urgency to the issues raised. It can even take a week before receiving a first (maybe negative) reply.

 

The MB agreed that by Mid-January there will be a re-assessment of the situation looking more to the specific implementations and installations at the sites.

 

5.      Summary of the GDB Meeting (document) – K.Bos

 

 

K.Bos presented a summary of the discussions and decisions taken at the GDB meeting the week before. See the document here.

5.1         Introduction

The introduction at the GDB included a presentation on SLC4 with the transition path from SLC3.

 

Ph.Charpentier questioned whether the software certified (and recompiled on SLC4) by the experiments and the Application Area, will work with the middleware running on SLC4 but in “SLC3-compatible mode”.

 

I.Bird replied that not all middleware components can already run natively on SLC4 (e.g. python warnings problems), and that the SLC3-compatible mode has been tested at GRIF in Paris by ATLAS. The goal is to run gLite natively on SLC4 as a priority, but we are not there yet.

 

T.Cass said that CERN will move the lxplus service to SLC4 on the 15 January 2007 ONLY if the experiments applications are running. He also stressed that a target date must be fixed in order have the SLC4 migration as soon as possible on 2007. Otherwise the issues will be postponed until is too late.

 

L.Robertson suggested that Ph.Charpentier, T.Cass and M.Schulz clarify (within a week) the issue outside the MB. They should report unsolved issues and disagreements to the next MB. K.Bos added that there will be again an update on the SLC4 migration at next GDB in January.

5.2         Storage Classes discussion

The progress on the Storage Classes mandate in already described above in the Action List section. See above.

5.3         Megatable

C.Eck summarized at the GDB the pre-GDB discussion. It was also decided an action in order to clarify the networking bandwidth needed.

Action:

D.Foster will form a group discussing the network setup and performance needed according to the Megatable values.
To be discussed at the OPN meeting (12 Jan 2007). Report needed by the OB end of January.

 

J.Gordon mentioned that was agreed that “Tier-1 to Tier-1” traffic will use the OPN network. “Tier-1 to Tier-2” traffic will be over the general network.

 

ALICE ion runs: L.Robertson noted that ion runs were considered at 2x106 that would mean assuming a100% efficiency.

 

Action:

The proposal to ALICE is to consider, as in the TDR, a value of 106 for the ALICE ion runs. L.Betev agreed that ALICE should confirm it within a week.

5.4         Security

The grid security management model has been changed. LCG will no longer have operational responsibility and security will be managed by the grid infrastructures: for EGEE (R.Wartel) and for OSG (D.Petravick).
D.Kelsey will act the LCG representative and contact link. He also presented the LCG security requirements to the IGTF. The documents will be distributed to the GDB mailing list and should be approved during the GDB meetings in January and February 2007.

5.5         SAM availability tests

The status of the SAM tests and the algorithm for considering a site as “available” was discussed. The sites requested a simpler interface to drill down to the causes and to the tests that make a site be considered “unavailable”.

5.6         SAM tests in OSG

OSG will develop their own tests, implementing similar SAM metrics and tests and filling directly the SAM database. The strategy is to learn about the SAM DB by completely implementing a test, working with the SAM team at CERN.

The GDB will have to approve that the SAM tests developed on EGEE and OSG are equivalent.

5.7         Pilot Jobs and glexec

The agreement is that the JSPG will define an Acceptable User Policy (AUP) that will clarify responsibilities and liabilities for the VOs running pilot jobs. Maybe not all sites will accept pilot jobs and glexec. There will be special VOMS roles so that sites can choose which roles can access their WNs and use glexec.

 

I.Bird noted that the sites will have then to configure their setup themselves. And the site configurations will be discussed in future TCG meetings.

 

The implementation of glexec was discussed at the last TCG meeting.

 

E.Laure reported that the “glexec” feature is available on the JRA1 preview test bed and that experiments should use it and provide feedback. And is already deployed and being test at FNAL.

 

C.Grandi added that the preview test bed is also completed by installations in Helsinki and NIKHEF for parallel tests.

 

Ph.Charpentier noted that LHCb is already running single users pilot jobs on all sites. They are like gLite regular jobs. J.Templon explained that the issues and the AUP on responsibility and liability are mainly about multiple users pilot jobs

 

L.Robertson noted that point 6 in the GDB summary says the JSPG will define the rules about responsibilities and liabilities between grid infrastructures (EGEE, OSG, etc) and the VOs. This implies that “when the user policy is accepted by the grids and experiments then sites will accept to run pilots jobs”. Sites will have this setup as default, and can only opt out by not allowing some VOMS roles to use their WNs.

 

The AUP document should be presented to the GDB by Spring 2007.

5.8         Accounting

J.Gordon presented at the GDB a summary of the status and plans for accounting.

 

Storage accounting (for disk and tape storage) is now available and one can select a time-window and a site. It is currently monitoring the UK sites but will soon start to collect the data from all the EGEE sites.

 

User-level accounting can now be done and the user DN is encrypted. Each user will be encrypted for transmission but stored in the DB not encrypted.

 

The development of APEL and DGAS was discussed and the base solution remains the GOC DB and sites can use APEL, DGAS or other tools. J.Gordon also added that APEL works with Update 10 of the gLite CE.

 

J.Gordon will present next week the status and the next steps in order to move to automatic accounting.

 

6.      Long Term Planning for the CERN Computing Centre - T.Cass, L.Robertson

 

Postponed to next week.

 

7.      AOB

 

 

No AOB.

 

8.      Summary of New Actions 

 

 

Action:

19 Dec 2006 - K.Bos should distribute to the MB list the mandate, and participation, of the Storage Classes working group.

 

Action:

19 Dec 2006 - The proposal to ALICE is to consider, as in the TDR, a value of 106 for the ALICE ion runs. L.Betev agreed that ALICE should confirm it within a week.

 

Action:

15 Jan 2007 - D.Foster will form a group discussing the network setup and performance needed according to the Megatable values.
To be discussed at the OPN meeting (12 Jan 2007). Report needed by the OB end of January.

 

 

 

 

The full Action List, current and past items, will be in this wiki page before next MB meeting.