LCG Management Board
Tuesday 12 December 2006 at 16:00
(Version 1 - 14.12.2006)
A.Aimar (notes), S.Belforte, L.Betev, I.Bird, N.Brook, K.Bos, T.Cass, Ph.Charpentier, L.Dell’Agnello, F.Donno, I.Fisk, J.Gordon, C.Grandi, S.Ilyin, M.Lamanna, E.Laure, M.Litmaath, J.Knobloch, V.Korenkov, H.Marten, G.Merino, B.Panzer, L.Robertson (chair), J.Shiers, J.Templon
Tuesday 19 December 16:00-17:00 - Phone Meeting
1. Minutes and Matters arising (minutes)
1.1 Minutes of Previous Meeting
No comments. Minutes approved.
1.2 Reviewers for the 2006Q4 QR Reports
reviewers this quarter should be from
1.3 Milestones and Targets for 2007: Discussion at the ECM
J.Shiers explained that a first proposal of milestones should be available within a week, input from ATLAS and CMS received.
2. Action List Review (list of actions)
Actions that are late are highlighted in RED.
Done. Mandates distributed. Here is the link to the message in the MB mailing list archive (NICE password required).
Not done. The values from H.Renshall were distributed two weeks ago; therefore the procurement plans from the sites are due.
Done. Discussed in the GDB meeting the previous week.
The work of the Storage Classes group, as it is defined now, still requires to complete:
- Presentations at the GDB of the remaining Tier-1 sites in order to see and comment the different implementation choices of each Tier-1 site.
- Presentation at the GDB of one Tier-2 site.
After that the mandate will be expended to include “data access” from the experiments (data, AOD, etc).
And the working group will be extended to one representative from each experiment and two representatives from the Tier-1 sites.
19 Dec 2006 - K.Bos should distribute to the MB list the mandate, and participation, of the Storage Classes working group.
On the way. Input from ATLAS and CMS received. A first proposal of the 2007 targets should be available within a week.
3. Status of the Russian Tier-2 sites (slides) – S.Ilyin
S.Ilyin presented status and plans of the Russian Tier-2 sites (RuTier2 Cluster).
The RuTier2 Cluster is going to:
- Serve all four experiments with all sites. All experiments will have access to all sites, even if some institutes do not participate to all collaborations.
- Support all the basic functions such as analysis, simulation and user data support.
- Provide some Tier-1 functions, i.e. storing a fraction of the data for local groups developing reconstruction algorithms.
Slide 2 shows the Russian sites that participating to the Cluster. Six of them are already active or will start next year as Tier-2 sites; the others will be mostly Tier-3 sites.
The operational model that is being defined will have CPU resources and disk space distributed among the four experiments as decided by the Russian community. The details and the operational procedures will be developed in early 2007.
As in the MoU, there are two financing
agencies (FASI and JINR) that contribute to the Russian LCG effort.
Representatives in WLCG
Slide 4: The process to get to the MoU signatures was started in February 2006:
- May 2006 approved by FASI and sent to Ministry of Finances and to Ministry of Foreign Affairs
- September 2006 approved by Ministry of Finances
- November 2006 returned from MFA for improving the Russian version, end of November sent to MFA again
- Current situation: waiting MFA approval
Note: They need a reminding letter from CERN addressed to the Chief of FASI S.N. Mazurenko.
The budget for the coming years is also being discussed: It will probably imply a reduction of 50% for disk and a small reduction for CPU as indicated in slide 5.
The association with one Tier-1 site depends on each experiment:
ATLAS – NIKHEF/SARA
CMS - CERN
LHCb - CERN
CERN will act as a special-purpose Tier-1
centre for CMS, taking a share of the general distribution of AOD and RECO
data as required.
Slide 7 shows the current and future networking connectivity.
International connectivity for Russian
science is based on a 2.5 Gigabit/s link(
GEANT2 PoP (connected to
Moscow G-NAP) has been opened in November 2005 with 2x622++ Mbps links to
GEANT PoPs. .
622 Mbps -
155 Mbps -
Slide 8 shows the Russian regional connectivity:
- IHEP (Protvino) 100 Mbps fibre-optic (to have 1 Gb/s in 2006)
- JINR (Dubna) 1 Gb/s f/o (update to 10 Gb/s in mid 2007)
- INR RAS (Troitsk) 1 Gb/s in early 2007
- PNPI (Gatchina) 1 Gb/s started
- SPbSU (S-Petersburg) 1 Gb/s
The tests performed by
CMS show a connectivity of ONLY 10 MBytes/s, over a link that is nominally of
N.Brook asked whether the breakdown of the resources (CPU vs. disk space) will be agreed with the experiments because LHCb needs CPU on the Tier-2 sites, rather than disk. S.Ilyin replied that this is being discussed within the experiments. The Russian Cluster is ready to adopt the breakdown that better suits the experiments.
H.Marten asked details on which “Tier-1 functions” the Russian Cluster plans to perform.
S.Ilyin replied that it depends on the experiment but are only for specific purposes. For instance LHCb will need some AOD data available for local usage. And the developers of reconstruction programs will need some raw data locally for fast access and testing.
4. Update on the SRM 2.2 Installations (slides) - M.Litmaath
4.1 Status of the services
Slides 2 to 4.
Compared to the previous status update there are now the results of the “S2 tests” develop by F.Donno:
Here are the link to the result pages:
The scripts are run manually for now; but they will soon be started by hourly cronjobs.
F.Donno continues to improve the tests adding new important uses cases and also checking how the services react to wrong instructions. When she finds issues she reports them to the SRM developers.
The list of open issues is maintained here: https://twiki.cern.ch/twiki/bin/view/SRMDev/IssuesInTheSpecifications
None of these open issues is crucial or a show stopper.
Currently there is not a second endpoint for any of the services but these will be set up in the next few weeks.
L.Betev asked about the meaning of the error codes in the web pages above. Some failures end up producing a “success” status.
F.Donno replied that this may be dependent on which server the tests are executed on and after a failure there could be successful execution (or vice versa). But this mismatch should not happen and will be reported.
The links to the LBL “SRM tester” tests are available here:
The results and issues on the SRM tester and in the “srm-devel” mailing lists are available here:
The rumour that “CASTOR supports only VOMS proxies” is not true and is not an issue:
- FTS is not yet ready for VOMS (January)
- A fall-back is anyway available and with an old “VO map-file” it would be easy to implement it (cf. LFC/DPM)
A new WSDL (of Sep. 27) is going to be installed by Dec. 15: it fixes some small long-standing issues (e.g. srmLs output format).
4.2 Status of Clients
FTS: The FTS development continues without surprises. The SRM client code is being unit-tested and will be checked with DPM later this week. The integration into the FTS framework needs a few days and a release for the installation on the “development test bed” is expected next week.
GFAL/lcg-utils: New RPM installation files are expected on the “test UI” by Wed13 Dec. And a patch for gLite release certification is expected next week.
4.3 GLUE Schema
The 2-months discussion has now produced a draft document of GLUE 1.3: http://glueschema.forge.cnaf.infn.it/Spec/V13
Not everything originally proposed is included, but the important features are present.
S.Andreozzi has already stated a first LDAP implementation and we could have a first version next week.
L.Field started to configure the information providers that he manages.
4.4 Storage Classes Working Group
Further future pre-GDB meetings have been agreed. Other Tier-1 and Tier-2 sites will present their plans about storage classes.
More experiments and Tier-1 representatives will join the group and A.Trumov is no longer available to chair it.
The main activity is to run the test suites as often as possible in order to find issues and verify improvements.
Feedback continues through e-mail and phone conferences with the developers.
Installations and verification of the releases of GFAL/lcg-utils and FTS for SRM V2.2.
During the WLCG workshop Jan. 22-26 there will be 3 SRM sessions:
- Discuss SRM V2.2 deployment with sites
- Discuss remaining issues with the SRM developers
- Define some “mini-SC” milestones to check the SRM installations progress
S.Belforte notes that the pages with the test results contain mostly failures. It is difficult to spot the progress and how far we are from the production level required. Is there real improvement visible, compared to September?
M.Litmaath replied that there is definitely a lot of improvement. The remaining problems are sometimes due to the MSS back-end being offline and not because of real SRM issues. F.Donno also added that now there is a clear metric of the situation and in January the measurements will become much more frequent. The progress should then become visible also in the test results pages.
M.Litmaath said that all implementations have shown that the basic SRM V2.2 calls are working. When there is a problem the service providers are informed. But not all service providers and developer teams react with urgency to the issues raised. It can even take a week before receiving a first (maybe negative) reply.
The MB agreed that by Mid-January there will be a re-assessment of the situation looking more to the specific implementations and installations at the sites.
5. Summary of the GDB Meeting (document) – K.Bos
K.Bos presented a summary of the discussions and decisions taken at the GDB meeting the week before. See the document here.
The introduction at the GDB included a presentation on SLC4 with the transition path from SLC3.
Ph.Charpentier questioned whether the software certified (and recompiled on SLC4) by the experiments and the Application Area, will work with the middleware running on SLC4 but in “SLC3-compatible mode”.
I.Bird replied that not all middleware components
can already run natively on SLC4 (e.g. python warnings problems), and that
the SLC3-compatible mode has been tested at GRIF in
T.Cass said that CERN will move the lxplus service to SLC4 on the 15 January 2007 ONLY if the experiments applications are running. He also stressed that a target date must be fixed in order have the SLC4 migration as soon as possible on 2007. Otherwise the issues will be postponed until is too late.
L.Robertson suggested that Ph.Charpentier, T.Cass and M.Schulz clarify (within a week) the issue outside the MB. They should report unsolved issues and disagreements to the next MB. K.Bos added that there will be again an update on the SLC4 migration at next GDB in January.
5.2 Storage Classes discussion
The progress on the Storage Classes mandate in already described above in the Action List section. See above.
C.Eck summarized at the GDB the pre-GDB discussion. It was also decided an action in order to clarify the networking bandwidth needed.
D.Foster will form a group discussing the network setup and
performance needed according to the Megatable values.
J.Gordon mentioned that was agreed that “Tier-1 to Tier-1” traffic will use the OPN network. “Tier-1 to Tier-2” traffic will be over the general network.
The proposal to
The grid security management model has
been changed. LCG will no longer have operational responsibility and security
will be managed by the grid infrastructures: for EGEE (R.Wartel) and for OSG
5.5 SAM availability tests
The status of the SAM tests and the algorithm for considering a site as “available” was discussed. The sites requested a simpler interface to drill down to the causes and to the tests that make a site be considered “unavailable”.
5.6 SAM tests in OSG
OSG will develop their own tests, implementing similar SAM metrics and tests and filling directly the SAM database. The strategy is to learn about the SAM DB by completely implementing a test, working with the SAM team at CERN.
The GDB will have to approve that the SAM tests developed on EGEE and OSG are equivalent.
5.7 Pilot Jobs and glexec
The agreement is that the JSPG will define an Acceptable User Policy (AUP) that will clarify responsibilities and liabilities for the VOs running pilot jobs. Maybe not all sites will accept pilot jobs and glexec. There will be special VOMS roles so that sites can choose which roles can access their WNs and use glexec.
I.Bird noted that the sites will have then to configure their setup themselves. And the site configurations will be discussed in future TCG meetings.
The implementation of glexec was discussed at the last TCG meeting.
E.Laure reported that the “glexec” feature is available on the JRA1 preview test bed and that experiments should use it and provide feedback. And is already deployed and being test at FNAL.
C.Grandi added that the preview test bed is also
completed by installations in
Ph.Charpentier noted that LHCb is already running single users pilot jobs on all sites. They are like gLite regular jobs. J.Templon explained that the issues and the AUP on responsibility and liability are mainly about multiple users pilot jobs
L.Robertson noted that point 6 in the GDB summary says the JSPG will define the rules about responsibilities and liabilities between grid infrastructures (EGEE, OSG, etc) and the VOs. This implies that “when the user policy is accepted by the grids and experiments then sites will accept to run pilots jobs”. Sites will have this setup as default, and can only opt out by not allowing some VOMS roles to use their WNs.
The AUP document should be presented to the GDB by Spring 2007.
J.Gordon presented at the GDB a summary of the status and plans for accounting.
Storage accounting (for disk and tape
storage) is now available and one can select a time-window and a site. It is
currently monitoring the
User-level accounting can now be done and the user DN is encrypted. Each user will be encrypted for transmission but stored in the DB not encrypted.
The development of APEL and DGAS was discussed and the base solution remains the GOC DB and sites can use APEL, DGAS or other tools. J.Gordon also added that APEL works with Update 10 of the gLite CE.
J.Gordon will present next week the status and the next steps in order to move to automatic accounting.
6. Long Term Planning for the CERN Computing Centre - T.Cass, L.Robertson
Postponed to next week.
8. Summary of New Actions
19 Dec 2006 - K.Bos should distribute to the MB list the mandate, and participation, of the Storage Classes working group.
19 Dec 2006 - The proposal to
15 Jan 2007 - D.Foster will form a group discussing the
network setup and performance needed according to the Megatable values.
The full Action List, current and past items, will be in this wiki page before next MB meeting.