LCG Management Board |
|
Date/Time:
|
Tuesday 12 December 2006 at 16:00 |
Agenda: |
|
Members: |
|
|
(Version
1 - 14.12.2006) |
Participants: |
A.Aimar (notes), S.Belforte, L.Betev,
I.Bird, N.Brook, K.Bos, T.Cass, Ph.Charpentier, L.Dell’Agnello,
F.Donno, I.Fisk, J.Gordon, C.Grandi, S.Ilyin, M.Lamanna, E.Laure, M.Litmaath,
J.Knobloch, V.Korenkov, H.Marten, G.Merino, B.Panzer, L.Robertson (chair),
J.Shiers, J.Templon |
Action List |
|
Next
Meeting: |
Tuesday 19 December 16:00-17:00 -
Phone Meeting |
1.
Minutes and Matters arising (minutes)
|
|
1.1
Minutes of Previous Meeting
No
comments. Minutes approved. 1.2
Reviewers for the 2006Q4 QR
Reports
The
reviewers this quarter should be from 1.3
Milestones and Targets for
2007: Discussion at the ECM
J.Shiers explained that a first proposal
of milestones should be available within a week, input from ATLAS and CMS
received. |
|
2. Action List Review (list
of actions)
Actions that are late
are highlighted in RED. |
|
Done. Mandates
distributed. Here is the link to the message
in the MB mailing list archive (NICE password required).
Not done. The
values from H.Renshall were distributed two weeks ago; therefore the procurement
plans from the sites are due.
Done. Discussed in the
GDB meeting the previous week. The work of the Storage
Classes group, as it is defined now, still requires to complete: -
Presentations at the GDB of
the remaining Tier-1 sites in order to see and comment the different
implementation choices of each Tier-1 site. -
Presentation at the GDB of
one Tier-2 site. After that the mandate
will be expended to include “data access” from the experiments
(data, AOD, etc). And the working group
will be extended to one representative from each experiment and two
representatives from the Tier-1 sites. New Action: 19 Dec 2006 -
K.Bos should distribute to the MB list the mandate, and participation, of the
Storage Classes working group.
On the way. Input from ATLAS and CMS received. A first
proposal of the 2007 targets should be available within a week. |
|
3.
Status of the Russian Tier-2
sites (slides)
– S.Ilyin
|
|
S.Ilyin presented status and plans of the
Russian Tier-2 sites (RuTier2
Cluster). The RuTier2 Cluster is going to: -
Serve all four experiments
with all sites. All experiments will have access to all sites, even if some
institutes do not participate to all collaborations. -
Support all the basic
functions such as analysis, simulation and user data support. -
Provide some Tier-1
functions, i.e. storing a fraction of the data for local groups developing
reconstruction algorithms. Slide 2 shows
the Russian sites that participating to the Cluster. Six of them are already
active or will start next year as Tier-2 sites; the others will be mostly
Tier-3 sites. The operational model that is being
defined will have CPU resources and disk space distributed among the four
experiments as decided by the Russian community. The details and the
operational procedures will be developed in early 2007. As in the MoU, there are two financing
agencies (FASI and JINR) that contribute to the Russian LCG effort. -
-
Representatives in WLCG
Collaboration Board: Slide 4: The process to get to the MoU
signatures was started in February 2006: -
May 2006
approved by FASI and sent to Ministry of Finances and to Ministry of
Foreign Affairs -
September 2006 approved by Ministry of
Finances -
November 2006 returned from MFA for
improving the Russian version, end of November sent to MFA again -
Current situation: waiting MFA approval Note: They need a reminding letter from CERN addressed to the Chief of
FASI S.N. Mazurenko. The budget for the coming years is also
being discussed: It will probably imply a reduction of 50% for disk and a
small reduction for CPU as indicated in slide 5. Slide 6. The association with one Tier-1 site
depends on each experiment: -
-
ATLAS – NIKHEF/SARA -
CMS - CERN
-
LHCb - CERN CERN will act as a special-purpose Tier-1
centre for CMS, taking a share of the general distribution of AOD and RECO
data as required. Slide 7 shows the current and future
networking connectivity. International connectivity for Russian
science is based on a 2.5 Gigabit/s link( Connectivity with -
GEANT2 PoP (connected to
Moscow G-NAP) has been opened in November 2005 with 2x622++ Mbps links to
GEANT PoPs. . Connectivity with -
622 Mbps - -
155 Mbps - Slide 8 shows the Russian regional
connectivity: -
-
IHEP (Protvino) 100 Mbps
fibre-optic (to have 1 Gb/s in 2006) -
JINR (Dubna) 1 Gb/s f/o
(update to 10 Gb/s in mid 2007) -
BINP ( -
INR RAS (Troitsk) 1 Gb/s in
early 2007 -
PNPI (Gatchina) 1 Gb/s
started -
SPbSU (S-Petersburg) 1 Gb/s The tests performed by
CMS show a connectivity of ONLY 10 MBytes/s, over a link that is nominally of
1 Gbits/s. N.Brook
asked whether the breakdown of the resources (CPU vs. disk space) will be
agreed with the experiments because LHCb needs CPU on the Tier-2 sites,
rather than disk. S.Ilyin replied that this is being discussed within the
experiments. The Russian Cluster is ready to adopt the breakdown that better
suits the experiments. H.Marten
asked details on which “Tier-1 functions” the Russian Cluster
plans to perform. S.Ilyin
replied that it depends on the experiment but are only for specific purposes.
For instance LHCb will need some AOD data available for local usage. And the
developers of reconstruction programs will need some raw data locally for
fast access and testing. |
|
4.
Update on the SRM 2.2
Installations (slides)
- M.Litmaath
|
|
4.1
Status of the services
Slides 2 to 4.
Compared to the
previous status update there are now the results of the “S2
tests” develop by F.Donno: Here are the
link to the result pages: http://cern.ch/grid-deployment/flavia/basic http://cern.ch/grid-deployment/flavia/cross http://cern.ch/grid-deployment/flavia/usecase The scripts
are run manually for now; but they will soon be started by hourly cronjobs. F.Donno
continues to improve the tests adding new important uses cases and also
checking how the services react to wrong instructions. When she finds issues
she reports them to the SRM developers. The list of open issues is maintained
here: https://twiki.cern.ch/twiki/bin/view/SRMDev/IssuesInTheSpecifications None of these open issues is crucial or a
show stopper. Currently there is not a second endpoint for
any of the services but these will be set up in the next few weeks. L.Betev asked about the meaning of the error codes
in the web pages above. Some failures end up producing a
“success” status. F.Donno replied that this may be dependent on which
server the tests are executed on and after a failure there could be successful
execution (or vice versa). But this mismatch should not happen and will be
reported. The links to
the LBL “SRM tester” tests are available here: -
http://sdm.lbl.gov/srm-tester/v22-progress.html
-
http://sdm.lbl.gov/srm-tester/v22daily.html The results and issues on the SRM tester and
in the “srm-devel” mailing lists are
available here: -
https://hpcrdm.lbl.gov/mailman/listinfo/srmtester -
http://listserv.fnal.gov/archives/srm-devel.html
The rumour
that “CASTOR supports only VOMS proxies” is not true and is not
an issue: -
FTS is not yet ready for VOMS
(January) -
A fall-back is anyway
available and with an old “VO map-file” it would be easy to
implement it (cf. LFC/DPM) A new WSDL (of Sep. 27) is going to be
installed by Dec. 15: it fixes some small long-standing issues (e.g. srmLs
output format). 4.2
Status of Clients
FTS:
The FTS development continues without surprises.
The SRM client code is being unit-tested and will be checked with DPM later
this week. The integration into the FTS framework needs a few days and a
release for the installation on the “development test bed” is
expected next week. GFAL/lcg-utils:
New RPM installation files are expected on the
“test UI” by Wed13 Dec. And a patch for gLite release
certification is expected next week. 4.3
GLUE Schema
The 2-months
discussion has now produced a draft document of GLUE 1.3: http://glueschema.forge.cnaf.infn.it/Spec/V13
Not everything
originally proposed is included, but the important features are present. S.Andreozzi
has already stated a first LDAP implementation and we could have a first
version next week. L.Field
started to configure the information providers that he manages. 4.4
Storage Classes Working Group
Further future
pre-GDB meetings have been agreed. Other Tier-1 and Tier-2 sites will present
their plans about storage classes. More
experiments and Tier-1 representatives will join the group and A.Trumov is no
longer available to chair it. 4.5
Plans
The main
activity is to run the test suites as often as possible in order to find
issues and verify improvements. Feedback
continues through e-mail and phone conferences with the developers. Installations
and verification of the releases of GFAL/lcg-utils and FTS for SRM V2.2. During the
WLCG workshop Jan. 22-26 there will be 3 SRM sessions: -
Discuss SRM V2.2 deployment
with sites -
Discuss remaining issues with
the SRM developers -
Define some
“mini-SC” milestones to check the SRM installations progress S.Belforte
notes that the pages with the test results contain mostly failures. It is
difficult to spot the progress and how far we are from the production level
required. Is there real improvement visible, compared to September? M.Litmaath
replied that there is definitely a lot of improvement. The remaining problems
are sometimes due to the MSS back-end being offline and not because of real
SRM issues. F.Donno also added that now there is a clear metric of the
situation and in January the measurements will become much more frequent. The
progress should then become visible also in the test results pages. M.Litmaath
said that all implementations have shown that the basic SRM V2.2 calls are
working. When there is a problem the service providers are informed. But not
all service providers and developer teams react with urgency to the issues
raised. It can even take a week before receiving a first (maybe negative)
reply. The MB agreed that by Mid-January there will be a
re-assessment of the situation looking more to the specific implementations
and installations at the sites. |
|
5.
Summary of the GDB Meeting (document)
– K.Bos
|
|
K.Bos presented
a summary of the discussions and decisions taken at the GDB meeting the week
before. See the document here. 5.1
Introduction
The
introduction at the GDB included a presentation on SLC4 with the transition
path from SLC3. Ph.Charpentier questioned whether the software
certified (and recompiled on SLC4) by the experiments and the Application
Area, will work with the middleware running on SLC4 but in
“SLC3-compatible mode”. I.Bird replied that not all middleware components
can already run natively on SLC4 (e.g. python warnings problems), and that
the SLC3-compatible mode has been tested at GRIF in T.Cass said that CERN will move the lxplus service
to SLC4 on the 15 January 2007 ONLY if the experiments applications are
running. He also stressed that a target date must be fixed in order have the
SLC4 migration as soon as possible on 2007. Otherwise the issues will be
postponed until is too late. L.Robertson suggested that
Ph.Charpentier, T.Cass and M.Schulz clarify (within a week) the issue outside
the MB. They should report unsolved issues and disagreements to the next MB.
K.Bos added that there will be again an update on the SLC4 migration at next
GDB in January. 5.2
Storage Classes discussion
The progress
on the Storage Classes mandate in already described above in the Action List
section. See above. 5.3
Megatable
C.Eck
summarized at the GDB the pre-GDB discussion. It was also decided an action
in order to clarify the networking bandwidth needed. Action: D.Foster will form a group discussing the network setup and
performance needed according to the Megatable values. J.Gordon
mentioned that was agreed that “Tier-1 to Tier-1” traffic will
use the OPN network. “Tier-1 to Tier-2” traffic will be over the
general network. Action: The proposal to 5.4
Security
The grid security management model has
been changed. LCG will no longer have operational responsibility and security
will be managed by the grid infrastructures: for EGEE (R.Wartel) and for OSG
(D.Petravick). 5.5
SAM availability tests
The status of the SAM tests and the
algorithm for considering a site as “available” was discussed.
The sites requested a simpler interface to drill down to the causes and to
the tests that make a site be considered “unavailable”. 5.6
SAM tests in OSG
OSG will develop their own tests,
implementing similar SAM metrics and tests and filling directly the SAM
database. The strategy is to learn about the SAM DB by completely
implementing a test, working with the SAM team at CERN. The GDB will have to approve that the SAM
tests developed on EGEE and OSG are equivalent. 5.7
Pilot Jobs and glexec
The agreement is that the JSPG will
define an Acceptable User Policy (AUP) that will clarify responsibilities and
liabilities for the VOs running pilot jobs. Maybe not all sites will accept
pilot jobs and glexec. There will be special VOMS roles so that sites can
choose which roles can access their WNs and use glexec. I.Bird noted that the sites will have then to
configure their setup themselves. And the site configurations will be
discussed in future TCG meetings. The
implementation of glexec was discussed at the last TCG meeting. E.Laure reported that the “glexec”
feature is available on the JRA1 preview test bed and that experiments should
use it and provide feedback. And is already deployed
and being test at FNAL. C.Grandi added that the preview test bed is also
completed by installations in Ph.Charpentier noted that LHCb is already running
single users pilot jobs on all sites. They are like gLite regular jobs.
J.Templon explained that the issues and the AUP on responsibility and
liability are mainly about multiple users pilot jobs L.Robertson noted that point 6 in
the GDB summary says the JSPG will define the rules about responsibilities
and liabilities between grid infrastructures (EGEE, OSG, etc) and the VOs.
This implies that “when the user policy is accepted by the grids and
experiments then sites will accept to run pilots jobs”. Sites will have
this setup as default, and can only opt out by not allowing some VOMS roles
to use their WNs. The AUP document should be presented to
the GDB by Spring 2007. 5.8
Accounting
J.Gordon presented at the GDB a summary
of the status and plans for accounting. Storage accounting (for disk and tape
storage) is now available and one can select a time-window and a site. It is
currently monitoring the User-level
accounting can now be done and the user DN is encrypted. Each user will be
encrypted for transmission but stored in the DB not encrypted. The
development of APEL and DGAS was discussed and the base solution remains the
GOC DB and sites can use APEL, DGAS or other tools. J.Gordon also added that
APEL works with Update 10 of the gLite CE. J.Gordon will present next week the status and the next
steps in order to move to automatic accounting. |
|
6.
Long Term Planning for the
CERN Computing Centre - T.Cass, L.Robertson
|
|
Postponed to next week. |
|
7.
AOB
|
|
No AOB. |
|
8.
Summary of New Actions
|
|
Action: 19 Dec 2006 -
K.Bos should distribute to the MB list the mandate, and participation, of the
Storage Classes working group. Action: 19 Dec 2006 - The proposal to Action: 15 Jan 2007 - D.Foster will form a group discussing the
network setup and performance needed according to the Megatable values. The full Action List, current and past items, will be in this wiki page before next MB meeting. |