LCG Management Board |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||
Date/Time: |
Tuesday
11 March 16:00-17:00 |
||||||||||||||||||||||||||||||||||||||||||||||||||||||
Agenda: |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||
Members: |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
(Version 1 - 12.3.2008) |
||||||||||||||||||||||||||||||||||||||||||||||||||||||
Participants:
|
A.Aimar
(notes), I.Bird (chair), Ph.Charpentier, L.Dell’Agnello, M.Ernst, I.Fisk, J.Gordon,
C.Grandi, F.Hernandez, M.Kasemann, M.Litmaath, U.Marconi, H.Marten, P.Mato,
G.Merino, A.Pace, B.Panzer, R.Pordes, Di Qing, H.Renshall, M.Schulz,
O.Smirnova, R.Tafirout, J.Templon |
||||||||||||||||||||||||||||||||||||||||||||||||||||||
Action
List |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||
Mailing
List Archive: |
https://mmm.cern.ch/public/archive-list/w/worldwide-lcg-management-board/ |
||||||||||||||||||||||||||||||||||||||||||||||||||||||
Next Meeting: |
Tuesday
18 March 2008 16:00-17:00 – Phone Meeting |
||||||||||||||||||||||||||||||||||||||||||||||||||||||
1. Minutes and Matters arising (Minutes) |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||
1.1 Minutes of Previous Meeting
The
minutes of the previous MB meeting were approved. Missing in the previous
MB minutes: A clean-up and update
to the SAM tests was proposed by Schulz and should be done in the next few
weeks. 1.2 Tape Efficiency MetricsNot
all sites can produce the metrics proposed. Therefore they should provide
alternative metrics suitable to measure and report about the performance of
their MSS. M.Ernst commented that from HPSS
is not easy to see the metrics because they have one single HPSS cache and
cannot see the metrics for multiple writes. It is not obtainable from HPSS
but this should not be an issue for BNL. I.Bird suggested that the sites
using HPSS could skip the metrics that are not relevant and propose their
alternatives. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||
2. Action List Review (List of actions)Actions that are late are highlighted in RED. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||
- 26 Feb 2008 - The Sites and Experiments should confirm to A.Aimar that they have updated the list of their contacts (correct emails, grid operators’ phones, etc). Here is the current contact information: https://twiki.cern.ch/twiki/bin/view/LCG/TierOneContactDetails Information confirmed
only by: -
29 Feb 2008 - A.Aimar will verify with the GridView
team the possibility to recalculate the
values for BNL. Not done. Asked to GridView but it still needs
to be implemented. -
29 Feb 2008 - A.Aimar will verify why the reliability
values for the Tier-2 sites seems incorrect (being lower than
availability). On the way. Being verified. -
18 Mar
2008 - Sites should propose
new tape efficiency metrics that they can implement, in case
they cannot provide the metrics proposed. Will be verified next
week. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||
3. CCRC08 Update (Slides) – H.Renshall
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||
H.Renshall presented the weekly update on CCRC08. CCR08 is now in the
phase 1.5 of CCRC'08 (i.e. between the February phase 1 run and phase 2 in
May). There are no formally coordinated activities or metrics in this phase
as yet. Currently it involves individual Experiments doing functionality,
throughput and stress testing of their computing model components and sites. ALICE continue to exercise their data export reaching up to 300 MB/sec to their 5 major Tier1 sites, well above the required 60 MB/sec for p-p running. They plan to add RAL in the near future (this or next week).
ATLAS performed their M6 detector cosmics run including data storage at
Tier 0. 40TB of data was stored on tape between Friday and Monday and
calibration streams were sent to the planned 4 Tier 2 sites of Naples, Rome,
Munich and Michigan. This week they are functionality testing Tier-1 to Tier-1 data movement to
PIC. Will then distribute 20 datasets of 100 files each to other Tier-1 for
multiple Tier-1 to PIC tests next week. Revolve this round all Tier-1 over
next 2 months. Intensive MC production for CCRC’08 phase 2 continues. G.Merino warned that next week PIC
will have a long electrical scheduled shutdown. The activities for ATLAS will
not be possible and they should be aware of this. ATLAS has also
proposed a draft plan for March and April. See slide 5. CMS continue preparations for the May run and a prior CMS global run in March. Reprocessing is going well with some site issues (dCache slow at FNAL and IN2P3 had disk pools problem). In Tier-1 to Tier-1 commissioning only the RAL to ASGC pair is missing. Their Tier 0 operations suffered from a one-week long instability in LSF which had hit a performance-related bug/feature in synchronising to the failover server. Automatic failover is currently disabled while this is investigated but CMS are suggesting a separate LSF instance for their Tier 0 operations. M.Kasemann
added that the agreement is that for next 4 weeks they will try not to set up
a new gateway for local submission. They try to use the single common
gateway. If after 4 weeks this solution shows to be inadequate it will then
be changed for May’s CCRC. LHCb are preparing the workflow for their stripping jobs to be part of the 4 weeks steady running at nominal rate in May. They are setting up to evaluate and act on CPU time remaining for their grid jobs and setting up SRMv2 endpoints to go into their SAM test suite. Future meetings coming up are the following: -
Next
CCRC’08 Face-to-Face Tuesday 1st April: Site focused session in the morning then Experiment and Service
focused session in the afternoon. -
21st
– 25th April WLCG Collaboration Workshop (Tier0/1/2) in CERN main
auditorium: Possible themes: WLCG Service Reliability: focus on Tier2s and progress since November 2007 workshop (1 day?) CCRC'08 & Full Dress Rehearsals - status and plans (2 days?) Operations track (2 days, parallel) Analysis track
(2 days, parallel) -
12th
– 13th June CCRC’08 Post Mortem: J.Templon
commented that the Site-focused session needs the presence of the Experiments.
The session is focused on presenting how the sites work and how to use their
resources at best. Therefore to be a useful session the Experiments’ presence
is really necessary. J.Templon
noted that all the data that ALIDE is sending to SARA is all made of “0”
data, which is either a mistake or data that is not real ALICE raw data but
just testing the transfer |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||
1. Update of the HL Milestones (HLM 11.03.2008) |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||
The MB verified all due milestones in the High Level Milestones dashboard HLM 11.03.2008. Here is the dashboard Updated Dashboard HLM 15.3.2008, after the discussion below:
WLCG-07-01 - FZK: H.Marten confirmed that will be ready for end of March 2008. WLCG-07-02 - ASGC: Done. - CNAF: 24x7 support is provided for all critical services but not for the non-critical ones. The timescale will be by and of April. - PIC: Done. - RAL: Not represented at the meeting. WLCG-07-03 - ASGC: Done. - NDGF: Done - SARA: In the process of changing to a single SARA/NIKHEF help desk. It will be ready by end of April.
WLCG
-07-04/05/06 - ASGC: No SLA defined yet. Will be done by March. - IN2P3: No defined yet. After the proposal the steps should be quick because the procedures are actually already in place. - CERN: The VOBox SLA is being discussed actively with the experiments so we are on the way to turning this green. -
FZK: Only the agreement from CMS is
missing. - NDGF: For ATLAS there are no VO Boxes to run at the sites. For ALICE they have 7 VO Boxes and the SLA is being prepared. - PIC: For LHCb is done. ATLAS no VOBoxes. For CMS an SLA is being proposed, similar to the one of LHCb, should be done by end March. - SARA: Document is ready but not implemented. Will be checked in the next two weeks.
WLCG
-07-08 - CERN: the uploading of accounting data is now probably OK, but we are seeing discrepancies of around 10% between the APEL and local accounting, most probably due to the single normalisation factor used by APEL.
WLCG
-07-17 -
IN2P3: The supplied CPU material was
not adequate and had to be sent back to the supplier. Hopefully will ready
for May. - CERN: Also have delivery problems and a supplier had to be replaced. Will be ready for end of April. - CNAF: Will have the CPUs by mid-May, Storage beginning of May. Due to administrative issues. - NDGF: CPU will be there by April, And Storage by September. - PIC: Will have CPU by and of April and Storage by June 2008.
WLCG
-07-19 - The tests are being done in CCC08. And some have been done recently.
Now CASTOR is 2.1.6 instead. But was tested only at CERN by ATLAS and CMS.
WLCG-07-28b: Done
WLCG-07-39: This is not complete yet and need to be reviewed in the next few weeks.
WLCG-07-40 - CMS: They ramped-up the resources but will do stress tests in May. - LHCb: Means running analysis at CERN and will be done in May 2008
WLCG-08-01 - OSG: There will be a VDT release with the SE tests. The Availability and Reliability calculations are progressing with the SAM team. J.Templon
added that it seems that the issues are minor and will be verified in April
when D.Collados is back. Action: Verify whether there are issues with NDGF SAM tests. Some
comments from J.Templon and D.Collados were not replied by M.Eller. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||
2. Multi-Users Pilot Jobs Working Group (Slides) - M.Litmaath |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||
The Pilot Jobs Frameworks working group, launched by the GDB, was mandated by WLCG MB on Jan. 22, 2008. Its mission is to: -
Review security issues in the
pilot job framework of each experiment. - Define a minimum set of security requirements - Advise on improvements - Use of a common library or tool set for proxy management, but seems unlikely. -
Report to GDB and MB in a
time frame is a few months The members of the working group are: - ALICE: Predrag Buncic - ATLAS: Torre Wenaus - CMS: Igor Sfiligoi - LHCb: Andrei Tsaregorodtsev - EGEE: David Groep - FNAL: Eileen Berman -
OSG: Mine Altunay -
WLCG: Maarten Litmaath
(chair) There were 3 phone conferences held, and a 4th call at the end of March (Friday March 28). The discussion progresses mostly via the mailing list. Each experiment is to provide a document
about their system -
LHCb were the first and the next version
will incorporate feedback from discussions so far. They had it already before
the meeting and have set the tone about the quality and content of the
document. -
CMS provided a first version
last week -
ALICE and ATLAS needed more
time and have not provided any document yet. A security questionnaire is being
discussed -
Currently at v0.4 -
Agreement on the
relevance/scope of a question is not always evident -
Each document should provide
the Experiment’s answers in an annexe. Some experiments do not agree on some questions. - E.g. How user tokens are used by the proxies, from submission until the job is started on the WN. What happens if the job crashes and how the clean-up is done? - The Experiments replied these requirements are not asked to the general gLite components (e.g. WMS has a lot of proxies). M.Schulz
noted that the TCG has launched the security verification of LFC and other
components. WMS will be reviewed as soon as the next version is out. M.Litmaath
added that the fact that some gLite components have not been already verified
is not a good reason not to verify the security of the VO’s frameworks. I.Bird
proposed that VOs that provide the document and the questionnaire and pass
the security check are allowed to use their framework. While those not
passing or not providing the information should be on hold until they do so. M.Litmaath
replied that it is possible to configure gLEexec to allow only some groups or
users but this will require configuration at each single site. In practice is
very difficult to achieve. I.Bird
replied that the PJF working group should report on each VO separately and
then the GDB and MB could decide what to do. J.Gordon
added that in the future other applications (Bio-Med, etc) should go through
the same verifications (but this is not a WLCG matter). Ph.Charpentier
noted that the sites should have already gLEexec installed so that when the
solutions are approved the sites are ready. J.Templon
added that gLEexec can be configured to the sites only if they define all the
details and the temporary space can be created in different way at the sites
(e.g. job’s subdirectory, permissions to protect the proxy files, etc). Ph.Charpentier
replied that guidance is needed from the gLEexec experts. M.Litmaath
added that all these issues are covered by the discussions that are taking
place in the working group. I.Bird
proposed that a general recommendations document on security should be
provided even before the frameworks are all certified. Ph.Charpentier
proposed that one site could provide an example installation so that all VOs
can test the environment and the configuration in depth before its deployed
elsewhere. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||
No AOB. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||
3. Summary of New Actions |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||
The full Action List, current and past items, will be in this wiki page before next MB meeting. |