--
JamieShiers - 26 Sep 2007
WLCG Service Reliability Workshop, November 26 - 30, IT Amphitheatre
Please use this page to make your suggestions for topics to be covered at this workshop on WLCG Service Reliability
General
The intention is not to have a packed agenda back-to-back with presentations, but also to allow time for discussions and problem solving.
The outcome of the workshop should be a clear understanding of what needs to be done, by whom and to which specific services to address any significant holes in the overall service. The priorities should be given by the experiments' Critical Services lists and be consistent with what is possible (at the Tier0 and other sites), e.g. in terms of on-call services, expert call out and so forth.
Experiment visit
A visit to one of the experiments - hopefully CMS this time - will still be possible if we book early enough in advance.
Lunch
A visit to the
Rajpoute
may also be required.
Critical services - Requirements
- The agenda
for this day is already pretty full.
WLCG Operations - What is Required to support LHC experiments?
- Cross-site problem resolution
- Handling of VO-boxes and other experiment services
- FTS service debugging tools
- Review of (main) Tier1 services (experiment critical services view) against 'Victoria checklist' (to be discussed at October GDB - need to include also issues like: is this service monitored? (basic levels: no contact, high load, (fan count wrong), ..))
- Any DB-related issues to be covered in the morning
Monitoring - What is Required to run Reliable Services?
- Any DB-related issues to be covered in the morning
Robust Services - Middleware Developers' Techniques & Tips
- Any DB-related issues to be covered in the morning
- Experience with experiment services (e.g. those used in Monday's Case Studies)
- Automatic configuration with minimum manual input for "standard installations".
- Site validation unit and functional tests as part of the middleware standard distribution (this forces the developers to think about installation and configuration issues).
- Real, certified service info providers
DB Applications - Performance and Reliability by Design
- Tom Kyte can't come but has been asked to suggest someone else who can