-- JamieShiers - 26 Sep 2007

WLCG Service Reliability Workshop, November 26 - 30, IT Amphitheatre

Please use this page to make your suggestions for topics to be covered at this workshop on WLCG Service Reliability


The intention is not to have a packed agenda back-to-back with presentations, but also to allow time for discussions and problem solving.

The outcome of the workshop should be a clear understanding of what needs to be done, by whom and to which specific services to address any significant holes in the overall service. The priorities should be given by the experiments' Critical Services lists and be consistent with what is possible (at the Tier0 and other sites), e.g. in terms of on-call services, expert call out and so forth.

Experiment visit

A visit to one of the experiments - hopefully CMS this time - will still be possible if we book early enough in advance.


A visit to the Rajpoute may also be required.

Critical services - Requirements

  • The agenda for this day is already pretty full.

WLCG Operations - What is Required to support LHC experiments?

  • Interaction between experiment- site- WLCG- EGEE- OSG- operations

  • Operations cook-book

  • Cross-site problem resolution

  • Handling of VO-boxes and other experiment services

  • FTS service debugging tools

  • Review of (main) Tier1 services (experiment critical services view) against 'Victoria checklist' (to be discussed at October GDB - need to include also issues like: is this service monitored? (basic levels: no contact, high load, (fan count wrong), ..))

  • Any DB-related issues to be covered in the morning

Monitoring - What is Required to run Reliable Services?

  • Any DB-related issues to be covered in the morning

Robust Services - Middleware Developers' Techniques & Tips

  • Any DB-related issues to be covered in the morning

  • LFC experience

  • FTS experience

  • CASTOR / SRM experience

  • Experience with experiment services (e.g. those used in Monday's Case Studies)

  • Automatic configuration with minimum manual input for "standard installations".

  • Site validation unit and functional tests as part of the middleware standard distribution (this forces the developers to think about installation and configuration issues).

  • Real, certified service info providers

DB Applications - Performance and Reliability by Design

  • Tom Kyte can't come but has been asked to suggest someone else who can
Edit | Attach | Watch | Print version | History: r5 < r4 < r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r4 - 2007-09-27 - JamieShiers
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback