TWiki> LCG Web>WLCGMonitoringConsolidation (revision 32)EditAttachPDF

Goal of the project

The goal of the project is the consolidation of the WLCG monitoring, including a critical analysis of what is monitored, the technology used and the deployment and support model. This should allow to:

  • reduce the complexity of the system
  • ensure simplified and more effective operations, support and service management
  • encourage an efficient deployment strategy, with a common development process
  • unify, where possible, the implementation of the monitoring components
Wherever reasonable, the effort should be aligned with the activities of the Agile Infrastructure Monitoring team at CERN.

The main objective of the project is to get to the point where the effort required for WLCG monitoring can be reduced to half of it's current level.

Tentative timetable

The first stage of the project (3 months) should include:

  • Review of the current systems and metrics used for WLCG monitoring
  • Collecting and summarizing requirements from the LHC computing community to understand the changed needs in monitoring.
  • Propose a revised strategy and architecture for WLCG monitoring
  • Where possible, suggest implementation technologies and approaches for the transitions
  • Highlight areas that are expected to be problematic

Second stage: implementation of the new monitoring framework and components needed for the transition to the new toolchain.

A first prototype should be ready by the end of 2013. All monitoring should have transitioned to the new approach by summer 2014. There is a dedicated jira tracker to follow this task

Member list

Mailing list: wlcg-mon-consolidation

  • Pablo -- project leader
  • Edward
  • Julia
  • Lionel
  • Luca
  • Marian
  • AI monitoring representative (Pedro)
  • Experiment representatives:
    • ALICE (Costin, Latchezar, Maarten)
    • ATLAS (Alessandra (with Simone and Alessandro as consultants) )
    • CMS (Andrea, Nicolo, Stefano)
    • LHCb (Stefan)
  • Operations representative (Maria, Pepe)

Some tasks of the project ( not a complete list)

  1. Requirements:
    1. Understanding requirements of the experiments regarding remote testing of the distributed sites and services
    2. Review with the experiments existing Dashboard applications, define priorities and agree on the further development and support model for applications which would require development effort for customization
  2. Understanding needs of the sites, famous question 'I like to see at a single display how my site is working for all LHC VOs'
  3. Review of the test submission part. Should we continue to have several ways for test submission: one for stress testing, another one for regular tests (Hammer Cloud/Nagios/JobPilot )?
  4. Overall architecture (strongly depends on previous point)
  5. Common framework
    1. Common implementation of the messaging interfaces
  6. Single web UI to WLCG monitoring info (not rewriting from scratch, based on myWLCG, and improving/extending functionality and adapting implementation to the design principles of the new framework).

Various technical topics

Architecture reviews

Work on simplified version of NCG

Status

Prototype status

Meetings

Edit | Attach | Watch | Print version | History: r34 < r33 < r32 < r31 < r30 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r32 - 2014-08-18 - PabloSaiz
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback