Project finished!
This project finished, and it was closed on the 5th of December 2014!
Thank you very much for your participation
Goal of the project
The goal of the project is the consolidation of the WLCG monitoring, including a critical analysis of what is monitored, the technology used and the deployment and support model. This should allow to:
- reduce the complexity of the system
- ensure simplified and more effective operations, support and service management
- encourage an efficient deployment strategy, with a common development process
- unify, where possible, the implementation of the monitoring components
Wherever reasonable, the effort should be aligned with the activities of the Agile Infrastructure Monitoring team at CERN.
The main objective of the project is to get to the point where the effort required for WLCG monitoring can be reduced to half of it's current level.
Tentative timetable
The first stage of the project (3 months) should include:
- Review of the current systems and metrics used for WLCG monitoring
- Collecting and summarizing requirements from the LHC computing community to understand the changed needs in monitoring.
- Propose a revised strategy and architecture for WLCG monitoring
- Where possible, suggest implementation technologies and approaches for the transitions
- Highlight areas that are expected to be problematic
Second stage: implementation of the new monitoring framework and components needed for the transition to the new toolchain.
A first prototype should be ready by the end of 2013. All monitoring should have transitioned to the new approach by summer 2014.
There is a
dedicated jira tracker
to follow this task
Member list
Mailing list: wlcg-mon-consolidation
- Pablo -- project leader
- Edward
- Julia
- Lionel
- Luca
- Marian
- AI monitoring representative (Pedro)
- Experiment representatives:
- ALICE (Costin, Latchezar, Maarten)
- ATLAS (Alessandra (with Simone and Alessandro as consultants) )
- CMS (Andrea, Nicolo, Stefano)
- LHCb (Stefan)
- Operations representative (Maria, Pepe)
Some tasks of the project ( not a complete list)
- Requirements:
- Understanding requirements of the experiments regarding remote testing of the distributed sites and services
- Review with the experiments existing Dashboard applications, define priorities and agree on the further development and support model for applications which would require development effort for customization
- Understanding needs of the sites, famous question 'I like to see at a single display how my site is working for all LHC VOs'
- Review of the test submission part. Should we continue to have several ways for test submission: one for stress testing, another one for regular tests (Hammer Cloud/Nagios/JobPilot )?
- Overall architecture (strongly depends on previous point)
- Common framework
- Common implementation of the messaging interfaces
- Single web UI to WLCG monitoring info (not rewriting from scratch, based on myWLCG, and improving/extending functionality and adapting implementation to the design principles of the new framework).
Various technical topics
Architecture reviews
Status
Prototype status
Meetings