JRA1.5 - Definition and Implementation of the Infrastructure Area Work Plan - Task Force Service Monitoring

Mandate and expected results

  • Understand and document the requirements for service monitoring
  • Identify where new features are required in the EMI software stack to meet these requirements.
  • Produce a work plan, with time-line, for adding these new features.

Guidelines / Hints

  • Contact EGI to obtain the requirements (Nagios, etc.)
    • Contact established and information available about Nagios probes and efforts
    • EGI presentation at Prague AHM and then discuss how EMI proceeds
  • Investigate, off-the-self or existing solution that will meet this requirements.
  • Hand-over discussions with EGI planned end of the week of 2011-02-07

EMI Key Objectives in Context

  • Short-term: Nagios Probes
  • Long-term: Investigations of service monitoring in EMI (to be refined)

Information about NAGIOS Probes from EGI (Emir)

  • During the preparation phase of EMI and EGI it was clearly agreed that the development of Nagios probes, these requiring expertise about a specific software component, would be a responsibility of EMI. EGI would be responsible of the development and maintenance of the rest of the framework needed to collect and display monitoring results. In addition I'm quoting relevant section of EMI DoW: "EMI will investigate and adapt off-the-shelf solutions and develop sensors to be plugged in industry standard monitoring tools, such as Nagios and standard CIM-based tools."

  • During the EGEE org.sam probes were developed and maintained by the SAM team. Funding didn't disappear, the responsibility was shifted to the service developers, i.e. EMI. Regarding the developer leaving, we were simply lucky that the person was around in the first 8 months of EGI project so he practically volunteered to maintain these probes.
    • However, this was never suppose to be a long term solution as this is not in our workplans.

  • Effort estimations: During the EGEE, SAM team allocated 1 FTE for development of probes. However, this was basically for developing probes for all services (CE, SRM, WMS) from the scratch. As the probes now exist effort should be smaller as it will cover maintenance and changes which will be caused by changes in monitored services.

List of NAGIOS Probes from EGI (Emir)

  • There are more probes in the SVN repository:
    • https://www.sysadmin.hep.ac.uk/svn/grid-monitoring/trunk/probe
    • Some of them are internal to Nagios and maintained by SAM team and some are maintained by VOs themselves.
    • We are still working on the full list of probes and metrics that test for all gLite services.
    • For the time being the ones described on the link I provided are included in production and the most mportant ones.

Task Force Leader

  • None - perhaps task force transformation into work of numerous PTs...pending discussions

Task Force Members

  • TBD: Ask for members on infrastructure list (Laurence)
  • Each middleware consortia to nominate one person?

Task Force Evaluators

  • Project Technical Board (PTB)

Open Issues

  • Man power, reported to PTB/PEB already

Tracking

  • 2010-09-01 - Laurence and Morris: Task Force created
  • 2010-11-09 - Balazs : Updates from EGI
  • 2010-11-09 - Balazs : Task force approved by PTB - search for leader taken by Laurence
  • 2011-01-27 - Morris: Receives information from EGI about Nagios Probes for EMI, further discussion in PEB planned
  • 2011-01-31 - PEB: Decision that we support the Probes weg get from EGI

-- LaurenceField - 01-Sep-2010


This topic: EMI > WebHome > EmiProjectStructure > JRA1 > EmiJra1 > EmiJra1T5Infrastructure > EmiJra1T5TaskForceServiceMonitoring
Topic revision: r7 - 2011-02-14 - BalazsKonya
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback