Siteview Sandbox.GridMap: A New Monitoring System for Grid Services at the Sites

Motivation of this activity

The CCRC08 experience was a very valuable benchmark for testing all Grid activities related to LHC experiments (link to CCRC08Workshop). In particular, it gave the opportunity to test the monitoring infrastructure and to evaluate its functionality both from the experiments and the sites point of view. Monitoring was the key service to measure whether the performance of the service was good or bad and to detect problems and efficiently fix them. The outcome of the test is the following:

Site administrators would like to be able to:

  • Compare the experiment's view of the site contribution to the information they get from their own monitoring systems
  • Understand if their site is contributing to the VO activity as expected

Some problems they found:

  • The main monitoring tools during this exercise were experiment specific tools. They proved to work well, but they are not straightforward to use for a person external to the experiment. Furthermore, they are different for every experiment
  • An additional requirement from sites is to have a definition of the targets from the experiments, otherwise it's not possible to understand whether the activity going on at their site meets the VO expectations or not

The objective of this project is:

A new tool which should:

  • Provide an overall view of all the activities going on at the site, from one unique console. This should be a tool easy to use, also for persons external to the VO, and which does not require a particular knowledge of each experiment
  • Provide an overall view of the status of the activities, as it is evaluated by the VO, and allow fast and efficient detection of problems
  • For every activity and VO provide links to the source of information (VO specific monitoring system), so the problem can be investigated in an efficient way

Proposal for a new monitoring tool

Siteview Sandbox.GridMap is a high level monitoring tool which, from one unique console, offers an overall view of the computing activities of the LHC experiments at the site. This is a high level tool which extracts data from the VO specific monitoring tools (Dasboard, Phedex, Monalisa, Dirac) and displays them in a uniform and simplified way in a common web interface using Gridmap technology.
The objects to monitor are the main VO activities at the site, as job processing activities and data transfers, as well as a general site status evaluation from the VO perspective.

Information flow and architecture

The information sources are the different monitoring tools used by each experiment: DIRAC for LHCb, Monalisa and Dashboard for ALICE, Dashboard for ATLAS, Dashboard for job processing and Phedex for data transfer for CMS,

Once the metrics have been extracted from the sources, they are published in some URLs.
A Dashboard collector periodically reads them from the URLs and stores them in a common database.
The values are then displayed in a Gridmap.

The fact that the metrics of all 4 experiments are stored in the same schema allows to display in the same plot results coming from different experiments. No new data are generated! The same data existing in the VO specific monitoring tools are presented in a different format, in parallel for every VO.

The database schema: the same schema should contain the data of the 4 experiment.

The gridmap: the information stored in the database will be displayed using the gridmap technology. A gridmap is being developed and can already display the data stored in the schema.

Activities and metrics to monitor

The first proposal for the list of metrics to monitor has been figured out on the basis of the feedback given by sites after the CCRC08. The list will be updated according to further requirements.

This is the format that should be used to provide the data.

A summary of the metrics currently available.

Current status of the activity

This a page which summarizes the current status of the activity.

Open questions

  • How to define the targets for each activity?

  • About the data transfer: is it worth storing in our shared database all the data channel by channel (this can be a lot of data...). Some considerations here.

  • Can the experiments provide information about the different sub activities (MC production, user analysis etc...) or do they just publish information about the main activities (job processing)? See more..

  • Metrics for Alice activities: job processing metrics are provided, together with the pledged values. Also the overall site status is provided. Still missing data about data transfer.

  • Metrics for Atlas activities: job processing and data transfer metrics are obtained through an API. The script calling the API is imported inside my collector. Also the status is computed.

  • Metrics for CMS activities: job processing metrics are provided, together with the pledged values for the number of jobs

  • Metrics for LHCb activities: metrics about data transfer and job processing activities are provided. Still missing the status of data transfers.

Feedback from users

Feedback from users here




User Guide

Related links

About Gridmap

Links to the existing CCRC08 servicemap showing experiment specific SLS and SAM data for critical services (only the map on the right!).

And link to the original CERN Sandbox.GridMap, showing the SAM and GridView data for reference.

Gridmap for the view of all the sites for a given VO (here CMS). Under development.

SAM visualization

for LHCb

and for Atlas

-- ElisaLanciotti - 11 Jul 2008

Topic attachments
I Attachment History Action Size Date Who Comment
PDFpdf siteview_elisa_grenoble.pdf r1 manage 1856.8 K 2008-11-28 - 11:59 ElisaLanciotti Talk given at the Reunion LCG-France - Nov 28th 2008
Edit | Attach | Watch | Print version | History: r39 < r38 < r37 < r36 < r35 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r39 - 2020-08-21 - TWikiAdminUser
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Sandbox/SandboxArchive All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback