Support XRootD Federation Monitoring Dashboard

This page documents the support of the XRootD Federation Monitoring Dashboard.

XRootD Federation Monitoring Dashboard Overview

XRootD Federation Monitoring Dashboard is composed by a web application, which runs under an Apache server and aims to show different statistics and multiple agents which perform different tasks, such as:

  • Generate Statistics
  • Collect information

The web application is managed through the standard Apache server daemon, which permits to stop / start / restart the web service. On the other hand, the dashboard agents are managed using the dasboard service configurator tool. Below is shown an overview of the available XRootD Federation Monitoring Dashboard agents:

Name Description
XrootdCollector Agent collecting messages sent by XRootD technology to a broker. The messages are stored in a database
Monitor Agents which aim generating statistics about information stored into a database

Note: You can find the log files for each of these components in /opt/dashboard/var/log/#NAME_LOG_FILE#

MSG Brokers

XRootD Federation Monitoring Dashboard uses the MSG brokers provided for IT-GT for getting messages. The brokers store the information sent by the XRootD waiting for being delivered after by XRootD Federation Monitoring Dashboard

Brokers Information

hostname state ActiveMQ version DNS alias Monitoring links
gridmsg007 integration 5.5.1-fuse-01-06
gridmsg107 production 5.5.1-fuse-01-06
gridmsg108 production 5.5.1-fuse-01-06

Topics and queues used in the brokers

The topic names used in any of the above brokers are:

  • xrdpop.fax_popularity
  • xrdpop.uscms_popularity

And the queue names are:

  • Consumer.dashb_atlas_xrootd-dev.xrdpop.uscms_popularity
  • Consumer.dashb_cms_xrootd-dev.xrdpop.uscms_popularity
  • transfer.atlas_xrootd_monitoring_rejected_queue
  • transfer.cms_xrootd_monitoring_rejected_queue

The queues are a link between the collector and topics xrdpop.fax_popularity and xrdpop.uscms_popularity which store the information sent by the XRootD until the collector recovers it. The name of the queue is a concatenation of the word _Consumer.dashb_[atlas or cms]_xrootd-dev + topic name.

On the other hand, the queues called transfer.atlas_xrootd_monitoring_rejected_queue and transfer.cms_xrootd_monitoring_rejected_queue are used for the collector to send those messages that cannot be decoded by this one (this queue should always be empty).

MSG Web Interface

MSG brokers provide a web interface to supervise them in case of incident. The links for each MSG broker are as follow:

Name broker Link access

Note: You have to be register in these brokers

Once inside you will see the topics used for the publishers to send information. When the collector is connected to broker (seen MSG brokers), two queues are created. To see them, go on the Queues link at the top of the page to get the list of queues. Now you should find the queues list above.

To check that everything is working fine, you must check the information about the queues where each field means:

  • Name: queue name.
  • Number Of Pending Messages: represents the number of messages that are stored in the server waiting for being delivered. If a consumer is running smoothly it should be 0.
  • Number Of Consumers: number of the active consumers in the queue. This number should be 1.
  • Messages Enqueued: number of messages sent to topics where you have an active subscription. Those messages have been allocated in a queue for delivery.
  • Messages Dequeued: number of messages that you have received (and acknowledged). They have been subtracted from the queue.

Therefore, the ideal situation for the start and complete messages will be: 1 consumer and enqueued messages ~= dequeued messages.

To understand better follow the next example.

Name Number Of Pending Messages Number Of Consumers Messages Enqueued Messages Dequeued
Consumer.dashb_atlas_xrootd-dev.xrdpop.uscms_popularity 0 1 29770 29770
Consumer.dashb_cms_xrootd-dev.xrdpop.uscms_popularity 0 1 29730 29730

The best situation is when enqueued messages = dequeued messages what means that all messages sent to the topic were received by the consumer and successfully inserted into the database.


Current Status (29/10/2012)

Dashboard agents

To see the agent status, you can type in the terminal the follow command (see Example “Service Operation” ):

[dboard@dashboard71 ~]$ dashb-agent-list
SERVICE GROUP             STATUS     SERVICES                 
transfer.stress.test      STOPPED    'stress.test',           
transfer.mock.producer    STOPPED    'transfer.mock.producer',
transfer.republisher      STOPPED    'fts_monitoring_start', 'fts_monitoring_complete',
transfer.collector        STARTED    'transfer.collector1', 'transfer.collector2',
transfer.monitor          STARTED    'computeStats', 'aggregateStats', 'computeErrorSummaries', 'aggregateErrorSummaries', 'deleteOldRecords',        STARTED    'curlVOFeedCMS', 'curlVOFeedAtlas', 'curlVOFeedLhcb', 'curlVOFeedAlice', 'curlTopologyWLCG',

Log files

Each dashobard agent shown privously has a log file placed in /opt/dashboard/var/log/#SERVICE_GROUP_NAME#

Therefore, if you want to see the log file from transfer.collector you should type:

[dboard@dashboard71 ~]$ tail -f /opt/dashboard/var/log/transfer.collector
2012-04-18 10:48:22,563 - CollectMessages:255 - INFO - No Messages
2012-04-18 10:48:22,682 - CollectMessages:255 - INFO - No Messages
2012-04-18 10:48:23,068 - CollectMessages:245 - INFO - Recieved and tried to insert 2 messages, 2 successfully and 0 failed 
2012-04-18 10:48:23,168 - CollectMessages:255 - INFO - No Messages
2012-04-18 10:48:23,194 - CollectMessages:245 - INFO - Recieved and tried to insert 3 messages, 3 successfully and 0 failed 
2012-04-18 10:48:23,294 - CollectMessages:255 - INFO - No Messages
2012-04-18 10:48:23,668 - CollectMessages:255 - INFO - No Messages


To restart the collectors type the follow command:

[dboard@dashboard71 ~]$ dashb-agent-restart transfer.collectors
[dboard@dashboard71 ~]$

To check that the log file is updated type:

tail -f /opt/dashboard/var/log/transfer.collector

The activity of the collectors can be seen trough the below link http:// which shows different statistics about the insertion rate.

Database Agents

Database agents are jobs which are running inside of the database. To see details about them you will need to use sqlplus through a previous connection with lxplus machine.

Lets see an example:

[ddarias] /home/ddieguez > ssh lxplus

[lxplus420] /afs/ > sqlplus

SQL*Plus: Release - Production on Thu May 3 11:36:43 2012

Copyright (c) 1982, 2010, Oracle.  All Rights Reserved.

Enter user-name: lcg_dashboard_tfr_r@lcgr    
Enter password: (see Note)
Connected to:
Oracle Database 11g Enterprise Edition Release - 64bit Production
With the Partitioning, Real Application Clusters and Real Application Testing options

To know what the password is, please have a look to the database configuration file (dashboard-dao.cfg) under directory /opt/dashboard/etc/dashboard-dao/ in one the production machines seen before.


[dboard@dashboard71 ~]$ vi /opt/dashboard/etc/dashboard-dao/dashboard-dao.cfg

Once inside of sqlplus, you will have to type this command

SQL> set serveroutput on

After doing that, you will be able to see all details about jobs executing the following procedure as shown below:


And then you will see all detail jobs.

Row: 1
  job_name: SERVER_UPDATE
  repeat_interval: FREQ = MINUTELY; INTERVAL = 20
  enabled: TRUE
  run_count: 63
  failure_count: 0
  last_start_date: 03-MAY-12 AM EUROPE/ZURICH
  last_run_duration: +000000000 00:00:00.108444
  next_run_date: 03-MAY-12 PM EUROPE/ZURICH

Row: 2
  job_name: VO_UPDATE
  repeat_interval: FREQ = MINUTELY; INTERVAL = 20
  enabled: TRUE
  run_count: 63
  failure_count: 0
  last_start_date: 03-MAY-12 AM EUROPE/ZURICH
  last_run_duration: +000000000 00:00:00.051942
  next_run_date: 03-MAY-12 PM EUROPE/ZURICH


Row: 8
  job_name: MSG_ALARM_CHECK
  repeat_interval: FREQ = MINUTELY; INTERVAL = 20
  enabled: TRUE
  run_count: 59
  failure_count: 0
  last_start_date: 03-MAY-12 AM EUROPE/ZURICH
  last_run_duration: +000000000 00:00:08.754900
  next_run_date: 03-MAY-12 PM EUROPE/ZURICH

Total number of jobs: 8


To restart the monitors type the follow commands from lxplus or sqldeveloper:



-- DanielDieguez - 29-Oct-2012

Edit | Attach | Watch | Print version | History: r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r1 - 2012-10-29 - unknown
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback