Real Time Monitor

Introduction

The Real Time Monitor (RTM) presents of real time view of the WLCG Grid in operation. It is based on information retrieved from the Grid information system and the Logging and Bookkeeping (LB) System. Sites which are participating in the infrastructure are found by querying the information system. The Site entry is used to obtain information such as the name, location, etc. for the site. The LB servers are found by querying the information system for the LB service entry. The RTM then connents to the LB server database to find the job state information. The site information is used to plot the sites on the map and the job state information is used to show where the jobs are running etc. More details on the RTM can be found on the RTM page.

Publishing a Site

In order to show up in the RTM the site needs to be published in the information system. Instructions on how to publish the site entry correctly can be found here.

Publishing the RB server

The RB servers are found by looking for its service entry in the information system. An example entry is shown below. The information provider gilte-info-server can also be used to publish this information. This is required so that the RB can be plotted on the RTM and associated with a site.

dn: GlueServiceUniqueID=hostname:7772,Mds-Vo-name=sitename, Mds-Vo-name=local,o=grid
objectClass: GlueTop
objectClass: GlueService
objectClass: GlueKey
objectClass: GlueSchemaVersion
GlueServiceUniqueID: hostname:7772
GlueServiceName: sitename-rb
GlueServiceType: ResourceBroker
GlueServiceVersion: 1.2.0
GlueServiceEndpoint: hostname:7772
GlueServiceURI: unset
GlueServiceAccessPointURL: not_used
GlueServiceStatus: OK
GlueServiceStatusInfo: No Problems
GlueServiceWSDL: unset
GlueServiceSemantics: unset
GlueServiceStartTime: 1970-01-01T00:00:00Z
GlueServiceOwner: VO1
GlueServiceOwner: VO2
GlueServiceAccessControlRule: VO1
GlueServiceAccessControlRule: VO2
GlueServiceAccessControlRule: infngrid
GlueForeignKey: GlueSiteUniqueID=sitename
GlueSchemaVersionMajor: 1
GlueSchemaVersionMinor: 2

Publishing the LB server

Note: We will not need this step as we will hard code the hostnames of the databases in custom update script.

The LB servers are found by looking for its service entry in the information system. An example entry is shown below. The information provider gilte-info-server can also be used to publish this information.

dn: GlueServiceUniqueID=https://hostname:9003/lb,Mds-Vo-name=sitename,Mds-Vo-name=local,o=grid
objectClass: GlueTop
objectClass: GlueService
objectClass: GlueKey
objectClass: GlueSchemaVersion
GlueServiceUniqueID: https://hostname:9003/lb
GlueServiceName: sitename-org.glite.lb.Server
GlueServiceType: org.glite.lb.Server
GlueServiceVersion: 1.6.2
GlueServiceEndpoint: https://hostname:9003/lb
GlueServiceURI: unset
GlueServiceAccessPointURL: https://hostname:9003/lb
GlueServiceStatus: OK
GlueServiceStatusInfo: No Problems
GlueServiceWSDL: unset
GlueServiceSemantics: unset
GlueServiceStartTime: 1970-01-01T00:00:00Z
GlueServiceOwner: VO1
GlueServiceOwner: VO2
GlueServiceAccessControlRule: VO1
GlueServiceAccessControlRule: VO2
GlueForeignKey: GlueSiteUniqueID=sitename
GlueSchemaVersionMajor: 1
GlueSchemaVersionMinor: 3

Job Status Information

Information about the job status is found by querying the LB database directly. For the purpose of interoperability with other infrastructures, it is not necessary to provide an LB server but just to provide the information in the database. This will require setting up a MySQL database. For the sake of convenience we will use the RTM data representation rather than the LB Servers.


mysql -u root -p
<enter your mysql password>

CREATE DATABASE RTM;
use RTM;

CREATE TABLE jobs( jobid varchar(128) NOT NULL, rb varchar(128), ui varchar(128), vo varchar(128), ce varchar(128), queue varchar(128), registered timestamp, state varchar(128), state_entered timestamp, PRIMARY KEY  (`jobid`)

);

CREATE INDEX ce_index ON jobs (ce);
CREATE INDEX rb_index ON jobs (rb);
CREATE INDEX registered_index ON jobs (registered);

GRANT SELECT ON RTM.jobs TO 'gridrtm'@'tl00.hep.ph.ic.ac.uk' IDENTIFIED BY 'password' ;

Ensure any firewalls are not blocking the IP address tl00.hep.ph.ic.ac.uk

Adding Information to the Database

INSERT INTO jobs( jobid, rb, ui, vo, ce, queue, registered timestamp, state, state_entered, rtm_timestamp timestamp, active, state_changed) VALUES (...)

The jobid must be unique for that job within the database, _FSktmi0w2Ctg6A9v6x6FVw_ is an example of what is currently used. 

The rb is the host name of the Workload Management Service. Use _unknown_ if not known. 

The ui  is the hostname from where the user submited the job. Use _unknown_ if not known. 

The vo is the VO name for who the job is being executed. Use _unknown_ if not known. 

The ce is hostname representing the Computing Service where the job will be run.  Use _unknown_ if not known. 

The queue is the queue name where the job will be queue. This is typically of the form _jobmanger-LRMS-queue_ . Use _unknown_ if not known. 

The registered timestamp is when the job first entered the system.  It is typically of the form _yyyy_mm_dd hh:mm:ss_  and should be in UTC.

The state is the current state of the job and can take one of the following values noting that they are case sensitive.

Undef,            < Undefined.
Submitted,    < entered by the user to the User Interface or registered by Job Partitioner
Waiting,          < Accepted by WMS, waiting for resource allocation
Ready,            < Matching resources found
Scheduled,  < Accepted by LRMS queue
Running,        < Executable is running
Done,              < Execution finished, output is available
Cleared,        < Output transfered back to user and freed
Aborted,        < Aborted by system (at any stage)
Canceled,    < Canceled by user
Unknown,     < Status cannot be determined
Purged,         < Job has been purged from bookkeeping server (for LB->RGMA interface)

The state_entered timestamp is when the job entered that state. It is typically of the form _yyyy_mm_dd hh:mm:ss_ 

How the RTM Works.

When the RTM finds the state Scheduled it draws a magenta line from the rb to the ce. When the RTM finds the Done state it draws a yellow line from the ce to the rb. If the RTM finds the states Aborted or Cancelled , it draws a red line from the ce to the rb. The state Running is used to contribute to the size of pulsing pie-charts and to contribute to the final number of running jobs (in the top right corner).

-- LaurenceField - 17 Sep 2008

Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r4 - 2011-06-21 - AndresAeschlimann
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback