---+ Real Time Monitor ---++ Introduction The Real Time Monitor (RTM) presents of real time view of the WLCG Grid in operation. It is based on information retrieved from the Grid information system and the Logging and Bookkeeping (LB) System. Sites which are participating in the infrastructure are found by querying the information system. The Site entry is used to obtain information such as the name, location, etc. for the site. The LB servers are found by querying the information system for the LB service entry. The RTM then connents to the LB server database to find the job state information. The site information is used to plot the sites on the map and the job state information is used to show where the jobs are running etc. More details on the RTM can be found on the [[http://gridportal.hep.ph.ic.ac.uk/rtm/][RTM page]]. ---++ Publishing a Site In order to show up in the RTM the site needs to be published in the information system. Instructions on how to publish the site entry correctly can be found [[https://wiki.egi.eu/wiki/MAN1_How_to_publish_Site_Information][here]]. ---++ Publishing the RB server The RB servers are found by looking for its service entry in the information system. An example entry is shown below. The information provider _gilte-info-server_ can also be used to publish this information. This is required so that the RB can be plotted on the RTM and associated with a site. <verbatim> dn: GlueServiceUniqueID=hostname:7772,Mds-Vo-name=sitename, Mds-Vo-name=local,o=grid objectClass: GlueTop objectClass: GlueService objectClass: GlueKey objectClass: GlueSchemaVersion GlueServiceUniqueID: hostname:7772 GlueServiceName: sitename-rb GlueServiceType: ResourceBroker GlueServiceVersion: 1.2.0 GlueServiceEndpoint: hostname:7772 GlueServiceURI: unset GlueServiceAccessPointURL: not_used GlueServiceStatus: OK GlueServiceStatusInfo: No Problems GlueServiceWSDL: unset GlueServiceSemantics: unset GlueServiceStartTime: 1970-01-01T00:00:00Z GlueServiceOwner: VO1 GlueServiceOwner: VO2 GlueServiceAccessControlRule: VO1 GlueServiceAccessControlRule: VO2 GlueServiceAccessControlRule: infngrid GlueForeignKey: GlueSiteUniqueID=sitename GlueSchemaVersionMajor: 1 GlueSchemaVersionMinor: 2 </verbatim> ---++ Publishing the LB server *Note: We will not need this step as we will hard code the hostnames of the databases in custom update script*. The LB servers are found by looking for its service entry in the information system. An example entry is shown below. The information provider _gilte-info-server_ can also be used to publish this information. <verbatim> dn: GlueServiceUniqueID=https://hostname:9003/lb,Mds-Vo-name=sitename,Mds-Vo-name=local,o=grid objectClass: GlueTop objectClass: GlueService objectClass: GlueKey objectClass: GlueSchemaVersion GlueServiceUniqueID: https://hostname:9003/lb GlueServiceName: sitename-org.glite.lb.Server GlueServiceType: org.glite.lb.Server GlueServiceVersion: 1.6.2 GlueServiceEndpoint: https://hostname:9003/lb GlueServiceURI: unset GlueServiceAccessPointURL: https://hostname:9003/lb GlueServiceStatus: OK GlueServiceStatusInfo: No Problems GlueServiceWSDL: unset GlueServiceSemantics: unset GlueServiceStartTime: 1970-01-01T00:00:00Z GlueServiceOwner: VO1 GlueServiceOwner: VO2 GlueServiceAccessControlRule: VO1 GlueServiceAccessControlRule: VO2 GlueForeignKey: GlueSiteUniqueID=sitename GlueSchemaVersionMajor: 1 GlueSchemaVersionMinor: 3 </verbatim> ---++ Job Status Information Information about the job status is found by querying the LB database directly. For the purpose of interoperability with other infrastructures, it is not necessary to provide an LB server but just to provide the information in the database. This will require setting up a MySQL database. For the sake of convenience we will use the RTM data representation rather than the LB Servers. <verbatim> mysql -u root -p <enter your mysql password> CREATE DATABASE RTM; use RTM; CREATE TABLE jobs( jobid varchar(128) NOT NULL, rb varchar(128), ui varchar(128), vo varchar(128), ce varchar(128), queue varchar(128), registered timestamp, state varchar(128), state_entered timestamp, PRIMARY KEY (`jobid`) ); CREATE INDEX ce_index ON jobs (ce); CREATE INDEX rb_index ON jobs (rb); CREATE INDEX registered_index ON jobs (registered); GRANT SELECT ON RTM.jobs TO 'gridrtm'@'tl00.hep.ph.ic.ac.uk' IDENTIFIED BY 'password' ; </verbatim> Ensure any firewalls are not blocking the IP address _tl00.hep.ph.ic.ac.uk_ ---++ Adding Information to the Database <verbatim> INSERT INTO jobs( jobid, rb, ui, vo, ce, queue, registered timestamp, state, state_entered, rtm_timestamp timestamp, active, state_changed) VALUES (...) The jobid must be unique for that job within the database, _FSktmi0w2Ctg6A9v6x6FVw_ is an example of what is currently used. The rb is the host name of the Workload Management Service. Use _unknown_ if not known. The ui is the hostname from where the user submited the job. Use _unknown_ if not known. The vo is the VO name for who the job is being executed. Use _unknown_ if not known. The ce is hostname representing the Computing Service where the job will be run. Use _unknown_ if not known. The queue is the queue name where the job will be queue. This is typically of the form _jobmanger-LRMS-queue_ . Use _unknown_ if not known. The registered timestamp is when the job first entered the system. It is typically of the form _yyyy_mm_dd hh:mm:ss_ and should be in UTC. The state is the current state of the job and can take one of the following values noting that they are case sensitive. Undef, < Undefined. Submitted, < entered by the user to the User Interface or registered by Job Partitioner Waiting, < Accepted by WMS, waiting for resource allocation Ready, < Matching resources found Scheduled, < Accepted by LRMS queue Running, < Executable is running Done, < Execution finished, output is available Cleared, < Output transfered back to user and freed Aborted, < Aborted by system (at any stage) Canceled, < Canceled by user Unknown, < Status cannot be determined Purged, < Job has been purged from bookkeeping server (for LB->RGMA interface) The state_entered timestamp is when the job entered that state. It is typically of the form _yyyy_mm_dd hh:mm:ss_ </verbatim> ---++ How the RTM Works. When the RTM finds the state _Scheduled_ it draws a magenta line from the rb to the ce. When the RTM finds the _Done_ state it draws a yellow line from the ce to the rb. If the RTM finds the states _Aborted_ or _Cancelled_ , it draws a red line from the ce to the rb. The state _Running_ is used to contribute to the size of pulsing pie-charts and to contribute to the final number of running jobs (in the top right corner). -- Main.LaurenceField - 17 Sep 2008
This topic: LCG
>
RTM
Topic revision: r4 - 2011-06-21 - AndresAeschlimann
Copyright &© 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use
Discourse
or
Send feedback