Job Monitor

This is part of the DIRAC Web Portal project. For the description of the DIRAC Web Portal basic functionality look here.

Description

The Job Monitor is providing information about the individual Jobs currently managed by the DIRAC Workload Management System. It shows details of the selected Jobs and allows certain Job manipulations.

Selectors

Selector widgets are provided in the left-side panel. These are drop-down lists with values that can be selected. A single or several values can be chosen. Once the selection is done press Submit button to refresh the contents of the table in the right-side panel. Use Reset button to clean up the values in all the selector widgets.

The following Selectors are avaialble:

DIRAC Site
The Job destination site in DIRAC nomenclature.

Status
The Job Status. The following Status values are possible:

Status Comment
Received Job is received by the DIRAC WMS
Checking Job is being checked for sanity by the DIRAC WMS
Waiting Job is entered into the Task Queue and is waiting to picked up for execution
Running Job is running
Stalled Job has not shown any sign of life since 2 hours while in the Running state
Completed Job finished execution of the user application, but some pending operations remain
Done Job is fully finished
Failed Job is finished unsuccessfully
Killed Job received KILL signal from the user
Deleted Job is marked for deletion

Minor Status
More detailed specification of the Job status

Application Status
Status of the user application executed by the Job

Owner
The Job Owner. This is the nickname of the Job Owner corresponding to the Owner grid certificate DN.

JobGroup
The Job groups are useful for job classifications to facilitate monitoring. This is used in massive data productions.

JobID
Particular Job identifier.

Time Span
The Time Span widget allows to select Pilot Jobs with Last Update timestamp in the specified time range.

Columns

The information on the selected Jobs is presented in the right-side panel in a form of a table. Note that not all the available columns are displayed by default. You can choose extra columns to display by choosing them in the menu activated by pressing on a menu button ( small triangle ) in any column title field.

The following columns are provided:

JobId
The Job identifier inside the DIRAC WorkloadManagement System

Site
The Job destination site in DIRAC nomenclature.

Status
The Job Status

Minor Status
More detailed specification of the Job status

Application Status
Status of the user application executed by the Job

Owner
The Job Owner. This is the nickname of the Job Owner corresponding to the Owner grid certificate DN.

OwnerDN
The Job Owner grid certificate DN.

OwnerGroup
The Job Owner group as defined in the DIRAC Configuration.

JobName
Each job can be given an individual name by the user as desired.

DIRACSetup
The DIRAC Setup instance in which the Job is executed

SubmissionTime
Job submission time stamp

LastUpdateTime
Job last status update time stamp

LastSignOfLife
Job last heart beat time stamp

Extra Info and Operations

Clicking on the line corresponding to a Job, one can obtain a menu which allows certain operations on the Job. Currently, the following operations are availble:

JDL
Get the Job JDL description.

Attributes
Get Job standard attributes. These are Job descriptors that present for all the jobs.

Parameters
Get Job parameters. These are dynamic parameters published by the Job itself.

Logging Info
Get Job logging information showing various job state transitions together with their timestamps

Peek StandardOutput
Get access to the last published view of the Job standard output. The last 20 lines are given.

Get LogFile
Get access to the Job log files. This is only relevant for the Production Jobs which publish application log files.

Get Pending Request
Get Job Pending Request for the operations that failed in the first place and are stored in the Failover Request System

Get StagerReport
Get Stager report for the Job, if applicable.

Get Sandbox File
Get access to the Job Sandbox files.

Actions

  • Reset - restart Job from scratch
  • Kill - kill Job
  • Delete - delete Job from the DIRAC Workload Management System

Pilot

  • Get StdOut - get the standard output of the corresponding Pilot Job
  • Get StdErr - get the standard error output of the corresponding Pilot Job
Edit | Attach | Watch | Print version | History: r6 < r5 < r4 < r3 < r2 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r6 - 2009-09-20 - AndreiTsaregorodtsev
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LHCb All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback