CMS Job Monitoring collectors
Contents:
Overview
New implementation of CMS job monitoring collector for the Experiment Dashboard monitoring project.
This page at the moment intend to list all the received messages (at the current status) grouped by message type.
First we want to undrestand/define where each parameter need to be written, so we can optimize the implementation of bulk queries for update, insert.
Task Meta
This concerns information at task/workflow level. Usually this message implies to insert a new task in the TASK table, or to update existing information if the TASK already exists:
Job Meta
Job meta information is sent as soon as a job is created/submitted, to transmit static information about a job. It mainly is translated on population of the JOB and TASK_JOB table.
Job Status
Runtime
This information is usually sent directly from the WN. It may consists of small messages sent in a short interval time (CRAB-2 case).
Performance
This is something currently not handled. Idea is to have a new table (JOB_PERFORMANCE) pointing to the JOB table (foreign key), because we want this information to be cleaned periodically (after proper aggregation).
- JobId
- TaskId: task name that should go in TaskMonitorId
- StepName
- GridJobID
- StatusValue
- StatusEnterTime
- WriteTotalMB
- ReadTotalMB
- ReadMBSec
- ReadMaxMSec
- ReadAveragekB
- MinEventTime
- MaxEventTime
- MinEventCPU
- MaxEventCPU
- AvgEventCPU
- TotalEventCPU
- TotalJobCPU
- ReadPercentageOps
- ReadCachePercentageOps
- ReadTotalSecs
- WriteTotalSecs
- PeakValueVsize
- AvgEventTime
- TotalJobTime
- PeakValueRss
- ReadNumOps
- JobExitCode
--
MattiaCinquilli - 27-Jun-2011