The main goal of PA monitoring is to collect job monitoring info from the local PA databases into Dashboard repository and to provide an interface to that data.
The application is installed on lxarda18. It consists of 2 parts:
Parameter |
Values |
Description |
plot |
pie, bar, cumulative and data |
Type of the plot |
what |
evts_read_failed, evts_read_success, evts_read_any (both success and failed) |
You should use it in case you want to get the sum of the events read |
evts_written_failed, evts_written_success, evts_written_any |
You should use it in case you want to get the sum of the events written |
time_failed, time_success, time_any |
SUM("jobs_length"/3600) |
jobs_failed, jobs_success, jobs_any |
SUM("inst_num") |
CMSSW_fail_codes |
Retrieves the data for the "CMSSW Application failure codes" plot |
time_by_span |
Use it when you want to get the data for the "Approximate resources utilized" or "Approximate batch slot count by ..." plots |
reco_rate |
For "Tier 0 reco rate" plot |
jobs_success_rates |
For "Jobs success rates" plot |
dts_timings |
For "Unnormalized dataset timings" plot |
jobs_failure_types |
For "Job failure type" query |
sortby |
prod_team, prod_agent, wf, dts, site, exit_code |
It's like the GROUP BY clause in a query |
job_type |
Merge, Processing, Any |
Type of the jobs |
starttime |
Date like YYYY-mm-dd hh24:mi:ss |
For example, 2010-06-28 13:54:17 |
endtime |
like the starttime parameter |
To get a cumulative plot do the same but change the plot type from bar to cumulative. Also you can specify teams, agents, workflows, sites, exit codes if you need it.
Almost all the queries are performed in the same way but there are some exceptions.
Sources can be found in Dashboard SVN. There are 2 modules: