Dashboard ProdAgent Monitoring

  • Common schema of Dashboard ProdAgent monitoring:
    PAMonitoring_small.jpg


The main goal of PA monitoring is to collect job monitoring info from the local PA databases into Dashboard repository and to provide an interface to that data. The application is installed on lxarda18. It consists of 2 parts:

  • The action for getting xml reports which contain job processing report and recording this info into Dashboard repository.
    One can test it by performing the command
    curl -d report="$(cat prodmon_report.xml)" http://lxarda18/dashboard/request.py/getPAinfo
    And this is an example of a report file.

  • Web user interface to retrieve the data collected for future plot construction at CMS Prodmon page. To get plot data you should use the link like
     http://lxarda18/dashboard/request.py/PAquery?list_of_parameters 

    Here is a list of the obligatory parameters and their description:
Parameter Values Description
plot pie, bar, cumulative and data Type of the plot
what evts_read_failed, evts_read_success, evts_read_any (both success and failed) You should use it in case you want to get the sum of the events read
evts_written_failed, evts_written_success, evts_written_any You should use it in case you want to get the sum of the events written
time_failed, time_success, time_any SUM("jobs_length"/3600)
jobs_failed, jobs_success, jobs_any SUM("inst_num")
CMSSW_fail_codes Retrieves the data for the "CMSSW Application failure codes" plot
time_by_span Use it when you want to get the data for the "Approximate resources utilized" or "Approximate batch slot count by ..." plots
reco_rate For "Tier 0 reco rate" plot
jobs_success_rates For "Jobs success rates" plot
dts_timings For "Unnormalized dataset timings" plot
jobs_failure_types For "Job failure type" query
sortby prod_team, prod_agent, wf, dts, site, exit_code It's like the GROUP BY clause in a query
job_type Merge, Processing, Any Type of the jobs
starttime Date like YYYY-mm-dd hh24:mi:ss For example, 2010-06-28 13:54:17
endtime like the starttime parameter

Optional parameters

Parameter Values Default value Description
prod_team a team name or a list of them any team For example, prod_team=Team1&prod_team=Team2&...
dts a dataset name or a list of them any dataset dts=dts1&dts=dts2&...
prod_agent a production agent name or a list of them any agent prod_agent=Agent1&...
wf a workflow name or a list of them any workflow wf=wf1&wf=wf2&...
site a site name or a list of them any site site=site1&...
span 3600, 86400 3600 One can select the span=3600 for the data retrieval for the previous 7 days and 86400 - in other cases
exit_code the numbers of the exit codes any exit code

Some example of URLs to get data for a plot:

http://lxarda18/dashboard/request.py/PAquery?starttime=2010-06-27 13:30:00&endtime=2010-06-28 13:30:00&what=evts_read_any&job_type=Merge&plot=pie&sortby=site
  • Jobs by exit codes
http://lxarda18/dashboard/request.py/PAquery?starttime=2010-06-25 13:30:00&endtime=2010-06-27 13:30:00&what=jobs_any&job_type=Merge&plot=bar&sortby=exit_code To get a cumulative plot do the same but change the plot type from bar to cumulative. Also you can specify teams, agents, workflows, sites, exit codes if you need it.

Almost all the queries are performed in the same way but there are some exceptions.

Sources can be found in Dashboard SVN. There are 2 modules: arda.dashboard.dao-oracle-cmspa and arda.dashboard.web-cmspa

-- IrinaSidorova - 25-Jun-2010

Topic attachments
I Attachment History Action Size Date Who Comment
JPEGjpg PAMonitoring_small.jpg r1 manage 54.4 K 2010-06-25 - 20:07 UnknownUser Common schema of Dashboard ProdAgent monitoring
XMLxml prodmon_report.xml r1 manage 0.8 K 2010-06-25 - 21:19 UnknownUser an example of prodmon report
Edit | Attach | Watch | Print version | History: r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r2 - 2010-06-28 - unknown
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    ArdaGrid All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback