--
JakubMoscicki - 15 Jun 2006
GangaPlotter : graphical summary about job statistics using
matplotlib
package
Piecharts
plotter.piechart(jobs, attr, **options)
- create a piechart for
jobs
given attr
as a piechart value. The attr
may be a string representing the desired attributed or a one-argument function mapping a job to an arbitrary value.
Examples:
-
plotter.piechart(jobs, "backend")
-
plotter.piechart(jobs.select(1,300), "status")
-
plotter.piechart(jobs, "backend.actualCE")
-
plotter.piechart(jobs,lambda j: len(j.subjobs),title='number of subjobs')
-
plotter.piechart(jobs,lambda j: len(j.subjobs)==10, title='len(j.subjobs)==10')
Barcharts
plotter.barchart(jobs, xattr, yattr, **options)
- create a barchart for
jobs
given xattr
as the barchart xvalue, yattr
as the barchart yvalue. The xattr
and yattr
may be a string representing the desired attributed or a one-argument function mapping a job to an arbitrary value.
Examples:
-
plotter.barchart(jobs,'backend','status',title='Job efficiency on each backend')
-
plotter.barchart(jobs,lambda j:j.backend.CE.split(':')[0],'backend.status',title='Job efficiency on each site',xlabel='CE')
Histograms
plotter.histogram(jobs, attr, **options)
As the data for generating the histogram needs to be a number (otherwise, it's the case of using barchart), it might be hard to find a use case of histogram directly on ganga's job attributes.
Nevertheless, thanks to the
dataproc
option (also available in other chart generators), there is an example which could be useful: making a histogram on the elapsed time of the LCG jobs.
Example: application runtime histogram of the LCG jobs
In the example, we are going to parse the file
j.outputdir/__jobscript__.log
to get the start-time and stop-time of the application executable. So firstly, define a parser as the following:
import os.path
import re
import time
def get_app_runtime(filepath):
re_timebeg = re.compile('^(.*)\s+(\[Info\])\s+(Load application executable).*$')
re_timeend = re.compile('^(.*)\s+(\[Info\])\s+(GZipping stdout and stderr).*$')
timebeg_str = ''
timeend_str = ''
timebeg_sec = 0
timeend_sec = 0
if os.path.exists(filepath):
f = open(filepath)
for l in f.readlines():
matches = re_timebeg.match(l.strip())
if matches:
timebeg_str = matches.group(1)
continue
else:
matches = re_timeend.match(l.strip())
if matches:
timeend_str = matches.group(1)
break
else:
continue
f.close()
if timebeg_str and timeend_str:
timebeg_sec = time.mktime(time.strptime(timebeg_str.strip(), '%a %b %d %H:%M:%S %Y'))
timeend_sec = time.mktime(time.strptime(timeend_str.strip(), '%a %b %d %H:%M:%S %Y'))
return timeend_sec - timebeg_sec
It would be handy if you can save this function in a script and load it into Ganga via
execfile()
every time when you re-entering Ganga.
Now we can generate a job elapsed time histogram:
plotter.histogram(jobs.select(status='completed'), attr=lambda j:j.outputdir+'/__jobscript__.log', dataproc=get_app_runtime, title='Application Runtime', xlabel='second', label='runtime')
and this is the result:
Scatter plots
plotter.scatter(jobs, xattr, yattr, **options)
- create a scatter plot for
jobs
given xattr
as the xvalue, yattr
as the yvalue. The xattr
and yattr
may be a string representing the desired attributed or a one-argument function mapping a job to an arbitrary value. The data can be also grouped by an optional attribute cattr
, data belongs to different group is distinguished by color and maker.
Example: application runtime (on X-axis) v.s. job turnaround time (on Y-axis) of the LCG jobs, data grouped by country
We will reuse the function
get_app_runtime
given above to modify the job data to make the plot.
plotter.scatter(jobs.select(status='completed'), xattr=lambda j:j.outputdir+'/__jobscript__.log', yattr=lambda j:(j.time.final() - j.time.submitted()).seconds, cattr='backend.actualCE', xdataproc=get_app_runtime, cattrext='by_country', title='Application Runtime v.s. Job Turnaround Time', xlabel='app runtime (sec)', ylabel='job turnaround (sec)', deep=True)
and this is the result:
Deep looping on sub-sub-sub-...-jobs
By default, the plotter loops over all the sub-job levels of the given (top level) jobs to extract the information for generating the plots. It could be disabled by giving an optional key argument
deep=False
to the plotter commands.
plotter.barchart(jobs,'status',deep=False)
Examples:
-
plotter.piechart(jobs,'status',title='with subjob deep looping')
-
plotter.piechart(jobs,'status',title='without subjob deep looping',deep=False)