Ganga External Monitoring

This page describes the design and implementation strategy for Ganga to do external monitoring via systems such as Monalisa and Dashboard.

Ganga is a client to the external monitoring systems in two places:

  • job on a worker node via job wrapper script generated by Ganga
  • Ganga client (interactive or script session)

This feature may be useful in the following ways:

    • monitoring of the internal activity of the applications (number of processed events)
    • collecting information about usage of ganga aka spyware
    • potential crosschecking with other monitoring systems (RGMA, Dirac,...)
    • monitoring user actions (for example Dashboard wants to know when the job is submitted)

Implementation (02 Jul 2007)

This is work in progress which sits on this branch in CVS: Ganga-4-4-0-dev-branch-kuba-monitoring-services

Example of a specific monitoring service implementation

OutputServerMS implements a simple debug utility which allows to quickly stream the stdout and stderr back to the client if the application fails to execute.

Ganga/Lib/MonitoringServices/OutputServerMS

How to use it:

  1. Run the server
    % Ganga/Lib/MonitoringServices/OutputServerMS/ganga_output_server.py [port] 
  2. If the server runs on the same machine as your ganga session and if it uses the default port, then skip this point. Otherwise specify the server address like this:
    % export GANGA_OUTPUTSERVERMS_URL=http://server.host.address:port 
  3. Enable the service in ganga
    % ganga
    >>> config['MonitoringServices']['Executable'] =  'Ganga.Lib.MonitoringServices.OutputServerMS.OutputServerMS'
    >>> Job().submit()
    

Archive

Work so far (27 Jul 2006)

  • B.G.:
    • implemented Dashboard monitoring inside Athena wrapper script
    • proposed a monitoring interface class
    • use case:
      • Athena exit code is meaningless (always 0)
      • when Athena terminates it produces a log file which is then analyzed by a special tool which produces an xml file
      • the xml file is parsed the 'proper' exit code is transmitted to Dashboard
  • B.K.:
    • implemented Gaudi Service which produces xml files with event information while Gaudi runs
    • modified Localhost and LSF handler to read the xml files and transmit the information to Monalisa

Implementation strategy

  • Monitoring interface defines send() methods to forward the information and it also defines the list of files which must be shipped to the worker node (e.g. monalisa modules)
  • This interface is used on the client and on the worker node, one monitoring object is connects to one monitoring system
  • Core framework automatically extracts the addtional files and puts them in _python/GangaWN/Monitoring directory on the worker node
  • There is an implementation of monitoring interface which agregates monitoring objects and serves as a fan-out to these monitoring objects
    • so from the point of view of the framework and wrapper scripts there is only one monitoring object!
  • For the moment we decided that all backend scripts must be modified with the calls to monitoring object send() methods, however these lines are completely generic, like this:
 sys.path.append['./_python']
 from GangaWN.Monitoring import getMonitoringObject
 monit = getMonitoringObject()

 monit.send_running(jobid,...)

Actions (in order)

  1. B.G. sends to this page his monitoring interface

I have attached two files (IMonitoring.py and ARDADashboard.py) to the page. IMonitoring.py is the definition of the methods to be implemented by a monitoring object. Currently, the number of methods is very small:

  • one is called by the JobManager when a job is submitted (maybe to put called in an other place). For the dashboard, it requires to know already some application info (like dataset, application version), the Grid Job Id and this kind of things.
  • one is called when building the sandbox. The method returns the list of files to be added to the sandbox for the monitoring to work properly. So far, it was called in the Athena (application) LCG handler in order not to fiddle too much with other applications. However, it sounds more consistent to append files to the sandbox in a common place for all applications.

There is nothing like a method to be called by the job when it runs on the worker node (however, we definitely need it). This comes from the fact the Athena/LCG job is a shell script which I have added a couple of lines to called the monitoring python script (executable). This script is originally distributed by the dashboard guys for CMS and is not using the interface I have defined. In principle, if we would stick to a shell script for Athena/LCG, the python monitoring script should at least be based on the interface so that everything would be consistent. I did not do it just because the script exists already.

I believe when the LCG wrapper is less shell script and more python, monitoring will "come for free" thanks to the hooks we will have put in.

  1. B.K. tries to use it in his wrapper scripts (possible revisions of the interface)

For now I am only going to work on the Localhost handler since LSF is playing up. The job wrapper scripts are very similar for both so this shouldn't be too much of a problem except for testing.

  1. We have to make sure HC has the introduced the job wrapper script on the LCG (otherwise B.G. cannot move it out of the Athena script)
  2. Somebody will develop the agregated monitoring object (editor's notice: Andrew already volunteered cool! )
  3. K.M. will make approperiate changes to the core

-- JakubMoscicki - 27 Jul 2006

Topic attachments
I Attachment History Action Size Date Who Comment
Texttxt ARDADashboard.py.txt r1 manage 2.2 K 2006-07-31 - 09:35 UnknownUser Dashboard implementation of IMonitoring interface
Texttxt DL.py.txt r1 manage 1.7 K 2006-08-31 - 11:50 UnknownUser The main monitoring that imports others depending on application
Texttxt IMonitoring.py.txt r1 manage 0.9 K 2006-07-31 - 09:34 UnknownUser Interface to implement for monitoring
Texttxt ListenerResults.py.txt r1 manage 8.3 K 2006-08-31 - 11:52 UnknownUser Parses GAUDI XML log files and within a loop sends to MonALISA (other actions possible)
Edit | Attach | Watch | Print version | History: r6 < r5 < r4 < r3 < r2 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r6 - 2007-07-02 - JakubMoscicki
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    ArdaGrid All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback