Octopus Monitoring

GangaOctopus is currently not working

Please not that at the present time (Sep. 2009) GangaOctopus is not functional. We keep the documentation below for reference, but please do not try to use it. Other solutions will hopefully be available in the near future.

Introduction

The octopus monitoring service allows you to peek into files within the sandbox of a running job, like the standard output or stderr, but also accumulating log files. It works via an Octopus server to which the job as well as your local client need to connect. The job-wrapper then can send requested information via the Octopus server to the client. The Octopus server can serve many thousands of concurrent data streams concurrently. Your local client and your job rendezvous via a 64bit channel number that is generated randomly when you submit the job and which is sent with the job to the worker node. The current (4.4.1) Ganga implementation does not provide any encryption of the data you send via the monitoring service, however and attacker will need to guess the right channel number.

The Octopus service can buffer some amount (default: 150KB) of output internally, so that your local client can connect to the Octopus server also after the job has started to run, and you will see always be able to read at least this buffered amount of output. The service also leaves the channel open for some (default: 15) minutes so that a job that quickly crashes can be looked at post-mortem. The pre-production phase is intended to figure out optimal values for the buffering time and the size of the buffer.

The intention is to set up one dedicated Octopus service for Atlas and another one for LHCb. For now only a development server is available.

How to use the service

During the current preproduction phase you need to point Ganga manually to the right Octopus server, in the future this could be done in the experiment or application settings. Before starting Ganga you need to set the following environment variable:
export GANGA_OCTOPUS_SERVER=ganga-o.cern.ch

There is also an environment variable to set the port number, but the default (8882) is usually sufficient. This variable is called GANGA_OCTOPUS_PORT.

To enable the Octopus monitoring in the job-wrapper, you need to set in .gangarc:

[MonitoringServices]
Executable/* = Ganga.Lib.MonitoringServices.Octopus.OctopusMS
or
[MonitoringServices]
Executable/LCG = Ganga.Lib.MonitoringServices.Octopus.OctopusMS

Now you can start Ganga. After submitting the job, there are two ways to remotely read a file, via the external oreader.py tool and from within Ganga. The oreader is located under install/ganga/python/Ganga/Lib/MonitoringServices/Octopus and started as

oreader <channel-id> [file in sandbox]
The default file read remotely is stdout. The reader will contact the Octopus server and wait for output into the requested file. It terminates when everything has been read and the job has finished. The channel number is printed by the job handler when the job is started and must be manually copied and pasted. This is a temporary solution.

The second method is an extension of the peek command written by Hung-Chung:

How to use the service with an Athena job

Before starting Ganga set the environment variable:

export GANGA_OCTOPUS_SERVER=ganga-o.cern.ch

After starting Ganga add the Octopus monitoring plugin to the global configuration of the Athena MonitoringService with:

config['MonitoringServices']['Athena/LCG']='Ganga.Lib.MonitoringServices.ARDADashboard.LCG.ARDADashboardLCGAthena.ARDADashboardLCGAthena, Ganga.Lib.MonitoringServices.Octopus.OctopusMS'

Submit your jobs as usual.

Peeking the stdout of a running job on LCG

The peek() method of LCG handler is implemented as an alternative client of Octopus monitoring service, allowing users to peek job's standard output when the job is running on a remote worker node. To enable this feature, the configuration steps described above should be followed.

If the Octopus monitoring service is adopted at the job submission time, the LCG backend handler keeps the service information including the channel id for picking up the job's stdout cached on the Octopus server. The user doesn't need to memorize the channel id somewhere by himself.

To peek the stdout of a running LCG job:

j.peek()

The default implementation of the job's peek method triggers the backend-specific implementation only when the job is at the running state. As the job may be already running and producing output even the status is reported as submitted (this is because of the late information update in the LCG LB), a backend method is exposed to allow users to look into the stdout before the job reaches the running state.

j.backend.inspect()

-- BirgerKoblitz - 19 Sep 2007

Edit | Attach | Watch | Print version | History: r6 < r5 < r4 < r3 < r2 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r6 - 2009-09-16 - BjornS
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    ArdaGrid All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback