How to plugin your application into DIANE?

This manual covers the python application adapters. For C++ application adapters have a look in G4Analysis application for an example.

How the framework works

The easiest is to start from an existing application such as simpleTest (browse source code). Copy $DIANE_TOP/dev/applications/simpleTest into $DIANE_TOP/dev/applications/YourApp and modify the file names accordingly. Alternative location of the application plugins is in ~/diane.workspace/applications directory. This location is controlled by DIANE_USER_WORKSPACE variable.

An application adapter is a python module which defines Planner, Integrator and Worker classes. DIANE calls the approperiate methods on these objects to run the application's specific functionality. One instance of Planner and Integrator lives inside the Master agent, while an instance of Worker lives inside every Worker agent.

DIANE automatically transmits the data messages between application objects (the arguments and results of the method calls). The data messages an be any python structure which can be pickled (pickle is a standard python persistency utility). So it can be simple types, lists, dictionaries, standard python objects etc.

When a job is started a job data (description) from a .job file is loaded (examples are in $DIANE_TOP/dev/workspace) and passed to Planner.env_createPlan(self,jobData,...) and Integrator.env_init(jobData) methods.

Planner.env_createPlan(self,jobData) defines how the job should be split in tasks. A task is a single unit of work which workers get to execute. This method returns a tuple (workerInitData,[taskDataList]).

Worker.env_init(self,workerInitData) is called exactly one by each Worker agent when it comes alive (workerInitData produced by Planner.env_createPlan will be transparently transimitted over the network and passed on). Return False if there is an initialization error. DIANE will also handle any exceptions, so don't worry about it.

Whenever a worker agent is free, a master agent assings a new task to it. This is done by calling Worker.env_performWork(self, taskData) where taskData has been produced by the Planner. This method returns a tuple (taskOutputData, status), where status should be False if you want to report errors in a graceful way.

The taskOutputData is passed to the env_addPartialOutput(self, taskOutputData) which saves the task output or does the merges with previous results - whatever your application needs.

Integrator.env_getResult() is called once, when all processing have been finished. Change: In DIANE 1.4.7 Integrator.env_getResult() was called every time the new task result arrived, which was pretty useless.

Customizing the worker environment

You can customize the environment of the worker process in a BOOT class in your application adapter (__init__.py file). The following methods may be used to modify the worker agent wrapper script:

  • preconfig returns a csh-script text which will be pasted into the worker wrapper before diane configuration script are sourced
  • postconfig returns a csh-script text to execute before the worker agent process is started
  • postworker returns a csh-script text to execute after the worker agent process has been terminated

The BOOT class may be used for many things like:

  • setting network parameters (e.g. ORBgiopMaxMsgSize environment variable control the size of the data message buffer)
  • fixing wrong environment (or emulating another environment)
  • installing a private copy of the application adapter

An example from G4Production application __init__.py file:

class BOOT:
    def preconfig(self,backend,spec):
        # WE ARE STREAMING LARGER FILES< SO INCREASE A MESSAGE BUFFER
        text = """setenv ORBgiopMaxMsgSize 35000000"""
        text += '\n'

        # IMITATE A GRID WORKER NODE IF RUNING LOCALLY
        if backend.lower() in ['lsf','localhost']:
            text += """
if (${?LD_LIBRARY_PATH} == 0) then
setenv LD_LIBRARY_PATH
endif

setenv LD_LIBRARY_PATH ${PWD}:${LD_LIBRARY_PATH}

# HACK: make a fake VO sw variable so that 'grid-sw' installation mode (worker.init) will work
# in the same way as in the Grid WNs
setenv VO_GEAR_SW_DIR /afs/cern.ch/sw/arda/install/ITU/
"""

        if backend.lower() in ['lcg','glite']:
            text += """# nothing needed
""" % config

            text += '\n'

        return text

    def postconfig(self,backend,spec):
        return ""

    def postworker(self,backend,spec):
        return ""

boot = BOOT()

The BOOT object is always created and used on the Master host before the worker agent submission. The spec parameter corresponds the the original .job file which contains the JobData, InputFiles, etc.

Uploading your locally modified application to the worker node

Currently, the management of application adapters is pretty much centralized. DIANE worker agent installs itself from a CERN web server automatically and typically the application adapters which are contained in the distribution are used. However if you want to customize this process, you can create do it in BOOT class in your application adapter (__init__.py file). Below there is a complete recipe in which we send a tarball containing the application adapter to the worker node in the input sandbox.

1. Make a private copy of the application adapter at local machine

The new application adapter has to be copied to a directory defined in $DIANE_USER_WORKSPACE (default is ~/diane.workspace). The tree structure of the adapter in this directory must be: $DIANE_USER_WORKSPACE/applications/, where is the application directory (e.g. G4Production,xmipp).

2. Modify the BOOT class in the __init__.py file

In order to extract the .tgz file at the worker node, and to store it in the appropriate location, the application adapter must include following modifications in the BOOT class in the init.py file:

   #***********************************************
         if backend.lower() in ['lcg','glite']:
           text = """
## HERE MIGHT BE SOME OTHER COMMANDS
tar xfzv <ApplName>.tgz
"""
   #***********************************************
where .tgz is the zipped and archived application directory, as explained below.

3. Create a tarball containing your private application adapter
cd ~
tar cfzvh <ApplName>.tgz diane.workspace/

The tarball should contain the following directory structure:

diane.workspace
diane.workspace/<ApplName>

4. Specify the application tarball in the *.job file

In the job data file (*.job) the new application adapter is included as follows:

InputFiles = ['/my_home_dir/<ApplName>.tgz']

You should specify an absolute path with the location of .tgz

How to stream the files

Small files may be streamed inside the data messages. If your files get larger you will have to increase the ORBgiopMaxMsgSize parameter to the same value on both sides of the wire (Master and Worker). What you typically can do is to read the content of the file and ship it together with its name in a python tuple. For convenience you could also use helper classes:

from DIANE.user import File, Base64BinaryFile

You simply do a File.read(fn) and return is from Worker.env_performWork(). The Integrator.env_addPartialOutput() can do f.write() and this will recreate the file on the Master host.

These classes are not magic. They are roughly equivalent to this code:

class File:
    def __init__(self,name):
        self.name = name
        self.text = file(name,'r').read()

    def write(self):
        import os
        dest = os.path.basename(self.name)
        file(dest,'w').write(self.text)
        self.name = dest
        
    def read(name):
        return File(name)

    read = staticmethod(read)

-- JakubMoscicki - 18 Oct 2006

Edit | Attach | Watch | Print version | History: r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r3 - 2007-06-12 - JakubMoscicki
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    ArdaGrid All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback