DIANE Frequently Asked Questions and Cookbook

Installation

Which platforms are supported?

See DIANE2Installation.

The diane-run command gives: ImportError: No module named omniORB

Make sure you followed the installation instructions and set up the environment. See DIANEEnvironmentScripts for more details. If things still do not work, then see DIANEPlatformDoctor

WorkerAgent or RunMaster do not start: ImportError: _omnipymodule.so:undefined symbol: PyUnicodeUCS4_DecodeUTF16

This is a problem of the python interpreter which has been compiled using different options for handling unicode characters (different length of unicode characters). Apparently the compiled libraries on our default platform are not compatible with your operating system.

See DIANEPlatformDoctor.

Cannot install from the central web repository at cern.ch (e.g. a cluster without web access)

You may install DIANE from alternative locations (installation mirrors): DIANEInstallationMirroring

Environment

ImportError: No module named diane.PACKAGE

These problems are probably due to flawed shell behavior which resets PATH and other environment variables when a new shell is started. Try to set PATH by doing: $(/.../diane-env) See DIANEEnvironmentScripts for more details

ERROR: cannot set up DIANE environment (diane-env not in PATH)

See above.

diane-run: command not found

See above.

Connectivity

Is inbound/outbound connectivity required?

The Master host must have inbound connectivity. The worker host (farm nodes, Grid nodes,...) only need outbound connectivity. So on the Master host you should open at least one port in the firewall. If you want to use the Directory Service then another open port is needed.

Additionally there is additional monitoring of your jobs (http://gangamon.cern.ch). It requires outbound TCP connection from Master and from Workers.

How to specify the port number for the Master?

Example: Suppose that you have port 22000 open on the master host for inbound traffic. Suppose that your run-file is called run.py

Choose any of the options below.

Option one:

> export ORBendPoint=giop:tcp::22000
> diane-run run.py

Option two:

> env ORBendPoint=giop:tcp::22000 diane-run run.py

Option three:

Add the following lines to the run.py file:

default_master = """
endPoint = giop:tcp::22000
"""

And then:

> diane-run run.py

How to set options for transport layer (omniORB)?

Any option for the omniORB transport layer may be set as in the example above.

Full description is here: DIANE2OmniOrbConfiguration

More information on other options available from the DIANE transport layer (omniORB): http://omniorb.sourceforge.net/omni41/omniORB/

Connectivity problems?

If you cannot connect to the master, you may have a firewall problem or a private IP address encoded in the Master Object Id. This happens on NAT networks.

Use catior utility provided by omniORB to see what information is published in the Master Object Id:

catior `cat ~/diane/runs/NNN/MasterOID`

If the IP address is something like 192.168.8.7 then you are likely on a private network. Define your public IP address using the ORBendPoint parameter.

Clusters without any internet connectivity

You probably need to disable monitoring to gangamon.cern.ch.

  • Disable monitoring in your run file
def run(input,config):
    ....
    config.MSGMonitoring.MSG_MONITORING_ENABLED = False

  • Disable monitoring in diane_install_dir/etc/ganga.ini

[MonitoringServices]
# comment out all lines here

The exact location of this file is normally defined in $GANGA_CONFIG_PATH environment variable.

Runtime problems

Termination of master or workers: TypeError: 'NoneType' object is not callable

Due to python2.2 threading semantics sometimes you may get this exception. It is harmless and you may just ignore it if you see this trace back:

Unhandled exception in thread:
Traceback (most recent call last):
 File "/usr/lib/python2.2/threading.py", line 429, in __bootstrap
   self.__stop()
 File "/usr/lib/python2.2/threading.py", line 438, in __stop
   self.__block.notifyAll()
 File "/usr/lib/python2.2/threading.py", line 242, in notifyAll
   self.notify(len(self.__waiters))
 File "/usr/lib/python2.2/threading.py", line 224, in notify
   me = currentThread()
TypeError: 'NoneType' object is not callable 

I get error: MARSHAL: CORBA.MARSHAL(omniORB.MARSHAL_PassEndOfMessage, CORBA.COMPLETED_YES)

This error means that the size of data passed between the master and worker exceed the internal memory buffer of omniORB. You should increase the buffer size by setting giopMaxMsgSize to a larger value (in bytes). For example to use 10MB buffer, modify the run file:

default_master = """
giopMaxMsgSize = 1000000
"""
default_worker = """
giopMaxMsgSize = 1000000
"""

Running master on a different host

Instructions how to submit worker agents on one host (e.g LCG UI, the "submitter host") and run the master on a different host ("master host"). If there is a shared filesystem between submitter host and the master host then it is enough to define $DIANE_USER_WORKSPACE to point on both hosts to the same location.

Starting master without submitting workers

On the master host: start DIANE master without submitting workers i.e. use the command diane-run

Example: diane-run python/diane/test/testOK.py

Submitting workers to the Grid using shared file system

diane-env -d ganga LCGSubmitter.py  --diane-run-file python/diane/test/testOK.py --diane-worker-number N

Submitting workers without shared file system

If you do not have a shared file system then you must have the MasterOID file available on the submitter host. The easiest is to copy the whole master directory (it is small at this point and contains only few files).

The directory of the last started master (current) may be obtained in this way: RUNDIR=`diane-env -d python -c 'import diane.workspace; print diane.workspace.getRundir()'`

Example: scp -r $RUNDIR submitter.host:MASTERDIR

diane-env -d ganga LCGSubmitter.py --diane-run-file python/diane/test/testOK.py --diane-master=MASTERDIR/MasterOID --diane-worker-number N

Note: not all content of the run file needed for worker submission, however the following pieces of information may be used:

  • worker transport layer configuration (static) as defined by the default_worker variable (see DIANE2OmniOrbConfiguration)
  • submission callbacks (worker_submit and initialize_submitter)
  • run() function may contain user-defined actions which may impact the submission.

Enabling GSI secure mode

In order to start the DIANE master in secure mode (i.e incoming connections from workers are authenticated using Grid credentials) you need to follow steps described in DIANE2GSI

How to override the default Ganga version used by DIANE?

These instructions assume that you installed DIANE in a standard place in the home directory (in some other case just adjust the paths).

When diane is installed it also installs a default Ganga version, as you can see in the installation log: ~/diane/ganga/install.log. To override this installation run ganga installer like this:

cd ~/diane
python ganga-install --prefix=~/diane/ganga VERSION

Now update the Ganga version in this file ~/diane/packages/GANGA_VERSION

Source the environment in a fresh shell and you should be done!

Development

diane-run or other diane command gives: ImportError: No module named DIANE_CORBA

Compile the project first: diane-env -d make in the diane install directory.

Writing code: extending the framework, new applications

How to access the WorkerAgent object from application Worker class?

Every application worker object has a _agent attribute which may be used to navigate back to its container (WorkerAgent object).

This may be needed for example, to use the built-in file transfer client (see Executable application for details):

class ExecutableWorker(diane.IApplicationWorker):
...
   def do_work(self,task_data):
      ftc = self._agent.ftc

How to access the RunMaster object from Scheduler or ApplicationManager class?

Every scheduler object has a job_master attribute which is set by the constructor when RunMaster creates the scheduler. Note: if you implement your own scheduler then you are required to call the constructor of the base class (see SimpleTaskScheduler for an example).

The application manager objects have the scheduler attribute which may be used to go back to the containing scheduler object.

For example, to access the default file server object defined by the RunManager from the ApplicationManager you may do this:

self.scheduler.job_master.file_server

In this particular case you may also be interested in getting the local servant object of the file transfer server (for example to setAuthorizedDirs() or access other server implementation details): self.scheduler.job_master.file_server.servant

-- JakubMoscicki - 19 Apr 2007

Edit | Attach | Watch | Print version | History: r24 < r23 < r22 < r21 < r20 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r24 - 2011-03-07 - JakubMoscicki
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    ArdaGrid All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback