DIANE Frequently Asked Questions and Cookbook
Installation
Which platforms are supported?
See
DIANE2Installation.
The diane-run command gives: ImportError: No module named omniORB
Make sure you followed the installation instructions and set up the environment. See
DIANEEnvironmentScripts for more details.
If things still do not work, then see
DIANEPlatformDoctor
This is a problem of the python interpreter which has been compiled using different options for handling unicode characters (different length of unicode characters). Apparently the compiled libraries on our default platform are not compatible with your operating system.
See
DIANEPlatformDoctor.
Cannot install from the central web repository at cern.ch (e.g. a cluster without web access)
You may install
DIANE from alternative locations (installation mirrors):
DIANEInstallationMirroring
Environment
ImportError: No module named diane.PACKAGE
These problems are probably due to flawed shell behavior which resets PATH and other environment variables when a new shell is started. Try to set PATH by doing:
$(/.../diane-env)
See
DIANEEnvironmentScripts for more details
ERROR: cannot set up DIANE environment (diane-env not in PATH)
See above.
diane-run: command not found
See above.
Connectivity
Is inbound/outbound connectivity required?
The Master host must have inbound connectivity. The worker host (farm nodes, Grid nodes,...) only need outbound connectivity. So on the Master host you should open at least one port in the firewall. If you want to use the Directory Service then another open port is needed.
Additionally there is additional monitoring of your jobs (
http://gangamon.cern.ch
). It requires outbound TCP connection from Master and from Workers.
How to specify the port number for the Master?
Example: Suppose that you have port 22000 open on the master host for inbound traffic.
Suppose that your run-file is called run.py
Choose any of the options below.
Option one:
> export ORBendPoint=giop:tcp::22000
> diane-run run.py
Option two:
> env ORBendPoint=giop:tcp::22000 diane-run run.py
Option three:
Add the following lines to the run.py file:
default_master = """
endPoint = giop:tcp::22000
"""
And then:
> diane-run run.py
How to set options for transport layer (omniORB)?
Any option for the omniORB transport layer may be set as in the example above.
Full description is here:
DIANE2OmniOrbConfiguration
More information on other options available from the
DIANE transport layer (omniORB):
http://omniorb.sourceforge.net/omni41/omniORB/
Connectivity problems?
If you cannot connect to the master, you may have a firewall problem or a private IP address encoded in the Master Object Id. This happens on NAT networks.
Use catior utility provided by omniORB to see what information is published in the Master Object Id:
catior `cat ~/diane/runs/NNN/MasterOID`
If the IP address is something like
192.168.8.7
then you are likely on a private network. Define your public IP address using the
ORBendPoint
parameter.
Clusters without any internet connectivity
You probably need to disable monitoring to gangamon.cern.ch.
- Disable monitoring in your run file
def run(input,config):
....
config.MSGMonitoring.MSG_MONITORING_ENABLED = False
- Disable monitoring in diane_install_dir/etc/ganga.ini
[MonitoringServices]
# comment out all lines here
The exact location of this file is normally defined in
$GANGA_CONFIG_PATH
environment variable.
Runtime problems
Termination of master or workers: TypeError: 'NoneType' object is not callable
Due to python2.2 threading semantics sometimes you may get this exception. It is harmless and you may just ignore it if you see this trace back:
Unhandled exception in thread:
Traceback (most recent call last):
File "/usr/lib/python2.2/threading.py", line 429, in __bootstrap
self.__stop()
File "/usr/lib/python2.2/threading.py", line 438, in __stop
self.__block.notifyAll()
File "/usr/lib/python2.2/threading.py", line 242, in notifyAll
self.notify(len(self.__waiters))
File "/usr/lib/python2.2/threading.py", line 224, in notify
me = currentThread()
TypeError: 'NoneType' object is not callable
I get error: MARSHAL: CORBA.MARSHAL(omniORB.MARSHAL_PassEndOfMessage, CORBA.COMPLETED_YES)
This error means that the size of data passed between the master and worker exceed the internal memory buffer of omniORB. You should increase the buffer size by setting
giopMaxMsgSize
to a larger value (in bytes). For example to use 10MB buffer, modify the run file:
default_master = """
giopMaxMsgSize = 1000000
"""
default_worker = """
giopMaxMsgSize = 1000000
"""
Running master on a different host
Instructions how to submit worker agents on one host (e.g LCG UI, the "submitter host") and run the master on a different host ("master host").
If there is a shared filesystem between submitter host and the master host then it is enough to define $DIANE_USER_WORKSPACE to point on both hosts to the same location.
Starting master without submitting workers
On the master host: start
DIANE master without submitting workers i.e. use the command
diane-run
Example:
diane-run python/diane/test/testOK.py
Submitting workers to the Grid using shared file system
diane-env -d ganga LCGSubmitter.py --diane-run-file python/diane/test/testOK.py --diane-worker-number N
Submitting workers without shared file system
If you do not have a shared file system then you must have the MasterOID file available on the submitter host. The easiest is to copy the whole master directory (it is small at this point and contains only few files).
The directory of the last started master (current) may be obtained in this way:
RUNDIR=`diane-env -d python -c 'import diane.workspace; print diane.workspace.getRundir()'`
Example:
scp -r $RUNDIR submitter.host:MASTERDIR
diane-env -d ganga LCGSubmitter.py --diane-run-file python/diane/test/testOK.py --diane-master=MASTERDIR/MasterOID --diane-worker-number N
Note: not all content of the run file needed for worker submission, however the following pieces of information may be used:
- worker transport layer configuration (static) as defined by the
default_worker
variable (see DIANE2OmniOrbConfiguration)
- submission callbacks (
worker_submit
and initialize_submitter
)
-
run()
function may contain user-defined actions which may impact the submission.
Enabling GSI secure mode
In order to start the
DIANE master in secure mode (i.e incoming connections from workers are authenticated using Grid credentials)
you need to follow steps described in
DIANE2GSI
How to override the default Ganga version used by DIANE?
These instructions assume that you installed
DIANE in a standard place in the home directory (in some other case just adjust the paths).
When diane is installed it also installs a default Ganga version, as you can see in the installation log:
~/diane/ganga/install.log
. To override this installation run ganga installer like this:
cd ~/diane
python ganga-install --prefix=~/diane/ganga VERSION
Now update the Ganga version in this file
~/diane/packages/GANGA_VERSION
Source the environment in a fresh shell and you should be done!
Development
diane-run or other diane command gives: ImportError: No module named DIANE_CORBA
Compile the project first:
diane-env -d make
in the diane install directory.
Writing code: extending the framework, new applications
How to access the WorkerAgent object from application Worker class?
Every application worker object has a
_agent
attribute which may be used to navigate back to its container (WorkerAgent object).
This may be needed for example, to use the built-in file transfer client (see Executable application for details):
class ExecutableWorker(diane.IApplicationWorker):
...
def do_work(self,task_data):
ftc = self._agent.ftc
How to access the RunMaster object from Scheduler or ApplicationManager class?
Every scheduler object has a
job_master
attribute which is set by the constructor when RunMaster creates the scheduler. Note: if you implement your own scheduler then you are required to call the constructor of the base class (see SimpleTaskScheduler for an example).
The application manager objects have the
scheduler
attribute which may be used to go back to the containing scheduler object.
For example, to access the default file server object defined by the RunManager from the ApplicationManager you may do this:
self.scheduler.job_master.file_server
In this particular case you may also be interested in getting the local
servant object of the file transfer server (for example to setAuthorizedDirs() or access other server implementation details):
self.scheduler.job_master.file_server.servant
--
JakubMoscicki - 19 Apr 2007