Introduction
DIANE is a tool for managing large number of small independent tasks (typically for parametric study). It works based on master-worker scheme which can improve application execution time and provide partial fault tolerance (in comparison with built-in gLite parametric job).
DIANE is very flexible and power users may customize almost any aspect of it, including scheduling and application wrappers.
This tutorial just gives the basics: how to run simple executable applications from the point of view of a simple user.
If you want to know more here is more reading. But you don't have to read it now to continue the tutorial:
The basics
DIANE run consists of a master and worker agents. The master is a mini-server which is started automatically when the processing starts and goes away when the processing terminates. The workers agents typically run as jobs (on the Grid, batch system). Master will survey the processing of your tasks (making sure that all tasks are completed). The worker agents provide the CPU power to do the tasks. If tasks fail or workers die for any reason the master will automatically reassign the tasks to other workers. As a user you may easily control what should be the master policies in this respect.
Worker agents are submitted as jobs to the EGEE Grid using the Ganga interface (
http://cern.ch/ganga
). Using Ganga is not mandatory but it is very convenient and flexible because it allows you to easily use your batch system or other local resources. We will come back to this later.
To run
DIANE you must have:
- at least one open network port which accepts incoming TCP connections (talk to your system administrator if needed)
- if you want to submit to the Grid then the Grid job submission commands must be available (EDG or gLite). This is so called Grid User Interface.
- if you want to submit to a batch system then the batch system commands (bsub,qsub... etc) must be available (LSF or PBS or SGE)
Of course if you just want to use your local batch system then you do not need to have the Grid UI and vice-versa.
Installation
Follow the instruction at
http://cern.ch/diane/install.php
Initial configuration
Ganga (submitter interface)
The
diane-submitter
command encapsulates the Ganga interface. Before you submit any job you must configure Ganga correctly. Ganga is installed in the background by the
DIANE installation script.
Note: the
diane-submitter
script is new in version 2.4, for older versions you should use
diane-env -d ganga
instead.
First time users: run
diane-submitter -i
to enter Ganga prompt. Type
^D
(Control-D) to exit.
Then edit the configuration file
~/.gangarc
For running at the Grid at minimum you should define the following parameters:
-
[LCG]VirtualOrganisation=YourVO
-
[LCG]GLITE_ENABLE=True
if you want to use gLite middleware
-
[LCG]EDG_ENABLE=True
if you want to use EDG middleware
For using he batch system you typically do not need to configure anything except when your batch system is installed in a strange way. See corresponding section of
~/.gangarc
file [LSF,PBS,SGE].
Simple Example
Here is example of simple executable application.
Suppose that you have a hello script is in your current working directory and it looks like this:
#!/usr/bin/env bash
rm -f message.out
echo hello $* > message.out
echo "I said hello $* and saved it in message.out"
After changing the executable permission bits (
chmod u+x hello
) you may simply run it like this:
hello 123
.
Now suppose that you want to run 20 times the "hello" executable script, changing its arguments every time. So have 20 almost identical
tasks. In
DIANE you define the work to be done using a run file which is a simple python file.
File
hello.run:
# tell DIANE that we are just running executables
# the ExecutableApplication module is a standard DIANE test application
from diane_test_applications import ExecutableApplication as application
# the run function is called when the master is started
# input.data stands for run parameters
def run(input,config):
d = input.data.task_defaults # this is just a convenience shortcut
# all tasks will share the default parameters (unless set otherwise in individual task)
d.input_files = ['hello']
d.output_files = ['message.out']
d.executable = 'hello'
# here are tasks differing by arguments to the executable
for i in range(20):
t = input.data.newTask()
t.args = [str(i)]
Since release 2.1 there is a possibility to add extra monitoring information to tasks. For example you may add some application-specific details or labels to easily keep track of work done by the tasks. This new functionality is described on a separate
DIANETaskMonitoring page.
Now you can start the master using the run file:
$ diane-run hello.run
The master will start in its own run directory (this information is printed by the master - check the output). The rundir is typically located in
~/diane/runs/nnn
. The default location may be changed with
$DIANE_USER_WORKSPACE
environment variable.
Note: If you do not specify the port then master will be started on a random port (selected by the operating system). This may not work if you have firewall and you may be required to use only certain ports. Check
DIANEQuestionsAndAnswers on how to set the master's port number.
You may now start a couple of worker agents:
$ diane-submitter Local --diane-worker-number=2
This command will start 2 worker agents locally on your computer. You will see master producing quite some output. After a while the processing should be terminated and you are ready to see the results. All results are stored by the master in the run directory (this behaviour may be customized and depends on the application plugins).
diane-submitter
uses Ganga tool to submit and run worker agent jobs. Each of the worker agent jobs can process multiple diane tasks. If you have many worker agent jobs the run completion time will be shorter. If you have less worker agent jobs or if some of the worker jobs crash for some reason than the only noticible effect will be the slowdown of the run but everything will continue to run without you intervention. You may also add new worker agents at any time.
Running
diane-submitter -i
enters interactive mode and is equivalent to running
diane-env -d ganga
without any arguments. In this mode you can inspect what is the status of your worker agent jobs, kill them if you like, inspect the stdout/stderr when they are terminated and so on. More in the Ganga tutorial:
http://cern.ch/ganga/user/html/GangaIntroduction
Quick recipe to get the stderr of the worker job:
- start
diane-submitter -i
and wait until the worker job status is completed
- then use
j.peek()
method or ls -l $j.outputdir
.
Querying the master
The
full index of commands
is provided in the Related Pages of the
Reference Manual
.
Check
-h
and
--help
options.
Every time you execute diane-run a new run directory is created. In this way you may start a number of masters which will not clash with one another.
Here are some commands which directly talk to the master. Unless you specify otherwise the commands always apply to the last started master.
-
diane-master-ping
: checks if the master is alive,
-
diane-master-ping getStatusReport
: gets the summary of the master status,
-
diane-master-ping getStatusReport
: gets more detailed information about the master status,
-
diane-master-ping kill
: kills the master.
Use
-f
option to select a different master (if you have many of them running concurrently).
If you have started a number of masters and you are lost, you may use
diane-ls
command which will give you the summary on all locally started masters.
Submiting more workers (adding resources to the master)
Submission of worker agent jobs is easily handled with Ganga submitter scripts.
List all available submitters with
diane-submitter -l
Note: the
diane-submitter
script is new in version 2.4, for older versions you should use
diane-env -d ganga
instead. You will not be able to use -l option.
A few predefined submitters are distributed with the release:
here is the list in SVN
.
User-defined submitter scripts may be placed in
~/diane/submitters
.
Here are some examples:
- Submitting 1 more worker to local batch system (LSF):
- Submitting 1 more worker locally:
- Submitting 5 more workers on the EGEE/EGI/LCG Grid which will connect to the last started master (corresponding to the latest directory in
$DIANE_WORKSPACE/runs
):
-
diane-submitter LCG --diane-worker-number 5
- Submitting 5 workers which will connect to the master number XXX (corresponding to =$DIANE_WORKSPACE/runs/XXX):
-
diane-submitter LCG --diane-worker-number 5 --diane-master=workspace:XXX
- Starting a worker on an arbitrary host (e.g. selected node of a private cluster)
In same cases you may want to use the
--diane-run-file
option which will pass additional configuration parameters into the worker agent jobs. Examples:
- you may want both master and workers be started in the authenticated mode (GSI). You can manage the configuration parameters from a single place - the run file.
- your application may require customization of the worker agent job submission - such customization is made available via the run file.
You may also easily write your own submitter scripts to customize the system to your needs. Look at
LocalSubmitter.py
for an easy example or at
LCGSubmitter.py
for more structured implementation.
Using the worker agent factory
The
AgentFactory is a special submitter script which automatizes the submission of the worker jobs. If you submit a bunch of worker jobs then over a longer period of time you'll see that some of these worker jobs terminated (for different reasons). You may be however interested on keeping a certain number of worker agents always in the pool. The worker factory will do exactly that on your behalf - if the number of worker agents drops, the worker factory will automatically submit some more. The worker factory may be run in a cron or directly from the command line.
Current implementation of the agent factory works with the LCGSubmitter.py and it uses the heurisitcs to choose the best possible Computing Elements.
Try:
diane-env -d ganga AgentFactory.py --help
Task monitoring
Since release 2.1 there is a possibility to add extra monitoring information to tasks. For example you may add some application-specific details or labels to easily keep track of work done by the tasks. This new functionality is described on a separate
DIANETaskMonitoring page.
Configure runtime parameters and job scheduling policies
Most of applications use the
SimpleTaskScheduler
, which allows to set
several simple scheduling policies
.
Core framework
runtime configuration parameters
define other advanced settings.
Scheduler policies and configuration parameters are set in the run file. A canonical example:
def run(input,config):
input.scheduler.policy.STOP_IF_FAILED_TASKS = True
input.scheduler.policy.FAILED_TASK_MAX_ASSIGN = 1
config.WorkerAgent.HEARTBEAT_DELAY = 10
....
Several examples are provided in the
test directory
Advanced scheduling with job requirements: matching task and worker capabilities
DIANERequirementsCapabilityScheduling
OUTDATED: If you want your master to join the directory service, then you should specify an additional --ds
option. Read more on DIANEDirectoryService
--
JakubMoscicki - 08 Jun 2007