Ganga EGEE Tutorial Package
Ganga is an easy-to-use Grid job submission framework
homepage
used in several activities in different application domains, supporting large users community (like in the case of High-Energy Physics) and steering large productions (like for the H5N1 bird flu drug searches or in the International Telecommunication Union digital broadcasting frequency definition).
Ganga supports multiple execution back ends (Grid, Batch, Local) and infrastructures (EGEE Production Grid,
GILDA Testbed
). Ganga should be installed on a machine enabled to submit jobs to the corresponding back end (in the case of grid submission, we call this machine a Grid User Interface). To use a local batch system (e.g. LSF, PBS) the batch system commands should be available on the same machine.
Installation
Skip this step if there has been an pre-installed Ganga on the system.
Follow instruction at:
http://cern.ch/ganga/download
Tutorial plugins are located in
GangaTutorial
directory and shipped with the Ganga core itself. Use
--extern=GangaTutorial
option of the installer.
You should configure ganga by specifying your Virtual Organization, type of proxy, UI etc. For Gilda UI machines we already set up GANGA_CONFIG_PATH environment variable to point to site specific configuration files (i.e.
export GANGA_CONFIG_PATH=/usr/local/ganga/prefix/etc/Gilda.ini:GangaTutorial/Tutorial.ini
). In
Gilda.ini
file we set up the 'gilda' VO to be used when acquiring a VOMS proxy certificate and submitting grid jobs. The file
GangaTutorial/Tutorial.ini
is shipped with the ganga itself and it is enabling the tutorial applications.
Submitting jobs in Ganga: quick overview
Refer to the manual for more complete documentation:
Introduction to Ganga
Here we start up the Ganga CLIP environment.
bash> ganga
*** Welcome to Ganga ***
Version: Ganga-5-0-0
Documentation and support: http://cern.ch/ganga
Type help() or help('index') for online help.
In [1]: type ctrl-D to exit Ganga CLIP
Do you really want to exit ([y]/n)?

When starting Ganga, you will be asked for the passphrase of your grid certificate if the grid modules of Ganga are enabled and there is no valid grid proxy in your environment.
Getting help
You can either consult the interactive help from the Ganga interactive session by typing
help('index')
or via web the online
reference manual
.
You may also look at the
Ganga Introduction
First Ganga job: running an arbitrary shell script
Running a "HelloWorld" job is straight-forward in Ganga. You could try it by typing the following commands in the Ganga CLIP.
In [1]: Job().submit()
In [2]: jobs
Here we will do a little bit advance to run an arbitrary user script using the built-in application of Ganga, the
Executable application.
Preparing your shell script
Since any shell command can be called within the Ganga CLIP, one can start creating a shell script like this:
In [1]: !pico myscript.sh
The following example takes one argument from the command and grabs the hostname, cpuinfo and meminfo of the machine the script is executed.
#!/bin/sh
echo "Hello ${1} !"
echo $HOSTNAME
cat /proc/cpuinfo | grep 'model name'
cat /proc/meminfo | grep 'MemTotal'
echo "Run on `date`"
When you finished editing the script close the editor and you'll be back in Ganga CLIP.
and make the script executable:
In [2]: !chmod +x myscript.sh
Running the shell script on local machine in interactive mode
Type in the following commands in the Ganga CLIP, you will launch your first Ganga job running a user specified shell script interactively:
In [4]: j = Job()
In [5]: j.application = Executable()
In [6]: j.application.exe = File('myscript.sh')
In [7]: j.application.args = ['Vietnam']
In [8]: j.backend=Interactive()
In [9]: j.submit()
In [10]: j
In [11]: jobs
Running the shell script on your local machine in batch mode (background)
In [11]: j1 = j.copy()
In [12]: j1.backend = Local()
In [13]: j1.submit()
In [13]: jobs
In [14]: j1.peek()
In [15]: cat $j1.outputdir/stdout
Running the shell script on LCG (and gLite)
We have tested our script locally through Ganga. Now we want to switch to run it on a production grid environment, the LCG. What we have to do in Ganga is just assign the job to use the
LCG backend. In the following example, the new job is created by cloning the previous job. This saves time to re-do what we have done before.
In [16]: j2 = j.copy()
In [17]: j2.backend = LCG()
In [18]: j2.application.args = ['Europe']
In [19]: j2.submit()
For the jobs submitted to LCG, the job's logging info could be queried from the LCG logging & bookkeeping system within Ganga.
In [20]: cat $j2.backend.loginfo(verbosity=1)
Checking out the final outputs
Once the jobs are in the
completed state, one can check the output using the following ways:
In [21]: j.peek() list files in the job's output dir
In [22]: !zcat $j.outputdir/stdout.gz print the stdout on the terminal
Note: currently the stdout and stderr files may be gzipped automatically depending on the backend (no full symmetry in the implementation)
Exercise: Prime number factorization
The task is to find the prime factors of a given integer. For example:
1925 = 5*5*7*11, so
5,
7,
11 are the (prime) factors of 1925.
Finding very large prime factors is almost impossible (requires almost infinite time). In the exercise we can factorize any number but only if the factors are among the first 15 million known prime numbers. We have 15 tables of 1 million prime numbers each and we can scan the table in the search of the factors. A collection of tables is called a PrimeTableDataset. The tables are used by PrimeFactorizer application.
Exercises
- Factorize the number 1925 on your local machine using the table containing first million of prime numbers
- Factorize the number 118020903911855744138963610 by:
- using EGEE infrastracture (LCG backend)
- using all 15 tables containing prime numbers
- splitting the workload into 5 subjobs
Hints
- create a Ganga job
- associate the job with an application
- configure the application
- associate the job with an input dataset
- configure the input dataset
- (optional) associate the job with a splitter
- configure the splitter
- use check_prime_job(j) function to see the outcome on selected job.
- the application object:
PrimeFactorizer
- the input dataset object:
PrimeTableDatase
- the splitter object:
PrimeFactorizerSplitter
Solution
Advanced application: BLAST
Resources
--
JakubMoscicki - 01 Mar 2007
--
AdrianMuraru - 12 Jun 2007