Spåtind Grid Tutorial 2010

Introduction

Welcome to the Spåtind Grid Tutorial 2010, part of the Spaatind 2010 Nordic Conference in Particle Physics. The aim of this short course is to give novice grid users a feel for what the grid is about, and also let you send some simple first jobs, while still giving some relevant information and more complex exercises for experienced users.

The tutorial consists of two parts:

  1. A series if introductory lectures. The official programme can be found here.
  2. A set of exercises, which can be found on this page.

Prerequisites

To follow the exercise part of this tutorial, you need:

  • A linux computer with web access (or the possibility to ssh from windows to a linux box somewhere)
  • A grid certificate issued by the NorduGrid CA authority, or from elsewhere if you're already part of the ATLAS VirtualOrganization. (This is to get access to the resources made available for the tutorial. If you do not have a grid certificate a test certificate can be provided. This certificate however is not a member of the ATLAS VO and can therefore not access any ATLAS data)

Programme

The course programme is as follows:

  1. Lectures (1h20')
    1. Introduction (5') - Peter
    2. Overview of grid computing and NorduGrid (15') - Alex
    3. The ARC middleware (15') - Martin
    4. Ganga (15') - Bjørn
    5. Grids in ATLAS (15') - Bjørn
    6. Tutorial overview - Peter
  2. Coffee (15')
  3. Exercises (1h) - P/M/B

Exercises

The exercises given here are meant for users at various levels, though everything except part (4) can be followed by anyone. For part (4), you need a membership in the ATLAS VO. We propose four ways through the material, depending on what level you feel you're at:

  • "I've never really heard about or used the grid": Parts 1,2
  • "I know about the grid, but haven't really gotten into it": Parts 1,2,3
  • "I'm an ATLAS physicist and have a grid account, but don't regularily use the grid": Parts 3,4
  • "I'm an experienced ATLAS physicist and casual grid user": Part 4, then try to run your own analysis on the grid

If your level is above this, you're welcome to help us teach other to use the grid wink

1) Getting a certificate, making a voms proxy

Getting a certificate

In order to use the grid you need a certificate provided by a Certificate Authority. The general instructions for request a grid certificate can be found at the NorduGrid Certificate Authority.

However, since getting a certificate normaly takes a couple of days we have provided some temporary certificates.

To get a temporary certificate

  • Download and unpack the certificate tar-ball
  • Follow the instructions given in the spaatind-certificates/README.

Logging in into the grid

First, log in into the grid by creating a proxy which will be used to identifying yourself when using the grid. The proxy will be time limited, the default value is 24 hours.

Create your proxy with

grid-proxy-init

followed by your grid password. If you are using one of the test certificates the password is given in the certificate/README file in the tarball.

You can view information about the validity of your proxy with

grid-proxy-info

If you need to create a proxy with a different validity time you can do this by

 grid-proxy-init -valid=hh:mm:ss 

However you will not need this for these exercises.

Once you are finish communicating with the grid you can destroy your proxy(log-out) by typing
= grid-proxy-destroy=

If you are participating in the ATLAS specific part of these exercises you should also create a voms proxy

voms-proxy-init -voms atlas

2) The ARC middleware, direct job submission

This part of the tutorial will guide you through how to obtain and use the ARC middleware. For the usage we will focus on the ARC command line client. For more information please see the ARC User Manual.

Getting the ARC middleware client

If you do not have a working copy of the ARC middleware it can be obtained at the NorduGrid download page. Just click on Standalone client package button.

  • Download and un-pack the standalone client
  • Setup the package
    cd nordugrid-arc-standalone-...tgz
    source setup.sh

Submitting a simple job

Consider the script hellogrid.sh provided in examples directory in the examples tar-ball. In order to create a grid job that will execute this script we need a job description of the following form

& (executable=hellogrid.sh) 
(stdout=hello.out) 
(stderr=hello.err) 
(gmlog=gridlog) 
(cputime=10) 

Now save the above job description in hellogrid.xrsl and you are ready to send it to a cluster.

If you are using one of the test certificates you should specify the cluster which you want to submit your jobs to. To submit the job withour specifying the cluster
ngsub hellogrid.xrsl
or if using the test certificates specify one of the following clusers
ngsub -c morpheus.dcgc.dk hellogrid.xrsl
ngsub -c grid.uio.no hellogrid.xrsl

If the submission is successful you will be provided with a job-id of the form gsiftp://morpheus.dcgc.dk:2811/jobs/225321262768030254571653.

To see the status of your jobs you can either

  • check the status of a specific job
    ngstat "job-id"
  • check the status of all your jobs
    ngstat -a

If you get an error message saying that the job was not found just wait a bit and try again.

Once the jobs has finish its status will be either FINISHED (if successful) or FAILED (if an error happened).

  • To retrieve the output from a specific job
    ngget "job-id"
  • To retrieve the output from all jobs
    ngget -a

Once the output from a job has been retrieved all information about the job on the cluster is cleaned unless the --keep options is used in the get command. Try to resubmit a job and retrieve the output by
ngget --keep "job-id" Then try to clean the job by
ngclean "job-id"
This command is also very useful if you want to clean information about failed jobs.

In some cases it can be useful to kill running job. This can be done by the ngkill command.

  • Extend the sleep command in hellogrid.sh and submit a job.
  • Then while the job is running try to kill it by executing
    ngkill "job-id"

More advanced jobs

In the previous job no other files than the job description were transfered to or from the cluster. In general jobs will make use of both input and output data. A simple example of this is given in the following job description

& (executable=hellogrid2.sh) 
(inputfiles=("myinputfile.txt" ""))
(outputfiles=("myoutputfile.txt" ""))
(jobname=hellogrid2) 
(stdout=hello.out) 
(stderr=hello.err) 
(gmlog=gridlog) 
(cputime=10) 

In this example the job expects an input file called myinputfile.txt and an output file called myoutputfile.txt. For the input files the second parameter is the path to the input files from where the job submission is done ("" means that the inpt files should be found in the current directory).

For the output files the second parameter is the output destination of our output files i.e. once the job has finished a copy command will be executed from the first parameter to the second. If the second parameter is "" no transfer is done before you make the ngget command.

Try to submit this job and see that you get back the output files once you do the ngget command.

In stead of sending your whole program as input files you can also make use of the software already installed on the cluster. The installed software(Run Time Envoronments) will of course depend on the cluster.

Upon job submission a cluster matching the Run Time Environments specified in the job description will be searched for and if no match is found the submission will not be successful.

In the following example we will try to use the program POVRAY. The input file skyvase.pov and the povray.sh script can be found in the examples/. Try to submit the following job to morpheus.dcgc.dk

& (executable=povray.sh) 
(inputfiles=(skyvase.pov ""))
(outputfiles=(skyvase.png ""))
(runtimeenvironment="APPS/GRAPH/POVRAY-3.6")
(jobname=povray) 
(stdout=pov.out) 
(stderr=pov.err) 
(gmlog=gridlog) 
(cputime=10) 

3) The ganga job submission toolkit

This part of the tutorial will introduce you to ganga, a tool for defining and submitting grid jobs. If you don't know what ganga is, see the talk 'Introduction to ganga' at Spåtind 2010.

Most of the tutorial text will be found on another page. This page will refer you to the relevant sections of the full ganga tutorial, also available as a PDF on the internal Spåtind page.

Installing ganga

The first step is to install ganga. This should take less than a minute, depending on the network connection.

If you have a linux system that is compatible with either RedHat Linux 4 or 5 (RHEL4/5), or equivalently Scientific Linux 4 or 5 (SLC4/5), then we recommend that you install ganga on your local machine. (Other linux flavours are not supported at the moment, because ganga comes with a number of external dependencies that are only available for these systems.)

If not, ssh into a compatible system, e.g. lxplus at CERN if you have an account there.

Then, follow these steps:

  1. Make a directory for ganga, e.g. ~/ganga/ and cd into it.
  2. Get the ganga install script, like this:
    wget http://cern.ch/ganga/download/ganga-install
  3. Make the script executable: chmod u+x ganga-install
  4. If you're on RHEL5 or SLC5, run the following command:
     ./ganga-install --extern=GangaAtlas,GangaNG --platf=i686-slc5-gcc43-opt --prefix=${PWD} 5.4.3 
  5. If you're on a RHEL4 or SLC4 machine, the version string can be skipped. Run this instead:
     ./ganga-install --extern=GangaAtlas,GangaNG --prefix=${PWD} 5.4.3 

This should install ganga on your system.

Things to note:

  1. 5.4.3 is the latest version as of Spåtind 2010. Check the ganga homepage to see the latest version at any given time.
  2. The install option
    --extern=GangaAtlas,GangaNG
    ensures you get all the external software you need to control ARC jobs via ganga. One of these is the ARC middleware itself, so if you use ganga you don't need to install this.

Finally, the install script ends by telling you the path to the ganga executable. Make a note of this - in the following we will call it /path/to/ganga. Feel free to add an alias, like this:

alias ganga="/path/to/ganga" (for bash)

or

alias ganga /path/to/ganga (for csh)

Setting ganga up for submitting ARC jobs

Before running Ganga properly, we will create a configuration script that you can use to alter the way Ganga behaves:

ganga -g

This creates a file '.gangarc'. The options given in here will override anything else (except the command line).

To activate GangaNG and ARC support, you need to make one change to this file:

Close to the top of the file there's a line like this:

#RUNTIME_PATH =

Replace this line with

RUNTIME_PATH =GangaAtlas:GangaNG

Your ganga is now set up for ARC usage.

In addition, to safeguard against some potential grid certificate problems, please do the following before starting ganga (in the terminal where you plan to start it):

cd /where/you/installed/ganga/external/nordugrid-arc-standalone/0.6.5/*/
source setup.sh

First steps with ganga

NOTE This section is only designed to give a brief overview of basic Ganga functionality. For more information, see the Ganga website!

Starting Ganga

We now assume you have a working ganga available.

NB: If you are an ATLAS physicist working on lxplus at CERN, there is an automatic script that sets the correct environment and finds the latest stable release of Ganga. To setup Ganga on lxplus just type the following two commands from a clean shell:

source /afs/cern.ch/sw/ganga/install/etc/setup-atlas.sh  # (or .csh)

You can also specify which version of Ganga to use by issuing this command instead:

source /afs/cern.ch/sw/ganga/install/etc/setup-atlas.sh 5.4.3

Now please start ganga:

/path/to/ganga

(or if you've put it in your path, just ganga)

This should present you with something similar to:


*** Welcome to Ganga ***
Version: Ganga-5-3-6
Documentation and support: http://cern.ch/ganga
Type help() or help('index') for online help.

This is free software (GPL), and you are welcome to redistribute it
under certain conditions; type license() for details.


ATLAS Distributed Analysis Support is provided by the "Distributed Analysis Help" HyperNews forum. You can find the forum at
    https://hypernews.cern.ch/HyperNews/Atlas/get/distAnalysisHelp.html
or you can send an email to hn-atlas-dist-analysis-help@cern.ch

GangaAtlas                         : INFO     Found 0 tasks
Ganga.GPIDev.Lib.JobRegistry       : INFO     Found 0 jobs in "jobs", completed in 0 seconds
Ganga.GPIDev.Lib.JobRegistry       : INFO     Found 0 jobs in "templates", completed in 0 seconds

********************************************************************
New in 5.2.0: Change the configuration order w.r.t. Athena.prepare()
              New Panda backend schema - not backwards compatible
For details see the release notes or the wiki tutorials
********************************************************************

In [1]:

You can quit Ganga at any point using Ctrl-D.

Getting Help

Ganga is based completely on Python and so the usual Python commands can be entered at the IPython prompt. For the specific Ganga related parts, however, there is an online help system that can be accessed using:

In [1]: help() 
************************************

*** Welcome to Ganga ***
Version: Ganga-5-3-5
Documentation and support: http://cern.ch/ganga
Type help() or help('index') for online help.

This is free software (GPL), and you are welcome to redistribute it
under certain conditions; type license() for details.


This is an interactive help based on standard pydoc help.

Type 'index'  to see GPI help index.
Type 'python' to see standard python help screen.
Type 'interactive' to get online interactive help from an expert.
Type 'quit'   to return to Ganga.
************************************

help>

Type 'index' at the prompt to see the Class list available. Then type the name of the particular object you're interested in to see the associated help. You can use 'q' to quit the entry you're currently viewing (though there is currently a bug that displays help on a 'NoneType' object!). You can also do this directly from the IPython prompt using:

In [1]: help(Job)

You might find it useful at this point to have a look in the help system about the following classes that we will be using:

Job
Athena
AthenaMC
ATLASLocalDataset
ATLASOutputDataset
AthenaJobSplitter
DQ2Dataset
DQ2OutputDataset
DQ2JobSplitter
LCG
Panda
NG

Your First Job

We will start with a very basic Hello World job that will run on the machine you are currently logged in on. This will hopefully start getting you used to the way Ganga works. Create a basic job object with default options and view it:

In [1]: j = Job()
In [2]: j

This should give an output similar to:

Out[6]: Job (
 status = 'new' ,
 name = '' ,
 inputdir = '/home/slater/gangadir/workspace/mws/LocalAMGA/0/input/' ,
 outputdir = '/home/slater/gangadir/workspace/mws/LocalAMGA/0/output/' ,
 outputsandbox = [] ,
 id = 0 ,
 info = JobInfo (
    submit_counter = 0
    ) ,
 inputdata = None ,
 merger = None ,
 inputsandbox = [] ,
 application = Executable (
    exe = 'echo' ,
    env = {} ,
    args = ['Hello World']
    ) ,
 outputdata = None ,
 splitter = None ,
 subjobs = 'Job slice:  jobs(0).subjobs (0 jobs)
' ,
 backend = Local (
    actualCE = '' ,
    workdir = '' ,
    nice = 0 ,
    id = -1 ,
    exitcode = None
    )
 )

Note that by just typing the job variable ('j'), IPython tries to print the information regarding it. For the job object, this is a summary of the object that Ganga uses to manage your job. These include the following parts:

  • application - The type of application to run
  • backend - Where to run
  • inputsandbox/outputsandbox - The files required for input and output that will be sent with the job
  • inputdata/outputdata - The required dataset files to be accessed by the job
  • splitter - How to split the job up into several subjobs
  • merger - How to merge the completed subjobs

For this job, we will be using a basic 'Executable' application ('echo') with the arguments 'Hello World'. There is no input or output data, so these are not set. We'll now submit the job:

In [3]: j.submit()

If all is well, the job will be submitted and you can then check it's progress using the following:

In [4]: jobs

This will show a summary of all the jobs currently running. You're basic Hello World job will go through the following stages: 'submitted', 'running', 'completing' and 'completed'. When your job has reached the completed state, the standard output and error output are transferred to the output directory of the job (as listed in the job object). There are several ways to check this output. First, we will use the 'peek' function of the job object:

In [5]: j.peek()

This function can also be used to look at specific files:

In [6]: j.peek("stdout")

The shell command 'less' is used by default. To use other programs, you can specify them as a second argument:

In [7]: j.peek("stdout", "emacs")

You can also use the exclamation mark (!) to directly access shell commands and the dollar sign to use 'python' variables. The above commands could also be carried out using:

In [8]: !ls $j.outputdir
In [9]: !emacs $j.outputdir/stdout

Using either of these two methods, view the stdout file. With any luck, you will see the Hello World message. Congratulations, you've run your first job!

Creating Submission Scripts

Clearly, it would be very tedious if you had to keep typing out the same text to submit a job and so there is scripting available within Ganga. To test this, let's try three types of Hello World job in one go. Create a file called 'first_job.py' and copy the following into it (Mini-test: can you see what's going on in each case?):

j = Job()
j.submit()

j = Job()
j.application.args=['Hello Another World', '42', 'My aunt is a Neptunian giant hamster']
j.submit()

j = Job()
j.application.exe='python'
j.application.args=['-c','print "Hello pythons"']

Then, from within Ganga, you can use the 'execfile' command to execute the script:

In [1]: execfile('first_job.py')

You can also run Ganga in batch mode by doing the following:

ganga first_job.py

At this point, just to show the persistency of your jobs, quit and restart Ganga. Your jobs will be preserved just as you left them!

More Advanced Job Manipulation

To finish off, we will cover some useful features of managing jobs. This is a fairly brief overview and a more complete list can be found at: http://ganga.web.cern.ch/ganga/user/html/GangaIntroduction/

Copying Jobs

You can copy a job regardless of it's status using the following:

j = Job()
j2 = j.copy()

The copied job is able to be submitted regardless of the original jobs status. Consequently, you can do the following:

j = jobs(3)
j.submit()

Job Status

Jobs can be killed and then resubmitted using the following:

j.kill()
j.resubmit()

The status of a job can be forced (e.g. if you think it has hung and you want to set it to failed) using the following:

j.force_status('failed')

Removing Jobs

To clean up your job repository, you can remove jobs using the 'remove' method:

j.remove()

Configuration Options

You can supply different configuration options for Ganga at startup through the .gangarc file. If you wish to change things on the fly however, you can use (there are examples in the next section):

config[section][parameter] = value

To show what the current settings are, just use the straight config value as with jobs:

config[section]

Interactively, the 'config' object behaves as a class so you can do the above using:

config.section.parameter = value

This also allows you to tab-complete your commands.

A simple ARC job

You should now have a basic feel for what ganga can do. The next step is to send a job to the grid. This requires only one further line of code in your job definition - try the following:

j = Job()
j.backend=NG()
j.submit()

You will notice that job submission now takes a bit longer. This is because ganga now uses the ARC middleware to submit the job to the set of grid resources that you are authorized to use. After a few seconds, your job should be accepted by some grid site, and will be queued there for execution. After a while it should finish, and the output will be copied back to the computer where you're running ganga.

NB: You have to keep ganga running for this to happen. You can turn it off and it will pick up all done jobs when you restart it, but if you keep it off for >24h the grid site will regard the job as orphaned and delete the output.

A slightly more complex ARC job

Finally, let's try a more advanced job that uses a few more features of ganga. For this you need something for your job to do - start by downloading these two files:

This is the source code for a simplistic gladiator fight simulator on a square arena. To see what it does, try the following command:

./bgwrapper.sh -B -x 25 -y 25 -f 100 --printfield -w 0.2

The goal now is to run a number of such simulations on grid sites. Create a file with the following ganga job description:

# Make a job
j = Job()

# Give the job a reasonable name
j.name = 'BattleGrid'

# Set the executable application...
j.application=Executable()
# ...and in this case specify it to be 'source'
j.application.exe='source'

# Add some input files to the job
j.inputsandbox=['./bgwrapper.sh','./BattleGrid.tar.gz']

# Set up a job splitter, which will create a number of subjobs
j.splitter=ArgSplitter()
j.splitter.args.append('bgwrapper.sh -B -x 100 -y 100 -f 100 -N 10')
j.splitter.args.append('bgwrapper.sh -B -x 200 -y 200 -f 200 -N 10')
j.splitter.args.append('bgwrapper.sh -B -x 100 -y 100 -f 100 -N 50')
j.splitter.args.append('bgwrapper.sh -B -x 1000 -y 1000 -f 1000 -N 1')

# Specify the ARC/NorduGrid backend
j.backend=NG()

# Submit the job
j.submit()

Now execute this file (execfile('jobfile.py')) in ganga, and you should get four simulations running with various parameters.

This concludes the basic part of the tutorial. The next section is most relevant to ATLAS physicists because of restrictions to data files etc., but feel free to read through the tutorial linked there to get further ideas about what ganga can do. We've only scratched the surface in this short course.

4) ATLAS jobs with ganga

If you're an ATLAS physicist and want to try using the grid, please refer to the official tutorial which is here:

Ganga tutorial within the contex of the ATLAS experiment

This takes you through every step required to do grid analysis within ATLAS, from setting things up on lxplus to advanced ganga configurations.

Links and references

Contact info

Name Email Main area
Peter Rosendahl Peter.Rosendahl@iftNOSPAMPLEASE.uib.no ARC
Martin Skou Andersen skou@nbiNOSPAMPLEASE.dk ARC
Bjorn H. Samset b.h.samset@fysNOSPAMPLEASE.uio.no Ganga, ATLAS analysis

-- BjornS - 29-Dec-2009

Edit | Attach | Watch | Print version | History: r11 < r10 < r9 < r8 < r7 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r11 - 2010-01-08 - WolfgangWalkowiak
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    ArdaGrid All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback