Difference: GangaTutorial1 (1 vs. 14)

Revision 142015-01-28 - NikitaKazeev

Line: 1 to 1
 
META TOPICPARENT name="LHCbSoftwareTutorials"

Ganga Tutorial 1

Line: 48 to 48
 In [1]:
Changed:
<
<
You should now see the Ganga prompt! Check to make sure that the application for this tutorial was loaded (we need PrimeFactorizeer):
>
>
You should now see the Ganga prompt! Check to make sure that the application for this tutorial was loaded (we need PrimeFactorizer):
 
In [1]:plugins('applications')
Out[1]: ['GaudiPython', 'Executable', 'Brunel', 'Moore', 'DaVinci', 'Panoptes', 'Gauss', 'Boole', 'Gaudi', 'Vetra', 'Root', 'Euler', 'PrimeFactorizer']

Revision 132013-02-25 - PatrickOwen

Line: 1 to 1
 
META TOPICPARENT name="LHCbSoftwareTutorials"

Ganga Tutorial 1

Line: 186 to 186
 For LHCb jobs, similar splitters are provided to split jobs up which run on multiple data files, etc.

We also want to add a merger to merge the output from each of the 5 subjobs:

Changed:
<
<
In [7]: j.merger = TextMerger(files=['factors-118020903911855744138963610.dat'])

>
>
In [7]: j.postprocessors = TextMerger(files=['factors-118020903911855744138963610.dat'])

 

When all 5 subjobs are complete, the merger will merge the contents of each of the 5 factors-118020903911855744138963610.dat files into a single file in the master job's output directory (we'll look at what this means below).

Line: 246 to 246
 As an example of running on the grid, we'll run the same set of jobs we ran above but using a different backend. We could retype all of the required info from the previous job definition or, better yet, we could use the TAB and arrow-UP functionality to re-enter the info. An easier way is to just copy the previous Job object, then change the backend so that the jobs run on the Dirac:
In [14]: j = j.copy()  # we could've also used Job(j), etc.
In [15]: j.backend = Dirac()

Changed:
<
<
In [16]: j.outputsandbox = ['factors-118020903911855744138963610.dat']
>
>
In [16]: j.outputfiles = [SandboxFile('factors-118020903911855744138963610.dat')]
 
Changed:
<
<
Notice that we had to add the factors file to the outputsandbox. This is to tell ganga that we want this file returned to us after the grid job is completed.
>
>
Notice that we had to add a SandboxFile to the outputfiles. This is to tell ganga that we want this file returned to us after the grid job is completed.
  Now just submit the jobs the same way as before (since this is the 1st time we've done something that requires a grid proxy, you'll be asked for your grid password if you don't currently have a valid grid proxy on this machine):
In [17]: j.submit()

Revision 122012-12-17 - JackWimberley

Line: 1 to 1
 
META TOPICPARENT name="LHCbSoftwareTutorials"

Ganga Tutorial 1

Line: 239 to 239
 

Running Ganga Jobs on the Grid

Added:
>
>
NOTE: This appears to be broken, as of 17 Dec 2012, and ganga complains that is_prepared is not set for PrimeFactorizer. Perhaps this error is why there is an attached document with a corrected PrimeFactorizer.py file, which is supposed to be an updated version for Ganga > 5.7 (current is 5.8)
 For many LHCb jobs (which often involve processing large amounts of data), running concurrently isn't enough. When a large number of CPU's is required for a job, we need the grid! Specifically, we want to run on the LHC Computing Grid (LCG). For LHCb jobs, this involves the DIRAC workload manager.

As an example of running on the grid, we'll run the same set of jobs we ran above but using a different backend. We could retype all of the required info from the previous job definition or, better yet, we could use the TAB and arrow-UP functionality to re-enter the info. An easier way is to just copy the previous Job object, then change the backend so that the jobs run on the Dirac:

Revision 112012-12-13 - JackWimberley

Line: 1 to 1
 
META TOPICPARENT name="LHCbSoftwareTutorials"

Ganga Tutorial 1

Added:
>
>
 We start with a very simple problem which doesn't require knowledge about any other LHCb software - factorizing prime numbers. Be careful what you copy and paste from this wiki as python doesn't like random white space (it's best if you don't copy and paste anything).

Line: 8 to 9
 

Ganga Setup

First, set the environment for Ganga in a fresh terminal:

Changed:
<
<
SetupProject Ganga

>
>
SetupProject Ganga

 
Changed:
<
<
which gives you the latest version (5.4.3 at time of writing). If you have last used ganga before version 5.4 you may need to do
ganga -g

>
>
which gives you the latest version (5.4.3 at time of writing). If you have last used ganga before version 5.4 you may need to do
ganga -g

 
Added:
>
>
 to update your .gangarc file to include some of the newer options.
Changed:
<
<
For this tutorial, we need to add an extra Ganga configuration file, so do
setenv GANGA_CONFIG_PATH ${GANGA_CONFIG_PATH}:/afs/cern.ch/user/j/jwilliam/public/GangaTutorial/Tutorial.ini
>
>
For this tutorial, we need to add an extra Ganga configuration file, so depending on your shell do
setenv GANGA_CONFIG_PATH ${GANGA_CONFIG_PATH}:/afs/cern.ch/user/j/jwilliam/public/GangaTutorial/Tutorial.ini

or for bash-like shells

export GANGA_CONFIG_PATH=${GANGA_CONFIG_PATH}:/afs/cern.ch/user/j/jwilliam/public/GangaTutorial/Tutorial.ini
 If you don't have access to /afs/cern.ch/user/j/jwilliam/public/GangaTutorial (e.g. you're using a machine that doesn't have afs or if something is wrong w/ my afs account), then you can do everything in this tutorial except submitting to the grid by simply doing this instead:
Changed:
<
<
setenv GANGA_CONFIG_PATH ${GANGA_CONFIG_PATH}:GangaTutorial/Tutorial.ini
You will not need to do either of these for standard LHCb running.
>
>
setenv GANGA_CONFIG_PATH ${GANGA_CONFIG_PATH}:GangaTutorial/Tutorial.ini

or for bash-like shells

export GANGA_CONFIG_PATH=${GANGA_CONFIG_PATH}:GangaTutorial/Tutorial.ini

You will not need to do either of these for standard LHCb running.

  Now start an interactive session:
Changed:
<
<
ganga

>
>
ganga

 
Changed:
<
<
*** Welcome to Ganga *
>
>
*** Welcome to Ganga *
 Version: Ganga-5-4-3 Documentation and support: http://cern.ch/ganga Type help() or help('index') for online help.

This is free software (GPL), and you are welcome to redistribute it under certain conditions; type license() for details.

Changed:
<
<
>
>
 In [1]:
Added:
>
>
 You should now see the Ganga prompt! Check to make sure that the application for this tutorial was loaded (we need PrimeFactorizeer):
Changed:
<
<
In [1]:plugins('applications')
Out[1]: ['GaudiPython', 'Executable', 'Brunel', 'Moore', 'DaVinci', 'Panoptes', 'Gauss', 'Boole', 'Gaudi', 'Vetra', 'Root', 'Euler', 'PrimeFactorizer']

>
>
In [1]:plugins('applications')
Out[1]: ['GaudiPython', 'Executable', 'Brunel', 'Moore', 'DaVinci', 'Panoptes', 'Gauss', 'Boole', 'Gaudi', 'Vetra', 'Root', 'Euler', 'PrimeFactorizer']

 
Added:
>
>
 Ignore the warning if you don't have a valid Grid proxy (you should only see this once; we'll create the proxy when we need it below). You can check which plugins are available to you in each category in your current Ganga session using plugins. Try using the help utility to see if you can figure out how to list all of the available plugins in all categories:
Changed:
<
<
In [2]:help(plugins)

>
>
In [2]:help(plugins)

 
Added:
>
>
 This runs less, so type q to exit. Ganga provides help information on just about every object, method, etc. Try this first if you get stuck.

A Local GangaTutorial

Line: 61 to 67
  The Tutorial.ini file should be edited to look something like this.
Changed:
<
<
[Configuration]

>
>
[Configuration]

 RUNTIME_PATH = /afs/cern.ch/user/a/auser/public/GangaTutorial
Line: 70 to 75
 
setenv GANGA_CONFIG_PATH ${GANGA_CONFIG_PATH}:/afs/cern.ch/user/a/auser/public/GangaTutorial/Tutorial.ini
Changed:
<
<
Or for bash-like shells
>
>
or for bash-like shells
 
export GANGA_CONFIG_PATH=${GANGA_CONFIG_PATH}:/afs/cern.ch/user/a/auser/public/GangaTutorial/Tutorial.ini
Line: 82 to 86
 

Running a Factorization Job using Ganga

Let's start with a small example. The goal is to find the prime factors of the integer 1925. For such a small number, we (clearly) only need the first prime number data table (recall that each table contains 1 million prime numbers). At the Ganga prompt, type the following:

Changed:
<
<
In [1]: j = Job()

>
>
In [1]: j = Job()

 In [2]: j.application = PrimeFactorizer(number=1925) In [3]: j.inputdata = PrimeTableDataset(table_id_lower=1, table_id_upper=1)

At this point, we've created a Job object but we haven't run anything yet. We're free to edit its attributes as much as we like prior to submitting the job. For actual LHCb jobs, the application might be DaVinci while the inputdata could be a list of LHCb data files. The idea and most of the syntax are the same though as in this simple example. To see all of the job's attributes, do

Changed:
<
<
In [4]:j
Out[4]: Job (

>
>
In [4]:j
Out[4]: Job (

  status = 'new' , name = '' , ... backend = Local ( ... )
Changed:
<
<
)
>
>
)
 
Added:
>
>
 Notice that the backend is set to Local (which is the default value since we didn't specify where we wanted the job to run). This means that the job will run in the background on the local machine.

OK, let's submit the job:

Line: 101 to 104
 Notice that the backend is set to Local (which is the default value since we didn't specify where we wanted the job to run). This means that the job will run in the background on the local machine.

OK, let's submit the job:

Changed:
<
<
In [5]: j.submit()
Ganga.GPIDev.Lib.Job               : INFO     submitting job 0

>
>
In [5]: j.submit()
Ganga.GPIDev.Lib.Job               : INFO     submitting job 0

 Ganga.GPIDev.Adapters : INFO submitting job 0 to Local backend
Changed:
<
<
Ganga.GPIDev.Lib.Job : INFO job 0 status changed to "submitted"
>
>
Ganga.GPIDev.Lib.Job : INFO job 0 status changed to "submitted"
 

We can check the status of the job by doing

Changed:
<
<
In [6]:j.status

>
>
In [6]:j.status

 
Added:
>
>
 This will either be submitted, running or completed. If the job hasn't finished yet, wait for a few seconds and check again (for such a small number, the job should finish very quickly).

We can see what files were output by the job by doing

Line: 115 to 117
 This will either be submitted, running or completed. If the job hasn't finished yet, wait for a few seconds and check again (for such a small number, the job should finish very quickly).

We can see what files were output by the job by doing

Changed:
<
<
In [7]:j.peek()
total 8.0K

>
>
In [7]:j.peek()
total 8.0K

 -rw-r--r-- 1 jwilliam z5 0 Jan 9 14:24 syslog -rw-r--r-- 1 jwilliam z5 232 Jan 9 14:24 stdout -rw-r--r-- 1 jwilliam z5 4.9K Jan 9 14:24 stderr -rw-r--r-- 1 jwilliam z5 86 Jan 9 14:24 jobstatus
Changed:
<
<
-rw-r--r-- 1 jwilliam z5 26 Jan 9 14:24 factors-1925.dat
>
>
-rw-r--r-- 1 jwilliam z5 26 Jan 9 14:24 factors-1925.dat
 
Added:
>
>
 All Ganga jobs return the standard output and error in the files stdout and stderr. This job has also produced the file factors-1925.dat. We can view the contents of this file using
Changed:
<
<
In [8]:j.peek('factors-1925.dat')

>
>
In [8]:j.peek('factors-1925.dat')

 
Added:
>
>
 which opens the file using less (use standard less commands to scroll etc., type q to quit).

The file should contain the factors [(5, 2), (7, 1), (11, 1)], let's check if this is correct:

Line: 131 to 133
 which opens the file using less (use standard less commands to scroll etc., type q to quit).

The file should contain the factors [(5, 2), (7, 1), (11, 1)], let's check if this is correct:

Changed:
<
<
In [9]:(5**2)*7*11 == 1925
Out[9]: True

>
>
In [9]:(5**2)*7*11 == 1925
Out[9]: True

 
Added:
>
>
 Remember, standard python syntax works at the Ganga prompt!

OK, so we've run a job and checked the output using Ganga's magic but for a real analysis you'll often want direct access to the file. So, where is factors-1925.dat? It's in the job's output directory. You can obtain the full path of this directory via

Line: 138 to 140
 Remember, standard python syntax works at the Ganga prompt!

OK, so we've run a job and checked the output using Ganga's magic but for a real analysis you'll often want direct access to the file. So, where is factors-1925.dat? It's in the job's output directory. You can obtain the full path of this directory via

Changed:
<
<
In [10]:j.outputdir
Out[10]: /afs/cern.ch/user/j/jwilliam/gangadir/workspace/jwilliam/LocalAMGA/0/output/

>
>
In [10]:j.outputdir
Out[10]: /afs/cern.ch/user/j/jwilliam/gangadir/workspace/jwilliam/LocalAMGA/0/output/

 

This is a normal directory that you own; thus, you have permission to access the files there from a process independent of Ganga. So, you could exit Ganga and examine factors-1925.dat using, e.g., cat on the Linux command line...or, you could do this from Ganga. You can access shell commands from the Ganga prompt using ! as follows:

Changed:
<
<
In [11]:!ls ~/.globus
usercert.pem  userkey.pem

>
>
In [11]:!ls ~/.globus
usercert.pem  userkey.pem

  In [12]:!cat $j.outputdir/factors-1925.dat
Changed:
<
<
[(5, 2), (7, 1), (11, 1)]
>
>
[(5, 2), (7, 1), (11, 1)]
 
Added:
>
>
 Notice that you can use the $ character to access python variables when using the ! to access shell!

A few other basic convenience features which you can play around with involve scrolling through the history and using the TAB completion. Try using the arrow-UP to scroll through the history of the Ganga commands you've executed so far (works the same as when in a shell). You can use TAB completion on keywords, variables, objects, etc. Try the following (where TAB and arrow-UP mean hit those keys, don't type it out):

Line: 155 to 156
  A few other basic convenience features which you can play around with involve scrolling through the history and using the TAB completion. Try using the arrow-UP to scroll through the history of the Ganga commands you've executed so far (works the same as when in a shell). You can use TAB completion on keywords, variables, objects, etc. Try the following (where TAB and arrow-UP mean hit those keys, don't type it out):
Changed:
<
<
In [13]:j.app<TAB>

>
>
In [13]:j.app<TAB>

  In [13]:j.application
Line: 172 to 171
  Now that you've seen some of the basics of Ganga, let's try something a little more interesting - factorizing a very large integer. For this we'll need a PrimeTableDataset which contains all 15 tables of prime numbers. To speed things up, we will also split the job into 5 local subjobs which will run concurrently.
Deleted:
<
<
 First, define a job as before but w/ a larger number and using all 15 prime number tables (feel free to use the arrow-UP and TAB keys to do this instead of typing it all out!):
Changed:
<
<
In [1]: j = Job()

>
>
In [1]: j = Job()

 In [2]: j.application = PrimeFactorizer(number=118020903911855744138963610) In [3]: j.inputdata = PrimeTableDataset() In [4]: j.inputdata.table_id_lower = 1
Line: 183 to 180
 

Now add a splitter to divide up the task of finding all the prime factors (here we'll make 5 subjobs):

Changed:
<
<
In [6]: j.splitter = PrimeFactorizerSplitter(numsubjobs=5)

>
>
In [6]: j.splitter = PrimeFactorizerSplitter(numsubjobs=5)

 
Added:
>
>
 For LHCb jobs, similar splitters are provided to split jobs up which run on multiple data files, etc.

We also want to add a merger to merge the output from each of the 5 subjobs:

Line: 189 to 186
 For LHCb jobs, similar splitters are provided to split jobs up which run on multiple data files, etc.

We also want to add a merger to merge the output from each of the 5 subjobs:

Changed:
<
<
In [7]: j.merger = TextMerger(files=['factors-118020903911855744138963610.dat'])

>
>
In [7]: j.merger = TextMerger(files=['factors-118020903911855744138963610.dat'])

 
Added:
>
>
 When all 5 subjobs are complete, the merger will merge the contents of each of the 5 factors-118020903911855744138963610.dat files into a single file in the master job's output directory (we'll look at what this means below).

OK, now submit the job (actually, the 5 jobs) just like we did above:

Line: 195 to 192
 When all 5 subjobs are complete, the merger will merge the contents of each of the 5 factors-118020903911855744138963610.dat files into a single file in the master job's output directory (we'll look at what this means below).

OK, now submit the job (actually, the 5 jobs) just like we did above:

Changed:
<
<
In [8]: j.submit()
Ganga.GPIDev.Lib.Job               : INFO     submitting job 1

>
>
In [8]: j.submit()
Ganga.GPIDev.Lib.Job               : INFO     submitting job 1

 Ganga.GPIDev.Adapters : INFO submitting job 1.0 to Local backend Ganga.GPIDev.Lib.Job : INFO job 1.0 status changed to "submitted" Ganga.GPIDev.Adapters : INFO submitting job 1.1 to Local backend
Line: 207 to 203
 Ganga.GPIDev.Adapters : INFO submitting job 1.3 to Local backend Ganga.GPIDev.Lib.Job : INFO job 1.3 status changed to "submitted" Ganga.GPIDev.Adapters : INFO submitting job 1.4 to Local backend
Changed:
<
<
Ganga.GPIDev.Lib.Job : INFO job 1.4 status changed to "submitted"
>
>
Ganga.GPIDev.Lib.Job : INFO job 1.4 status changed to "submitted"
 

You can check the status off all 5 jobs by simpy doing:

Changed:
<
<
In [9]:j.status

>
>
In [9]:j.status

 
Added:
>
>
 If any of the jobs is still running, the status of the master job will be listed as running. If all 5 jobs are completed, the master job's status will also be completed. Wait until all 5 jobs are done (should take less than a minute) before moving on (in the mean time you can play around with help, e.g. try help(j.submit)...remember, type q to quit).

Once the jobs are complete, let's look at the output of one of the subjobs (do exactly what we did above):

Line: 217 to 213
 If any of the jobs is still running, the status of the master job will be listed as running. If all 5 jobs are completed, the master job's status will also be completed. Wait until all 5 jobs are done (should take less than a minute) before moving on (in the mean time you can play around with help, e.g. try help(j.submit)...remember, type q to quit).

Once the jobs are complete, let's look at the output of one of the subjobs (do exactly what we did above):

Changed:
<
<
In [10]:j.subjobs[2].peek()
total 18K

>
>
In [10]:j.subjobs[2].peek()
total 18K

 -rw-r--r-- 1 jwilliam z5 0 Jan 9 18:08 syslog -rw-r--r-- 1 jwilliam z5 564 Jan 9 18:08 stdout -rw-r--r-- 1 jwilliam z5 15K Jan 9 18:08 stderr -rw-r--r-- 1 jwilliam z5 86 Jan 9 18:08 jobstatus
Changed:
<
<
-rw-r--r-- 1 jwilliam z5 17 Jan 9 18:08 factors-118020903911855744138963610.dat
>
>
-rw-r--r-- 1 jwilliam z5 17 Jan 9 18:08 factors-118020903911855744138963610.dat
  In [11]:j.subjobs[2].peek('factors-118020903911855744138963610.dat')
Line: 231 to 227
 The file should contain the factor [(141650963, 1)]. Each of the j.subjobs is itself a Job (try printing it), so you can do anything you would do on an independent job on the subjobs.

Now examine the merged output of all the jobs:

Changed:
<
<
In [12]:j.peek()
total 2.0K

>
>
In [12]:j.peek()
total 2.0K

 -rw-r--r-- 1 jwilliam z5 653 Jan 9 18:08 factors-118020903911855744138963610.dat.merge_summary
Changed:
<
<
-rw-r--r-- 1 jwilliam z5 869 Jan 9 18:08 factors-118020903911855744138963610.dat
>
>
-rw-r--r-- 1 jwilliam z5 869 Jan 9 18:08 factors-118020903911855744138963610.dat
  In [13]:j.peek('factors-118020903911855744138963610.dat')
Deleted:
<
<
The file should contain the factors [(2, 1), (3, 1), (5, 1), (7, 1), (15485867, 1)] [] [(141650963, 1)] [] [(256203221, 1)] (some of the prime number tables don't contain any factors of this particular number). You can check if they're right on the Ganga prompt like we did above. Notice that the master job doesn't have the stdout and stderr files since itself was never actually run. In fact, had we not added the merger to the job there would be no output in the master job's directory.
 
Added:
>
>
The file should contain the factors [(2, 1), (3, 1), (5, 1), (7, 1), (15485867, 1)] [] [(141650963, 1)] [] [(256203221, 1)] (some of the prime number tables don't contain any factors of this particular number). You can check if they're right on the Ganga prompt like we did above. Notice that the master job doesn't have the stdout and stderr files since itself was never actually run. In fact, had we not added the merger to the job there would be no output in the master job's directory.
 

Running Ganga Jobs on the Grid

For many LHCb jobs (which often involve processing large amounts of data), running concurrently isn't enough. When a large number of CPU's is required for a job, we need the grid! Specifically, we want to run on the LHC Computing Grid (LCG). For LHCb jobs, this involves the DIRAC workload manager.

As an example of running on the grid, we'll run the same set of jobs we ran above but using a different backend. We could retype all of the required info from the previous job definition or, better yet, we could use the TAB and arrow-UP functionality to re-enter the info. An easier way is to just copy the previous Job object, then change the backend so that the jobs run on the Dirac:

Changed:
<
<
In [14]: j = j.copy()  # we could've also used Job(j), etc.

>
>
In [14]: j = j.copy()  # we could've also used Job(j), etc.

 In [15]: j.backend = Dirac() In [16]: j.outputsandbox = ['factors-118020903911855744138963610.dat']
Line: 255 to 250
 Notice that we had to add the factors file to the outputsandbox. This is to tell ganga that we want this file returned to us after the grid job is completed.

Now just submit the jobs the same way as before (since this is the 1st time we've done something that requires a grid proxy, you'll be asked for your grid password if you don't currently have a valid grid proxy on this machine):

Changed:
<
<
In [17]: j.submit()
Ganga.GPIDev.Lib.Job               : INFO     submitting job 2

>
>
In [17]: j.submit()
Ganga.GPIDev.Lib.Job               : INFO     submitting job 2

 Enter Certificate password: Ganga.GPIDev.Adapters : INFO submitting job 2.0 to Dirac backend Ganga.GPIDev.Lib.Job : INFO job 2.0 status changed to "submitted"
Line: 268 to 262
 Ganga.GPIDev.Adapters : INFO submitting job 2.3 to Dirac backend Ganga.GPIDev.Lib.Job : INFO job 2.3 status changed to "submitted" Ganga.GPIDev.Adapters : INFO submitting job 2.4 to Dirac backend
Changed:
<
<
Ganga.GPIDev.Lib.Job : INFO job 2.4 status changed to "submitted"
>
>
Ganga.GPIDev.Lib.Job : INFO job 2.4 status changed to "submitted"
 
Added:
>
>
 Congratulations! You've just submitted 5 jobs to the LCG grid via Dirac.

Let's check the status of the jobs:

Line: 273 to 268
 Congratulations! You've just submitted 5 jobs to the LCG grid via Dirac.

Let's check the status of the jobs:

Changed:
<
<
In [17]: j.subjobs
Out[2]: 

>
>
In [17]: j.subjobs
Out[2]: 

 Job slice: jobs(2).subjobs (5 jobs)
# fqid status name subjobs application backend backend.actualCE
Line: 284 to 278
 # 2.2 completed PrimeFactorizer Dirac LCG.GRIDKA.de # 2.3 completed PrimeFactorizer Dirac LCG.PIC.es # 2.4 completed PrimeFactorizer Dirac LCG.USC.es
Changed:
<
<
>
>
 
Added:
>
>
 Notice that the hostname of the computer which ran (or is running if the job hasn't finished yet) the job is displayed along with the current status. Hopefully your jobs will start soon, but it's possible (depending on where the job is running) that some of your jobs will stay in the submitted state for a while. If all the jobs are finished, go ahead and check the master's output. If some are still running, check some of the subjobs output and check that it matches what was output by the same subjob when run locally. Once any of the jobs is running or completed, you've run on The Grid!

Running a Later Version of Ganga

Line: 294 to 289
  Inside the Lib directory of the GangaTutorial folder, download the corrected version (you may need to specify the -k flag to ignore the unknown CERN certificate_.
Changed:
<
<
curl https://twiki.cern.ch/twiki/pub/LHCb/GangaTutorial1/PrimeFactorizer.py.txt -o PrimeFactorizer.py

>
>
curl https://twiki.cern.ch/twiki/pub/LHCb/GangaTutorial1/PrimeFactorizer.py.txt -o PrimeFactorizer.py

 

This converts the PrimeFactorization app to a prepared app.

Line: 307 to 301
 

The Job Registry

All of the jobs you've ever run (and not deleted) are contained in the list jobs:

Changed:
<
<
In [1]: jobs
Out[1]: 

>
>
In [1]: jobs
Out[1]: 

 Job slice: jobs (3 jobs)
# fqid status name subjobs application backend backend.actualCE # 0 completed PrimeFactorizer Local lxplus242.cern.ch # 1 completed 5 PrimeFactorizer Local # 2 completed 5 PrimeFactorizer Dirac
Changed:
<
<
>
>
 
Added:
>
>
 If we wanted to rerun the first job, we could do the following:
Changed:
<
<
In [2]: j = jobs(0).copy() 

>
>
In [2]: j = jobs(0).copy() 

 In [3]: j.submit()
Added:
>
>
 The last job can always be accessed using the python list directly using jobs[-1].

Job Templates

Line: 328 to 322
 

Job Templates

Often times when running LHCb jobs you will want to rerun a type of job (e.g. Monte Carlo production jobs). Rather than always copying a previous job, you could set up a template of it. To template the first job we ran, do

Changed:
<
<
In [1]:t = JobTemplate(jobs(0))

>
>
In [1]:t = JobTemplate(jobs(0))

  In [2]:t.name = 'small-prime-factorizer'
Line: 336 to 330
 You don't have to name it, but this will be useful later on to help you find the template you're looking for. The list of all your job templates is stored in the python list templates (the same way jobs are stored in jobs). Try printing it.

Now, create a new job from the template and run it:

Changed:
<
<
In [3]:j = Job(t) # or j = Job(templates(0)),...

>
>
In [3]:j = Job(t) # or j = Job(templates(0)),...

  In [4]:j.submit()
Line: 346 to 340
 

Removing Jobs

If you want to remove a job to save disk space or just because it's obsolete, simply do (try it):

Changed:
<
<
In [1]: jobs
Out[1]: 

>
>
In [1]: jobs
Out[1]: 

 Job slice: jobs (3 jobs)
# fqid status name subjobs application backend backend.actualCE
Line: 356 to 349
 # 1 completed 5 PrimeFactorizer Local # 2 completed 5 PrimeFactorizer Dirac ... plus whatever other jobs you've submitted so far ...
Changed:
<
<
>
>
  In [2]:jobs(0).remove()
Changed:
<
<
Ganga.GPIDev.Lib.Job : INFO removing job 0
>
>
Ganga.GPIDev.Lib.Job : INFO removing job 0
  In [3]: jobs
Changed:
<
<
Out[3]:
>
>
Out[3]:
 Job slice: jobs (2 jobs)
# fqid status name subjobs application backend backend.actualCE # 1 completed 5 PrimeFactorizer Local # 2 completed 5 PrimeFactorizer Dirac ... plus whatever other jobs you've submitted so far ...
Changed:
<
<
>
>
 
Added:
>
>
 This removes the job workspace (i.e. the output directory and all output files) and all traces of the job in Ganga's registries....so be careful when doing this!

The GANGA Box

Line: 376 to 370
 

The GANGA Box

You can persist (store) any GANGA object in the GANGA box. E.g., you could create a bookkeeping query object:

Changed:
<
<
In[1]: bkq = BKQuery()

>
>
In[1]: bkq = BKQuery()

 In[2]: bkq.path = '/LHCb/Collision09/Beam450GeV-VeloOpen-MagDown/Real Data + RecoToDST-07/90000000/DST'
Added:
>
>
 which can be used at any time to get an up-to-date list of LHCb data files obtained by this query by doing:
Changed:
<
<
In[3]: data = bkq.getDataset()

>
>
In[3]: data = bkq.getDataset()

 
Added:
>
>
 This data could then be used as the input data for Ganga jobs. To store this object so that you don't need to recreate it every time you want to update the query, simply do:
Changed:
<
<
In[4]: box.add(bkq,'example bk query')

>
>
In[4]: box.add(bkq,'example bk query')

 
Added:
>
>
 You can then access this object at any time. E.g., try quitting and restarting GANGA and then do:
Changed:
<
<
In[1]: data = box['example bk query'].getDataset()

>
>
In[1]: data = box['example bk query'].getDataset()

 In[2]:data[0]
Changed:
<
<
Out[2]: LogicalFile (
>
>
Out[2]: LogicalFile (
  name = '/lhcb/data/2009/DST/00005842/0000/00005842_00000194_1.dst' )
Changed:
<
<
>
>
 

Writing Your Own Functions

You can also write your own functions and load them into Ganga. Exit Ganga and create the file ~/.ganga.py:

Changed:
<
<
def foo(): print 'bar'

>
>
def foo(): print 'bar'

 
Added:
>
>
 Ganga will automatically load this file, so restart it and try the following:
Changed:
<
<
In [1]:foo()
bar

>
>
In [1]:foo()
bar

 
Added:
>
>
 As an exercise, try and write your own function that creates a job from your "small prime numbers" template, submits it and returns a reference to the Job object.

Etc...

Revision 102012-09-13 - AlexPearce

Line: 1 to 1
 
META TOPICPARENT name="LHCbSoftwareTutorials"

Ganga Tutorial 1

We start with a very simple problem which doesn't require knowledge about any other LHCb software - factorizing prime numbers. Be careful what you copy and paste from this wiki as python doesn't like random white space (it's best if you don't copy and paste anything).
Line: 28 to 28
  You will not need to do either of these for standard LHCb running.
Deleted:
<
<
If you would like to run the files locally, copy them to a directory you own and change the contents of the Tutorial.ini file.

cp -R /afs/cern.ch/user/j/jwilliam/public/GangaTutorial ~/public/

The Tutorial.ini file should be edited to look something like this.

[Configuration]
RUNTIME_PATH = /afs/cern.ch/user/a/auser/public/GangaTutorial

Now add your Tutorial.ini file to GANGA_CONFIG_PATH.

setenv GANGA_CONFIG_PATH ${GANGA_CONFIG_PATH}:/afs/cern.ch/user/a/auser/public/GangaTutorial/Tutorial.ini

Or for bash-like shells

export GANGA_CONFIG_PATH=${GANGA_CONFIG_PATH}:/afs/cern.ch/user/a/auser/public/GangaTutorial/Tutorial.ini
 Now start an interactive session:
ganga

Line: 78 to 53
  This runs less, so type q to exit. Ganga provides help information on just about every object, method, etc. Try this first if you get stuck.
Added:
>
>

A Local GangaTutorial

If you would like to run the files locally, copy them to a directory you own and change the contents of the Tutorial.ini file.

cp -R /afs/cern.ch/user/j/jwilliam/public/GangaTutorial ~/public/

The Tutorial.ini file should be edited to look something like this.

[Configuration]
RUNTIME_PATH = /afs/cern.ch/user/a/auser/public/GangaTutorial

Now add your Tutorial.ini file to GANGA_CONFIG_PATH.

setenv GANGA_CONFIG_PATH ${GANGA_CONFIG_PATH}:/afs/cern.ch/user/a/auser/public/GangaTutorial/Tutorial.ini

Or for bash-like shells

export GANGA_CONFIG_PATH=${GANGA_CONFIG_PATH}:/afs/cern.ch/user/a/auser/public/GangaTutorial/Tutorial.ini
 

Prime Number Factorization

In this tutorial, our task is to find the prime factors of a given integer. Finding very large prime factors requires a lot of CPU time. This tutorial provides code that can factorize any number whose prime factors are among the first 15 million known prime numbers. We have 15 tables of 1 million prime numbers each and we can scan the table in search of the factors. The python modules we will use (which are already written for you) include a collection of prime number tables called a PrimeTableDataset which are used by the PrimeFactorizer application.

Line: 291 to 288
  Notice that the hostname of the computer which ran (or is running if the job hasn't finished yet) the job is displayed along with the current status. Hopefully your jobs will start soon, but it's possible (depending on where the job is running) that some of your jobs will stay in the submitted state for a while. If all the jobs are finished, go ahead and check the master's output. If some are still running, check some of the subjobs output and check that it matches what was output by the same subjob when run locally. Once any of the jobs is running or completed, you've run on The Grid!
Added:
>
>

Running a Later Version of Ganga

If you are running a version of Ganga that is at least v5.7.0, you may encounter problems when submitting the jobs to the Grid. To resolve these the application needs to be converted to a prepared application. This requires creating a local version of GangaTutorial, as described above.

Inside the Lib directory of the GangaTutorial folder, download the corrected version (you may need to specify the -k flag to ignore the unknown CERN certificate_.

curl https://twiki.cern.ch/twiki/pub/LHCb/GangaTutorial1/PrimeFactorizer.py.txt -o PrimeFactorizer.py

This converts the PrimeFactorization app to a prepared app.

Recreate the job in Ganga (from scratch, not using j.copy()) and try submitting the job to the Grid (Dirac) again.

 

A Few Other Features

The Job Registry

Line: 408 to 419
 

-- MikeWilliams - 09 Jan 2009

Added:
>
>
META FILEATTACHMENT attachment="PrimeFactorizer.py.txt" attr="" comment="A corrected version of jwilliam's PrimeFactorizer to support Ganga v5.7.0+" date="1347537522" name="PrimeFactorizer.py.txt" path="PrimeFactorizer.py.txt" size="5370" user="apearce" version="1"

Revision 92012-09-13 - AlexPearce

Line: 1 to 1
 
META TOPICPARENT name="LHCbSoftwareTutorials"

Ganga Tutorial 1

We start with a very simple problem which doesn't require knowledge about any other LHCb software - factorizing prime numbers. Be careful what you copy and paste from this wiki as python doesn't like random white space (it's best if you don't copy and paste anything).
Line: 28 to 28
  You will not need to do either of these for standard LHCb running.
Added:
>
>
If you would like to run the files locally, copy them to a directory you own and change the contents of the Tutorial.ini file.

cp -R /afs/cern.ch/user/j/jwilliam/public/GangaTutorial ~/public/

The Tutorial.ini file should be edited to look something like this.

[Configuration]
RUNTIME_PATH = /afs/cern.ch/user/a/auser/public/GangaTutorial

Now add your Tutorial.ini file to GANGA_CONFIG_PATH.

setenv GANGA_CONFIG_PATH ${GANGA_CONFIG_PATH}:/afs/cern.ch/user/a/auser/public/GangaTutorial/Tutorial.ini

Or for bash-like shells

export GANGA_CONFIG_PATH=${GANGA_CONFIG_PATH}:/afs/cern.ch/user/a/auser/public/GangaTutorial/Tutorial.ini
 Now start an interactive session:
ganga

Revision 82011-01-05 - MichaelWilliams

Line: 1 to 1
 
META TOPICPARENT name="LHCbSoftwareTutorials"

Ganga Tutorial 1

We start with a very simple problem which doesn't require knowledge about any other LHCb software - factorizing prime numbers. Be careful what you copy and paste from this wiki as python doesn't like random white space (it's best if you don't copy and paste anything).
Line: 337 to 337
  This removes the job workspace (i.e. the output directory and all output files) and all traces of the job in Ganga's registries....so be careful when doing this!
Added:
>
>

The GANGA Box

You can persist (store) any GANGA object in the GANGA box. E.g., you could create a bookkeeping query object:

In[1]: bkq = BKQuery()
In[2]: bkq.path = '/LHCb/Collision09/Beam450GeV-VeloOpen-MagDown/Real Data + RecoToDST-07/90000000/DST'
which can be used at any time to get an up-to-date list of LHCb data files obtained by this query by doing:
In[3]: data = bkq.getDataset()
This data could then be used as the input data for Ganga jobs. To store this object so that you don't need to recreate it every time you want to update the query, simply do:
In[4]: box.add(bkq,'example bk query')
You can then access this object at any time. E.g., try quitting and restarting GANGA and then do:
In[1]: data = box['example bk query'].getDataset()
In[2]:data[0]
Out[2]: LogicalFile (
 name = '/lhcb/data/2009/DST/00005842/0000/00005842_00000194_1.dst' 
 ) 

 

Writing Your Own Functions

You can also write your own functions and load them into Ganga. Exit Ganga and create the file ~/.ganga.py:

Revision 72009-12-14 - MichaelWilliams

Line: 1 to 1
 
META TOPICPARENT name="LHCbSoftwareTutorials"

Ganga Tutorial 1

We start with a very simple problem which doesn't require knowledge about any other LHCb software - factorizing prime numbers. Be careful what you copy and paste from this wiki as python doesn't like random white space (it's best if you don't copy and paste anything).
Line: 9 to 9
  First, set the environment for Ganga in a fresh terminal:

Changed:
<
<
GangaEnv
>
>
SetupProject Ganga
 
Changed:
<
<
and take the latest version (5.1.3 at time of writing). If you have last used ganga before version 5.1 you may need to do
>
>
which gives you the latest version (5.4.3 at time of writing). If you have last used ganga before version 5.4 you may need to do
 
ganga -g
Line: 26 to 26
 
setenv GANGA_CONFIG_PATH ${GANGA_CONFIG_PATH}:GangaTutorial/Tutorial.ini
Changed:
<
<
You will not need to do either of these for standard LHCb running.
>
>
You will not need to do either of these for standard LHCb running.
  Now start an interactive session:
ganga

*** Welcome to Ganga ***

Changed:
<
<
Version: Ganga-5-1-3
>
>
Version: Ganga-5-4-3
 Documentation and support: http://cern.ch/ganga Type help() or help('index') for online help.
Line: 42 to 42
  In [1]:
Changed:
<
<
You will be prompted for your grid password (type it and press ENTER). You should now see the Ganga prompt! Check to make sure that the application for this tutorial was loaded (we need PrimeFactorizeer):
>
>
You should now see the Ganga prompt! Check to make sure that the application for this tutorial was loaded (we need PrimeFactorizeer):
 
In [1]:plugins('applications')
Out[1]: ['GaudiPython', 'Executable', 'Brunel', 'Moore', 'DaVinci', 'Panoptes', 'Gauss', 'Boole', 'Gaudi', 'Vetra', 'Root', 'Euler', 'PrimeFactorizer']
Changed:
<
<
You can check which plugins are available to you in each category in your current Ganga session using plugins. Try using the help utility to see if you can figure out how to list all of the available plugins in all categories:
>
>
Ignore the warning if you don't have a valid Grid proxy (you should only see this once; we'll create the proxy when we need it below). You can check which plugins are available to you in each category in your current Ganga session using plugins. Try using the help utility to see if you can figure out how to list all of the available plugins in all categories:
 
In [2]:help(plugins)
Line: 106 to 106
 
In [8]:j.peek('factors-1925.dat')
Changed:
<
<
which opens a separate terminal which displays the contents of the file using less (use standard less commands to scroll etc., type q to quit).
>
>
which opens the file using less (use standard less commands to scroll etc., type q to quit).
  The file should contain the factors [(5, 2), (7, 1), (11, 1)], let's check if this is correct:

Line: 192 to 192
 
In [9]:j.status
Changed:
<
<
If any of the jobs is still running, you'll get a status report on all 5 jobs with their current states. If all 5 jobs are completed, you'll simply be told that the master job is done. Wait until all 5 jobs are done (should take less than a minute) before moving on (in the mean time you can play around with help, e.g. try help(j.submit)...remember, type q to quit).
>
>
If any of the jobs is still running, the status of the master job will be listed as running. If all 5 jobs are completed, the master job's status will also be completed. Wait until all 5 jobs are done (should take less than a minute) before moving on (in the mean time you can play around with help, e.g. try help(j.submit)...remember, type q to quit).
  Once the jobs are complete, let's look at the output of one of the subjobs (do exactly what we did above):

Line: 232 to 232
  Notice that we had to add the factors file to the outputsandbox. This is to tell ganga that we want this file returned to us after the grid job is completed.
Changed:
<
<
Now just submit the jobs the same way as before:
>
>
Now just submit the jobs the same way as before (since this is the 1st time we've done something that requires a grid proxy, you'll be asked for your grid password if you don't currently have a valid grid proxy on this machine):
 
In [17]: j.submit()
Ganga.GPIDev.Lib.Job               : INFO     submitting job 2

Added:
>
>
Enter Certificate password:
 Ganga.GPIDev.Adapters : INFO submitting job 2.0 to Dirac backend Ganga.GPIDev.Lib.Job : INFO job 2.0 status changed to "submitted" Ganga.GPIDev.Adapters : INFO submitting job 2.1 to Dirac backend
Line: 338 to 339
 

Writing Your Own Functions

Changed:
<
<
You can also write your own functions and load them into Ganga. Exit Ganga and create the following file:
>
>
You can also write your own functions and load them into Ganga. Exit Ganga and create the file ~/.ganga.py:
 

Changed:
<
<
cd ~/ && mkdir myganga cd myganga && cat Foo.py def bar(): print 'bar is a simple example...try to write your own functions!' Now, set your PYTHONPATH so Ganga can find Foo.py:
setenv PYTHONPATH ~/myganga/

>
>
def foo(): print 'bar'
 
Changed:
<
<
Restart Ganga and try the following:
>
>
Ganga will automatically load this file, so restart it and try the following:
 

Changed:
<
<
In [1]:from Foo import bar

In [2]:bar() bar is a simple example...try to write your own functions!

>
>
In [1]:foo() bar
  As an exercise, try and write your own function that creates a job from your "small prime numbers" template, submits it and returns a reference to the Job object.

Revision 62009-10-05 - MichaelWilliams

Line: 1 to 1
 
META TOPICPARENT name="LHCbSoftwareTutorials"

Ganga Tutorial 1

We start with a very simple problem which doesn't require knowledge about any other LHCb software - factorizing prime numbers. Be careful what you copy and paste from this wiki as python doesn't like random white space (it's best if you don't copy and paste anything).
Line: 20 to 20
  For this tutorial, we need to add an extra Ganga configuration file, so do

Added:
>
>
setenv GANGA_CONFIG_PATH ${GANGA_CONFIG_PATH}:/afs/cern.ch/user/j/jwilliam/public/GangaTutorial/Tutorial.ini If you don't have access to /afs/cern.ch/user/j/jwilliam/public/GangaTutorial (e.g. you're using a machine that doesn't have afs or if something is wrong w/ my afs account), then you can do everything in this tutorial except submitting to the grid by simply doing this instead:

 setenv GANGA_CONFIG_PATH ${GANGA_CONFIG_PATH}:GangaTutorial/Tutorial.ini
Changed:
<
<
You will not need to do this for standard LHCb running.
>
>
You will not need to do either of these for standard LHCb running.
  Now start an interactive session:

Line: 218 to 222
 

Running Ganga Jobs on the Grid

Changed:
<
<
For many LHCb jobs (which often involve processing large amounts of data), running concurrently isn't enough. When a large number of CPU's is required for a job, we need the grid! Specifically, we want to run on the LHC Computing Grid (LCG). For actual LHCb jobs, this will involve the DIRAC workload manager; however, for this simple example we'll access the LCG directly. For actual LHCb jobs which process data or Monte Carlo etc., you MUST submit to the LCG via DIRAC!
>
>
For many LHCb jobs (which often involve processing large amounts of data), running concurrently isn't enough. When a large number of CPU's is required for a job, we need the grid! Specifically, we want to run on the LHC Computing Grid (LCG). For LHCb jobs, this involves the DIRAC workload manager.
 
Changed:
<
<
As an example of running on the grid, we'll run the same set of jobs we ran above but using a different backend. We could retype all of the required info from the previous job definition or, better yet, we could use the TAB and arrow-UP functionality to re-enter the info. An easier way is to just copy the previous Job object, then change the backend so that the jobs run on the LCG:
>
>
As an example of running on the grid, we'll run the same set of jobs we ran above but using a different backend. We could retype all of the required info from the previous job definition or, better yet, we could use the TAB and arrow-UP functionality to re-enter the info. An easier way is to just copy the previous Job object, then change the backend so that the jobs run on the Dirac:
 
In [14]: j = j.copy()  # we could've also used Job(j), etc.

Changed:
<
<
In [15]: j.backend = LCG()
>
>
In [15]: j.backend = Dirac() In [16]: j.outputsandbox = ['factors-118020903911855744138963610.dat']
 
Changed:
<
<
Again, for LHCb jobs you need to access the LCG using DIRAC. In Ganga, this means using the Dirac backend instead of LCG.
>
>
Notice that we had to add the factors file to the outputsandbox. This is to tell ganga that we want this file returned to us after the grid job is completed.
  Now just submit the jobs the same way as before:

Changed:
<
<
In [16]: j.submit()
>
>
In [17]: j.submit()
 Ganga.GPIDev.Lib.Job : INFO submitting job 2
Changed:
<
<
Ganga.GPIDev.Adapters : INFO submitting job 2.0 to LCG backend
>
>
Ganga.GPIDev.Adapters : INFO submitting job 2.0 to Dirac backend
 Ganga.GPIDev.Lib.Job : INFO job 2.0 status changed to "submitted"
Changed:
<
<
Ganga.GPIDev.Adapters : INFO submitting job 2.1 to LCG backend
>
>
Ganga.GPIDev.Adapters : INFO submitting job 2.1 to Dirac backend
 Ganga.GPIDev.Lib.Job : INFO job 2.1 status changed to "submitted"
Changed:
<
<
Ganga.GPIDev.Adapters : INFO submitting job 2.2 to LCG backend
>
>
Ganga.GPIDev.Adapters : INFO submitting job 2.2 to Dirac backend
 Ganga.GPIDev.Lib.Job : INFO job 2.2 status changed to "submitted"
Changed:
<
<
Ganga.GPIDev.Adapters : INFO submitting job 2.3 to LCG backend
>
>
Ganga.GPIDev.Adapters : INFO submitting job 2.3 to Dirac backend
 Ganga.GPIDev.Lib.Job : INFO job 2.3 status changed to "submitted"
Changed:
<
<
Ganga.GPIDev.Adapters : INFO submitting job 2.4 to LCG backend
>
>
Ganga.GPIDev.Adapters : INFO submitting job 2.4 to Dirac backend
 Ganga.GPIDev.Lib.Job : INFO job 2.4 status changed to "submitted"
Changed:
<
<
Congratulations! You've just submitted 5 jobs to the LCG grid.
>
>
Congratulations! You've just submitted 5 jobs to the LCG grid via Dirac.
  Let's check the status of the jobs:

Line: 251 to 256
 Job slice: jobs(2).subjobs (5 jobs)
# fqid status name subjobs application backend backend.actualCE
Changed:
<
<
#2.0 completed PrimeFactorizer LCG polgrid1.in2p3.fr:2119/jobmanager-pbs-lhcb #2.1 completed PrimeFactorizer LCG mars-ce0.mars.lesc.doc.ic.ac.uk:2119/jobmanag #2.2 completed PrimeFactorizer LCG ce01.cat.cbpf.br:2119/jobmanager-lcgpbs-lhcb #2.3 completed PrimeFactorizer LCG dgc-grid-35.brunel.ac.uk:2119/jobmanager-lcgp #2.4 completed PrimeFactorizer LCG ce02.grid.acad.bg:2119/jobmanager-pbs-lhcb
>
>
# 2.0 completed PrimeFactorizer Dirac LCG.Glasgow.uk # 2.1 completed PrimeFactorizer Dirac LCG.PDC.se # 2.2 completed PrimeFactorizer Dirac LCG.GRIDKA.de # 2.3 completed PrimeFactorizer Dirac LCG.PIC.es # 2.4 completed PrimeFactorizer Dirac LCG.USC.es
  Notice that the hostname of the computer which ran (or is running if the job hasn't finished yet) the job is displayed along with the current status. Hopefully your jobs will start soon, but it's possible (depending on where the job is running) that some of your jobs will stay in the submitted state for a while. If all the jobs are finished, go ahead and check the master's output. If some are still running, check some of the subjobs output and check that it matches what was output by the same subjob when run locally. Once any of the jobs is running or completed, you've run on The Grid!
Line: 273 to 278
 # fqid status name subjobs application backend backend.actualCE # 0 completed PrimeFactorizer Local lxplus242.cern.ch # 1 completed 5 PrimeFactorizer Local
Changed:
<
<
# 2 completed 5 PrimeFactorizer LCG
>
>
# 2 completed 5 PrimeFactorizer Dirac
  If we wanted to rerun the first job, we could do the following:
Line: 312 to 317
 # fqid status name subjobs application backend backend.actualCE # 0 completed PrimeFactorizer Local lxplus242.cern.ch # 1 completed 5 PrimeFactorizer Local
Changed:
<
<
# 2 completed 5 PrimeFactorizer LCG
>
>
# 2 completed 5 PrimeFactorizer Dirac
 ... plus whatever other jobs you've submitted so far ...
Line: 325 to 330
 
# fqid status name subjobs application backend backend.actualCE # 1 completed 5 PrimeFactorizer Local
Changed:
<
<
# 2 completed 5 PrimeFactorizer LCG
>
>
# 2 completed 5 PrimeFactorizer Dirac
 ... plus whatever other jobs you've submitted so far ...

Revision 52009-01-12 - MikeWilliams

Line: 1 to 1
 
META TOPICPARENT name="LHCbSoftwareTutorials"

Ganga Tutorial 1

We start with a very simple problem which doesn't require knowledge about any other LHCb software - factorizing prime numbers. Be careful what you copy and paste from this wiki as python doesn't like random white space (it's best if you don't copy and paste anything).
Line: 262 to 262
 

A Few Other Features

Added:
>
>

The Job Registry

 All of the jobs you've ever run (and not deleted) are contained in the list jobs:
In [1]: jobs

Line: 281 to 283
  The last job can always be accessed using the python list directly using jobs[-1].
Added:
>
>

Job Templates

 Often times when running LHCb jobs you will want to rerun a type of job (e.g. Monte Carlo production jobs). Rather than always copying a previous job, you could set up a template of it. To template the first job we ran, do
In [1]:t = JobTemplate(jobs(0))

Line: 297 to 301
  Job templates are quite useful due to the fact that they're easy and fast to search through.
Added:
>
>

Removing Jobs

 If you want to remove a job to save disk space or just because it's obsolete, simply do (try it):
In [1]: jobs

Line: 325 to 331
  This removes the job workspace (i.e. the output directory and all output files) and all traces of the job in Ganga's registries....so be careful when doing this!
Added:
>
>

Writing Your Own Functions

You can also write your own functions and load them into Ganga. Exit Ganga and create the following file:

cd ~/ && mkdir myganga
cd myganga && cat Foo.py
def bar():
    print 'bar is a simple example...try to write your own functions!'

Now, set your PYTHONPATH so Ganga can find Foo.py:
setenv PYTHONPATH ~/myganga/
Restart Ganga and try the following:
In [1]:from Foo import bar

In [2]:bar()
bar is a simple example...try to write your own functions!
As an exercise, try and write your own function that creates a job from your "small prime numbers" template, submits it and returns a reference to the Job object.

Etc...

 There are many more features in Ganga which I don't have time to cover here. Remember to use the help function if you're unsure about something (try help() if you're unsure about everything!). There are also answers to common questions on the FAQ wiki page and the user's guides are also quite useful.

Good luck and happy grid-ing!

Revision 42009-01-12 - MikeWilliams

Line: 1 to 1
 
META TOPICPARENT name="LHCbSoftwareTutorials"

Ganga Tutorial 1

Changed:
<
<
We start with a very simple problem which doesn't require knowledge about any other LHCb software - factorizing prime numbers. Be careful what you cut and paste from this wiki as python doesn't like random white space (it's best if you don't cut and paste anything).
>
>
We start with a very simple problem which doesn't require knowledge about any other LHCb software - factorizing prime numbers. Be careful what you copy and paste from this wiki as python doesn't like random white space (it's best if you don't copy and paste anything).
 
Line: 51 to 51
 

Prime Number Factorization

Changed:
<
<
In this tutorial, our task task is to find the prime factors of a given integer. Finding very large prime factors requires a lot of CPU time. This tutorial provides code that can factorize any number whose prime factors are among the first 15 million known prime numbers. We have 15 tables of 1 million prime numbers each and we can scan the table in the search of the factors. The python modules we will use (which are already written for you) include a collection of prime number tables called a PrimeTableDataset which are used by the PrimeFactorizer application.
>
>
In this tutorial, our task is to find the prime factors of a given integer. Finding very large prime factors requires a lot of CPU time. This tutorial provides code that can factorize any number whose prime factors are among the first 15 million known prime numbers. We have 15 tables of 1 million prime numbers each and we can scan the table in search of the factors. The python modules we will use (which are already written for you) include a collection of prime number tables called a PrimeTableDataset which are used by the PrimeFactorizer application.
 

Running a Factorization Job using Ganga

Line: 98 to 98
 -rw-r--r-- 1 jwilliam z5 86 Jan 9 14:24 jobstatus -rw-r--r-- 1 jwilliam z5 26 Jan 9 14:24 factors-1925.dat
Changed:
<
<
All Ganga jobs return the standared output and error in the files stdout and stderr. This job has also produced the file factors-1925.dat. We can view the contents of this file using
>
>
All Ganga jobs return the standard output and error in the files stdout and stderr. This job has also produced the file factors-1925.dat. We can view the contents of this file using
 
In [8]:j.peek('factors-1925.dat')
Line: 188 to 188
 
In [9]:j.status
Changed:
<
<
If any of the jobs is still running, you'll get a status report on all 5 jobs with their current states. If all 5 jobs are completed, you'll simply be told that the master job is done. Wait until all 5 jobs are done (should take less than a minute) before moving on (in the mean time you can play around with help, e.g. try help(j.submit)).
>
>
If any of the jobs is still running, you'll get a status report on all 5 jobs with their current states. If all 5 jobs are completed, you'll simply be told that the master job is done. Wait until all 5 jobs are done (should take less than a minute) before moving on (in the mean time you can play around with help, e.g. try help(j.submit)...remember, type q to quit).
  Once the jobs are complete, let's look at the output of one of the subjobs (do exactly what we did above):

Line: 231 to 231
 
In [16]: j.submit()
Ganga.GPIDev.Lib.Job               : INFO     submitting job 2

Changed:
<
<
Ganga.GPIDev.Adapters : INFO submitting job 2.0 to LCGl backend
>
>
Ganga.GPIDev.Adapters : INFO submitting job 2.0 to LCG backend
 Ganga.GPIDev.Lib.Job : INFO job 2.0 status changed to "submitted" Ganga.GPIDev.Adapters : INFO submitting job 2.1 to LCG backend Ganga.GPIDev.Lib.Job : INFO job 2.1 status changed to "submitted"
Line: 239 to 239
 Ganga.GPIDev.Lib.Job : INFO job 2.2 status changed to "submitted" Ganga.GPIDev.Adapters : INFO submitting job 2.3 to LCG backend Ganga.GPIDev.Lib.Job : INFO job 2.3 status changed to "submitted"
Changed:
<
<
Ganga.GPIDev.Adapters : INFO submitting job 2.4 to LCGl backend
>
>
Ganga.GPIDev.Adapters : INFO submitting job 2.4 to LCG backend
 Ganga.GPIDev.Lib.Job : INFO job 2.4 status changed to "submitted"
Added:
>
>
Congratulations! You've just submitted 5 jobs to the LCG grid.
  Let's check the status of the jobs:

Line: 257 to 258
 #2.4 completed PrimeFactorizer LCG ce02.grid.acad.bg:2119/jobmanager-pbs-lhcb
Changed:
<
<
Notice that the hostname of the computer which ran (or is running if the job hasn't finished yet) the job is displayed along with the current status.
>
>
Notice that the hostname of the computer which ran (or is running if the job hasn't finished yet) the job is displayed along with the current status. Hopefully your jobs will start soon, but it's possible (depending on where the job is running) that some of your jobs will stay in the submitted state for a while. If all the jobs are finished, go ahead and check the master's output. If some are still running, check some of the subjobs output and check that it matches what was output by the same subjob when run locally. Once any of the jobs is running or completed, you've run on The Grid!
 

A Few Other Features

Line: 281 to 281
  The last job can always be accessed using the python list directly using jobs[-1].
Changed:
<
<
Add templates and remove and kill...
>
>
Often times when running LHCb jobs you will want to rerun a type of job (e.g. Monte Carlo production jobs). Rather than always copying a previous job, you could set up a template of it. To template the first job we ran, do
In [1]:t = JobTemplate(jobs(0))

In [2]:t.name = 'small-prime-factorizer'
You don't have to name it, but this will be useful later on to help you find the template you're looking for. The list of all your job templates is stored in the python list templates (the same way jobs are stored in jobs). Try printing it.

Now, create a new job from the template and run it:

In [3]:j = Job(t) # or j = Job(templates(0)),...

In [4]:j.submit() 
Job templates are quite useful due to the fact that they're easy and fast to search through.

If you want to remove a job to save disk space or just because it's obsolete, simply do (try it):

In [1]: jobs
Out[1]: 
Job slice:  jobs (3 jobs)
--------------
# fqid      status        name   subjobs      application          backend                               backend.actualCE  
#  0     completed                        PrimeFactorizer            Local                              lxplus242.cern.ch  
#  1     completed                     5  PrimeFactorizer            Local                                                 
#  2     completed                     5  PrimeFactorizer              LCG       
... plus whatever other jobs you've submitted so far ...


In [2]:jobs(0).remove()
Ganga.GPIDev.Lib.Job               : INFO     removing job 0

In [3]: jobs
Out[3]: 
Job slice:  jobs (2 jobs)
--------------
# fqid      status        name   subjobs      application          backend                               backend.actualCE  
#  1     completed                     5  PrimeFactorizer            Local                                                 
#  2     completed                     5  PrimeFactorizer              LCG       
... plus whatever other jobs you've submitted so far ...

This removes the job workspace (i.e. the output directory and all output files) and all traces of the job in Ganga's registries....so be careful when doing this!

There are many more features in Ganga which I don't have time to cover here. Remember to use the help function if you're unsure about something (try help() if you're unsure about everything!). There are also answers to common questions on the FAQ wiki page and the user's guides are also quite useful.

Good luck and happy grid-ing!

 

-- MikeWilliams - 09 Jan 2009

Revision 32009-01-12 - MikeWilliams

Line: 1 to 1
 
META TOPICPARENT name="LHCbSoftwareTutorials"

Ganga Tutorial 1

Changed:
<
<
We start with a very simple problem which doesn't require knowledge about any other LHCb software - factorizing prime numbers. Be careful what you cut and paste from this wiki as python doesn't like random white space.
>
>
We start with a very simple problem which doesn't require knowledge about any other LHCb software - factorizing prime numbers. Be careful what you cut and paste from this wiki as python doesn't like random white space (it's best if you don't cut and paste anything).
 
Line: 102 to 102
 
In [8]:j.peek('factors-1925.dat')
Changed:
<
<
which opens a separate terminal which displays the contents of the file using less (use standard less commands to scroll etc., type q to quite).
>
>
which opens a separate terminal which displays the contents of the file using less (use standard less commands to scroll etc., type q to quit).
  The file should contain the factors [(5, 2), (7, 1), (11, 1)], let's check if this is correct:

Line: 144 to 144
 

Splitting a Ganga Job into Multiple Concurrent Jobs

Changed:
<
<
Now that you've seen some of the basics of Ganga, let's try something a little more interesting - factorizing a very large integer. For this we'll need a PrimeTableDataset which contains all 15 tables of prime numbers. To speed things up, we will also split the job into 5 local subjobs, which will run in concurrently.
>
>
Now that you've seen some of the basics of Ganga, let's try something a little more interesting - factorizing a very large integer. For this we'll need a PrimeTableDataset which contains all 15 tables of prime numbers. To speed things up, we will also split the job into 5 local subjobs which will run concurrently.
 

First, define a job as before but w/ a larger number and using all 15 prime number tables (feel free to use the arrow-UP and TAB keys to do this instead of typing it all out!):

Line: 156 to 156
 In [5]: j.inputdata.table_id_upper = 15
Changed:
<
<
Now add a splitter to divide up the task of finding all the prime factors (here will make 5 subjobs):
>
>
Now add a splitter to divide up the task of finding all the prime factors (here we'll make 5 subjobs):
 
In [6]: j.splitter = PrimeFactorizerSplitter(numsubjobs= 5)
For LHCb jobs, similar splitters are provided to split jobs up which run on multiple data files, etc.
Changed:
<
<
We also want to add a merger to merge the output from each of the 5 subjobs:
>
>
We also want to add a merger to merge the output from each of the 5 subjobs:
 
In [7]: j.merger = TextMerger(files=['factors-118020903911855744138963610.dat'])
Line: 188 to 188
 
In [9]:j.status
Changed:
<
<
If any of the jobs is still running, you'll get a status report on all 5 jobs with their current states. If all 5 jobs are completed, you'll simply be told that the master job is done. Wait until all 5 jobs are done (should take less than a minute) before moving on (in the mean time you can play around with help).
>
>
If any of the jobs is still running, you'll get a status report on all 5 jobs with their current states. If all 5 jobs are completed, you'll simply be told that the master job is done. Wait until all 5 jobs are done (should take less than a minute) before moving on (in the mean time you can play around with help, e.g. try help(j.submit)).
  Once the jobs are complete, let's look at the output of one of the subjobs (do exactly what we did above):

Line: 218 to 218
 

Running Ganga Jobs on the Grid

Added:
>
>
For many LHCb jobs (which often involve processing large amounts of data), running concurrently isn't enough. When a large number of CPU's is required for a job, we need the grid! Specifically, we want to run on the LHC Computing Grid (LCG). For actual LHCb jobs, this will involve the DIRAC workload manager; however, for this simple example we'll access the LCG directly. For actual LHCb jobs which process data or Monte Carlo etc., you MUST submit to the LCG via DIRAC!

As an example of running on the grid, we'll run the same set of jobs we ran above but using a different backend. We could retype all of the required info from the previous job definition or, better yet, we could use the TAB and arrow-UP functionality to re-enter the info. An easier way is to just copy the previous Job object, then change the backend so that the jobs run on the LCG:

In [14]: j = j.copy()  # we could've also used Job(j), etc.
In [15]: j.backend = LCG()
Again, for LHCb jobs you need to access the LCG using DIRAC. In Ganga, this means using the Dirac backend instead of LCG.

Now just submit the jobs the same way as before:

In [16]: j.submit()
Ganga.GPIDev.Lib.Job               : INFO     submitting job 2
Ganga.GPIDev.Adapters              : INFO     submitting job 2.0 to LCGl backend
Ganga.GPIDev.Lib.Job               : INFO     job 2.0 status changed to "submitted"
Ganga.GPIDev.Adapters              : INFO     submitting job 2.1 to LCG backend
Ganga.GPIDev.Lib.Job               : INFO     job 2.1 status changed to "submitted"
Ganga.GPIDev.Adapters              : INFO     submitting job 2.2 to LCG backend
Ganga.GPIDev.Lib.Job               : INFO     job 2.2 status changed to "submitted"
Ganga.GPIDev.Adapters              : INFO     submitting job 2.3 to LCG backend
Ganga.GPIDev.Lib.Job               : INFO     job 2.3 status changed to "submitted"
Ganga.GPIDev.Adapters              : INFO     submitting job 2.4 to LCGl backend
Ganga.GPIDev.Lib.Job               : INFO     job 2.4 status changed to "submitted"

Let's check the status of the jobs:

In [17]: j.subjobs
Out[2]: 
Job slice:  jobs(2).subjobs (5 jobs)
--------------
# fqid      status        name   subjobs      application          backend                               backend.actualCE  
#2.0   completed                        PrimeFactorizer              LCG     polgrid1.in2p3.fr:2119/jobmanager-pbs-lhcb  
#2.1   completed                        PrimeFactorizer              LCG  mars-ce0.mars.lesc.doc.ic.ac.uk:2119/jobmanag  
#2.2   completed                        PrimeFactorizer              LCG   ce01.cat.cbpf.br:2119/jobmanager-lcgpbs-lhcb  
#2.3   completed                        PrimeFactorizer              LCG  dgc-grid-35.brunel.ac.uk:2119/jobmanager-lcgp  
#2.4   completed                        PrimeFactorizer              LCG     ce02.grid.acad.bg:2119/jobmanager-pbs-lhcb  

Notice that the hostname of the computer which ran (or is running if the job hasn't finished yet) the job is displayed along with the current status.

A Few Other Features

All of the jobs you've ever run (and not deleted) are contained in the list jobs:

 

Changed:
<
<
In [8]: j = jobs[-1].copy() # we could've also used j.copy(), jobs(1).copy(), Job(jobs[-1]), etc. In [9]: j.backend = LCG() In [10]: j.submit()
>
>
In [1]: jobs Out[1]: Job slice: jobs (3 jobs)
# fqid status name subjobs application backend backend.actualCE # 0 completed PrimeFactorizer Local lxplus242.cern.ch # 1 completed 5 PrimeFactorizer Local # 2 completed 5 PrimeFactorizer LCG

 
Added:
>
>
If we wanted to rerun the first job, we could do the following:
In [2]: j = jobs(0).copy() 
In [3]: j.submit()
The last job can always be accessed using the python list directly using jobs[-1].

Add templates and remove and kill...

  -- MikeWilliams - 09 Jan 2009

Revision 22009-01-09 - MikeWilliams

Line: 1 to 1
 
META TOPICPARENT name="LHCbSoftwareTutorials"

Ganga Tutorial 1

We start with a very simple problem which doesn't require knowledge about any other LHCb software - factorizing prime numbers. Be careful what you cut and paste from this wiki as python doesn't like random white space.
Line: 27 to 27
 Now start an interactive session:
ganga

Added:
>
>
*** Welcome to Ganga * Version: Ganga-5-1-3 Documentation and support: http://cern.ch/ganga Type help() or help('index') for online help.

This is free software (GPL), and you are welcome to redistribute it under certain conditions; type license() for details. In [1]: You will be prompted for your grid password (type it and press ENTER). You should now see the Ganga prompt! Check to make sure that the application for this tutorial was loaded (we need PrimeFactorizeer):

In [1]:plugins('applications')
Out[1]: ['GaudiPython', 'Executable', 'Brunel', 'Moore', 'DaVinci', 'Panoptes', 'Gauss', 'Boole', 'Gaudi', 'Vetra', 'Root', 'Euler', 'PrimeFactorizer']

 
Changed:
<
<
You will be prompted for your grid password (type it and press ENTER). You should now see the Ganga prompt!
>
>
You can check which plugins are available to you in each category in your current Ganga session using plugins. Try using the help utility to see if you can figure out how to list all of the available plugins in all categories:
In [2]:help(plugins)
This runs less, so type q to exit. Ganga provides help information on just about every object, method, etc. Try this first if you get stuck.
 

Prime Number Factorization

Line: 127 to 146
  Now that you've seen some of the basics of Ganga, let's try something a little more interesting - factorizing a very large integer. For this we'll need a PrimeTableDataset which contains all 15 tables of prime numbers. To speed things up, we will also split the job into 5 local subjobs, which will run in concurrently.
Added:
>
>
First, define a job as before but w/ a larger number and using all 15 prime number tables (feel free to use the arrow-UP and TAB keys to do this instead of typing it all out!):
 
In [1]: j = Job()
In [2]: j.application = PrimeFactorizer(number=118020903911855744138963610)
In [3]: j.inputdata = PrimeTableDataset()
In [4]: j.inputdata.table_id_lower = 1
In [5]: j.inputdata.table_id_upper = 15

Added:
>
>

Now add a splitter to divide up the task of finding all the prime factors (here will make 5 subjobs):


 In [6]: j.splitter = PrimeFactorizerSplitter(numsubjobs= 5)
Deleted:
<
<
In [7]: j.submit()
 
Added:
>
>
For LHCb jobs, similar splitters are provided to split jobs up which run on multiple data files, etc.

We also want to add a merger to merge the output from each of the 5 subjobs:

In [7]: j.merger = TextMerger(files=['factors-118020903911855744138963610.dat'])
When all 5 subjobs are complete, the merger will merge the contents of each of the 5 factors-118020903911855744138963610.dat files into a single file in the master job's output directory (we'll look at what this means below).

OK, now submit the job (actually, the 5 jobs) just like we did above:

In [8]: j.submit()
Ganga.GPIDev.Lib.Job               : INFO     submitting job 1
Ganga.GPIDev.Adapters              : INFO     submitting job 1.0 to Local backend
Ganga.GPIDev.Lib.Job               : INFO     job 1.0 status changed to "submitted"
Ganga.GPIDev.Adapters              : INFO     submitting job 1.1 to Local backend
Ganga.GPIDev.Lib.Job               : INFO     job 1.1 status changed to "submitted"
Ganga.GPIDev.Adapters              : INFO     submitting job 1.2 to Local backend
Ganga.GPIDev.Lib.Job               : INFO     job 1.2 status changed to "submitted"
Ganga.GPIDev.Adapters              : INFO     submitting job 1.3 to Local backend
Ganga.GPIDev.Lib.Job               : INFO     job 1.3 status changed to "submitted"
Ganga.GPIDev.Adapters              : INFO     submitting job 1.4 to Local backend
Ganga.GPIDev.Lib.Job               : INFO     job 1.4 status changed to "submitted"

You can check the status off all 5 jobs by simpy doing:

In [9]:j.status
If any of the jobs is still running, you'll get a status report on all 5 jobs with their current states. If all 5 jobs are completed, you'll simply be told that the master job is done. Wait until all 5 jobs are done (should take less than a minute) before moving on (in the mean time you can play around with help).

Once the jobs are complete, let's look at the output of one of the subjobs (do exactly what we did above):

In [10]:j.subjobs[2].peek()
total 18K
-rw-r--r--  1 jwilliam z5   0 Jan  9 18:08 __syslog__
-rw-r--r--  1 jwilliam z5 564 Jan  9 18:08 stdout
-rw-r--r--  1 jwilliam z5 15K Jan  9 18:08 stderr
-rw-r--r--  1 jwilliam z5  86 Jan  9 18:08 __jobstatus__
-rw-r--r--  1 jwilliam z5  17 Jan  9 18:08 factors-118020903911855744138963610.dat

In [11]:j.subjobs[2].peek('factors-118020903911855744138963610.dat')
The file should contain the factor [(141650963, 1)]. Each of the j.subjobs is itself a Job (try printing it), so you can do anything you would do on an independent job on the subjobs.

Now examine the merged output of all the jobs:

In [12]:j.peek()
total 2.0K
-rw-r--r--  1 jwilliam z5 653 Jan  9 18:08 factors-118020903911855744138963610.dat.merge_summary
-rw-r--r--  1 jwilliam z5 869 Jan  9 18:08 factors-118020903911855744138963610.dat

In [13]:j.peek('factors-118020903911855744138963610.dat')
The file should contain the factors [(2, 1), (3, 1), (5, 1), (7, 1), (15485867, 1)] [] [(141650963, 1)] [] [(256203221, 1)] (some of the prime number tables don't contain any factors of this particular number). You can check if they're right on the Ganga prompt like we did above. Notice that the master job doesn't have the stdout and stderr files since itself was never actually run. In fact, had we not added the merger to the job there would be no output in the master job's directory.
 

Running Ganga Jobs on the Grid

Revision 12009-01-09 - MikeWilliams

Line: 1 to 1
Added:
>
>
META TOPICPARENT name="LHCbSoftwareTutorials"

Ganga Tutorial 1

We start with a very simple problem which doesn't require knowledge about any other LHCb software - factorizing prime numbers. Be careful what you cut and paste from this wiki as python doesn't like random white space.

Ganga Setup

First, set the environment for Ganga in a fresh terminal:

GangaEnv
and take the latest version (5.1.3 at time of writing). If you have last used ganga before version 5.1 you may need to do
ganga -g
to update your .gangarc file to include some of the newer options.

For this tutorial, we need to add an extra Ganga configuration file, so do

setenv GANGA_CONFIG_PATH ${GANGA_CONFIG_PATH}:GangaTutorial/Tutorial.ini
You will not need to do this for standard LHCb running.

Now start an interactive session:

ganga
You will be prompted for your grid password (type it and press ENTER). You should now see the Ganga prompt!

Prime Number Factorization

In this tutorial, our task task is to find the prime factors of a given integer. Finding very large prime factors requires a lot of CPU time. This tutorial provides code that can factorize any number whose prime factors are among the first 15 million known prime numbers. We have 15 tables of 1 million prime numbers each and we can scan the table in the search of the factors. The python modules we will use (which are already written for you) include a collection of prime number tables called a PrimeTableDataset which are used by the PrimeFactorizer application.

Running a Factorization Job using Ganga

Let's start with a small example. The goal is to find the prime factors of the integer 1925. For such a small number, we (clearly) only need the first prime number data table (recall that each table contains 1 million prime numbers). At the Ganga prompt, type the following:

In [1]: j = Job()
In [2]: j.application = PrimeFactorizer(number=1925)
In [3]: j.inputdata = PrimeTableDataset(table_id_lower=1, table_id_upper=1)

At this point, we've created a Job object but we haven't run anything yet. We're free to edit its attributes as much as we like prior to submitting the job. For actual LHCb jobs, the application might be DaVinci while the inputdata could be a list of LHCb data files. The idea and most of the syntax are the same though as in this simple example. To see all of the job's attributes, do

In [4]:j
Out[4]: Job (
 status = 'new' ,
 name = '' ,
     ...
  backend = Local ( ... )
  )
Notice that the backend is set to Local (which is the default value since we didn't specify where we wanted the job to run). This means that the job will run in the background on the local machine.

OK, let's submit the job:

In [5]: j.submit()
Ganga.GPIDev.Lib.Job               : INFO     submitting job 0
Ganga.GPIDev.Adapters              : INFO     submitting job 0 to Local backend
Ganga.GPIDev.Lib.Job               : INFO     job 0 status changed to "submitted"

We can check the status of the job by doing

In [6]:j.status
This will either be submitted, running or completed. If the job hasn't finished yet, wait for a few seconds and check again (for such a small number, the job should finish very quickly).

We can see what files were output by the job by doing

In [7]:j.peek()
total 8.0K
-rw-r--r--  1 jwilliam z5    0 Jan  9 14:24 __syslog__
-rw-r--r--  1 jwilliam z5  232 Jan  9 14:24 stdout
-rw-r--r--  1 jwilliam z5 4.9K Jan  9 14:24 stderr
-rw-r--r--  1 jwilliam z5   86 Jan  9 14:24 __jobstatus__
-rw-r--r--  1 jwilliam z5   26 Jan  9 14:24 factors-1925.dat
All Ganga jobs return the standared output and error in the files stdout and stderr. This job has also produced the file factors-1925.dat. We can view the contents of this file using
In [8]:j.peek('factors-1925.dat')
which opens a separate terminal which displays the contents of the file using less (use standard less commands to scroll etc., type q to quite).

The file should contain the factors [(5, 2), (7, 1), (11, 1)], let's check if this is correct:

In [9]:(5**2)*7*11 == 1925
Out[9]: True
Remember, standard python syntax works at the Ganga prompt!

OK, so we've run a job and checked the output using Ganga's magic but for a real analysis you'll often want direct access to the file. So, where is factors-1925.dat? It's in the job's output directory. You can obtain the full path of this directory via

In [10]:j.outputdir
Out[10]: /afs/cern.ch/user/j/jwilliam/gangadir/workspace/jwilliam/LocalAMGA/0/output/

This is a normal directory that you own; thus, you have permission to access the files there from a process independent of Ganga. So, you could exit Ganga and examine factors-1925.dat using, e.g., cat on the Linux command line...or, you could do this from Ganga. You can access shell commands from the Ganga prompt using ! as follows:

In [11]:!ls ~/.globus
usercert.pem  userkey.pem

In [12]:!cat $j.outputdir/factors-1925.dat
[(5, 2), (7, 1), (11, 1)]
Notice that you can use the $ character to access python variables when using the ! to access shell!

A few other basic convenience features which you can play around with involve scrolling through the history and using the TAB completion. Try using the arrow-UP to scroll through the history of the Ganga commands you've executed so far (works the same as when in a shell). You can use TAB completion on keywords, variables, objects, etc. Try the following (where TAB and arrow-UP mean hit those keys, don't type it out):

In [13]:j.app<TAB>

In [13]:j.application

In [13]:j.application<arrow-UP>

In [13]:j.application = PrimeFactorizer(number=1925)

The arrow-UP key scrolls through the history of commands that match what's been typed so far. In this case it scrolls through all commands which start with j.application (which is only 1 command so far, but try it again latter on in the tutorial!). This behavior is similar to using ESC-P in tcsh or CTRL-R in bash.

Splitting a Ganga Job into Multiple Concurrent Jobs

Now that you've seen some of the basics of Ganga, let's try something a little more interesting - factorizing a very large integer. For this we'll need a PrimeTableDataset which contains all 15 tables of prime numbers. To speed things up, we will also split the job into 5 local subjobs, which will run in concurrently.

In [1]: j = Job()
In [2]: j.application = PrimeFactorizer(number=118020903911855744138963610)
In [3]: j.inputdata = PrimeTableDataset()
In [4]: j.inputdata.table_id_lower = 1
In [5]: j.inputdata.table_id_upper = 15
In [6]: j.splitter = PrimeFactorizerSplitter(numsubjobs= 5)
In [7]: j.submit()

Running Ganga Jobs on the Grid

In [8]: j = jobs[-1].copy()  # we could've also used j.copy(), jobs(1).copy(), Job(jobs[-1]), etc.
In [9]: j.backend = LCG()
In [10]: j.submit()

-- MikeWilliams - 09 Jan 2009

 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback