Job Indexing and Slices
Goals
- Clarify/Add selection mechanisms and list-like slices (physical vs logical indexing)
- Assure that slices support a subset of the list interface which is large enough for interchanging lists and slices easily
Indexing
List-style indexing of jobs:
-
jobs[i]
gets the i-th job
-
jobs[i:j]
gets a slice
- advantages:
- uniform interface between
jobs[i]
and j.subjobs[i]
- jobs may be passed to methods which expect a list (such as merge() or export())
- allows positional access to jobs e.g. first, last, last 10 jobs
- disadvantages:
- the change is not backwards compatible, the warning message for operator [] could be used to help transition of the scripts. With Ganga 4.3 semantics of operator[] ("lookup by id") it is quite hard to use it in the scripts - it is mostly used at the interactive prompt. Therefore the impact of the transition should not be overestimated.
Lookup by id is done by
jobs(id)
which returns the jobs object or raises JobAccessError if not found
In Ganga 4.3 list-style indexing is not possible and the operator [] provides a dictionary-style lookup by id.
Example:
Consider that the registry contains jobs with ids 2,3,5,7,9
task |
4.3 |
new proposal |
result (ids of selected jobs) |
get first job |
N/A |
jobs[0] |
2 |
get last job |
N/A |
jobs[-1] |
9 |
get third job |
N/A |
jobs[2] |
5 |
get last two jobs |
N/A |
jobs[:-2] |
[7, 9] |
get first two jobs |
N/A |
jobs[2:] |
[2,3] |
get 10th job |
N/A |
jobs[10] |
IndexError |
get job id 7 |
jobs[7] |
jobs(7) |
7 |
get job id 0 |
jobs[0] -> None |
jobs(0) |
JobAccessError |
Access to subjobs
Tuple
(i,k)
and string
"i.k"
(where
i
,=k= are integers) are fully-qualified job identifiers (fqid).
The fqid may address any subjob:
-
jobs("i.k")
and jobs((i.k))
are equivalent to jobs(i).subjobs(k)
JobAccessError is raised if the address is wrong.
The id and position of the subjob is identical in
j.subjobs
slice. Therefore
j.subjobs(k)
is equivalent to
j.subjobs[k]
Note: with current implementation of remote repository based on Oracle, this assumption is NOT true.
Slices
A slice may be created as follows:
-
jobs[i:j]
-
select(...)
: cut on jobs attributes
-
jobtree
folder is a slice
-
j.subjobs
is a slice (?)
Main characteristics:
- A jobs slice is a logical view on a group of jobs.
- Slices have the same pretty, tabular display.
- Collective operations are provided for slices:
kill
, submit
, resubmit
(see GangaJobOperations)
Cutting on job attributes allows to address typical use-cases (and not to have fully blown SQL equivalent ;-)).
Selections by job id:
-
select(id1,id2)
: return a slice of all jobs with the id in a given range, empty slice if the range is empty
Selections by attribute:
-
select(name='x')
: return all jobs with name=='x'
-
select(status='failed')
: return all failed jobs
-
select(backend="LCG")
: return all LCG jobs
- Not implemented yet:
select(time='yesterday')
: return all jobs which were created yesterday (this interface is not fixed yet and depends on GangaTimestamps)
- Not implemented yet:
select(backend=LCG(actualCE='CERN'))
: return all jobs which with LCG backend and run at CERN
Multiple basic selections may be specified together (AND logic):
-
select(id1,id2,backend=LCG, status='completed')
: get completed jobs run at CERN from the id1,id2 range
Not implemented yet: Arbitrary selections:
-
select([id1,id2,...,idN])
: return a slice as specified by the list of job ids (this allows to create arbitrary slices in the loop)
-
ids = []
-
for j in jobs: if CUT(j): ids+=j.id
-
slice = jobs.select(ids)
- passing list of jobs ids in conjunction with the
ids()
method (which exists in Ganga 4.3 already) on the slice allows arbitrary list operations, such as summing (ORing):
-
s1 = jobs.select(...)
-
s2 = jobs.select()
-
s12 = jobs.select(s1.ids()+s2.ids())
Not implemented yet:: Extended selections (more questionable):
-
select(cut)
: all jobs for which cut(j) is True
are selected, example: jobs.select(lambda j: j.backend.actualCE=='CERN')
- alternative syntax for cutting on arbitrary attributes:
select(('backend.actualCE','CERN'))
: return all jobs for which backend.actualCE =
CERN=
Access to objects by name
The name of jobs and templates are not required to be unique. However it may be desirable (especially for templates) to allows an easy access by name if the names are kept unique:
j = Job(templates['MyUniqueName'])
templates["name"]
returns the job if the "name" is unique or raises a JobAccessError if job not found
Note that
templates["name"]
is different than
templates("i.k")
and
templates.select(name="name")
.
Further improvements
- Not implemented yet:obsolete clean
- Not implemented yet:confirm on remove (by default)
- fqid printout to tuple for subjobs (for copy/paste) but keep single int at the master job level
- Not implemented yet:keep advanced ops (list) at id level
- Not implemented yet:slice.redo()
--
JakubMoscicki - 13 Apr 2007