Job Indexing and Slices

Goals

  • Clarify/Add selection mechanisms and list-like slices (physical vs logical indexing)
  • Assure that slices support a subset of the list interface which is large enough for interchanging lists and slices easily

Indexing

List-style indexing of jobs:

  • jobs[i] gets the i-th job
  • jobs[i:j] gets a slice
  • advantages:
    • uniform interface between jobs[i] and j.subjobs[i]
    • jobs may be passed to methods which expect a list (such as merge() or export())
    • allows positional access to jobs e.g. first, last, last 10 jobs
  • disadvantages:
    • the change is not backwards compatible, the warning message for operator [] could be used to help transition of the scripts. With Ganga 4.3 semantics of operator[] ("lookup by id") it is quite hard to use it in the scripts - it is mostly used at the interactive prompt. Therefore the impact of the transition should not be overestimated.

Lookup by id is done by jobs(id) which returns the jobs object or raises JobAccessError if not found

In Ganga 4.3 list-style indexing is not possible and the operator [] provides a dictionary-style lookup by id.

Example: Consider that the registry contains jobs with ids 2,3,5,7,9

task 4.3 new proposal result (ids of selected jobs)
get first job N/A jobs[0] 2
get last job N/A jobs[-1] 9
get third job N/A jobs[2] 5
get last two jobs N/A jobs[:-2] [7, 9]
get first two jobs N/A jobs[2:] [2,3]
get 10th job N/A jobs[10] IndexError
get job id 7 jobs[7] jobs(7) 7
get job id 0 jobs[0] -> None jobs(0) JobAccessError

Access to subjobs

Tuple (i,k) and string "i.k" (where i,=k= are integers) are fully-qualified job identifiers (fqid).

The fqid may address any subjob:

  • jobs("i.k") and jobs((i.k)) are equivalent to jobs(i).subjobs(k)

JobAccessError is raised if the address is wrong.

The id and position of the subjob is identical in j.subjobs slice. Therefore j.subjobs(k) is equivalent to j.subjobs[k]

Note: with current implementation of remote repository based on Oracle, this assumption is NOT true.

Slices

A slice may be created as follows:

  • jobs[i:j]
  • select(...) : cut on jobs attributes
  • jobtree folder is a slice
  • j.subjobs is a slice (?)

Main characteristics:

  • A jobs slice is a logical view on a group of jobs.
  • Slices have the same pretty, tabular display.
  • Collective operations are provided for slices: kill, submit, resubmit (see GangaJobOperations)

Cutting on job attributes allows to address typical use-cases (and not to have fully blown SQL equivalent ;-)).

Selections by job id:

  • select(id1,id2) : return a slice of all jobs with the id in a given range, empty slice if the range is empty

Selections by attribute:

  • select(name='x') : return all jobs with name=='x'
  • select(status='failed') : return all failed jobs
  • select(backend="LCG") : return all LCG jobs
  • Not implemented yet: select(time='yesterday') : return all jobs which were created yesterday (this interface is not fixed yet and depends on GangaTimestamps)
  • Not implemented yet: select(backend=LCG(actualCE='CERN')) : return all jobs which with LCG backend and run at CERN

Multiple basic selections may be specified together (AND logic):

  • select(id1,id2,backend=LCG, status='completed') : get completed jobs run at CERN from the id1,id2 range

Not implemented yet: Arbitrary selections:

  • select([id1,id2,...,idN]) : return a slice as specified by the list of job ids (this allows to create arbitrary slices in the loop)
    • ids = []
    • for j in jobs: if CUT(j): ids+=j.id
    • slice = jobs.select(ids)
  • passing list of jobs ids in conjunction with the ids() method (which exists in Ganga 4.3 already) on the slice allows arbitrary list operations, such as summing (ORing):
    • s1 = jobs.select(...)
    • s2 = jobs.select()
    • s12 = jobs.select(s1.ids()+s2.ids())

Not implemented yet:: Extended selections (more questionable):

  • select(cut) : all jobs for which cut(j) is True are selected, example: jobs.select(lambda j: j.backend.actualCE=='CERN')
  • alternative syntax for cutting on arbitrary attributes: select(('backend.actualCE','CERN')) : return all jobs for which backend.actualCE = CERN=

Access to objects by name

The name of jobs and templates are not required to be unique. However it may be desirable (especially for templates) to allows an easy access by name if the names are kept unique: j = Job(templates['MyUniqueName'])

templates["name"] returns the job if the "name" is unique or raises a JobAccessError if job not found

Note that templates["name"] is different than templates("i.k") and templates.select(name="name").

Further improvements

  • Not implemented yet:obsolete clean
  • Not implemented yet:confirm on remove (by default)
  • fqid printout to tuple for subjobs (for copy/paste) but keep single int at the master job level
  • Not implemented yet:keep advanced ops (list) at id level
  • Not implemented yet:slice.redo()

-- JakubMoscicki - 13 Apr 2007

Edit | Attach | Watch | Print version | History: r10 < r9 < r8 < r7 < r6 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r10 - 2007-11-07 - JakubMoscicki
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    ArdaGrid All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback