Job Operations

Goals of the changes in 5.0

  • Clarify the error handling for job operations API
  • Make collective operations easier

Basic operations on individual jobs

Basic job operations such as j.submit(), j.kill() raise JobError on all failures (including runtime errors or incorrect status).

Rationale

Returning True/False is not good for scripting:
  • Problematic in the loops: a systematic error (such as misconfigured LCG User Iterface) should rather break the loop.
  • Problematic error checking and reporting: return code is never checked by users
  • In python exceptions may be raised by any operation

Collective operations on groups of jobs

Existing use patterns in Ganga 4 are based on the "keep_going" behavior in the loops which ignores any errors and attempts to perform operation on all jobs in the selected group:

 for j in jobs.select(100,110): 
   j.kill()

In Ganga 5 a jobs slice may be used for performing collective operations: jobs.select(100,110).kill()

Optional keep_going argument may be specified to this (and any) operation:

  • keep_going is True : ignore any errors, this is the default
  • keep_going is False: raise exception on the first error

The operations on slices have the following meaning:

  • submit : submit all jobs in a new state
  • resubmit : resubmit all jobs in a failed or killed state
  • kill : kill all jobs in a running or submitted state
  • remove : remove all jobs (if subjobs are represented as slices then this operation may fail in that case)
  • fail : fail all jobs Not implemented yet because fail method does not exist on job object level
  • copy : copy all jobs and return a slice which contains them, a copy of a subjob is promoted to the master job
  • peek : perform peek on the jobs in the slice one by one Not implemented yet - e.g. less on large number of files would kill the terminal?
  • prune : remove workspace files Not implemented yet - equivalent method should be put at the job level, see GangaJobWorkspaceInterface)

Not implemented yet: jobs.clean() should be no longer available. Equivalent functionality should be available by adding an optional argument nokill=False (default) to jobs.remove() operation

-- JakubMoscicki - 13 Apr 2007

Edit | Attach | Watch | Print version | History: r10 < r9 < r8 < r7 < r6 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r10 - 2008-02-22 - JakubMoscicki
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    ArdaGrid All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback