Ganga Job Status

Issues remaining to be solved

  • resource leak: ganga should protect users from leaking resources: ganga should not report jobs are finished/killed whereas they are actually occupying a slot on the Grid, so two "modes" should be foreseen:
    • strict mode: Ganga does not allow silent leak of resources
    • relaxed mode: for user convenience (if a user does not care about resource leaks) it should be easy to take jobs out of the monitoring loop
  • resubmission problem: imagining there is a master job containing several running subjobs and one "done" subjob, resubmission of the "done" job will not be allowed due to the following reason: in the beginning the status of the master job should be "running", after resubmitting the "done" subjob, the master's status has to be changed from "running" to "submitted" according to the rule of taking the worse status among the subjobs as the master job's status. And this is not allowed for this moment. However, this is not as simple as just enabling the transaction from "running" to "submitted". It's actually related to the logic about how a job can be resubmitted and how Ganga can avoid resource leaks.

The design of the system

Job status is controlled by a state machine. All changes to job.status are done via job.updateStatus() which performes a number of functions:

  • checks if a transition is allowed and the state exists;
  • adds logging and timestamp informationl
  • commits the job immediately to the repository (conditional commit may fail -> this is a way to implement critical sections in monitoring loop such as running->completing->completed).

The state machine is defined inside the Job class. The automatically created state graph is shown below.

Internal Ganga code (Ganga.GPIDev packages) may change the job.status directly. There are few exceptions when this is necessary for internal purposes.

Notes

  • ability to manually move jobs to failed state is not taken into account yet (see resource conservation argument below);
  • resource conservation strategy (changed wrt Ganga-4-1-*)
    • users should be protected from creating accidental resource leaks: loose connection to jobs without releasing computing resources
    • states completed, failed, killed all imply that the resource has been released (as far as the backend can tell)
    • under normal circumstances jobs must be killed before removed: j.remove() will fail if a job may not be killed sucessfully
    • under exceptional circumstances a user may force job removal (but this needs to be done explicitly), e.g. job.remove(force=1)
  • grayed-out states are transient, i.e. never stored in the repository;

ganga_job_stat.gif

Revised state transitions

The following suggestions have been taken into account in addition to the paragraph above:

  • job.fail() method puts a job into the failed state if it is currently in the submitted, running, killed, completing or completed state.
  • job.resubmit() change the job from any of the states running, killed, completed, completing and failed into submitting
  • If the job is running it will try to kill it and fail if it can't. This behaviour is can be ignored by adding a force=1 argument.
  • A template state has been added

ganga_job_stat.gif ganga_job_stat_full.gif

-- JakubMoscicki - 19 Jul 2006

Topic attachments
I Attachment History Action Size Date Who Comment
GIFgif ganga_job_stat.gif r2 r1 manage 42.0 K 2006-07-19 - 19:10 JakubMoscicki  
GIFgif ganga_job_stat_full.gif r1 manage 72.6 K 2006-07-19 - 19:10 JakubMoscicki  
GIFgif ganga_job_stat_small.gif r1 manage 50.8 K 2006-07-19 - 19:12 JakubMoscicki  
Edit | Attach | Watch | Print version | History: r5 < r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r5 - 2006-11-30 - JakubMoscicki
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    ArdaGrid All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback