Job State

Introduction

The purpose of the job state algorithm is to provide a job recovery mechanism for the Panda pilot. Whenever the pilot updates the Panda server with the progress of a job, it also creates/updates a job state file with all information necessary for later job recovery. If there is a crash and the job is lost, the job state file will remain in the job work directory for later recovery by a different pilot. This new pilot will find the lost job state file and will attempt to recover the lost job. If the payload of the lost job finished (but e.g. failed to transfer the data and log files), the job recovery algorithm will move the log to the local SE and will register the data file.

Algorithm

diagram Activity diagram for the job state algorithm.

File format

File name:
jobState-<jobId>.pickle
The job state file contains a container object of the JobState class stored in pickle format.

File object information

[coming soon; basically everything needed to recover the job]

The job state object knows how to store itself/be read back, as well as delete itself and the old work directory.


Major updates:
-- PaulNilsson - 06 Oct 2006

Topic attachments
I Attachment History Action Size Date Who Comment
JPEGjpeg updateJobState.jpg.jpeg r1 manage 23.7 K 2006-10-06 - 17:56 UnknownUser Activity diagram for job state algorithm
PNGpng updateJobState.png r1 manage 21.0 K 2006-10-06 - 18:02 UnknownUser job state algorithm
Edit | Attach | Watch | Print version | History: r5 < r4 < r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r2 - 2006-10-06 - PaulNilssonSecondary
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    PanDA All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback