Job State

Introduction

The purpose of the job state algorithm is to provide a job recovery mechanism for the Panda pilot. Whenever the pilot updates the Panda server with the progress of a job, it also creates/updates a job state file with all information necessary for later job recovery. If there is a crash and the job is lost, the job state file will remain in the job work directory for later recovery by a different pilot. This new pilot will find the lost job state file and will attempt to recover the lost job. If the payload of the lost job finished (but e.g. failed to transfer the data and log files), the job recovery algorithm will move the log to the local SE and will register the data file.

Algorithm

diagram

Activity diagram for the job state algorithm. [Correction: a job state file will in fact be re-created on every Panda server update. There is no check if the file already exists.]

Additional details

The job state object knows how to store itself and how to be read back, as well as delete itself and the old work directory. If there are no more job state files in the site work directory, this directory will also be deleted. When a job finishes (successfully) and all output files and log are copied to the SE, the job state file is removed from the site work directory.

File format

File name:
jobState-<jobId>.pickle

File object information

The job state file is a container for all information necessary for job recovery. The file contains the following objects stored in pickle format: There are a few overlaps with these objects but they are insignificant. A typical job state file size is about 4kB.


Major updates:
-- PaulNilsson - 06 Oct 2006



Responsible: PaulNilsson

Topic attachments
I Attachment History Action Size Date Who Comment
JPEGjpeg updateJobState.jpg.jpeg r1 manage 23.7 K 2006-10-06 - 17:56 UnknownUser Activity diagram for job state algorithm
PNGpng updateJobState.png r1 manage 21.0 K 2006-10-06 - 18:02 UnknownUser job state algorithm
Edit | Attach | Watch | Print version | History: r5 < r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r5 - 2006-12-22 - PaulNilssonSecondary
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    PanDA All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback