Bookkeeping for Panda analysis jobs

Introduction

pbook is the next-generation of the bookkeeping application for all Panda analysis jobs. It has some advantages against pathena_util;

  • the central job repository

  • robust database management using SQLite

  • capability to retry old jobs which have been submitted in last 30days

  • dual user-interfaces

  • clean packaging in the panda-client

In the new bookkeeping scheme, all job information is stored in the central repository on the server side and each user makes a local copy on her/his computer. Perviously each job information was stored on the local computer where the user submitted the job. So if a job was submitted from lxplus, the user was not able to kill the job from another computer which doesn't share the home directory, and vice versa. This problem has been solved since the user alway has a consistent job information which is kept on the server.

Two user-interfaces are available, the graphical user-interface (GUI) and the command-line user-interface (CUI). The former may allow users to browse the local job repository more conveniently. The latter invokes an interactive python session where users can manipulate job repository easily. pbook internally runs multiple threads in parallel for better performance, so that the local job repository needs to be updated transactionally. SQLite gurantees that parallel sessions cleanly access the local repository.

pbook.png


Getting Started

Installation and Setup

pbook is included in the panda-client package. See Installation and Setup for how to.

One noteworthy thing is that sqlite3 is required for the local job repository. In general, sqlite3 is already available on SL(C)4/5 machines. Try

$ which sqlite3
If the above command shows a proper path name like
/usr/bin/sqlite3
you can skip the following section and can proceed to How to run. Otherwise, the following procedure is required in addition.

Additional setup when sqlite3 is missing

When sqlite3 is unavailabe on your computer, there are some solutiuons

  • install sqlite3. e.g.,
      $ yum install sqlite3
  • run pbook after setting up Athena rel-14 or higher. e.g.,
      $ source setup.sh -tag=14.2.24,32,setup
      $ source someYourDirectory/etc/panda/panda_setup.sh
  • set PATH and LD_LIBRARY_PATH to include sqlite3. e.g.,
      $ export PATH=/afs/cern.ch/sw/lcg/external/sqlite/3.4.0/slc4_ia32_gcc34/bin:$PATH
      $ export LD_LIBRARY_PATH=/afs/cern.ch/sw/lcg/external/sqlite/3.4.0/slc4_ia32_gcc34/lib:$LD_LIBRARY_PATH
      $ source someYourDirectory/etc/panda/panda_setup.sh

How to run

How to run is just
$ pbook
or
$ pbook --gui
if you prefer GUI.


Usage

The command-line interface

When pbook gets started, it tries to retrieve your job information from the central repository to make a local copy. It may take a few minutes when you run pbook at first time.
$ pbook
INFO : Synchronizing local repository ...
...
Once copying is done, you should get an interactive prompt. Autocomplete is bounded to the TAB key, and the up-arrow key can be used to bring back the command.
Start pBook 0.1.5
>>> 
Try
>>> help()
to see available commands.

The show() command prints all job information

>>> show()
...
======================================
          JobID : 687
           type : prun
        PandaID : 21053668
          nJobs : 1
           site : ANALY_BNL_ATLAS_1
          cloud : US
           inDS : 
          outDS : user08.TadashiMaeno.5d5390b2-5a14-40f9-b628-144ce30cb051
          libDS : user08.TadashiMaeno.lib._1227119249.41.lib.tgz
        retryID : 689
   provenanceID : 685
   creationTime : 2008-12-10 16:31:48
     lastUpdate : 2008-12-11 14:50:11
         params : 
      jobStatus : frozen
           finished : 1

======================================
          JobID : 688
           type : pathena
        PandaID : 21053684
          nJobs : 1
           site : ANALY_BNL_ATLAS_1
          cloud : US
           inDS : data08_cvalid.00000001.SitesValidation.daq.RAW.04
          outDS : user08.TadashiMaeno.93bd05c5-26fa-4eca-aeb1-534505ec622a
          libDS : user08.TadashiMaeno.lxplus214_89.lib._000679.lib.tgz
        retryID : 0
   provenanceID : 684
   creationTime : 2008-12-10 16:39:54
     lastUpdate : 2008-12-11 14:50:12
         params : 
      jobStatus : frozen
             failed : 1
or a single job information when JobID is specified
>>> show(676)
======================================
          JobID : 676
           type : pathena
        PandaID : 20544807-20544808
          nJobs : 1 + 1(build)
           site : ANALY_MWT2_SHORT
          cloud : US
           inDS : 
          outDS : user08.TadashiMaeno.7b07de49-6424-49ff-b02d-40c6d2a4a7d2
          libDS : user08.TadashiMaeno.lxplus232_12.lib._000676.lib.tgz
        retryID : 0
   provenanceID : 0
   creationTime : 2008-11-28 14:12:57
     lastUpdate : 2008-12-11 14:50:48
         params : 
      jobStatus : frozen
           finished : 2
If you want to kill all sub-jobs in JobID=123
>>> kill(123)
It takes ~30min at most to kill sub-jobs. The kill command is propagated to the pilot when it accesses to the panda server. The pilot accesses every 30min.

To retry failed sub-jobs in JobID=123

>>> retry(123)
Once sub-jobs are retried they will have a new JobID which is set in retryID. For example,
>>> show(123)
======================================
          JobID : 123
           ...
        retryID : 126
   provenanceID : 0
126 is the new JobID in this case. JobID=126 has provenanceID=123, inversely. So users can track down the history.
>>> show(126)
======================================
          JobID : 126
           ...
        retryID : 0
   provenanceID : 123
Sometimes you may want to synchronize the job repository manually. In this case,
>>> sync()
Press Ctl-D to exit.

The graphical interface

GUI is rather self-explanatory.

pbookGUI.png

The left window shows the list of jobs in the local repository. The top-right window shows the summary of the job information selected in the left window. Users can invoke any command (kill, retry, sync, etc) using the tool-bar buttons.

FAQ

How to show running jobs

show('running')

How to show PandaIDs in particular states

show(123,showPandaIDinState='activated,failed')

How to restore the pbook database

When errors are shown like 'database disk image is malformed', you can restore the local DB:
pbook --restoreDB


Contact Email Address: hn-atlas-dist-analysis-help@cern.ch



Responsible:

Never reviewed

Topic revision: r14 - 2012-05-29 - ElenaOliverGarcia
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    PanDA All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2018 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback