Long term future of job repository

Desired functionality

  • Keep all relevant information for current and previous jobs
  • Scalable to 100k jobs or more
  • Easy to relocate
  • Easy to send a single job to support for debugging
  • Human-readable?
  • Archiving functionality

Local repository

  • Working (but not well understood by present team?)
  • Based on binary files:
     /home/user/gangadir/repository/user/LocalAMGA/2.2> ls
    jobs/  templates/
    /home/user/gangadir/repository/user/LocalAMGA/2.2> ls *
    jobs:
    Attr-0_1234370948.7450650440  Entr-0_1234432060.1406100145  jobstree/  SEQjobseq_1234371023.5307240678
    
    templates:
    Attr-0_1234370948.7643530040  jobstree/  SEQjobseq_1234370948.7668550508
    /home/user/gangadir/repository/user/LocalAMGA/2.2> ls */jobstree
    jobs/jobstree:
    Attr-0_1234370948.7522061407  Entr-0_1234432061.0458601289
    
    templates/jobstree:
    Attr-0_1234370948.7723031580
    
  • Have users with up to 50k jobs
  • XML repository also exists, used e.g. by GangaTasks

Archiving

  • Reading/building a full 100k job repository on boot is heavy
  • Call for some kind of archiving functionality
    • j.archive()
      • Moves job out of current repository, into archive repo?
      • Tar up job directories?
    • auto-archive, specified in .gangarc ?
      • Archive any job that went into FINISHED N days ago?
      • Archive any job not looked at for N days?
      • Monthly check: Archive current jobs to archived-jobs-X?
    • generalise to multiple repos:
      • current-jobs
      • archived-jobs-jan09
      • archived-jobs-dec08

Remote repository

  • Functionality exists, not maintained
  • Not used?
  • Do we really want to maintain this?
    • Risk maintaining a muti-TB file server on behalf of users, who consider it to be permanent job storage...
  • Initiative: virtual machine, 'repo on a stick'
  • All of this can be achieved by making the repo fully relocatable, easily splittable, and include archiving

XML-based idea

  • One job = one XML file and one dir
     > ls gangadir/
    current-jobs/ archived-jobs-jan09/ archived-jobs-dec08/
    > ls gangadir/current-jobs/
    42/ 43/ 44/ 67/ 68/ current-jobs.xml
    > ls gangadir/current-jobs/42/
    job42.xml input output
    > ls gangadir/current-jobs/43/
    job43.xml input output 0/ 1/ 2/
    > ls gangadir/archived-jobs-jan09/
    45/ 46/ 47/ archived-jobs-09.xml
    > ls gangadir/archived-jobs-jan09/45/
    job45.xml jobinfo.tar.gz
    

-- BjornS - 12 Feb 2009

Edit | Attach | Watch | Print version | History: r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r2 - 2009-02-12 - BjornS
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    ArdaGrid All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback