Goals of the Refactoring

  • Integrated backend for jobs, tasks and the GangaBox
  • lazy loading / unloading * this involves indexing using metadata
  • (re)consider and (re)implement intersession locking
  • enable archiving of older jobs
  • have multiple synchronized ganga instances
  • keep open the possibility to reintroduce remote and possibly sqlite/sql backends.


Summary of Current Structure

  • VStreamer streams GangaObjects to files
  • JobRepositoryXML and JobRepository are backends to the JobRegistry
  • GangaTasks uses VStreamer directly, using a different method than JobRepositoryXML

Summary of Proposed Structure

  • Introduce the GangaRepository base class as an abstract named repository of Ganga Objects, derived class GangaXMLRepository as actual backend interacting with XML and files.
  • Introduce an index of top-level objects in every repository to enable lazy (un)loading and to facilitate repository sharing

Older discussions and proposals


CVS Branch: Ganga-XML-Refactoring-branch

Modified files:

  • Core/GangaRepository/GangaRepository.py - here we define new interface for GangaRepository
  • GPIDev/Base/Objects.py



  • Introduce a per-object _index_cache, filled by the GangaRepository using the per-class _index_cache_parameters. Make the job list use the index cache (o.getIndexValue(string))
  • GangaRepository can save the index cache to enable lazy loading on startup
  • Objects can be freed and substituted by empty objects with _index_cache to free up memory

XML details

  • Split the index (and the subdirectories) into chunks of 1000 to keep the index loading/saving fast (cPickle of dummy index, 10 str entries of 10-30 chars, 1000 jobs, lxplus, proto 1, write: 0.3s, load: 27ms. Proto 2 is slightly slower)
  • Timestamp update timings: AFS < 10ms; NFS Munich 1,5s; SSHFS CERN-Munich 5s; note that laptops can have quite large ~10 sec time shifts


  • Initialized with (unique) name, type and location. Type defaults to XML. Location defaults to "gangadir/User.Name/LocalXML/6.0/" (could also be a server address)
  • If __init__ returns without throwing Exceptions, it must have created "target" objects for all encountered root-level objects in the repository. If _data is not filled, _index_cache of the corresponding object must be filled.
  • If any object or parts of it cannot be initialized due to missing plug-ins or errors, the corresponding root object must not be destroyed in the persistency, and should be loadable again once the error is fixed. The root object must be set read-only, similar if a root object is locked by another ganga session
  • Any Registry object must notify the GangaRepository on object change. Suggestion: identify consistent sets of modifications at Registry level and submit. Submissions should then be atomic if possible.
  • On Root object registration (GangaRepository.register(obj)) the GangaRepository must return a unique ID associated with that object. The Registry is responsible for setting any fields with this ID or expose this ID to the user..
  • All objects that are fully loaded into memory and not set read-only must be locked in the backend
  • The Checkout of objects should be fully transparent: After __init__ at least the index must be loaded; if any non-indexed field is accessed the full object must be loaded and checked for a lock (if no global lock is held)
  • The Registries should suggest inactive jobs to be retired/unloaded from memory (?)

-- JohannesEbke - 03 Jun 2009 -- JakubMoscicki - 04 Jun 2009

Edit | Attach | Watch | Print version | History: r5 < r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r5 - 2009-06-04 - JakubMoscicki
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    ArdaGrid All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback