Support for lazy loading

LL=lazy loading

  • main use cases:
    • level 0: improve the startup time
    • level 1: partially loaded objects (load metadata only) e.g. do not load all 600 subjobs of a given master job
    • level 2: partially load object on request at the attribute level (TOO FANCY)

  • assumptions:
    • repository gets fixed and the whole table is not loaded in memory
    • metadata loading is fast (to be checked) and metadata is small

  • we should mark MD items in the schema
  • if MD changed then the table in repository must be updated accordingly
    • in principle it could be done generically
    • but practically it is done manually (with the current implementation)

  • LL is completely transparent at GPI level
    • dir(j) always gives all attributes
    • accessing an unloaded attribute triggers a full load of the whole object tree
  • LL is NOT transparent at the object level
    • dir(j._impl) will give all attrributes as well
    • accessing an unloaded attribute raises an exception
    • TODO: identify and document the internal plugin code (e.g. monitoring method) which is possibly affected by this
  • we want to have the support for lazy loading at the object level
    • e.g. :j._impl.backend.actualCE should trigger a full load of the job (unless this attribute is MD)
  • a metadata schema is available here: jobs._impl.repository.schema
    • this schema should be created from the object schema (dynamically) based on MD marking
    • however creating new MD columns is manual (at this moment)
  • the mapping of object attributes into the metadata attributes may be done as follows:
    • j.id -> jobs._impl.repository.schema['id']
    • j.application._name -> jobs._impl.repository.schema['application._name']
    • j.backend.id -> jobs._impl.repository.schema['backend.id']
    • len(j.subjobs)!=0 = -> =jobs._impl.repository.schema['split']
  • the name mapping to SQL columns should be checked for special chars
  • should MD attribute be plugin class specific or category specific (e.g. backend.actualCE or LCG.actualCE)

  • a component object may be partially (MD) or fully loaded
  • "fully" means that the object and all its unamnaged subobjects have been fully checked out (blob)

  • operation (e.g. slice or print of non-MD attribute of j.subjobs) could optimize the loading by requesting a full load of all subjobs
  • however monitoring of a subjob should not trigger the full load of other subjobs (e.g. completed ones)
  • managed object is an object which has its own, distinct representation in the repository (a row in the table)
    • managed object has the following property: hasattr(obj.registry) and not obj.registry is None

  • examples (subjobs are managed objects):
    • master job partially loaded, subjobs partially loaded (normal checkout)
    • master job fully loaded, subjobs partially loaded (e.g. j.nonMDattr accessed)
    • master job paritally loaded, certain subjobs fully loaded (monitoring example)
    • if master job is partially loaded, then j.subjobs[i].nonMDattr triggers full load of the subjob only

Ideas for object level implemetation:

  • use implobj.loadstate attribute to represent "partially", "fully" and [? "failed" (which is status == 'incomplete' now)] * the state may be kept for children attrs in the parent
  • "partially" means that metadata for the object and all its sub-objects has been loaded
    • still to be checked for performance

Implementation:

  • gangaObjectFactory should create objects in the partially loaded state
  • GangaObjectDescriptors: modify getters and setters
  • decide where the loadstate flag is (if in every subobject or in the root object)
  • decide how to make a mapping between the attributes and metadata
    • proposal 1: have a dictionary in the repository and generate schema information for each class automatically
    • proposal 2: have a dictionary in each class and generate the global dictionary for repository
    • proposal 3 (OK): extra flag in the schema

  • any setattr first triggers a full load

Some questions:

  • if the subjob is fully loaded and turns out to be in the 'incomplete' status what do we do:
    • make the subjob 'incomplete' (YES, and all subjobs)
    • make the master job 'incomplete' (YES)

Extra flaws we have seen

Support for incomplete status: currently the assignment at the object level to the 'incomplete' job suceeds. In the future it should probably fail with exception.

-- JakubMoscicki - 20 Jun 2006

Edit | Attach | Watch | Print version | History: r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r3 - 2007-07-12 - JakubMoscicki
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    ArdaGrid All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback