Internal Junk (to be reviewed)

Other job properties

The subjobs get the master job name unless splitter modifies it.

The application and the backend type are the same for all subjobs and the master job. I think there is a implicit assumption about this in IBackend.updateMonitoringInformation()

The splitter objects and their interactions

The splitter is a pluggable object (in the same way as other GPI objects).

The splitter creates a list of subjob objects using the master job as a template and some predefined splitting strategy (controlled by splitter properties).

In the general case the splitter may mutate the application object. In the case of pure dataset-based splitting this is not necessary, because inputdata is a property of a job. It is up to the application to define which properties may be mutated by splitter (a "mutable convention"). Splitter must ensure that "mutable convention" is respected.

The backend object may also be mutated during splitting. Some of the use case examples:

  • if data chunks are not of the same size, the time requirement for the subjobs may be changed;
  • user may want to have explicit control on the subjob destination (defined as a part of the splitting strategy).

So, the splitter may depend on the type of inputdata, application or backend. The submission may fail if the splitter is not compatible with one of these components.

The job.submit() pseudocode:

    assert(job.subjobs == [])
    assert(job.master == None)

    if job.splitter:
        job.subjobs = job.splitter.split()

    repository._add_subjobs_and_commit()
   
    mod,mc,sc = appmgr.configure(job)

    if mod: job._commit()

    if jobmgr.submit(job,mc,sc):
        return 1

    # on errors
    return 0

Application, Backend and RuntimeHandler interface

Base classes for applications, backends and runtime handlers are mandatory (this is a change wrt Ganga 4.0.x).

In general there are two flavours of the configuration and submission methods which have different aspects:

  • the shared/master job aspect,
  • the specific/subjob aspect.

In job submission the shared/master aspect is considered only once while the specific/sub aspect is considered the number of times which is implied by the splitting. So it is either equal to the number of subjobs or equal to one if no splitting.

The interfaces have been desigend in a 'backwards-compatible' way. This means that the applications and backends may be used without changes with the new base classes.

By default splitting is enabled for all "old" applications but it will be innefficient (the application will be configured for each subjob). The bulk submission is emulated with individual job submission in a loop.

The prototype of base classes and their intrerface will follow soon.

File Workspaces

The structure of the workspace is as follows:

JOBDIR
  -- input        : masterjob config (shared)
  -- output       : masterjob output (merged)

                  ** if splitting enabled **
  -- SUBJOB_i     : subjob dir
      -- input    : subjob config (specific to i)
      -- output   : subjob output

With splitting enabled, the files generated while configuring the specific aspect of subjobs are stored in the subjob directories.

With splitting disabled, the master directory is used to store files generated while configuring both master and specific aspects of the job.

This means that filenames should be unique if you take the master directory and any of the subjob directories.

Interaction with the repository

It should be possible to modify the status of individual subjobs in a fast way. There should be add() method to enable adding a range of subjobs in one go. It should be also possible to add additional subjobs at a later time.

Ideas for generic splitters

Under development

I like the idea of a generic ArgSplitter (ExeSplitter is specific to Executables). Currently ArgSplitter depends on application having 'args' attribute. We could make an even more GenericSplitter which could do something like this:

          for arg in self.args:
            # Copy job in naive way. Needs revision
            j = Job()
            j.copyFrom(job)
            # Add new arguments to subjob
            j.application=job.application
            for k in arg:
             setattr(j.application,k,arg[k])
            logger.debug('Arguments for split job is: '+str(arg))
            subjobs.append(j)
And be used like as ArgsSplitter:
   j.splitter = GenericSplitter(args={'args':[['AAA',1],['BBB',2],['CCC',3]]})
Or like this as ExeSplitter:

j.splitter = GenericSplitter(args=....)

-- JakubMoscicki - 14 Nov 2007

Edit | Attach | Watch | Print version | History: r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r1 - 2007-11-14 - JakubMoscicki
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    ArdaGrid All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback