The issue

A sandbox can be more complex than a simple set of files which are placed in the CWD on the worker node. In particular there can be files in subdirectories both in the input and output sandbox. It is proposed to provide a utility module that can deal with the correct operations on the sandbox both as used directly by Ganga but also on the worker node.

Proposal

Both the input and the output sandbox contains a list of File objects as supplied by the user. Ibn addition files might be added by the application runrime handler.

For input

In the input directory of the workspace for the job the input sandbox is created as a compressed tarfile of everything that the input sandbox should contain. In addition it will contain the backend wrapper script, the tarfile Python module which is not part of Python2.2 and the sandbox module which takes care of the packing and unpacking. On arrival at the WN the wrapper script will call the unpack method of the wrapper script, the input sandbox will be unpacked and the control handed over to the runtime wrapper.

KUBA sandbox and other utility modules which are part of Ganga should be put into a directory with a reserved name e.g. __lib__ in order to minimize the name clashes with regular input files

KUBA in principle we do not have to replicate the utility modules in inputdir of each job, we may specify the absolute path back to the release. this could save us some space if in the future we want to ship more modules

KUBA alternatively, for simplicity, we could ship entire Ganga core (it is small) via wget or similar mechanism directly to the worker node ?

For output

When the runtime wrapper has finished the backend wrapper will call the packing method of the sandbox module with the requested files as an argument. The files will be packed and a control file written which contains a list of the requested files as well as the status of the packing. The files returned from the WN will then be the packed file, the status file and stdout, stderr as returned by a higher level wrapper. If no files are requested for the outputsandbox empty files are returned. The status file contains 1 line for each file in the output sandbox. It has a numeric ID indicating if the file was there or not. The sandbox module will provide a high level interface for reading this file.

KUBA I am not sure what is the function of the status file: if we have a tarball then it is easy to say if the file was on the WN or not -> unless you want to prevent the loss of information in case somebody deletes some file in the local workspace (what if they delete the status file)?

If the packed file will be unpacked in the output workspace or just left there can be decided by an overall configuration parameter in the .gangarc file.

Use

The use of the utility module should be mandatory for all backends which support the execution of user supplied scripts on the WN (all but Dirac?).

Example

A Root job will run with the script angular.C and produce a file angular.root as output. While running it reads a steering script run10.txt from the subdir input and the root file is created in the subdir output.
j = Job(
  application = Root(
    version = '5.11.06' , 
    script = '~/RooFit/angular.C'),
  inputsandbox = [File(name = '~/control/run10.txt', subdir='input'],
  outputsandbox = [File(name = 'angular.root', subdir='output'])

When this job is submitted it will in the input workspace create a compressed tar file inputsandbox.tar.gz containing the files angular.C and input/run10.txt.

To the WN will be transferred the backend and runtime wrappers, inputsandbox.tar.gz, sandbox.py (containing the utility functions for packing and unpacking), and tarfile.py (the tarfile module that is not part of Python2.2).

After the job has finished the ouput workspace will contain the files std.out, std.err, outputsandbox.tar.gz and outputsandbox.status. The outputsandbox.tar.gz contains the file output/angular.root. The outputsandbox.status looks like

1     output/angular.root
where the number 1 indicates that the requested file was there.

KUBA we should have a naming convention, so that all automatically generated files are clearly distinguishable from user files and there is no risk of name clashes

Possible extensions

The utility module could be extended to accept file locations on an SE for the input sandbox. This would mean that files are the files would be copied to the WN before control would be transferred to the runtime wrapper.


-- UlrikEgede - 23 Jun 2006

Edit | Attach | Watch | Print version | History: r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r2 - 2006-06-23 - JakubMoscicki
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    ArdaGrid All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback