Ganga Job Workspace Interface
Job workspace can be imported as a GPI object. It will be an attribute of a Ganga job, so that
j.workspace
gives access to the job workspace object. The workspace class will have the following interface:
Methods
-
create()
create job workspace,
-
remove()
delete job workspace completely,
-
mkdir(dirname)
create directory “dirname”,
-
rmdir(dirname)
delete directory “dirname”,
These methods need not to be actually exported to the GPI, but they are necessary for the internal implementation.
Ulrik: What happens if I create a workspace object myself? Kuba: You should not be allowed to do so. The class (constructor) should not be exported to GPI
Data
-
dir
local directory associated with the job workspace. This can be in fact a method, which dynamically discovers the directory location. The latter relates to the case of a remote workspace where the local workspace directory of a job is not necessary stay the same but rather depends on the logon computer. Kuba: so this dir would be a path to the local cache of the remote workspace, right?
-
input
an object representing input workspace,
-
output
an object representing output workspace.
The input and output workspace classes can be derived from the workspace class, so that they have all the listed above methods (including “dir”) of the workspace class. They will also have the following methods, which have to be exported to GPI:
-
listfiles()
list content of the input/output workspace,
-
peek(…)
series of peek commands,
and the methods which can stay “private”, but are necessary for the internal implementation:
-
writefile(fileobj, executable=None)
write File or FileBuffer objects to the workspace,
-
readfile(filename)
read file into FileBuffer object.
Note, if peek() method stays defined at the job level as it is now, than the peek() methods of the input/output workspace can be made “private” as well (i.e. not exported to the GPI)
Ulrik: I think they should be exported. If a job is in the completed or failed state, then j.peek() will simply refer to j.workspace.output.peek()
Examples:
-
job.workspace.dir
location of the local job workspace
-
job.workspace.output.dir
location of the local output job workspace (in current implementation it will be job.workspace.dir +”\output”
)
-
job.workspace.output.listfiles()
list files in the output workspace
-
job.workspace.output.peek()
examine contents of the directory (in contrast to the listfiles() the peek() method may also return the list of subjob directories, and the output format may be different)
Ulrik: We might in the future create another subdir here like status which will be accessed in the same way.
Kuba:
Local cache for remote workspace
- we need to consider if the local cache is updated automatically or manually, should this be configurable? Ulrik: I will tend to go for manual update.
- if it is updated automatically then we must avoid that this happens as a side effect of, for example, printing a job
- when should the automatic update take place (when job enters one of the final states?)
- Ulrik: We need a GPI method to delete the local cache as well.
- Ulrik: If we go for some automatic update we might need garbage collection as well to operate within a fixed sized cache. Sounds like too much work to me. Kuba:We need this in any case if the job is deleted when connected to the repository from a different machine.
--
AlexanderSaroka - 11 Oct 2007
--
JakubMoscicki - 08 Nov 2007
--
UlrikEgede - 08 Nov 2007