Introduction

This page is designed as a simple user walkthrough of one of the latest features in Ganga 6.

Ganga 6 sees the introduction of many changes over the last version, both visible to the user and behind the scenes. This page will focus on the change to the specification of output in Ganga

As of Ganga 6, the system for defining output has been completely re-worked offering a more scalable and powerful approach. The distinction between outputsandbox and outputdata which was somewhat limiting has been addressed. Thus both of these attribute have become deprecated and surpassed by outputfiles. With outputfiles the user is free to directly specify where they would like their output through the use of distinct 'File' type objects (see below). For a full discussion of the new feature with implementation details look at the GangaOutput page.

Pre Ganga 6

Before Ganga 6, users specified files that they wanted returned to their local machine by using the 'outputsandbox' and files that they wanted uploaded to a mass storage using the 'outputdata' attributes of the job object. Where the outputdata files ended up depended on the backend that the job was run on for example running on the Grid using the DIRAC WMS/DMS would result in the files being uploaded to a DIRAC grid SE.

Here is an example of specifying different output under pre Ganga 6 releases:

In[0]: j = Job()
In[1]: j.outputsandbox = [ 'mylog.txt' ]
In[2]: j.outputdata = [ 'myhistos.root' ]

Ganga 6 outputfiles

This situation is somewhat limiting as a user might want for example to run locally but then have there files uploaded to a DIRAC SE for use later. Pre Ganga 6 this would have involved manually uploading them. The solution then is to decouple the dependence of the outputdata location from the backend on which the job is run.

The specification of the outputdata location is now done through the use of distinct 'File' objects, one for any possible destination. Since retrieving output locally is just another type of destination it too will have a unique File object and as such there is no need for a special distinction between the outputsandbox and outputdata attributes, hence their deprecation.

File objects

To Date (10th Jan 2013) there exists 4 types of File object.
  • OutputSandboxFile : returns file to the output workspace
  • MassStorageFile : uploads file to mass storage
  • LCGStorageElementFile : uploads file to LCG storage element
  • DiracFile : uploads file to Dirac storage element

These objects have unique methods and attributes as you might expect but they also have common ones outlined below:

attributes:

  • namePattern : This is the 'local' name of the file or if a wildcard pattern is used then the pattern is stored here
  • localDir : This tells the file object which 'local' dir the file is located in. This is needed when using put() and get()
  • failureReason : If a file operation fails then this attribute should be filled with the reason for the failure

methods:

  • get() : Retrieve the file to the local directory specified by the localDir attribute
  • put() : Puts the file from the job.outputdir if attached to a job or the localDir othewise to the respective location.

These objects can also be used standalone as well as attached to jobs. This means that one can also store them in the box for persistification.

Example

We can now see how we can replace the pre Ganga6 example above with the new system as below.
In[0]: j = Job()
In[1]: j.outputfiles = [ OutputSandboxFile( 'mylog.txt' ), DiracFile( 'myhistos.root' ) ]

This can be simplified further by use of the automatic file type detection system as below

In [0]:j=Job()

In [1]:j.outputfiles = [ 'dead.txt', 'parrot.joke' ]

In [2]:j.outputfiles

Out[2]: [OutputSandboxFile (
                 namePattern = 'dead.txt' ,
                 compressed = False ,
                 localDir = ''
                 ), 
              OutputSandboxFile (
                  namePattern = 'parrot.joke' ,
                  compressed = False ,
                  localDir = ''
                  )
             ]

By default any unrecognised file namePatterns are made into OutputSandboxFiles however in the ~/.gangarc file a user can specify a different type e.g a DiracFile, see the GangaOutput page for more details.

Finally these File objects can be used standalone as in the example below:

In[0]: d=DiracFile( '*.root' )
In[1]: d.localDir = os.getcwd()
In[2]: d.put()
Edit | Attach | Watch | Print version | History: r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r1 - 2013-01-10 - AlexanderRichards
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    ArdaGrid All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback