Motivation

The LCG RB (both EDG and gLite WMS) has a size limitation on job's inputsandbox (by default is 10 MB). Jobs with oversized inputsandbox will encounter job submission failures. Current workaround for ATLAS is to configure RBs to support up to 50 MB inputsandbox.

To overcome this limitation, a concept of inputsandbox cache is introduced in the LCG backend handler. The idea is to pre-upload the oversized inputsandboxes to SEs when preparing the job and to use the job wrapper to download them on the fly before launching the real executable on the worker node.

Implementation

A loop has been added in the jobprepare() method of the LCG handler. In the loop, the size of each input sandbox is checked before composing the job wrapper and JDL.

If the sandbox doesn't exceed the limitation defined by config['LCG']['BoundSandboxLimit'], it will be attached with the job and shipped through the resource broker; otherwise the sandbox is uploaded to a remote storage element before job submission. In the later case, the reference to the pre-uploaded sandbox (e.g. GUID or URI) will be given to the job. The reference is also logged in job.inputdir/__iocache__ for other management purpose (e.g. cleanup). If the oversized sandbox is shared among sub-jobs, it will be uploaded only once.

For transferring the oversized input sandboxes, a GridCache class is introduced to provide there basic methods, upload(), download(), delete(). All the methods are implemented with a retry mechanism. Current implementation wraps the lcg-utility commands to perform file transfers on the grid.

Usage

Although this feature is automatically applied when the LCG handler detected an oversized input sandbox, there are ways to configure (or disable) it:

  • force to enable/disable the feature: set config['LCG']['BoundSandboxLimit'] to a very small/large value (in byte)
  • force to use certain storage element: set config['LCG']['DefaultSE'] or j.backend.iocache (j.backend.iocache takes precedence)

As the current implementation uses LFC, the LFC_HOST env. variable is automatically detected using the lcg-infosites command. The config['LCG']['DefaultLFC'] can be given as a backup setting if the LFC_HOST cannot be obtained. The value used to upload the oversized input sandbox will be set as an "environment variable" in the JDL to ensure the same LFC will be used by the WN to download the sandbox.

In addition, a method for cleaning up the uploaded files is also exposed to users. For the job at a final state (e.g. completed, failed), one can call j.backend.cleanup() to manually remove all the uploaded input sandboxes associated with the job. For instance, the LCG handler doesn't cleanup them automatically.

SRMv2 space token

As the SRMv2 was currently adopted by HEP experiments for managing the data stored on distributed storage elements, a specific srmv2 space token needs to be specified in uploading oversized input sandbox to avoid miss-using the storage technology (e.g. you don't want the input sandbox being staged into tape). Starting from Ganga 4.4.10, one could specify the space token in the following two ways:

  • set config['LCG']['DefaultSE'] in the syntax: token:<TOKEN_NAME>:<SE_NAME>
  • or set config['LCG']['DefaultSRMToken']

If both are set, the first setting takes precedent.

Known issues

* For gLite bulk job, the WMS sets the sandbox restriction on job collection. The LCG handler checks the sandbox size only on each individual job.

-- Main.hclee - 24 Jan 2007

Edit | Attach | Watch | Print version | History: r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r3 - 2008-03-31 - HurngChunLee
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    ArdaGrid All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback