Motivation

The LCG RB (gLite WMS as well) has a upper bound limitation of inputsandbox size (by default is 10 MB). Jobs with oversized inputsandbox will encounter job submission failures. Current workaround for ATLAS Athena jobs is to configure RBs to support up to 50 MB inputsandbox.

To overcome this limitation, a concept of inputsandbox cache is introduced in the LCG backend handler. The idea is to pre-upload the oversized inputsandboxes to SEs when preparing the job and to use the job wrapper to download them on the fly before launching the real executable on the worker node.

Implementation

A loop has been added in the jobprepare() method of the LCG class. In this loop, the size of each inputsandbox is checked before composing the job wrapper and JDL.

Inputsandboxes not exceeding the size limitation (defined by config['LCG']['BoundSandboxLimit']) are delt in the usual way; while the oversized inputsandboxes are pre-uploaded to the iocache and the corresponding URI are used in composing the job wrapper and the JDL. The iocache could be defined by j.backend.iocache on the job level or config['LCG']['DefaultSE'] on the session level.

Current implementation uses "lcg-utils" for file transfer. By factoring out implementation details into the GridCache class, it's easy to adopt other file transfer means if needed.

Since there could be some over-sized inputsandboxes shared among subjobs (e.g. the inputsandbox of the master job), it will be more efficient if the shared files are uploaded only once. To implement this feature, inputsandboxes have been successfully uploaded are logged in job.inputdir/__iocache__. Only the file not presented in this list will be uploaded (md5sum is used to check file identity); otherwise the corresponding URI is taken from the list.

Based on the same log file, a method for cleaning up the uploaded files is also exposed to users. For those jobs in final state (e.g. completed), the j.backend.cleanup() can be called manually to remove all the uploaded inputsandboxes associated with the job.

Features

  • User defined SE: use backend.iocache for job level control or config['LCG']['DefaultSE'] for session level

  • User defined limitation of inputsandbox size: defined by config['LCG']['BoundSandboxLimit'] in byte

  • The oversized inputsandboxes shared among subjobs will be uploaded only once.

  • Flexible to adopt other file transfer protocols

Known issues

  • "lcg-utils" may not be available in a pure glite UI environment (e.g. the gLite UI installation on AFS). Current workaround is to run "lcg-utils" command within a "EDG" based shell environment.

  • Performance drawback: there is an overhead of uploading files to the Grid SEs, it's especially significant in bulk submission. Possible solution is to have a multi-thread job preparation loop.

-- Main.hclee - 24 Jan 2007

Edit | Attach | Watch | Print version | History: r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r1 - 2007-01-24 - HurngChunLee
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    ArdaGrid All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback