Batch System Support and Coordination

Batch System Support is a community driven effort by those that require CE adaption for a specific batch system. This aim of this page is to be a focal point for these activities. It links to information for system administrators trying to grid-enable their farm, and also contains a general "How To" for adaption the CE to a specific batch systems.

Supported CEs

There is support for LCG-CE and CREAM. There is a transition plan to replace LCG-CE with CREAM, which includes certain acceptance criteria related to batch system integration. See the status page of batch system support on CREAM.

Notes on parameter passing

See the plans for ParameterPassing to the batch system.

Torque batch system

Torque integration in general is maintained by NIKHEF within SA3.

Torque integration with blah is maintained by INFN within SA3.

Condor batch system

The e-mail list of the Condor batch system group is

project-eu-egee-batchsystem-condor@cernNOSPAMPLEASE.ch

You can subscribe yourself through the SIMBA interface. (http://simba.cern.ch)

Condor integration is maintained by IFAE (PIC) within SA3.

lcg-CE or creamCE

Installation instructions for condor

Queue simulation

Queue simulation instructions for Condor

SGE batch system

Check Current status of the implementation of SGE wiki page.

SGE integration is maintained by CESGA within SA3.

LSF batch system

Workplan for LSF batch system testing at PIC:

  1. Research for the possibility of having lsf server running on a virtual machine
  2. Installation & configuration of server and clients
  3. Possibles enhancements of blah for lsf
  4. Accounting verification apel/dgas
  5. Information system:
  6. testing scheduler scripts
  7. possibility of running different clusters for just one computing element (in particular slc4 testing)
  8. Stress test for more than a few nodes
  9. SAM test script implementation

Support for blah with LSF is maintained by INFN within SA3.

Testing

Details of community based testing should be put here.

Information for LRMS integrators

Information on how to integrate your LRMS for CREAM will appear in the CreamLRMSCookBook. To add a batch system to the glite release check the following points:

  1. Nodetypes supporting the batch system
    • lcg-CE SL3, SL4 (when available)
    • CREAM SL4 (as soon as it is available, you may leverage work from glite-CE as both use BLAH)
  2. Jobmanager on lcg-CE
  3. BLAH plugin for CREAM
  4. Information Provider
  5. Accounting
    • APEL on lcg-CE
    • APEL on CREAM (take glite-CE work as BLAH is the same)

RPMs are needed for Jobmanager, BLAH plugin, Information Provider and APEL (specific part for the batch system).

Information providers

For each batch system, there should be a backend command or set of backend commands that produce a representation of the queue state in a prescribed format. This output is taken by the lcg-info-dynamic-scheduler to calculate, amongst others, the EstimatedResponseTime.

In the current there need to be two (possibly three) scripts:

  • lcg-info-dynamic-provider-{pbs,lsf,sge,condor...}
  • lrmsinfo-{pbs,lsf,sge,condor,...} : this script is called by the lcg-info-dynamic-scheduler
  • vomaxjobs-{maui,lsf,sge,condor,...} : this optional script is called by the lcg-info-dynamic-scheduler
As can be depicted with this diagram: GIP-evolution.png

However, in the near future the lcg-info-dynamic-scheduler will incorporate the functionality of the lcg-info-dynamic-* scripts, as can be depicted as: GIP-evolution-V2.png

This transition will not be a 'big-bang' upgrade but will be phased, e.g, for the batch system 'pbs':

  1. a new version of the 'lcg-info-dynamic-scheduler' will be rolled out , with a flag 'use_old_style_output' set. Test until satisfied;
  2. new versions of the 'lrmsinfo-pbs' and 'vomaxjobs-maui' scripts will be rolled out, but they will still produce old-style output (using a configuration setting). Again, test until satisfied;
  3. the 'lrmsinfo-pbs' and 'vomaxjobs-maui' scripts will be configured to produce 'new style' output (Protocol_V2). Again, test until satisfied;
  4. as the 'lcg-info-dynamic-scheduler' script still has it's configuration setting 'use_old_style_output' set, the GIP will not see anything different;
  5. the 'lcg-info-dynamic-pbs' script is stopped;
  6. the 'use_old_style_output' flag is set to 'false' in the 'lcg-info-dynamic-scheduler' script and the GIP now receives all information from only the 'lcg-info-dynamic-scheduler' script. Do a final test to verify that the GIP is still happy.

Thus a phased upgrade can be done for each batch system.

Configuration

YAIM configuration for Jobmanager, BLAH plugin, Information Provider and APEL but not necessarily for the batch system itself.

For meta-rpms and configuration targets, please follow the model adopted for Torque;

  • glite-TORQUE_server - what you need to install on your HEAD
  • glite-TORQUE_client - what you need to install on your WN
  • glite-TORQUE_utils - what you need to install on your CE or BDII_site (this will include submitter stuff, info providers, accounting etc).

Anticipated installation scenarios;

  • CE with own torque server - CE + TORQUE_server + TORQUE_utils
  • CE with separate torque server - CE + TORQUE_utils
  • Standalone TORQUE server - TORQUE_server + TORQUE_utils
  • WN for torque - glite-WN + TORQUE_client
  • BDII_site - glite-BDII + TORQUE_utils

From this you can see that the <BATCH>_utils configuration target will have to detect the node-type in order to know what to configure.

For other batch systems, gLite will not distribute the batch system software itself, so you would expect (for example)

configuration targets

  • SGE_server
  • SGE_clients
  • SGE_utils

These could be implemented by glite-yaim-sge or a separate yaim rpm could be produced in each case.

meta-packages

  • SGE_utils only
Topic attachments
I Attachment History Action Size Date Who Comment
PNGpng GIP-evolution-V2.png r2 r1 manage 36.0 K 2009-04-24 - 16:40 JanJustKeijser GIP diagram , new situation
PNGpng GIP-evolution.png r2 r1 manage 29.7 K 2009-04-24 - 16:41 JanJustKeijser GIP diagram , current situation
Edit | Attach | Watch | Print version | History: r26 < r25 < r24 < r23 < r22 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r26 - 2010-02-10 - unknown
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    EGEE All webs login

This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright &© by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Ask a support question or Send feedback