Parameter Passing
Passing job parameters from the JDL, such as requirements on memory, or available CPU time, benefits both users and sites. See
the presentation by Douglas McNab from the GDB, 2 December 2009
These requirements have to pass through several systems before they reach the batch system, such as the WMS and the CE. Below you will find the specifics for CREAM.
The way for CREAM to
pass JDL requirements to the batch systems
is by the CERequirements field, and use a batch system specific filter script to process selected requirements into batch system submission statements.
An
example filter script (or BLAH hook) has been developed for Torque
and will be included by default in an upcoming release of the TORQUE_utils meta-package.
Currently (13-Jan-2010) only Torque and LSF have support for this hooking mechanism, Condor and SGE should add this as well (the developers have been contacted).
Parameters to pass along
The following set of parameters from the Glue 1.3 schema are deemed both generic and useful. With the arrival of Glue 2.0, the names will change somewhat.
Glue Parameter |
Description |
Unit |
Torque field |
Unit |
MainMemoryRAMSize |
The amount of RAM |
MB |
mem |
MB |
MaxWallClockTime |
The default maximum wallclock time allowed to each job by the batch system if no limit is requested. Once this time has expired the job will most likely be killed or removed from the queue |
minutes |
walltime |
seconds |
MaxObtainableWallClockTime |
The maximum obtainable wall clock time that can be granted to the job upon user request |
minutes |
walltime |
seconds |
MaxCPUTime |
The default maximum CPU time allowed to each job by the batch system |
minutes |
cput |
seconds |
MaxObtainableCPUTime |
The maximum obtainable CPU time that can be granted to the job upon user request |
minutes |
cput |
seconds |
SMPGranularity |
This is a special parameter (actually not a glue parameter) to indicate how many processes per node an MPI job wants |
# |
ppn |
# |
WholeNodes |
This parameter indicates that the job wants exclusive access to the node(s) it's scheduled on |
boolean |
? |
|
The SMPGranularity and
WholeNodes come from the MPI working group recommendations, see also
bug #58968
and
bug #58878
.
Deployment
For Torque, the pbs_local_submit_attributes.sh will be packaged in an RPM and included in
the next TORQUE_utils patch
.
For the other batch systems LSF_utils the same thing should be done.
A new
YAIM variable, e.g. INCLUDE_BLAH_HOOK=yes/no will toggle the installation of a symbolic link as /op/glite/bin/pbs_local_submit_attributes.sh which will be picked up automatically by pbs_submit.sh.
Point of discussion is whether this variable should default to yes or no.
For the other LRMSs, LSF, Condor and SGE, the same thing needs to be done. Savannah feature requests have been submitted or will be shortly.
Open issues
For direct submission to CREAM the above two points are no issue.
- Condor has no hook mechanism, bug #57307
(fixed with CREAM 1.6), bug #57307
(fixed with CREAM 1.6) or the condor_local_submit_attributes.sh bug #61359
(ready for test).
- SGE has no hook mechanism; bug #61355
or the sge_local_submit_attributes.sh bug #61353
- LSF has no lsf_local_submit_attributes.sh bug #61358
It should be relatively easy to add as the examples from pbs_submit.sh and lsf_submit.sh show.
- With Glue 2.0 the list of parameters to pass is going to change; it is not clear at the moment how to treat this.
--
DVanDok - 13-Jan-2010