Introduction
The
Short Jobs Working Group has developed a
draft proposal addressing the problem of short jobs. The proposal defines a type of jobs called
ShortDeadlineJobs and describes the necessary steps for implementation and deployment.
Short Deadline Jobs in EGEE
A EGEE job is a Short Deadline Job (SDJ) when:
-
It has a deadline constraint.
-
It is guaranteed to complete in a short time.
-
It is unexpected, ie. there isn't any prior reservation.
Apart from the above it is a normal job, submitted and scheduled through standard EGEE mechanisms.
A Torque/MAUI configuration supporting SDJ's has been implemented and tested at
LAL, although production-grade deployment of this configuration depends on small changes to the middleware that are yet to be included in gLite (they will be included in release 3.2).
SDJ's at EGEE-SEE-CERT
Since the next release of gLite will include support for Short Deadling Jobs, the
LAL configuration was recently deployed at the EGEE-SEE-CERT testbed, on an experimental basis, and this document describes the steps taken.
Requirements
As far as the batch system is concerned the
Short Jobs Working Group identifies the following requirements
-
SDJ's should be scheduled without queueing.
-
should not be delayed by other jobs.
-
should either be scheduled for execution or should be rejected.
-
The delay incurred on normal jobs by SDJ's should be bounded.
As far as the middleware is concerned the following requirement is worth pointing out: non-SDJ jobs should
not be treated as SDJ jobs. With the current gLite and JDL this is not feasible, but will be in the next release.
Architecture
The solution implemented at
LAL consists of a separate queue for SDJ's with special resource constraints. The queue naturaand a separate maui scheduling class with corresponding reservations on the worker nodes.
Middleware
For reasons relating to the routing of jobs by the middleware, the SDJ queues must correspond to
GlueCEUniqueID's with the regular expression
.*sdj$
. This will permit users to add the proper requirements in their JDL's and satisfy the dependency that non-SDJ jobs don't get scheduled as such.
Torque queue
As was just mentioned the queues should be named according to the regular expression
.*sdj$
, for example
dteam.sdj
.
In order to minize the delay incurred on normal jobs by SDJ's, the queue must impose smaller cpu and wall time constraints (
resources_max.cput
and
resources_max.walltime
respectively) on SDJ's. Also, optionally, the queue can have a limit on total running SDJ's (
max_running
), in order to minimize the impact on the site.
As per the requirements, jobs entering this queue should not be queued when slots are not available but should be rejected immediately. This is achieved by passing the NOQUEUE flag to MAUI via the SUBMITFILTER mechanism in torque.
In order to achieve concurrent execution of normal batch jobs and SDJ's, each worker node must have extra 'virtual' processors, in addition to those already configured. These extra slots will be reserved for SDJ's as is described next.
Maui policy
Jobs in the SDJ queue, apart from having the NOQUEUE flag, must also not be delayed by other batch jobs. The SDJ queue corresponds to a MAUI class of the same name, so MAUI will be configured to reserve extra 'virtual' slots on each worker node (SRCFG) for jobs in the SDJ class. Also MAUI may optionally be configured to restrict the total jobs for the class.
WMS issues
In the JDL there will soon be (gLite 3.2) a boolean attribute called
ShortDeadlineJob
with which the WMS will know if a job should go to a SDJ queue or not (via matching the
GlueCEUniqueID with the regexp pattern
.*sdj$
). Until then matching must be done manually in the Requirements clause of the JDL and access to a SDJ queue should be restricted to a specific VO for testing.
EGEE-SEE-CERT configuration
In this section we outline the necessary steps taken to configure our site for Short Deadline Jobs. We assume that our queue will be tied to the
dteam
VO and that the queue will be named
dteam.sdj
.
Torque
The following shell commands create and configure the SDJ queue:
qmgr -c "create queue dteam.sdj queue_type=execution"
qmgr -c "set queue dteam.sdj resources_max.cput=00:10:00"
qmgr -c "set queue dteam.sdj resources_max.walltime=00:30:00"
qmgr -c "set queue dteam.sdj enabled=true"
qmgr -c "set queue dteam.sdj started=true"
Optionally, we may limit the total running SDJ jobs, although at EGEE-SEE-CERT we did not opt for it:
qmgr -c "set queue dteam.sdj max_running=8"
The following shell commands create a torque
SUBMITFILTER
that adds the MAUI flag NOQUEUE if the job is in the SDJ queue:
cat > /var/spool/pbs/submit_filter <<EOF
#!/usr/bin/perl -n
s/^(\#PBS\s+-q.+\.sdj\s*$)/\1\#PBS -W x="FLAGS:NOQUEUE"\n/;
print;
EOF
chmod 0755 /var/spool/pbs/submit_filter
echo "SUBMITFILTER /var/spool/pbs/submit_filter" >> /var/spool/pbs/torque.cfg
Last we add the 'virtual' processors for each node, say 2 per node, and restart torque:
sed -i -e "s/np=$CE_SMPSIZE/np=$((CE_SMPSIZE+2))/" /var/spool/pbs/server_priv/nodes
/etc/init.d/pbs_server restart
MAUI
For each node a resource reservation must be made so that the 2 extra slots are reserved for the
dteam.sdj
class:
for node in `sed -e 's/\s.*//' /var/spool/pbs/server_priv/nodes` ; do
echo "SRCFG[$node] HOSTLIST=$node " >> /var/spool/maui/maui.cfg
echo "SRCFG[$node] PERIOD=INFINITY " >> /var/spool/maui/maui.cfg
echo "SRCFG[$node] ACCESS=DEDICATED " >> /var/spool/maui/maui.cfg
echo "SRCFG[$node] TASKCOUNT=1 " >> /var/spool/maui/maui.cfg
echo "SRCFG[$node] RESOURCES=PROCS:2 " >> /var/spool/maui/maui.cfg
echo "SRCFG[$node] CLASSLIST=dteam.sdj " >> /var/spool/maui/maui.cfg
done
In case one has specified a
max_running
attribute for the queue a corresponding parameter is also required in the MAUI configuration:
echo "CLASSCFG[dteam.sdj] MAXPROC=8" >> /var/spool/maui/maui.cfg
Lastly one must restart MAUI to effect the new configuration:
/etc/init.d/maui restart
lcg-CE
The
dteam.sdj
queue can be published as any other queue. Because though of the lack of a proper JDL attribute in gLite 3.1, access to the queue should be restricted to a specific VO. The
site-info.def
should be modified to contain something to the following effect:
QUEUES="$VOS dteam.sdj"
DTEAM_SDJ_GROUP_ENABLE="dteam /VO=dteam/GROUP=/dteam/ROLE=production /VO=dteam/GROUP=/dteam/ROLE=lcgadmin"
and of course the CE software should be reconfigured accordingly.
Job Submission
The following JDL clause was used to send jobs to the SDJ queue:
Requirements = RegExp(".*jobmanager.*\.sdj$",other.GlueCEUniqueID) &&
(other.GlueCEPolicyMaxCPUTime > 5) &&
(other.GlueCEPolicyMaxWallClockTime > 15) &&
(other.GlueCEPolicyMaxRunningJobs == 0 || other.GlueCEStateRunningJobs <= other.GlueCEPolicyMaxRunningJobs);
The following JDL clause was used to
not send jobs to the SDJ queue:
Requirements = (other.GlueCEPolicyMaxCPUTime > 10) &&
(other.GlueCEPolicyMaxWallClockTime > 30);