Queue simulation in Condor
This recipe explains a method to mimic PBS queues but using Condor as a batch system. There are multiple approaches to simulate that behaviour:
A) Deploying multiple schedulers
A Condor CE is, in fact, a scheduler with a single queue. We can exploit that and set up more than one scheduler, each on a machine on its own. Then the queue part of the CEID (<resource>:<port>/cream-<batchsystem>-<queue>) is used to redirect to the specified scheduler/queue. That is already implemented and works fine.
For instance we can set up 3 schedulers (queue1, queue2, queue3), each on a machine with the same hostname. Then we can send jobs to any of them like this:
glite-ce-job-submit -r vce03.pic.es:8443/cream-condor-queue1 [...] or
glite-ce-job-submit -r vce03.pic.es:8443/cream-condor-queue2 [...] or
glite-ce-job-submit -r vce03.pic.es:8443/cream-condor-queue3
CREAM and the condor_submit.sh script will call condor_submit, passing along the name of the queue as the -name argument. For instance:
condor_submit job.submit -n queue1
And that command will look up for a scheduler in host
queue1
and send it the job.
What's more, we don't even have to use the real hostname of the machines. There is an attribute called SCHEDD_NAME that defines the name of the scheduler. For example, we can implement
queue1
on top of a machine called
machine1
. We only have to define
SCHEDD_NAME = "queue1"
and then send the jobs as
glite-ce-job-submit -r vce03.pic.es:8443/cream-condor-queue1@machine1
This way, every scheduler/queue can implement different policies regarding regarding its jobs, users, etc.
B) Implement «virtual queues» using attributes (recommended)
A better way of doing this, and more on the philosofy of Condor is using attributes. Condor attributes are the pillar of the entire batch system. They provide a powerful framework to implement extensible, flexible and very fine-grained policies. For this method to work, we have to patch CREAM condor_submit.sh script. In this script, as it has been described in the former point, the queue section of CEID is used as the scheduler. In this approach we will override this feature and add the queue of the CEID as a custom attribute to the job. Then, the job will be submitted to the default scheduler (usually localhost). For instance, a job submitted with this command:
glite-ce-job-submit -r vce03.pic.es:8443/cream-condor-queue1 [...]
condor_submit.sh gets called and generates a submit file that forwards to Condor. In order to let the scheduler to know about the queue the job comes from, we need to insert a custom attribute. This attribute specifies the virtual queue and allows the scheduler to apply the different policies. Before we come to that, condor_submit.sh has to me modified. This are the modifications required:
[...]
# Hang around for 1 day (86400 seconds) ?
# Hang around for 30 minutes (1800 seconds) ?
leave_in_queue = JobStatus == 4 && (CompletionDate =?= UNDEFINED || CompletionDate == 0 || ((CurrentTime - CompletionDate) < 1800))
EOF
# Add custom queue Attribute
if [ ! -z "$queue" ]; then
echo "+BatchQueue=\"$queue\"" >> $submit_file
fi
cat >> $submit_file << EOF
queue 1
EOF
[...]
#echo $queue | grep "/" >&/dev/null
## If there is a "/" we need to split out the pool and queue
#if [ "$?" == "0" ]; then
# pool=${queue#*/}
# queue=${queue%/*}
#fi
#if [ -z "$queue" ]; then
target=""
#else
# if [ -z "$pool" ]; then
# target="-name $queue"
# else
# target="-pool $pool -name $queue"
# fi
#fi
[...]
Until here, the jobs are now labeled with an attribute that specifies its «virtual queue» in its submit file. For example, using this attribute (
BatchQueue) we can enforce a walltime limit for each virtual queue: Short (2 hours), Medium (12 hours) and Long (48 hours). This is the configuration added in the Condor to apply these policies:
# Queue simulation
IsLongJob = ( TARGET.BatchQueue =?= "long" )
IsMediumJob = ( TARGET.BatchQueue =?= "medium" )
IsShortJob = ( ( $(IsLongJob) == FALSE ) && ( $(IsMediumJob) == FALSE ) )
LongJobWallTimeLimit = ( 48 * 60 * 60 )
MediumJobWallTimeLimit = ( 12 * 60 * 60 )
ShortJobWallTimeLimit = ( 2 * 60 * 60 )
RemoveLongJob = ( TARGET.RemoteWallClockTime > $(LongJobWallTimeLimit) )
RemoveMediumJob = ( TARGET.RemoteWallClockTime > $(MediumJobWallTimeLimit) )
RemoveShortJob = ( TARGET.RemoteWallClockTime > $(ShortJobWallTimeLimit) )
SYSTEM_PERIODIC_REMOVE = ( ( $(IsLongJob) && $(RemoveLongJob) ) \
|| ( $(IsMediumJob) && $(RemoveMediumJob) ) \
|| ( $(IsShortJob) && $(RemoveShortJob) ) )
Note that the jobs which do not specify a
BatchQueue attribute are treated as belonging to the Short queue. Finally, there is a question not resolved in this recipe, and it is publishing all this information to LDAP. It has to be worked on. For any questions, send me a email.
--
PauTallada - 04-Nov-2009