Batch Job Management with Torque/OpenPBS
The batch system on titan uses
OpenPBS, a free customizable batch system. Jobs are submitted by users with
qsub from titan.physics.umass.edu, and are scheduled to run using a fair-share algorithm which prioritizes jobs based on all users' recent cpu utilization. This means, for example, that a user who wants to run a few quick jobs will not have to wait for another user who has hundreds of 10-hour long jobs already in the system.
Submitting jobs with qsub
Specify job resource requirements (e.g. time, memory, etc. ) with the '-l' option.
To send a job myjob.csh requesting 8:00 of cpu time, use:
>
qsub -l cput=08:00:00 myjob.csh
List Job Identifiers
Print a list of job identifiers of all jobs in the system by user bbrau:
>
qselect -u bbrau
Query the system with qstat
List all jobs:
>
qstat
and which nodes they're running on:
>
qstat -n
Full disclosure:
>
qstat -f
of just one job using its job identifier:
>
qstat -f 1908.titan.physics.umass.edu
Learn about the batch system with qmgr
Print the server configuration:
>
qmgr -c 'print server'
Find out about node titan12:
qmgr -c 'print node titan12'
Node Status with qnodes
List them all:
>
qnodes
or just the ones that aren't up:
>
qnodes -l
* Delete jobs with qdel*
>
qdel 1908.titan.physics.umass.edu
or do it with a qselect to pick all of your jobs
>
qdel `qselect -u bbrau`
All the gory details are in:
[[http://www.doesciencegrid.org/public/pbs/pbs.v2.3_admin.pdf][The
OpenPBS Administrator's Guide]
And of course on titan, you can read the man pages for most of the commands:
>
man qstat
--
BenjaminBrau - 25-Mar-2010