Using the PBS Batch System

Last update on Thursday, 24-Jan-2008 16:47:04 CST.

Batch, or batch processing, is the capability of running jobs outside of the interactive login session. In this document, batch implies a complex subsystem which provides for control over job scheduling and resource contention. On k2, the batch system is part of the Portable Batch System (PBS). PBS defines various queues, which are collections of ordered jobs lined up for execution. The use of the term "queue" however does not imply the ordering is "first in, first out." Each queue is defined as a set of attributes such as queue name, queue priority, queue resource limits, and job count limits. The batch system allows users to overcome resource limits imposed on interactive (sometimes referred to as "command-line") processing and to evenly and efficiently regulate the execution flow of jobs.

The interactive limit (CPU time) per login session on all systems is 20 minutes. Any violations of this limit will result in process termination. A user may only use a maximum of two processors simultaneously for interactive processing. A user is expected to lower this limit under heavy system loads. Exceptions to this policy will be considered by the staff on a per case basis. This limitation must be overcome by submitting the job in "batch mode" as described below.

PBS Job Files

A PBS batch job script is a text file with PBS directives and Unix commands. The PBS directives are always at the beginning of the file and are specified in lines that start with the #PBS keyword and continue with other job specifications. These typically describe the job's characteristics (e.g. job name, job shell, etc.) and the resources (e.g., number of cpus, memory, etc.) it needs. There are also several PBS environment variables that you should be aware of.

The following is a sample batch job file for the PBS batch system on k2:

#PBS -N myjob 
#PBS -S /bin/tcsh 
#PBS -j oe
#PBS -l cput=4:00:00
#PBS -l ncpus=2,mem=1gb

cd $TMPDIR

cp $PBS_O_WORKDIR/inputfile1 .
cp $PBS_O_WORKDIR/inputfile2 .
cp $PBS_O_WORKDIR/myprog .

./myprog

cp outputfile $PBS_O_WORKDIR
qstat -f $PBS_JOBID

The explanation of each line is listed below:

Line Explanation
#PBS -N myjob The name of the batch job will be myjob.
#PBS -S /bin/tcsh The bash shell will be used to interpret the batch job script.
#PBS -j oe The standard output and error streams will be merged into the standard output stream file. The standard output stream will be implicitly stored in $PBS_O_WORKDIR/jobname.oNNN where jobname is the name of the job and NNN is job identifier.
#PBS -l cput=4:00:00
#PBS -l ncpus=2,mem=1gb
This job requests 4 hours of cpu time, 2 cpus, and 1 GB of physical memory.
cd $TMPDIR Make $TMPDIR the job's working directory.
cp $PBS_O_WORKDIR/inputfile1 .
cp $PBS_O_WORKDIR/inputfile2 .
cp $PBS_O_WORKDIR/myprog .
Copy files to be used for the job from the job submission directory, $PBS_O_WORKDIR, directory to the $TMPDIR directory.
./myprog Execute the program myprog.
cp outputfile $PBS_O_WORKDIR Copy the output file generated by the execution of myprog to the $PBS_O_WORKDIR directory.
qstat -f $PBS_JOBID Prints job information and statistics. You should examine the resources_used lines to understand the resource usage of your job.

The number of cpus specified for a job in a #PBS directive (-l ncpus=##) MUST be the same as that specified for the running of a program through its interface. Specifically, for MPI program the -np parameter of the mpirun command must be set equal to the value of ncpus above. Similarly, for OpenMP programs the value of the $OMP_NUM_THREADS environment variable must be set to the same value as ncpus. This requirement also applies for commercial application programs, such as Gaussian and ABAQUS. Two sample batch job files below illustrate the point.

Sample Batch Job File for Gaussian

#PBS -N sample -j oe
#PBS -S /bin/tcsh
#PBS -l cput=10:00:00,mem=500mb,ncpus=2

# Initialize environment 
setenv g03root /usr/local/g03
source $g03root/g03/bsd/g03.login

set echo # Show issue commands in output

# Copy input files to $TMPDIR
cp sample.com $TMPDIR

# Run Gaussian 03
cd $TMPDIR
g03 < sample.com

# Copy output file to home directory
cp sample.log $HOME

# Get CPU time and other info about job
qstat -f $PBS_JOBID

The Gaussian input and/or the Default.route file must specify the same number of cpus as the PBS ncpus argument. The job output will goto sample.oNNN where NNN is the job ID.

Sample Batch Job for ABAQUS

#PBS -N test_axi1 -S /bin/tcsh -j oe
#PBS -l ncpus=1,cput=22:00:00,mem=500mb,vmem=5gb

cd $TMPDIR
cp $PBS_O_WORKDIR/axi1.inp .

abaqus job=test_axi1 cpus=1 input=axi1.inp

cp test_axi1.* $PBS_O_WORKDIR

The ABAQUS ncpus argument must match the PBS ncpus argument. The job output will goto test_axi1.oNNN where NNN is the job ID.

Job Submission: The qsub command

Use the qsub command to submit a job as shown below:

k2% qsub myjob
1234.k2

One of the first things that happen when a job is submitted is the assigning of a unique job id to it by PBS.

Job Submission Options

A list of the more commonly useful options for submitting batch jobs is listed below:

Option Description
-e path Defines the path to be used for the standard error stream of the batch job.
-j join A join argument oe directs the merging of the standard out and standard error streams into the standard out. A join with eo merges the two streams into standard error, If the join argument is n or the option is not specified, the two streams will be two separate files.
-l resource_list Specifies resources and associated maximal levels of use by the job. Commonly used resources are ncpus, cputime, walltime, mem, vmem, and file. Resources that are not explicitly specified will cause the assumption of default values that are in effect for each queue. Additional sources of information here are the listings of the qlimit command, the qstat -Qf queue command and the pbs_resources man page.
-m mail_options Specifies which conditions under which the server will send an email message about the job.
-N name Declares a name for the job.
-o path Defines the path to be used for the standard output stream of the batch job.
-S shell Declares the shell that interprets the job script. We strongly recommend that you use the bash shell.
-v variable_list Any environment variables specified in this list will be exported from the qsub command's environment to the job's environment.
-V All environment variables will be exported from the qsub command's environment to the job's environment. We recommend that you use the -v varlist option to import only the necessary environment variables.

Queue Structure

A queue is a software structure through which PBS manages the processing of jobs. Batch queues are defined by a number of parameters of which the most important are resource limits. There are several such "execution" queues from which PBS schedules jobs for execution. Jobs are routed to the appropriate queue based upon, for the most part, a job's resource limit specifications. Some queues can be used only by special permission. These are generally the high-cpu queues, bench8, bench16, bench32, bench48, but xlt_xlm8 and special belong in this category (of special permission) as well. The special-access queues must be used only for jobs that match the queues special characteristics. You can the output of the qlimit command at this link, which is updated every five minutes.

PBS Resources

The following resources are the more commonly used in the PBS batch system on k2. Additional sources of information are the listings of the qlimit command, the qstat -Qf queue command and the pbs_resources man page.

Resource Explanation
WALLTIME Maximum amount of wall-clock time duration for the job within the system since the beginning of execution. The format is hh:mm:ss.
CPUTIME Maximum amount of cpu time that the job can consume. The format is hh:mm:ss. ALL jobs should specify cpu time. Failure to specify cpu time will cause PBS to assign a job just 2 hours, the default value.
MEM Maximum amount of physical resident memory that the job can occupy.
PMEM Maximum amount of physical resident memory any process can occupy belonging to the job.
VMEM Maximum virtual memory per job.
PVMEM Maximum virtual memory per process in the job.
NCPUS Maximum number of cpus allowed per job.
FILE Maximum size a file can attain per job.
MAXR Maximum number of jobs that can be executing concurrently in a given queue.
USERR Maximum number of jobs a user may run concurrently in a given queue.

Job Monitoring and PBS commands

The following commands are for common tasks involving the PBS batch system on k2 and titan. More information about batch processing can be obtained from the following man pages: pbs, qsub, qstat, and qdel.

Task Command
Submit a job qsub jobfile
Show running jobs qstat -r
Show running jobs and when they began executing qstat -rs
Show the jobs that are not running qstat -i
Show the jobs that are not running and why they are not running qstat -is
Show the status of all the queues qstat -q
Show which queues you have access to qaccess
Show detailed information for a given job qstat -f jobid
Show detailed information for all queues or a specific queue qstat -Qf [queue_name]
Show all jobs qstat -a
Show all jobs for a given user qstat -u user
Show the processes under a given running job p_qstat jobid
Show the status of the batch system in a manner like top. Has a built-in help screen for available commands. bmonitor
Delete a given job qdel jobid
Shows the job and queue limits of various execution queues qlimit
Find all jobs over the last N days for a given user findjobs -n N -u username
Show the job history over the last N days for a given job. Format the output to 80 columns. Note, the -w flag is necessary when the output is sent to a pipe or a file. tracejob -n N -w 80 jobid

Job Accounting

The qstat -f jobid command

The qstat -f provides detailed information about a job. Some notable fields in the output are resources_used, queue, qtime, comment, and etime.

k2% qstat -f 80326
Job Id: 80326.k2
    Job_Name = Ir-OL-start
    Job_Owner = gooduser@k2.tamu.edu
    resources_used.cpupercent = 394
    resources_used.cput = 31:44:48
    resources_used.mem = 757008kb
    resources_used.ncpus = 4
    resources_used.vmem = 1665776kb
    resources_used.walltime = 08:16:22
    job_state = R
    queue = mt_mm
    server = k2
    Checkpoint = u
    ctime = Thu Aug 26 11:37:11 2004
    Error_Path = k2.tamu.edu:/home/gooduser/errfile
    exec_host = k2/0*4
    Hold_Types = n
    Join_Path = oe
    Keep_Files = n
    Mail_Points = a
    mtime = Fri Aug 27 01:54:51 2004
    Output_Path = k2:/home/gooduser/outfile
    Priority = 0
    qtime = Thu Aug 26 11:37:11 2004
    Rerunable = True
    Resource_List.cput = 50:00:00
    Resource_List.file = 20gb
    Resource_List.mem = 2gb
    Resource_List.ncpus = 4
    Resource_List.nice = 15
    Resource_List.pcput = 50:00:00
    Resource_List.pmem = 1gb
    Resource_List.pvmem = 128gb
    Resource_List.vmem = 128gb
    session_id = 2451749
    substate = 42
    Variable_List = PBS_O_HOME=/home/gooduser,PBS_O_LOGNAME=gooduser,
        PBS_O_PATH=/usr/sbin:/usr/bsd:/sbin:/usr/bin:/usr/bin/X11:.:/usr/local
        /bin:/usr/freeware/bin:.:/usr/local/bin:/usr/free
        ware/bin:,PBS_O_MAIL=/usr/mail/gooduser,
        PBS_O_SHELL=/bin/tcsh,PBS_O_TZ=CST6CDT,PBS_O_HOST=k2.tamu.edu,
        PBS_O_WORKDIR=/home/gooduser/,PBS_O_SYSTEM=IRIX64,
        PBS_O_QUEUE=shared
    euser = gooduser
    egroup = user
    hashname = 80326.k2
    queue_rank = 69374
    queue_type = E
    comment = Job run on node k2 - at Fri Aug 27 at 01:54
    alt_id = jid=0x4ca7000000006aae,ash=0x4ca7ffff000079b8
    etime = Thu Aug 26 11:37:11 2004
    run_count = 1

Common PBS Environment Variables

PBS Environment Variable Description
$PBS_O_WORKDIR The absolute path from which the job was originally submitted from.
$PBS_JOBID The job identifier assigned to the job by the batch system. The job identifier will typically be nnn.k2 where nnn is a positive non-zero integer.
$PBS_JOBNAME The job name supplied by the user.
$PBS_QUEUE The name of the queue from which the job is executed.
$TMPDIR A job's default working directory is $HOME. That is frequently undesirable because of space limitations and lower I/O performance. Going to $TMPDIR (=/work/$PBS_JOBID), which is created at a job's start and deleted at its end, affords a large disk area and, typically, better I/O performance. You must explicitly save any files you need before job completion. Preferably, in batch jobs you should save such files on local disk areas, such as /scratch/$USER or $HOME. File transfers in a batch job involving the tape archive or a remote host should be strongly avoided because of the possible long delays they can cause.

Policies and Best Practices

Batch system policies are approved by the Steering Committee, review@sc.tamu.edu, and may on occassion change to reflect changing needs and load conditions. Your adherence to what we say below will be appreciated. What we aim at is to convince you that a little care on your part in doing certain things right will go a long way to keep k2 and titan efficiently and fairly run for everyone. Very reluctantly, in order to maintain fairness and efficiency we will on occasion prematurely terminate jobs. The subsection Abnormal Job Termination lists common reasons for terminating a job by the staff.

Setting Appropriate Job Resource Limits

You should not, as a matter of practice, set resource levels for your job to maximal queue values unless you actually need to. Larger settings are harder to satisfy and, hence, will delay your job's execution on a busy system. This is particularly true when the resource is memory and/or the number of CPUs. Set job resource limits to the lowest possible level consistent with a successful completion. On this point, for example, you need to make sure that if you run commercial code, say, Gaussian, ABAQUS, or FLUENT, the native/internal resource limits which you specify for them and the resource limits you specify in the #PBS -l directive MUST match. If you need help in setting the latter, please contact the Help Desk for assistance.

Invalid Parallel Batch Jobs

Jobs requesting multiple cpus must use multiple cpus simultaneously from a single command. Running multiple independent commands in the background in a batch job script is NOT parallel processing and is not permitted. Just so there is NO misunderstanding, the following example constitutes an illustration of what is invalid parallel processing and therefore is NOT permitted.

command1 &
command2 &

Abnormal Job Termination

The SC staff reserves the right to terminate batch jobs when one or a combination of following effects occur:

  1. Use by your program of a larger number of cpus than its parallel efficiency warrants.
  2. Use by your program of a smaller number of cpus than that specified through PBS (-l ncpus=##). This is a particularly unacceptable practice since it results in wasting resources that they might otherwise be used by others. When you request, say, four cpus by setting -l ncpus=4 in the #PBS directives, PBS sets aside four cpu slots. It knows nothing about the actual number of cpus that your program will use.
  3. Use/abuse of a special access queue (e.g., xlt_xlm8, bench8) to run a job that could very well run in one of the common queues.
  4. Excessive I/O with large files, which in turn overwhelms memory due to excessive file caching.
  5. Any use of large amounts of disk and/or memory that causes a significant disruption to the smooth operation of the system.
  6. Use of fewer or larger number of cpus than the number specified through PBS (-l ncpus=##).
  7. Delayed file transfers with source or destination hosts that are remote.

Queued Jobs Not Executing

The batch system has limits on the total number of resources a user may use and the total number of jobs a user can run. Also, each queue has limits on the total number of jobs it can run and a limit on the number of jobs it can run per user.

You may find that there may be available resources (eg. cpus) but your job may still be queued because of one of these limits. Please use the 'qstat -s jobid' command to see why your job is still queued.

Jobs Using Files From the Tape Archive

If your job requires files from the tape archive (or a remote host), we recommend that you first manually copy these files from the archive to your, say, /scratch directory on k2 or /tmp directory on titan before you submit your job. The objective here is to avoid possible delays during batch processing.

Starving Jobs and Backfilling

Queued jobs may become "starved" when delayed by other long-running jobs for some time. If the batch system cannot schedule starving jobs due to a resource or queue limit, it will attempt to schedule other non-starving queued jobs given the available resources. This is known as backfilling.

Additional Information

More information about batch processing can be obtained from the following man pages: pbs, qsub, qstat, qdel, and pbs_resources.