|
Batch, or batch processing, is the capability of running jobs outside of the
interactive login session. In this document, batch implies a complex subsystem
which provides for control over job scheduling and resource contention. On
k2, the batch system is part of the Portable Batch System (PBS). PBS
defines various queues, which are collections of ordered jobs lined up for
execution. The use of the term "queue" however does not imply the ordering is
"first in, first out." Each queue is defined as a set of attributes such as
queue name, queue priority, queue resource limits, and job count limits. The
batch system allows users to overcome resource limits imposed on interactive
(sometimes referred to as "command-line") processing and to evenly and
efficiently regulate the execution flow of jobs.
The interactive limit (CPU time) per login session on all systems is 20
minutes. Any violations of this limit will result in process termination. A
user may only use a maximum of two processors simultaneously for interactive
processing. A user is expected to lower this limit under heavy system loads.
Exceptions to this policy will be considered by the staff on a per case basis.
This limitation must be overcome by submitting the job in "batch mode" as
described below.
PBS Job Files
A PBS batch job script is a text file with PBS directives and Unix
commands. The PBS directives are always at the beginning of the file and are
specified in lines that start with the #PBS keyword and continue with other job
specifications. These typically describe the job's
characteristics (e.g. job name, job shell, etc.) and the resources (e.g., number of cpus, memory, etc.) it
needs. There are also several PBS environment
variables that you should be aware of.
The following is a sample batch job file for the PBS batch system on
k2:
#PBS -N myjob
#PBS -S /bin/tcsh
#PBS -j oe
#PBS -l cput=4:00:00
#PBS -l ncpus=2,mem=1gb
cd $TMPDIR
cp $PBS_O_WORKDIR/inputfile1 .
cp $PBS_O_WORKDIR/inputfile2 .
cp $PBS_O_WORKDIR/myprog .
./myprog
cp outputfile $PBS_O_WORKDIR
qstat -f $PBS_JOBID
|
The explanation of each line is listed below:
| #PBS -N myjob |
The name of the batch job will be myjob. |
| #PBS -S /bin/tcsh |
The bash shell will be used to interpret the batch job script. |
| #PBS -j oe |
The standard output and error streams will be merged into
the standard output stream file. The standard output stream
will be implicitly stored in $PBS_O_WORKDIR/jobname.oNNN where
jobname is the name of the job and NNN is job identifier. |
#PBS -l cput=4:00:00
#PBS -l ncpus=2,mem=1gb |
This job requests 4 hours of cpu time, 2 cpus, and 1 GB of physical memory. |
| cd $TMPDIR |
Make $TMPDIR the job's working directory. |
cp $PBS_O_WORKDIR/inputfile1 .
cp $PBS_O_WORKDIR/inputfile2 .
cp $PBS_O_WORKDIR/myprog . |
Copy files to be used for the job from the job submission directory,
$PBS_O_WORKDIR, directory to the $TMPDIR directory. |
| ./myprog |
Execute the program myprog. |
| cp outputfile $PBS_O_WORKDIR |
Copy the output file generated by the execution of myprog
to the $PBS_O_WORKDIR directory. |
| qstat -f $PBS_JOBID |
Prints job information and statistics. You should
examine the resources_used lines to understand the resource usage of
your job. |
The number of cpus specified for a job in a #PBS directive (-l ncpus=##)
MUST be the same as that specified for the running of a program through
its interface. Specifically, for MPI program the -np parameter of the mpirun
command must be set equal to the value of ncpus above. Similarly, for OpenMP
programs the value of the $OMP_NUM_THREADS environment variable must be
set to the same value as ncpus. This requirement also applies for commercial
application programs, such as Gaussian and ABAQUS. Two sample batch job files
below illustrate the point.
Sample Batch Job File for Gaussian
#PBS -N sample -j oe
#PBS -S /bin/tcsh
#PBS -l cput=10:00:00,mem=500mb,ncpus=2
# Initialize environment
setenv g03root /usr/local/g03
source $g03root/g03/bsd/g03.login
set echo # Show issue commands in output
# Copy input files to $TMPDIR
cp sample.com $TMPDIR
# Run Gaussian 03
cd $TMPDIR
g03 < sample.com
# Copy output file to home directory
cp sample.log $HOME
# Get CPU time and other info about job
qstat -f $PBS_JOBID
|
The Gaussian input and/or the Default.route file must specify the same number
of cpus as the PBS ncpus argument. The job output will goto sample.oNNN where
NNN is the job ID.
Sample Batch Job for ABAQUS
#PBS -N test_axi1 -S /bin/tcsh -j oe
#PBS -l ncpus=1,cput=22:00:00,mem=500mb,vmem=5gb
cd $TMPDIR
cp $PBS_O_WORKDIR/axi1.inp .
abaqus job=test_axi1 cpus=1 input=axi1.inp
cp test_axi1.* $PBS_O_WORKDIR
|
The ABAQUS ncpus argument must match the PBS ncpus argument. The job output
will goto test_axi1.oNNN where NNN is the job ID.
Job Submission: The qsub command
Use the qsub command to submit a job as shown below:
One of the first things that happen when a job is submitted
is the assigning of a unique job id to it by PBS.
Job Submission Options
A list of the more commonly useful options for submitting batch
jobs is listed below:
| -e path |
Defines the path to be used for the standard error stream of
the batch job. |
| -j join |
A join argument oe directs the merging of the
standard out and standard error streams into the standard out.
A join with eo merges the two streams into standard error,
If the join argument is n or the option is not specified,
the two streams will be two separate files.
|
| -l resource_list |
Specifies resources and associated maximal levels of use by the job.
Commonly used resources are ncpus, cputime, walltime, mem, vmem, and file.
Resources that are not explicitly specified will cause the assumption of default
values that are in effect for each queue. Additional sources of information
here are the listings of the qlimit command, the qstat -Qf queue
command and the pbs_resources
man page. |
| -m mail_options |
Specifies which conditions under which the server will send an
email message about the job. |
| -N name |
Declares a name for the job. |
| -o path |
Defines the path to be used for the standard output stream of
the batch job. |
| -S shell |
Declares the shell that interprets the job script. We strongly
recommend that you use the bash shell. |
| -v variable_list |
Any environment variables specified in this list will be exported
from the qsub command's environment to the job's environment. |
| -V |
All environment variables will be exported from the qsub command's
environment to the job's environment. We recommend that you
use the -v varlist option to import only the necessary environment
variables. |
Queue Structure
A queue is a software structure through which PBS manages the processing of
jobs. Batch queues are defined by a number of parameters of which the most
important are resource limits. There are several such "execution" queues from
which PBS schedules jobs for execution. Jobs are routed to the appropriate
queue based upon, for the most part, a job's resource limit specifications.
Some queues can be used only by special permission. These are generally the
high-cpu queues, bench8, bench16, bench32, bench48, but xlt_xlm8 and special
belong in this category (of special permission) as well. The special-access
queues must be used only for jobs that match the queues special
characteristics. You can the output of the qlimit command at this
link, which is updated every five minutes.
PBS Resources
The following resources are the more commonly used in the PBS batch
system on k2. Additional sources of information are the listings
of the qlimit command, the qstat -Qf queue command and the
pbs_resources man page.
| WALLTIME |
Maximum amount of wall-clock time duration for the job within the system since
the beginning of execution. The format is hh:mm:ss. |
| CPUTIME |
Maximum amount of cpu time that the job can consume. The format
is hh:mm:ss. ALL jobs
should specify cpu time. Failure to specify cpu time will
cause PBS to assign a job just 2 hours, the default value. |
| MEM |
Maximum amount of physical resident memory that the job can occupy. |
| PMEM |
Maximum amount of physical resident memory any process can occupy
belonging to the job. |
| VMEM |
Maximum virtual memory per job. |
| PVMEM |
Maximum virtual memory per process in the job. |
| NCPUS |
Maximum number of cpus allowed per job. |
| FILE |
Maximum size a file can attain per job. |
| MAXR |
Maximum number of jobs that can be executing concurrently in a given queue. |
| USERR |
Maximum number of jobs a user may run concurrently in a given queue. |
Job Monitoring and PBS commands
The following commands are for common tasks involving the PBS batch
system on k2 and titan. More information about batch processing can be obtained
from the following man pages: pbs, qsub, qstat, and qdel.
| Submit a job |
qsub jobfile |
| Show running jobs |
qstat -r |
| Show running jobs and when they began executing |
qstat -rs |
| Show the jobs that are not running |
qstat -i |
| Show the jobs that are not running and why they are not running |
qstat -is |
| Show the status of all the queues |
qstat -q |
| Show which queues you have access to |
qaccess |
| Show detailed information for a given job |
qstat -f jobid |
| Show detailed information for all queues or a specific queue |
qstat -Qf [queue_name] |
| Show all jobs |
qstat -a |
| Show all jobs for a given user |
qstat -u user |
| Show the processes under a given running job |
p_qstat jobid |
| Show the status of the batch system in a manner like top. Has a built-in
help screen for available commands. |
bmonitor |
| Delete a given job |
qdel jobid |
| Shows the job and queue limits of various execution queues |
qlimit |
| Find all jobs over the last N days for a given user |
findjobs -n N -u username |
| Show the job history over the last N days for a given job. Format
the output to 80 columns. Note, the -w flag is necessary when the output
is sent to a pipe or a file. |
tracejob -n N -w 80 jobid |
Job Accounting
The qstat -f jobid command
The qstat -f provides detailed information about a job. Some notable fields
in the output are resources_used, queue, qtime, comment, and etime.
k2% qstat -f 80326
Job Id: 80326.k2
Job_Name = Ir-OL-start
Job_Owner = gooduser@k2.tamu.edu
resources_used.cpupercent = 394
resources_used.cput = 31:44:48
resources_used.mem = 757008kb
resources_used.ncpus = 4
resources_used.vmem = 1665776kb
resources_used.walltime = 08:16:22
job_state = R
queue = mt_mm
server = k2
Checkpoint = u
ctime = Thu Aug 26 11:37:11 2004
Error_Path = k2.tamu.edu:/home/gooduser/errfile
exec_host = k2/0*4
Hold_Types = n
Join_Path = oe
Keep_Files = n
Mail_Points = a
mtime = Fri Aug 27 01:54:51 2004
Output_Path = k2:/home/gooduser/outfile
Priority = 0
qtime = Thu Aug 26 11:37:11 2004
Rerunable = True
Resource_List.cput = 50:00:00
Resource_List.file = 20gb
Resource_List.mem = 2gb
Resource_List.ncpus = 4
Resource_List.nice = 15
Resource_List.pcput = 50:00:00
Resource_List.pmem = 1gb
Resource_List.pvmem = 128gb
Resource_List.vmem = 128gb
session_id = 2451749
substate = 42
Variable_List = PBS_O_HOME=/home/gooduser,PBS_O_LOGNAME=gooduser,
PBS_O_PATH=/usr/sbin:/usr/bsd:/sbin:/usr/bin:/usr/bin/X11:.:/usr/local
/bin:/usr/freeware/bin:.:/usr/local/bin:/usr/free
ware/bin:,PBS_O_MAIL=/usr/mail/gooduser,
PBS_O_SHELL=/bin/tcsh,PBS_O_TZ=CST6CDT,PBS_O_HOST=k2.tamu.edu,
PBS_O_WORKDIR=/home/gooduser/,PBS_O_SYSTEM=IRIX64,
PBS_O_QUEUE=shared
euser = gooduser
egroup = user
hashname = 80326.k2
queue_rank = 69374
queue_type = E
comment = Job run on node k2 - at Fri Aug 27 at 01:54
alt_id = jid=0x4ca7000000006aae,ash=0x4ca7ffff000079b8
etime = Thu Aug 26 11:37:11 2004
run_count = 1
|
Common PBS Environment Variables
| $PBS_O_WORKDIR |
The absolute path from which the job was originally submitted from. |
| $PBS_JOBID |
The job identifier assigned to the job by the batch system. The job
identifier will typically be nnn.k2 where nnn is a positive non-zero
integer. |
| $PBS_JOBNAME |
The job name supplied by the user. |
| $PBS_QUEUE |
The name of the queue from which the job is executed. |
| $TMPDIR |
A job's default working directory is $HOME. That is frequently undesirable
because of space limitations and lower I/O performance. Going to $TMPDIR
(=/work/$PBS_JOBID), which is created at a job's start and deleted at its end,
affords a large disk area and, typically, better I/O performance. You must
explicitly save any files you need before job completion. Preferably,
in batch jobs you should save such files on local disk areas, such as
/scratch/$USER or $HOME. File transfers in a batch job involving the tape
archive or a remote host should be strongly avoided because of the possible
long delays they can cause. |
Policies and Best Practices
Batch system policies are approved by the Steering Committee, review@sc.tamu.edu, and may on occassion
change to reflect changing needs and load conditions. Your adherence to what
we say below will be appreciated. What we aim at is to convince you that a
little care on your part in doing certain things right will go a long way to
keep k2 and titan efficiently and fairly run for everyone. Very reluctantly, in
order to maintain fairness and efficiency we will on occasion prematurely
terminate jobs. The subsection Abnormal Job
Termination lists common reasons for terminating a job by the staff.
Setting Appropriate Job Resource Limits
You should not, as a matter of practice, set resource levels for your job to
maximal queue values unless you actually need to. Larger settings are harder
to satisfy and, hence, will delay your job's execution on a busy system. This
is particularly true when the resource is memory and/or the number of CPUs. Set
job resource limits to the lowest possible level consistent with a successful
completion. On this point, for example, you need to make sure that if you run
commercial code, say, Gaussian, ABAQUS, or FLUENT, the native/internal resource
limits which you specify for them and the resource limits you specify in the
#PBS -l directive MUST match. If you need help in setting the latter,
please contact the Help Desk for assistance.
Invalid Parallel Batch Jobs
Jobs requesting multiple cpus must use multiple cpus simultaneously from a
single command. Running multiple independent commands in the background in a
batch job script is NOT parallel processing and is not permitted. Just
so there is NO misunderstanding, the following example constitutes an
illustration of what is invalid parallel processing and therefore is NOT
permitted.
Abnormal Job Termination
The SC staff reserves the right to terminate batch jobs when one or a
combination of following effects occur:
- Use by your program of a larger number of cpus than its parallel
efficiency warrants.
- Use by your program of a smaller number of cpus than that specified
through PBS (-l ncpus=##). This is a particularly unacceptable practice
since it results in wasting resources that they might otherwise be used
by others. When you request, say, four cpus by setting -l ncpus=4 in the
#PBS directives, PBS sets aside four cpu slots. It knows nothing about
the actual number of cpus that your program will use.
- Use/abuse of a special access queue (e.g., xlt_xlm8, bench8) to run a job
that could very well run in one of the common queues.
- Excessive I/O with large files, which in turn overwhelms memory
due to excessive file caching.
- Any use of large amounts of disk and/or memory that causes a significant
disruption to the smooth operation of the system.
- Use of fewer or larger number of cpus than the number specified
through PBS (-l ncpus=##).
- Delayed file transfers with source or destination hosts that are remote.
Queued Jobs Not Executing
The batch system has limits on the total number of resources a user may
use and the total number of jobs a user can run. Also, each queue has limits
on the total number of jobs it can run and a limit on the number of jobs it
can run per user.
You may find that there may be available resources (eg. cpus) but your job
may still be queued because of one of these limits. Please use the
'qstat -s jobid' command to see why your job is still queued.
Jobs Using Files From the Tape Archive
If your job requires files from the tape archive (or a remote host), we
recommend that you first manually copy these files from the archive to your,
say, /scratch directory on k2 or /tmp directory on titan before you submit your
job. The objective here is to avoid possible delays during batch processing.
Starving Jobs and Backfilling
Queued jobs may become "starved" when delayed by other long-running jobs
for some time. If the batch system cannot schedule starving
jobs due to a resource or queue limit, it will attempt to schedule other
non-starving queued jobs given the available resources. This is known as
backfilling.
Additional Information
More information about batch processing can be obtained from the
following man pages: pbs, qsub, qstat, qdel, and pbs_resources.
|