Texas A&M Supercomputing Facility Texas A&M University Texas A&M Supercomputing Facility

Debugging

This subsection of necessity contains a summary account on core debugging. The "General Programming Concepts: Writing and Debugging Programs" guide from IBM is another good resource for these and other topics.

dbx, pdbx

The serial command-line IBM debugger is called dbx. The parallel command-line IBM debugger is called pdbx.

Debugging in Batch Jobs

If the problem only arises after the program runs for a long time, or in a case where large amounts of memory is used, interactive debugging does not remain possible. In this case, you wish to capture the core file and run the debugger in a batch job.

Saving the Core File

To save the core file, check for existence of core file once your program terminates.

Example Job File (captures the core file)

#@ shell                = /bin/ksh
#@ comment              = Core Debug
#@ initialdir           = $(home)/tests/project1/
#@ job_name             = progDebug
#@ error                = $(job_name).o$(schedd_host).$(jobid).$(stepid)
#@ output               = $(job_name).o$(schedd_host).$(jobid).$(stepid)
#@ resources            = ConsumableCpus(1) ConsumableMemory(500mb)
# Specify 50 minutes of wallclock time for the duration of the job
#@ wall_clock_limit     = 00:50:00
#@ node                 = 1
#@ tasks_per_node       = 1
#@ notification         = always
#@ queue
cd $TMPDIR
cp $HOME/prog.exe .
./prog.exe
if [ -f core ] ; then
  cp core $HOME
fi
cp prog.out $HOME

Running the Debugger in Batch Mode

Create a debugger command file (call "dbx.commands")

Example "dbx.commands" File

  where
  dump .
  print xsect, temp, pres, vel, coeff
  quit

The batch job should be similar to the following

#@ shell                = /bin/ksh
#@ comment              = Core Debug
#@ initialdir           = $(home)/tests/project1/
#@ job_name             = progDebug
#@ error                = $(job_name).o$(schedd_host).$(jobid).$(stepid)
#@ output               = $(job_name).o$(schedd_host).$(jobid).$(stepid)
#@ resources            = ConsumableCpus(1) ConsumableMemory(500mb)
# Specify 50 minutes of wallclock time for the duration of the job
#@ wall_clock_limit     = 00:50:00
#@ node                 = 1
#@ tasks_per_node       = 1
#@ notification         = always
#@ queue
cd $TMPDIR
cp $HOME/prog.exe $HOME/core $HOME/dbx.commands .
dbx -c dbx.commands prog.exe core

The job's output file (debug.out) will contain the results of the "where" and "dump" commands given to "dbx". As the user finds more information, he can add more commands in "dbx.commands" file to pin-point the cause of the error.

Trapping XL Fortran Floating Point Exceptions

Click here for HOWTO trap XL Fortran Floating Point Exceptions