Mathematical Libraries

Last update on Friday, 01-Feb-2008 15:54:13 CST.

Introduction to the Math Libraries on cosmos

Using numerical libraries is the easiest way of achieving a high performance in your application. On cosmos.tamu.edu there is a variety of libraries that have been designed and tuned specifically for the Altix/Itanium-2® architecture delivering considerable performance improvements over freely available or general purpose numerical codes.

Some of the libraries have been extended with OpenMP directives to provide parallel execution, so the benefits of multiprocessing can be easily obtained just by linking with the SMP library and setting the OMP_NUM_THREADS environmental variable to the number of processors required. Note however that SMP libraries have limited scalability on NUMA platforms like the Altix due to the unavoidable higher latencies of remote memory accesses. In general, this means that setting the number of threads to a value higher than 12/16 will not give further performance improvements, and it is even likely that it will increase the run time.

If you need a special purpose numerical library not listed here, you can email us your request and we will consider installing it on cosmos.tamu.edu.

Intel Math Kernel Library (MKL)

The Math Kernel Library for the Itanium architecture contains the following subroutines:

  • BLAS (Basic Linear Algebra Subprograms) : All the kernels from Level 1, 2 and 3 BLAS.
  • LAPACK : All the LAPACK version 3 routines.
  • FFT : Two interfaces are provided for Fast Fourier Transforms. The recommended interface is the Fortran90 DFT, and provides enough functionality and flexibility to cover most FFT needs.
  • Sparse solver : For sparse systems of equations.
  • Vector Math Library : Allows operation with certain functions (eg. tan) on whole vectors.
  • Vector Statistical Library : For random number generation of whole arrays.

Choosing a MKL version with modules

First, you will need to choose and load a MKL module. Run the "module avail" command to list all the available environment modules (including MKL). To load the latest MKL module:

$ module load mkl-latest

Please consult the modules page for more information about using modules for environment management.

Linking MKL with the Intel Compiler

After loading a mkl module, you need to link your program with the MKL libraries:

$ ifort program.f90 -lmkl_subset -lmkl -L$MKL_PATH

where subset refers to a specific part of the library e.g., -lmkl_lapack, -lmkl_solver. $MKL_PATH is defined by the corresponding mkl module that you loaded previously. Note that there is no separate subset for FFT routines. You may need to add "-lguide -lpthread" to take advantage of OpenMP threaded subroutines.

Additional Information

There is additional MKL documentation with detailed description of the library and examples of use. Product features and additional documentation from Intel.

SGI Scientific Computing Software Library (SCSL)

The Scientific Computing Software Library (SCSL; see man scsl) from SGI has been ported and optimized for the Itanium-2 architecture. It delivers performance similar and in some cases superior to the Intel MKL. The SCSL covers the following areas:

Available SCSL Routines

  • Signal Processing Routines
    • Fast-Fourier Transforms,
    • Convolution, and
    • Correlation.
    For more information see the intro_fft man pages on cosmos.tamu.edu.
  • Direct and Iterative Linear Equation Solvers

    Direct linear equation solvers for real and complex sparse systems with symmetric non-zero structure, and iterative solvers for real sparse systems with arbitrary structure. For more information see the intro_solvers man pages on cosmos.tamu.edu.

  • Basic Linear Algebra (BLAS) Routines
    • Level 1 BLAS : Vector-vector linear algebra subprograms. See intro_blas1.
    • Level 2 BLAS : Matrix-vector linear algebra subprograms. See intro_blas2.
    • Level 3 BLAS : Matrix-matris linear algebra subprograms. See intro_blas3.
  • LAPACK routines : All of LAPACK 3. See intro_lapack.
  • Parallel Random Number Generators : 64-bit thread-safe parallel random number generators. See srand64.
  • Distributed Shared Memory Routines : Including ScaLAPACK, Parallel BLAS (PBLAS) and BLACS. See SDSM below.

The SCSL routines can be linked and loaded by using the -lscs or the -lscs_mp options. To link with the SCSL library add the following flag when linking:

$ ifort program.f90 -lscs
$ ifort program.f90 -lscs_mp

The second option (-lscs_mp) gives you access to the OpenMP multi-processor (i.e., multi-threaded) version Of the SCSL library.

Note that you must use version 7.x or later of the Intel Compilers to link against the latest release (1.5.1) of SCSL on cosmos.tamu.edu.

Note: When linking to SCSL with -lscl, the default integer size is 4 bytes (32 bits). Another version of SCSL is available in which integers are 8 bytes (64 bits). This version allows the users access to larger memory sizes. It can be loaded by using the -lscs_i8 option or the -lscs_i8_mp options. A program can use only one of the two versions; 4-byte integer and 8-byte integer library calls cannot be mixed.

Additional Information

For further reference you can access the SCSL documentation resources at SGI.

SGI Scientific Computing Software Library routines for Distributed Shared Memory (SDSM)

The SGI Scientific Computing Software Library, for Distributed Shared Memory (SDSM) is the multi-processor version of SCSL. SDSM contains the following routines.

  • Basic Linear Algebra Communication Subprograms (BLACS). See intro_blacs.
  • Scalable LAPACK (SCALAPACK). See intro_scalapack.
  • Parallel BLAS) (PBLAS). See intro_scalapack.

The SDSM routines can be loaded by using the -lsdsm option when linking your application. The required scsl and mpi libraries will automatically be included. To link with the SDSM library add the following flags when linking:

$ ifort program.f90 -lsdsm
$ ifort program.f90 -lsdsm_mp

Linking with -lsdsm enables the distributed shared memory routines (e.g., SCALAPACK) and links with the SCSL library and the Message Passing Toolkit (SGI's implementation of MPI) as needed.

The second method of linking differs from the first in that PBLAS calls to BLAS routines will be made to the OpenMP parallel version of the library (libscs_mp.so). This will allow hybrid parallelism, which may reduce time to solution for some applications. Users of the hybrid approach are encouraged to carefully review the message passing toolkit (MPT) documentation (see man mpi) to determine optimal mechanisms for launching such hybrid jobs.

Note: When linking to SDSM with -lsdsm, the default integer size is 4 bytes (32 bits). There is currently no version of SDSM available with a default integer size of 8 bytes (64 bits).

Note that you must use the version 7.x or later of the Intel Compilers to link against the default version of SDSM and SCSL on cosmos.tamu.edu.