Assignment 3: Linear Algebra HPC libraries

Krister Dackland, Erik Elmroth, Robert Granat and Bo Kågström

In this assignment you will practice and learn how to use the ScaLAPACK library.
You will develop a program that solves an overdetermined system by computing the least square solution.
You will implement the solution using Fortran 77 (take some time to learn it), measure the performance, and check the correctness of the computed components.

Notice: The assignment will be performed as a common exercise together with one or two of the teachers at Monday 23rd of April. More information about time and location will come.

Remember to use the account TDBD08-VT07 in your submit file. But don't try to allocate more than 64 nodes in your submit-file.

Assignment 3.1: Introduction to Seth at HPC2N
Log on, Compile, and Run a ScaLAPACK program - if you know how to do this, skip this section and go on to 3.2

log on to Seth at HPC2N
make a directory (make sure you are at /kfs$HOME): mkdir ScaLAPACK

change directory: cd ScaLAPACK
download a makefile: cp ~granat/Public/ScaLAPACK/myfirst/Makefile .
download the source code: cp ~granat/Public/ScaLAPACK/myfirst/myfirst.f .
download the submit file (you might need to change it to suit your directories): cp ~granat/Public/ScaLAPACK/myfirst/submit .

make the executable: make
run the program: qsub submit
check the batch queue: showq | more
the result shows up in your PBS_O_WORKDIR (that is from where you submitted the job).

Assignment 3.2: Write a main program that initialize ScaLAPACK
Write a program TryScaLAPACK (here is a template Fortran77 code with make- and submitfiles), that perform the following tasks:

initialize BLACS, the BLACS QRef
node 0 reads an input file containing three matrix dimension (M, N, K) and a block size (NB)
node 0 distributes the information from the input file
all nodes init a descriptor of a matrix A of size MxN (M > N) use ScaLAPACK routine DESCINIT
all nodes init a descriptor of a matrix B of size MxK (M > K) use ScaLAPACK routineDESCINIT
all nodes generate their part of the distributed random matrices A and B, use FORTRAN routine DRAND48()

·                DOUBLE PRECISION DRAND48, X

·                EXTERNAL         DRAND48

·                X = DRAND48()

To compute the local number of rows/columns of a distributed matrix use for example ScaLAPACK tool routine numroc.f
release the process grid and terminate BLACS
For further information about the descriptor and other related issues study this

Assignment 3.3: Solve the overdetermined system AX = B
Assumptions: A is MxN, B is MxK, where K is the number of right hand sides.
Therefore, the requested solution X is NxK.

One way to solve an overdetermined system is to compute the least squares solution.
That is, to find the solution that minimizes ||AX - B|| (2-norm).

Perform the following steps to solve the overdetermined system:

QR factorize the A matrix -> Q matrix of size MxM and R upper-trapezoidal matrix of size MxN with non-zero elements only in the top NxN submatrix (Use The ScaLAPACK Users Guide, SLUG or the list of double scalapack routines to find an appropriate QR factorization routine).
Apply Q' onto B (Use PDORMQR).
Now you may use the PBLAS routine PDTRSM to compute X <- inv(R)*B, store X in the NxN top of B.

Assignment 3.4: Write your own "ScaLAPACK" routine
Write a routine PDFROB that compute the frobenius norm (F-norm) of a matrix A.
frobenius norm = sqrt(sum(abs(a(i,j)^2))), i = 1..M, j = 1..N

Assignment 3.5: Check result and measure performance
Use all building blocks from earlier assignments to:

measure the performance of the QR-factorization routine in Mflops for different matrix and grid sizes (#flops = 4N^3/3),
use your PDFROB and the PBLAS routine PDGEMM to compute the residual norm AX - B (we use the F-norm since it is easy to compute),
report the performance and the residual norms for the different grid and problem sizes.

Tips, Tricks and Links

ScaLAPACK Users Guide (SLUG)
Print a distributed matrix: PDLAPRNT
MPI routines
double scalapack routines
PDORMQR
numroc.f
infog2l.f
dlassq.f
A Clock: MPI_WTIME()
Mixing C/C++ and Fortran77 routines
A Fortran77 tutorial

HPC2N and Department of Computing Science

Umeå University, S-901 87 Umeå, Sweden
Email: larsk@cs.umu.se
Last updated 060314 by Lars Karlsson