Assignment 3: Linear Algebra
HPC libraries
Krister Dackland, Erik Elmroth,
Robert Granat and Bo Kågström
In this
assignment you will practice and learn how to use the ScaLAPACK library.
You will develop a program that solves an overdetermined system by
computing the least square solution.
You will implement the solution using Fortran 77 (take some time
to learn it), measure the performance, and check the correctness of the
computed components.
Notice:
The assignment will be performed as a common exercise
together with one or two of the teachers at Monday 23rd of April. More
information about time and location will come.
Remember to use the account
TDBD08-VT07 in your submit file. But don't try to allocate more than 64
nodes in your submit-file.
Assignment 3.1: Introduction to Seth at
HPC2N
Log on, Compile, and Run a ScaLAPACK program - if you know how to do
this, skip this section and go on to 3.2
- log on to Seth at HPC2N
- make a
directory (make sure you are at /kfs$HOME): mkdir
ScaLAPACK
- change directory: cd ScaLAPACK
- download
a makefile: cp
~granat/Public/ScaLAPACK/myfirst/Makefile .
- download
the source code: cp
~granat/Public/ScaLAPACK/myfirst/myfirst.f .
- download
the submit file (you might need to change it to suit your directories): cp ~granat/Public/ScaLAPACK/myfirst/submit .
- make the executable: make
- run the program: qsub submit
- check the
batch queue: showq | more
- the result shows up in
your PBS_O_WORKDIR (that is from where you submitted the job).
Assignment 3.2: Write a main program
that initialize ScaLAPACK
Write a program TryScaLAPACK (here is a template Fortran77 code with make-
and submitfiles), that perform the following tasks:
- initialize BLACS,
the BLACS QRef
- node 0 reads an input file containing
three matrix dimension (M, N, K) and a block size (NB)
- node 0 distributes the
information from the input file
- all nodes init a
descriptor of a matrix A of size MxN (M > N) use ScaLAPACK routine DESCINIT
- all nodes init a
descriptor of a matrix B of size MxK (M > K) use ScaLAPACK routineDESCINIT
- all nodes generate their
part of the distributed random matrices A and B, use FORTRAN
routine DRAND48()
· DOUBLE PRECISION DRAND48, X
· EXTERNAL DRAND48
· X = DRAND48()
- To compute the local
number of rows/columns of a distributed matrix use for example ScaLAPACK
tool routine numroc.f
- release the process grid
and terminate BLACS
- For further information
about the descriptor and other related issues study this
Assignment
3.3: Solve the overdetermined
system AX = B
Assumptions: A is MxN, B is MxK, where K is
the number of right hand sides.
Therefore, the requested solution X is NxK.
One way to solve an overdetermined
system is to compute the least squares solution.
That is, to find the solution that minimizes ||AX - B|| (2-norm).
Perform the following steps to solve the
overdetermined system:
- QR factorize the A
matrix -> Q matrix of size MxM and R
upper-trapezoidal matrix of size MxN with non-zero elements only in the
top NxN submatrix (Use The ScaLAPACK Users Guide, SLUG or the list
of double scalapack routines to find an
appropriate QR factorization routine).
- Apply Q' onto B (Use PDORMQR).
- Now you may use the PBLAS routine PDTRSM to
compute X <- inv(R)*B, store X in the NxN top of B.
Assignment
3.4: Write your own "ScaLAPACK" routine
Write a routine PDFROB that compute
the frobenius norm (F-norm) of a matrix A.
frobenius norm = sqrt(sum(abs(a(i,j)^2))), i = 1..M, j = 1..N
Assignment 3.5: Check
result and measure performance
Use all building blocks from earlier assignments to:
- measure the performance
of the QR-factorization routine in Mflops for different matrix and grid
sizes (#flops = 4N^3/3),
- use your PDFROB and the PBLAS routine PDGEMM to
compute the residual norm AX - B (we use the F-norm since it is easy to
compute),
- report the
performance and the residual norms for the different grid and problem
sizes.
Tips, Tricks and Links
HPC2N and Department
of Computing Science
Umeå
University,
S-901 87 Umeå, Sweden
Email: larsk@cs.umu.se
Last updated 060314 by Lars Karlsson