Online Book Reader

Home Category

Choose a category
All
Classic-Fiction

Beautiful Code [199]

By Root 7365 0

which was the first Fortran standard to allow recursive subroutines. A side effect of using Fortran 90 was the increased importance of the LDA parameter, the leading dimension of A. It allows more flexible use of the subroutine, as well as performance tuning for cases when matrix dimension m would cause memory bank conflicts that could significantly reduce available memory bandwidth.

The Fortran 90 compilers use the LDA parameter to avoid copying the data into a contiguous buffer when calling external routines, such as one of the BLAS. Without LDA, the compiler has to assume the worst-case scenario when input matrix a is not contiguous and needs to be copied to a temporary contiguous buffer so the call to BLAS does not end up with an out-of-bands memory access. With LDA, the compiler passes array pointers to BLAS without any copies.

Example 14-4 shows recursive LU factorization.

Example 14-4. Recursive variant (Fortran 90 coding)

Code View: Scroll / Show All

recursive subroutine rdgetrf(m, n, a, lda, ipiv, info)

implicit none

integer, intent(in) :: m, n, lda

double precision, intent(inout) :: a(lda,*)

integer, intent(out) :: ipiv(*)

integer, intent(out) :: info

integer :: mn, nleft, nright, i

double precision :: tmp

double precision :: pone, negone, zero

parameter (pone=1.0d0)

parameter (negone=-1.0d0)

parameter (zero=0.0d0)

intrinsic min

integer idamax

external dgemm, dtrsm, dlaswp, idamax, dscal

mn = min(m, n)

if (mn .gt. 1) then

nleft = mn / 2

nright = n - nleft

call rdgetrf(m, nleft, a, lda, ipiv, info)

if (info .ne. 0) return

call dlaswp(nright, a(1, nleft+1), lda, 1, nleft, ipiv, 1)

call dtrsm('L', 'L', 'N', 'U', nleft, nright, pone, a, lda,

$ a(1, nleft+1), lda)

call dgemm('N', 'N', m-nleft, nright, nleft, negone,

$ a(nleft+1,1) , lda, a(1, nleft+1), lda, pone,

$ a(nleft+1, nleft+1), lda)

call rdgetrf(m - nleft, nright, a(nleft+1, nleft+1), lda,

$ ipiv(nleft+1), info)

if (info .ne. 0) then

info = info + nleft

return

end if

do i =nleft+1, m

ipiv(i) = ipiv(i) + nleft

end do

call dlaswp(nleft, a, lda, nleft+1, mn, ipiv, 1)

else if (mn .eq. 1) then

i = idamax(m, a, 1)

ipiv(1) = i

tmp = a(i, 1)

if (tmp .ne. zero .and. tmp .ne. -zero) then

call dscal(m, pone/tmp, a, 1)

a(i,1) = a(1,1)

a(1,1) = tmp

else

info = 1

end if

return

end

There is a certain degree of elegance in the recursive variant. No loops are exposed in the routine. Instead, the algorithm is driven by the recursive nature of the method (see the paper by F. G. Gustavson listed under "Further Reading").

The Recursive LU Algorithm consists of four basic steps, illustrated in Figure 14-2:

Split the matrix into two rectangles (m * n/2); if the left part ends up being only a single column, scale it by the reciprocal of the pivot and return.

Apply the LU algorithm to the left part.

Apply transformations to the right part (perform the triangular solve A12 =L-1A12 and matrix multiplication A22=A22–A21*A12).

Apply the LU algorithm to the right part.

Figure 14-2. Recursive LU factorization

Most of the work is performed in the matrix multiplications, which operate on successive matrices of size n/2, n/4, n/8, etc. The implementation in Example 14-4 can show about a 10 percent improvement in performance over the LAPACK implementation given in Example 14-3.

In a sense, any of the previous renditions of the LU algorithm could be considered a step backward in terms of code elegance. But divide-and-conquer recursion was a tremendous leap forward (even dismissing the modest performance gains). The recursive algorithm for matrix factorization can now be taught to students alongside other recursive algorithms, such as various kinds of sorting methods.

By changing just the size of matrix parts, it is possible to achieve the same memory access pattern as in LINPACK or LAPACK. Setting nleft to 1 makes the code operate on vectors, just as in LINPACK, whereas setting nleft to NB>1 makes it behave like LAPACK's blocked code. In both cases, the original recursion

Online Book Reader

Beautiful Code [199]

®Online Book Reader