Online Book Reader

Home Category

Beautiful Code [196]

By Root 5023 0
standardize vector operations for use in scientific computations. The idea was to define some simple, frequently used operations and implement them on various systems to achieve portability and efficiency. This package came to be known as the Level-1 Basic Linear Algebra Subprograms (BLAS) or Level-1 BLAS.

The term Level-1 denotes vector-vector operations. As we will see, Level-2 (matrix-vector operations), and Level-3 (matrix-matrix operations) play important roles as well.

In the 1970s, the algorithms of dense linear algebra were implemented in a systematic way by the LINPACK project. LINPACK is a collection of Fortran subroutines that analyze and solve linear equations and linear least-squares problems. The package solves linear systems whose matrices are general, banded, symmetric indefinite, symmetric positive definite, triangular, and tridiagonal square. In addition, the package computes the QR and singular value decompositions of rectangular matrices and applies them to least-squares problems.

LINPACK uses column-oriented algorithms, which increase efficiency by preserving locality of reference. By column orientation, we mean that the LINPACK code always references arrays down columns, not across rows. This is important since Fortran stores arrays in column-major order. This means that as one proceeds down a column of an array, the memory references proceed sequentially through memory. Thus, if a program references an item in a particular block, the next reference is likely to be in the same block.

The software in LINPACK was kept machine-independent partly through the introduction of the Level-1 BLAS routines. Almost all of the computation was done by calling Level-1 BLAS. For each machine, the set of Level-1 BLAS would be implemented in a machine-specific manner to obtain high performance.

Example 14-2 shows the LINPACK implementation of factorization.

Example 14-2. LINPACK variant (Fortran 66 coding)

Code View: Scroll / Show All

subroutine dgefa(a,lda,n,ipvt,info)

integer lda,n,ipvt(1),info

double precision a(lda,1)

double precision t

integer idamax,j,k,kp1,l,nm1

c

c

c gaussian elimination with partial pivoting

c

info = 0

nm1 = n - 1

if (nm1 .lt. 1) go to 70

do 60 k = 1, nm1

kp1 = k + 1

c

c find l = pivot index

c

l = idamax(n-k+1,a(k,k),1) + k - 1

ipvt(k) = l

c

c zero pivot implies this column already triangularized

c

if (a(l,k) .eq. 0.0d0) go to 40

c

c interchange if necessary

c

if (l .eq. k) go to 10

t = a(l,k)

a(l,k) = a(k,k)

a(k,k) = t

10 continue

c

c compute multipliers

c

t = -1.0d0/a(k,k)

call dscal(n-k,t,a(k+1,k),1)

c

c row elimination with column indexing

c

do 30 j = kp1, n

t = a(l,j)

if (l .eq. k) go to 20

a(l,j) = a(k,j)

a(k,j)= t

20 continue

call daxpy(n-k,t,a(k+1,k),1,a(k+1,j),1)

30 continue

go to 50

40 continue

info = k

50 continue

60 continue

70 continue

ipvt(n) = n

if (a(n,n) .eq. 0.0d0) info = n

return

end

The Level-1 BLAS subroutines DAXPY, DSCAL, and IDAMAX are used in the routine DGEFA. The main difference between Example 14-1 and Example 14-2 (other than the programming language and the interchange of loop indexes) is the use of routine DAXPY to encode the inner loop of the method.

It was presumed that the BLAS operations would be implemented in an efficient, machine-specific way suitable for the computer on which the subroutines were executed. On a vector computer, this could translate into a simple, single vector operation. This would avoid leaving the optimization up to the compiler and explicitly exposing a performance-critical operation.

In a sense, then, the beauty of the original code was regained with the use of a new vocabulary to describe the algorithms: the BLAS. Over time, the BLAS became a widely adopted standard and were most likely the first to enforce two key aspects of software: modularity and portability. Again, these are taken for granted today, but at the time they were not. One could have the cake of compact algorithm representation and eat it, too, because the resulting Fortran code was portable.

Return Main Page Previous Page Next Page

®Online Book Reader