Online Book Reader

Home Category

Choose a category
All
Classic-Fiction

Beautiful Code [194]

By Root 7562 0

so the matrix algorithms had to follow yet again as well. It was quickly discovered that good local performance has to be combined with good global partitioning of the matrices and vectors.

Any trivial divisions of matrix data quickly uncovered scalability problems dictated by so-called Amdahl's Law: the observation that the time taken by the sequential portion of a computation provides the minimum bound for the entire execution time, and therefore limits the gains achievable from parallel processing. In other words, unless most computations can be done independently, the point of diminishing returns is reached, and adding more processors to the hardware mix will not result in faster processing.

For the sake of simplicity, the class of multi-core architectures includes both symmetric multiprocessing (SMP) and single-chip multi-core machines. This is probably an unfair simplification, as the SMP machines usually have better memory systems. But when applied to matrix algorithms, both yield good performance results with very similar algorithmic approaches: these combine local cache reuse and independent computation with explicit control of data dependences.

How Elegant Code Evolves with Hardware The Case of Gaussian Elimination > A Decompositional Approach

14.2. A Decompositional Approach

At the basis of solutions to dense linear systems lies a decompositional approach. The general idea is the following: given a problem involving a matrix A, one factors or decomposes A into a product of simpler matrices from which the problem can easily be solved. This divides the computational problem into two parts: first determine an appropriate decomposition, and then use it in solving the problem at hand.

Consider the problem of solving the linear system:

Ax = b

where A is a nonsingular matrix of order n. The decompositional approach begins with the observation that it is possible to factor A in the form:

A = LU

where L is a lower triangular matrix (a matrix that has only zeros above the diagonal) with ones on the diagonal, and U is upper triangular (with only zeros below the diagonal). During the decomposition process, diagonal elements of A (called pivots) are used to divide the elements below the diagonal. If matrix A has a zero pivot, the process will break with division-by-zero error. Also, small values of the pivots excessively amplify the numerical errors of the process. So for numerical stability, the method needs to interchange rows of the matrix or make sure pivots are as large (in absolute value) as possible. This observation leads to a row permutation matrix P and modifies the factored form to:

PTA = LU

The solution can then be written in the form:

x = A-1Pb

which then suggests the following algorithm for solving the system of equations:

Factor A

Solve the system Ly = Pb

Solve the system Ux = y

This approach to matrix computations through decomposition has proven very useful for several reasons. First, the approach separates the computation into two stages: the computation of a decomposition, followed by the use of the decomposition to solve the problem at hand. This can be important, for example, if different right hand sides are present and need to be solved at different points in the process. The matrix needs to be factored only once and reused for the different righthand sides. This is particularly important because the factorization of A, step 1, requires O(n3) operations, whereas the solutions, steps 2 and 3, require only O(n2) operations. Another aspect of the algorithm's strength is in storage: the L and U factors do not require extra storage, but can take over the space occupied initially by A.

For the discussion of coding this algorithm, we present only the computationally intensive part of the process, which is step 1, the factorization of the matrix.

How Elegant Code Evolves with Hardware The Case of Gaussian Elimination > A Simple Version

14.3. A Simple Version

For the first version, we present a straightforward implementation of LU factorization. It consists

Online Book Reader

Beautiful Code [194]

®Online Book Reader