Online Book Reader

Home Category

Beautiful Code [200]

By Root 5083 0
deteriorates from divide-and-conquer to the tail kind. The behavior of such variations of the recursive algorithm can be studied alongside a Quicksort algorithm with various partitioning schemes of the sorted array.

Finally, we leave as an exercise to the reader to try to mimic the recursive code without using recursion and without explicitly handling the recursive call stack—an important problem to solve if the Fortran compiler cannot handle recursive functions or subroutines.

How Elegant Code Evolves with Hardware The Case of Gaussian Elimination > ScaLAPACK PDGETRF

14.7. ScaLAPACK PDGETRF

LAPACK is designed to be highly efficient on vector processors, high-performance "super-scalar" workstations, and shared-memory multiprocessors. LAPACK can also be used sat-isfactorily on all types of scalar machines (PCs, workstations, and mainframes). However, LAPACK in its present form is less likely to give good performance on other types of parallel architectures—for example, massively parallel Single Instruction Multiple Data (SIMD) machines, or Multiple Instruction Multiple Data (MIMD) distributed-memory machines. The ScaLAPACK effort was intended to adapt LAPACK to these new architectures.

By creating the ScaLAPACK software library, we extended the LAPACK library to scalable MIMD, distributed-memory, concurrent computers. For such machines, the memory hierarchy includes the off-processor memory of other processors, in addition to the hierarchy of registers, cache, and local memory on each processor.

Like LAPACK, the ScaLAPACK routines are based on block-partitioned algorithms in order to minimize the frequency of data movement between different levels of the memory hierarchy. The fundamental building blocks of the ScaLAPACK library are distributed-memory versions of the Level-2 and Level-3 BLAS, and a set of Basic Linear Algebra Communication Subprograms (BLACS) for communication tasks that arise frequently in parallel linear algebra computations. In the ScaLAPACK routines, all interprocessor communication occurs within the distributed BLAS and the BLACS, so the source code of the top software layer of ScaLAPACK looks very similar to that of LAPACK.

The ScaLAPACK solution to LU factorization is shown in Example 14-5.

Example 14-5. ScaLAPACK variant (Fortran 90 coding)

Code View: Scroll / Show All

SUBROUTINE PDGETRF( M, N, A, IA, JA, DESCA, IPIV, INFO )

INTEGER BLOCK_CYCLIC_2D, CSRC_, CTXT_, DLEN_, DTYPE_,

$ LLD_, MB_, M_, NB_, N_, RSRC_

PARAMETER ( BLOCK_CYCLIC_2D = 1, DLEN_ = 9, DTYPE_ = 1,

$ CTXT_ = 2, M_ = 3, N_ = 4, MB_ = 5, NB_ = 6,

$ RSRC_ = 7, CSRC_ = 8, LLD_ = 9 )

DOUBLE PRECISION ONE

PARAMETER ( ONE = 1.0D+0 )

CHARACTER COLBTOP, COLCTOP, ROWBTOP

INTEGER I, ICOFF, ICTXT, IINFO, IN, IROFF, J, JB, JN,

$ MN, MYCOL, MYROW, NPCOL, NPROW

INTEGER IDUM1( 1 ), IDUM2( 1 )

EXTERNAL BLACS_GRIDINFO, CHK1MAT, IGAMN2D, PCHK1MAT, PB_TOPGET,

$ PB_TOPSET, PDGEMM, PDGETF2, PDLASWP, PDTRSM, PXERBLA

INTEGER ICEIL

EXTERNAL ICEIL

INTRINSIC MIN, MOD

* Get grid parameters

ICTXT = DESCA( CTXT_ )

CALL BLACS_GRIDINFO( ICTXT, NPROW, NPCOL, MYROW, MYCOL )

* Test the input parameters

INFO = 0

IF( NPROW.EQ.-1 ) THEN

INFO = -(600+CTXT_)

ELSE

CALL CHK1MAT( M, 1, N, 2, IA, JA, DESCA, 6, INFO )

IF( INFO.EQ.0 ) THEN

IROFF = MOD( IA-1, DESCA( MB_ ) )

ICOFF = MOD( JA-1, DESCA( NB_ ) )

IF( IROFF.NE.0 ) THEN

INFO = -4

ELSE IF( ICOFF.NE.0 ) THEN

INFO = -5

ELSE IF( DESCA( MB_ ).NE.DESCA( NB_ ) ) THEN

INFO = -(600+NB_)

END IF

END IF

CALL PCHK1MAT( M, 1, N, 2, IA, JA, DESCA, 6, 0, IDUM1, IDUM2, INFO )

END IF

IF( INFO.NE.0 ) THEN

CALL PXERBLA( ICTXT, 'PDGETRF', -INFO )

RETURN

END IF

IF( DESCA( M_ ).EQ.1 ) THEN

IPIV( 1 ) = 1

RETURN

ELSE IF( M.EQ.0 .OR. N.EQ.0 ) THEN

RETURN

END IF

* Split-ring topology for the communication along process rows

CALL PB_TOPGET( ICTXT, 'Broadcast', 'Rowwise', ROWBTOP )

CALL PB_TOPGET( ICTXT, 'Broadcast', 'Columnwise', COLBTOP )

CALL PB_TOPGET( ICTXT, 'Combine', 'Columnwise', COLCTOP )

CALL PB_TOPSET( ICTXT, 'Broadcast', 'Rowwise', 'S-ring' )

CALL PB_TOPSET( ICTXT,

Return Main Page Previous Page Next Page

®Online Book Reader