Beautiful Code [200]
Finally, we leave as an exercise to the reader to try to mimic the recursive code without using recursion and without explicitly handling the recursive call stack—an important problem to solve if the Fortran compiler cannot handle recursive functions or subroutines.
How Elegant Code Evolves with Hardware The Case of Gaussian Elimination > ScaLAPACK PDGETRF
14.7. ScaLAPACK PDGETRF
LAPACK is designed to be highly efficient on vector processors, high-performance "super-scalar" workstations, and shared-memory multiprocessors. LAPACK can also be used sat-isfactorily on all types of scalar machines (PCs, workstations, and mainframes). However, LAPACK in its present form is less likely to give good performance on other types of parallel architectures—for example, massively parallel Single Instruction Multiple Data (SIMD) machines, or Multiple Instruction Multiple Data (MIMD) distributed-memory machines. The ScaLAPACK effort was intended to adapt LAPACK to these new architectures.
By creating the ScaLAPACK software library, we extended the LAPACK library to scalable MIMD, distributed-memory, concurrent computers. For such machines, the memory hierarchy includes the off-processor memory of other processors, in addition to the hierarchy of registers, cache, and local memory on each processor.
Like LAPACK, the ScaLAPACK routines are based on block-partitioned algorithms in order to minimize the frequency of data movement between different levels of the memory hierarchy. The fundamental building blocks of the ScaLAPACK library are distributed-memory versions of the Level-2 and Level-3 BLAS, and a set of Basic Linear Algebra Communication Subprograms (BLACS) for communication tasks that arise frequently in parallel linear algebra computations. In the ScaLAPACK routines, all interprocessor communication occurs within the distributed BLAS and the BLACS, so the source code of the top software layer of ScaLAPACK looks very similar to that of LAPACK.
The ScaLAPACK solution to LU factorization is shown in Example 14-5.
Example 14-5. ScaLAPACK variant (Fortran 90 coding)
Code View: Scroll / Show All
SUBROUTINE PDGETRF( M, N, A, IA, JA, DESCA, IPIV, INFO )
INTEGER BLOCK_CYCLIC_2D, CSRC_, CTXT_, DLEN_, DTYPE_,
$ LLD_, MB_, M_, NB_, N_, RSRC_
PARAMETER ( BLOCK_CYCLIC_2D = 1, DLEN_ = 9, DTYPE_ = 1,
$ CTXT_ = 2, M_ = 3, N_ = 4, MB_ = 5, NB_ = 6,
$ RSRC_ = 7, CSRC_ = 8, LLD_ = 9 )
DOUBLE PRECISION ONE
PARAMETER ( ONE = 1.0D+0 )
CHARACTER COLBTOP, COLCTOP, ROWBTOP
INTEGER I, ICOFF, ICTXT, IINFO, IN, IROFF, J, JB, JN,
$ MN, MYCOL, MYROW, NPCOL, NPROW
INTEGER IDUM1( 1 ), IDUM2( 1 )
EXTERNAL BLACS_GRIDINFO, CHK1MAT, IGAMN2D, PCHK1MAT, PB_TOPGET,
$ PB_TOPSET, PDGEMM, PDGETF2, PDLASWP, PDTRSM, PXERBLA
INTEGER ICEIL
EXTERNAL ICEIL
INTRINSIC MIN, MOD
* Get grid parameters
ICTXT = DESCA( CTXT_ )
CALL BLACS_GRIDINFO( ICTXT, NPROW, NPCOL, MYROW, MYCOL )
* Test the input parameters
INFO = 0
IF( NPROW.EQ.-1 ) THEN
INFO = -(600+CTXT_)
ELSE
CALL CHK1MAT( M, 1, N, 2, IA, JA, DESCA, 6, INFO )
IF( INFO.EQ.0 ) THEN
IROFF = MOD( IA-1, DESCA( MB_ ) )
ICOFF = MOD( JA-1, DESCA( NB_ ) )
IF( IROFF.NE.0 ) THEN
INFO = -4
ELSE IF( ICOFF.NE.0 ) THEN
INFO = -5
ELSE IF( DESCA( MB_ ).NE.DESCA( NB_ ) ) THEN
INFO = -(600+NB_)
END IF
END IF
CALL PCHK1MAT( M, 1, N, 2, IA, JA, DESCA, 6, 0, IDUM1, IDUM2, INFO )
END IF
IF( INFO.NE.0 ) THEN
CALL PXERBLA( ICTXT, 'PDGETRF', -INFO )
RETURN
END IF
IF( DESCA( M_ ).EQ.1 ) THEN
IPIV( 1 ) = 1
RETURN
ELSE IF( M.EQ.0 .OR. N.EQ.0 ) THEN
RETURN
END IF
* Split-ring topology for the communication along process rows
CALL PB_TOPGET( ICTXT, 'Broadcast', 'Rowwise', ROWBTOP )
CALL PB_TOPGET( ICTXT, 'Broadcast', 'Columnwise', COLBTOP )
CALL PB_TOPGET( ICTXT, 'Combine', 'Columnwise', COLCTOP )
CALL PB_TOPSET( ICTXT, 'Broadcast', 'Rowwise', 'S-ring' )
CALL PB_TOPSET( ICTXT,