Beautiful Code [201]
CALL PB_TOPSET( ICTXT, 'Combine', 'Columnwise', ' ' )
* Handle the first block of columns separately
MN = MIN( M, N )
IN = MIN( ICEIL( IA, DESCA( MB_ ) )*DESCA( MB_ ), IA+M-1 )
JN = MIN( ICEIL( JA, DESCA( NB_ ) )*DESCA( NB_ ), JA+MN-1 )
JB = JN - JA + 1
* Factor diagonal and subdiagonal blocks and test for exact
* singularity.
CALL PDGETF2( M, JB, A, IA, JA, DESCA, IPIV, INFO )
IF( JB+1.LE.N ) THEN
* Apply interchanges to columns JN+1:JA+N-1.
CALL PDLASWP('Forward', 'Rows', N-JB, A, IA, JN+1, DESCA, IA, IN, IPIV )
* Compute block row of U.
CALL PDTRSM( 'Left', 'Lower', 'No transpose', 'Unit', JB,
$ N-JB, ONE, A, IA, JA, DESCA, A, IA, JN+1, DESCA )
*
IF( JB+1.LE.M ) THEN
* Update trailing submatrix.
CALL PDGEMM( 'No transpose', 'No transpose', M-JB,N-JB, JB,
$ -ONE, A, IN+1, JA, DESCA, A, IA, JN+1, DESCA,
$ ONE, A, IN+1, JN+1, DESCA )
END IF
END IF
* Loop over the remaining blocks of columns.
DO 10 J = JN+1, JA+MN-1, DESCA( NB_ )
JB = MIN( MN-J+JA, DESCA( NB_ ) )
I = IA + J - JA
*
* Factor diagonal and subdiagonal blocks and test for exact
* singularity.
*
CALL PDGETF2( M-J+JA, JB, A, I, J, DESCA, IPIV, IINFO )
*
IF( INFO.EQ.0 .AND. IINFO.GT.0 ) INFO = IINFO + J - JA
*
* Apply interchanges to columns JA:J-JA.
*
CALL PDLASWP('Forward', 'Rowwise', J-JA, A, IA, JA, DESCA, I,I+JB-1, IPIV)
IF( J-JA+JB+1.LE.N ) THEN
* Apply interchanges to columns J+JB:JA+N-1.
CALL PDLASWP( 'Forward', 'Rowwise', N-J-JB+JA, A, IA, J+JB,
$ DESCA, I, I+JB-1, IPIV )
* Compute block row of U.
CALL PDTRSM( 'Left', 'Lower', 'No transpose', 'Unit', JB,
$ N-J-JB+JA, ONE, A, I, J, DESCA, A, I, J+JB,
$ DESCA )
IF( J-JA+JB+1.LE.M ) THEN
* Update trailing submatrix.
CALL PDGEMM( 'No transpose', 'No transpose', M-J-JB+JA,
$ N-J-JB+JA, JB, -ONE, A, I+JB, J, DESCA, A,
$ I, J+JB, DESCA, ONE, A, I+JB, J+JB, DESCA )
END IF
END IF
10 CONTINUE
IF( INFO.EQ.0 ) INFO = MN + 1
CALL IGAMN2D(ICTXT, 'Rowwise', ' ', 1, 1, INFO, 1, IDUM1,IDUM2, -1,-1, MYCOL)
IF( INFO.EQ.MN+1 ) INFO = 0
CALL PB_TOPSET( ICTXT, 'Broadcast', 'Rowwise', ROWBTOP )
CALL PB_TOPSET( ICTXT, 'Broadcast', 'Columnwise', COLBTOP )
CALL PB_TOPSET( ICTXT, 'Combine', 'Columnwise', COLCTOP )
RETURN
END
In order to simplify the design of ScaLAPACK, and because the BLAS have proven to be very useful tools outside LAPACK, we chose to build a Parallel BLAS, or PBLAS (described in the paper by Choi et al. listed under "Further Reading"), whose interface is as similar to the BLAS as possible. This decision has permitted the ScaLAPACK code to be quite similar, and sometimes nearly identical, to the analogous LAPACK code.
It was our aim that the PBLAS would provide a distributed memory standard, just as the BLAS provided a shared memory standard. This would simplify and encourage the development of high-performance and portable parallel numerical software, as well as providing manufacturers with just a small set of routines to be optimized. The acceptance of the PBLAS requires reasonable compromises between competing goals of functionality and simplicity.
The PBLAS operate on matrices distributed in a two-dimensional block cyclic layout. Because such a data layout requires many parameters to fully describe the distributed matrix, we have chosen a more object-oriented approach and encapsulated these parameters in an integer array called an array descriptor. An array descriptor includes:
The descriptor type
The BLACS context (a virtual space for messages that is created to avoid collisions between logically distinct messages)
The number of rows in the distributed matrix
The number of columns in the distributed matrix
The row block size
The column block size
The process row over which the first row of the matrix is distributed
The process column over which the first column of the matrix is distributed
The leading dimension of the local array storing the local blocks
By using this descriptor, a call to a PBLAS routine is very similar to a call to the corresponding BLAS routine:
CALL DGEMM