High Performance Computing - Charles Severance [124]
While HPF, PVM, and MPI are all mature and popular technologies, it’s not clear whether any of these technologies will be the long-term solution that we will use 10 years from now. One possibility is that we will use FORTRAN 90 (or FORTRAN 95) without any data layout directives or that the directives will be optional. Another interesting possibility is simply to keep using FORTRAN 77. As scalable, cache-coherent, non-uniform memory systems become more popular, they will evolve their own data allocation primitives. For example, the HP/Exemplar supports the following data storage attributes: shared, node-private, and thread-private. As dynamic data structures are allocated, they can be placed in any one of these classes. Node-private memory is shared across all the threads on a single node but not shared beyond those threads. Perhaps we will only have to declare the storage class of the data but not the data layout for these new machines.
PVM and MPI still need the capability of supporting a fault-tolerant style of computing that allows an application to complete as resources fail or otherwise become available. The amount of compute power that will be available to applications that can tolerate some unreliability in the resources will be very large. There have been a number of moderately successful attempts in this area such as Condor, but none have really caught on in the mainstream.
To run the most powerful computers in the world at their absolute maximum performance levels, the need to be portable is somewhat reduced. Making your particular application go ever faster and scale to ever higher numbers of processors is a fascinating activity. May the FLOPS be with you!
[65] There is another red-black approach that computes first the even elements and then the odd elements of the rod in two passes. This approach has no data dependencies within each pass. The ROD array never has all the values from the same time step. Either the odd or even values are one time step ahead of the other. It ends up with a stride of two and doubles the bandwidth but does not double the memory storage required to solve the problem.
[66] There are other algorithmic approaches to solving partial differential equations, such as the "fast multipole method" that accelerates convergence "legally." Don't assume that the brute force approach used here is the only method to solve this particular problem. Programmers should always look for the best available algorithm (parallel or not) before trying to scale up the "wrong" algorithm. For folks other than computer scientists, time to solution is more important than linear speed-up.
[67] One of the more significant efforts in the area of completely new languages is Streams and Iteration in a Single Assignment Language (SISAL). It's a data flow language that can easily integrate FORTRAN and C modules. The most interesting aspects of SISAL are the number of large computational codes that were ported to SISAL and the fact that the SISAL proponents generally compared their performance to the FORTRAN and C performance of the same applications.
[68] Just in case you are wondering, A*B gives you an element-wise multiplication of array members— not matrix multiplication. That is covered by a FORTRAN 90 intrinsic function.
[69] The distributed RC5 key-cracking effort was coordinated in this fashion. Each processor would check out a block of keys and begin testing those keys. At some point, if the processor was not fast enough or had crashed, the central system would reissue the block to another processor. This allowed the system to recover from problems on individual computers.
[70] As we shall soon see.
[71] Always a safe assumption.
[72] Notice I said “not that much more effort.”
[73] Note: the exact compilation may be different on your system.
[74] One should not diminish the positive contributions of PVM, however. PVM was the first widely avail- able portable message-passing environment. PVM pioneered the idea of heterogeneous distributed computing with built-in format conversion.