High Performance Computing - Charles Severance [110]
!HPF$ DISTRIBUTE PLATE(*,BLOCK)
!HPF$ ALIGN SCALE(I) WITH PLATE(J,I)
Or:
DIMENSION PLATE(200,200),SCALE(200)
!HPF$ DISTRIBUTE PLATE(*,BLOCK)
!HPF$ ALIGN SCALE(:) WITH PLATE(*,:)
In both examples, the PLATE and the SCALE variables are allocated to the same processors as the corresponding columns of PLATE. The * and : syntax communicate the same information. When * is used, that dimension is collapsed, and it doesn't participate in the distribution. When the : is used, it means that dimension follows the corresponding dimension in the variable that has already been distributed.
You could also specify the layout of the SCALE variable and have the PLATE variable "follow" the layout of the SCALE variable:
DIMENSION PLATE(200,200),SCALE(200)
!HPF$ DISTRIBUTE SCALE(BLOCK)
!HPF$ ALIGN PLATE(J,I) WITH SCALE(I)
You can put simple arithmetic expressions into the ALIGN directive subject to some limitations. Other directives include:
PROCESSORS Allows you to create a shape of the processor configuration that can be used to align other data structures.
REDISTRIBUTE and REALIGN Allow you to dynamically reshape data structures at runtime as the communication patterns change during the course of the run.
TEMPLATE Allows you to create an array that uses no space. Instead of distributing one data structure and aligning all the other data structures, some users will create and distribute a template and then align all of the real data structures to that template.
The use of directives can range from very simple to very complex. In some situations, you distribute the one large shared structure, align a few related structures and you are done. In other situations, programmers attempt to optimize communications based on the topology of the interconnection network (hypercube, multi-stage interconnection network, mesh, or toroid) using very detailed directives. They also might carefully redistribute the data at the various phases of the computation.
Hopefully your application will yield good performance without too much effort.
HPF control structures
While the HPF designers were in the midst of defining a new language, they set about improving on what they saw as limitations in FORTRAN 90. Interestingly, these modifications are what is being considered as part of the new FORTRAN 95 standard.
The FORALL statement allows the user to express simple iterative operations that apply to the entire array without resorting to a do-loop (remember, do-loops force order). For example:
FORALL (I=1:100, J=1:100) A(I,J) = I + J
This can be expressed in native FORTRAN 90 but it is rather ugly, counterintuitive, and prone to error.
Another control structure is the ability to declare a function as "PURE." A PURE function has no side effects other than through its parameters. The programmer is guaranteeing that a PURE function can execute simultaneously on many processors with no ill effects. This allows HPF to assume that it will only operate on local data and does not need any data communication during the duration of the function execution. The programmer can also declare which parameters of the function are input parameters, output parameters, and input-output parameters.
HPF intrinsics
The companies who marketed SIMD computers needed to come up with significant tools to allow efficient collective operations across all the processors. A perfect example of this is the SUM operation. To SUM the value of an array spread across N processors, the simplistic approach takes N steps. However, it is possible to accomplish it in log(N) steps using a technique called parallel-prefix-sum. By the time HPF was in development, a number of these operations had been identified and implemented. HPF took the opportunity to define standardized syntax for these operations.
A sample of these operations includes:
SUM_PREFIX Performs various types of parallel-prefix summations.
ALL_SCATTER Distributes a single value to a set of processors.
GRADE_DOWN Sorts into decreasing order.
IANY Computes the logical OR of a set of values.
While