This paper is concerned with the automatic mapping of array computation to processors efficiently. One of the major overheads associated with the mapping of computation is load imbalance. An optimising compiler should find a mapping such that this overhead is minimised. This paper describes formally the mapping of loop iterations to processors so as to minimise load imbalance. The class of perfectly load balanced affine loops is defined whereby using unimodular transformations, it is shown that a large class of loops are equivalent. An algorithm is detailed which can determine both whether a loop structure may be load balanced, and the necessary transformations to do so. This algorithm has a worst case complexity of O(m^3) where m is the dimensionality of the iteration space. The analysis is extended to the case when many loops are to be partitioned where it is shown that a transformation may be constructed which simultaneously balances all appropriate loops. Finally it is shown that if a loop is perfectly load balanced, then there exists a transformation such that it may be placed outermost so as to aid partitioning.
|Title of host publication||Parallel Computing: From Theory to Sound Practice EWPC '92: European Workshops on Parallel Computing, Barcelona, March 1992.|
|Number of pages||12|
|Publication status||Published - 22 Mar 1992|
- Program Transformation