TY - GEN
T1 - MARS: A Distributed Memory Approach to Shared Memory Compilation
AU - O'Boyle, Michael F. P.
PY - 1998/9/24
Y1 - 1998/9/24
N2 - This paper describes an automatic parallelising compiler, MARS, targeted for shared memory machines. It uses a data partitioning approach, traditionally used for distributed memory machines, in order to globally reduce overheads such as communication and synchronisation. Its high-level linear algebraic representation allows direct application of, for instance, unimodular transformations and global application of data transformation. Although a data based approach allows global analysis and in many instances outperforms local, loop-orientated parallelisation approaches, we have identified two particular problems when applying data parallelism to sequential Fortran 77 as opposed to data parallel dialects tailored to distributed memory targets. This paper describes two techniques to overcome these problems and evaluates their applicability. Preliminary results, on two SPECf92 benchmarks, show that with these optimisations, MARS outperforms existing state-of-the art loop based auto-parallelisation approaches.
AB - This paper describes an automatic parallelising compiler, MARS, targeted for shared memory machines. It uses a data partitioning approach, traditionally used for distributed memory machines, in order to globally reduce overheads such as communication and synchronisation. Its high-level linear algebraic representation allows direct application of, for instance, unimodular transformations and global application of data transformation. Although a data based approach allows global analysis and in many instances outperforms local, loop-orientated parallelisation approaches, we have identified two particular problems when applying data parallelism to sequential Fortran 77 as opposed to data parallel dialects tailored to distributed memory targets. This paper describes two techniques to overcome these problems and evaluates their applicability. Preliminary results, on two SPECf92 benchmarks, show that with these optimisations, MARS outperforms existing state-of-the art loop based auto-parallelisation approaches.
U2 - 10.1007/3-540-49530-4_19
DO - 10.1007/3-540-49530-4_19
M3 - Conference contribution
SN - 978-3-540-65172-7
T3 - Lecture Notes in Computer Science
SP - 259
EP - 274
BT - Languages, Compilers, and Run-Time Systems for Scalable Computers
PB - Springer
ER -