Edinburgh Research Explorer

Compiling and Optimizing for Decoupled Architectures

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Original languageEnglish
Title of host publicationSupercomputing, 1995. Proceedings of the IEEE/ACM SC95 Conference
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Pages40-40
Number of pages1
ISBN (Print)0-89791-816-9
DOIs
Publication statusPublished - 1995

Abstract

Decoupled architectures provide a key to the problem of sustained supercomputer performance through their ability to hide large memory latencies. When a program executes in a decoupled mode the perceived memory latency at the processor is zero; effectively the entire physical memory has an access time equivalent to the processor's register file, and latency is completely hidden. However, the asynchronous functional units within a decoupled architecture must occasionally synchronize, incurring a high penalty. The goal of compiling and optimizing for decoupled architectures is to partition the program between the asynchronous functional units in such a way that latencies are hidden but synchronization events are executed infrequently. This paper describes a model for decoupled compilation, and explains the effectiveness of compilation for decoupled systems. A number of new compiler optimizations are introduced and evaluated quantitatively using the Perfect Club scientific benchmarks. We show that with a suitable repertiore of optimizations, it is possible to hide large latencies most of the time for most of the programs in the Perfect Club.

    Research areas

  • Benchmarks, Compiling, Decoupled architecture, Optimization, Performance, Quantitative analysis, Computer architecture, Computer science, Costs, Delay, Distributed computing, Frequency synchronization, Optimizing compilers, Performance analysis, Registers, Supercomputers

ID: 18691069