A fast and accurate method for determining a lower bound on execution time

Grigori Fursin, Michael F. P. O'Boyle, Olivier Temam, G. Watts

Research output: Contribution to journalArticlepeer-review

Abstract

In performance critical applications, memory latency is frequently the dominant overhead. In many cases, automatic compiler-based optimizations to improve memory performance are limited and programmers frequently resort to manual optimization techniques. However, this process is tedious and time-consuming. Furthermore, as the potential benefit from optimization is unknown there is no way to judge the amount of effort worth expending, nor when the process can stop, i.e. when optimal memory performance has been achieved or sufficiently approached. Architecture simulators can provide such information but designing an accurate model of an existing architecture is difficult and simulation times are excessively long. In this article, we propose and implement a technique that is both fast and reasonably accurate for estimating a lower bound on execution time for scientific applications. This technique has been tested on a wide range of programs from the SPEC benchmark suite and two commercial applications, where it has been used to guide a manual optimization process and iterative compilation. We compare our technique with that of a simulator with an ideal memory behaviour and demonstrate that our technique provides comparable information on memory performance and yet is over two orders of magnitude faster. We further show that our technique is considerably more accurate than hardware counters.
Original languageEnglish
Pages (from-to)271-292
Number of pages22
JournalConcurrency and Computation: Practice and Experience
Volume16
Issue number2-3
DOIs
Publication statusPublished - 7 Jan 2004

Fingerprint Dive into the research topics of 'A fast and accurate method for determining a lower bound on execution time'. Together they form a unique fingerprint.

Cite this