Complementing user-level coarse-grain parallelism with implicit speculative parallelism

Nikolas Ioannou, Marcelo Cintra

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Multi-core and many-core systems are the norm in contemporary processor technology and are expected to remain so for the foreseeable future. Programs using parallel programming primitives like PThreads or OpenMP often exploit coarse-grain parallelism, because it offers a good trade-off between programming effort versus performance gain. Some parallel applications show limited or no scaling beyond a number of cores. Given the abundant number of cores expected in future many-cores, several cores would remain idle in such cases while execution performance stagnates. This paper proposes using cores that do not contribute to performance improvement for running implicit fine-grain speculative threads. In particular, we present a many-core architecture and protocol that allow applications with coarse-grain explicit parallelism to further exploit implicit speculative parallelism within each thread. Implicit speculative parallelism frees the programmer from the additional effort to explicitly partition the work into finer and properly synchronized tasks. Our results show that, for a many-core comprising of 128 cores supporting implicit speculative parallelism in clusters of 2 or 4 cores, performance improves on top of the highest scalability point by 41% on average for the 4-core cluster and by 27% on average for the 2-core cluster. These performance improvements come with an energy consumption that is close to -- and sometimes better than -- the baseline. This approach often leads to better performance and energy efficiency compared to existing alternatives such as Core Fusion and Frequency Boosting. We also investigate the tradeoffs between explicit and implicit threads as input dataset sizes vary. Finally, we present a dynamic mechanism to choose the number of explicit and implicit threads, which performs within 6% of the static oracle selection of threads.
Original languageEnglish
Title of host publicationProceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Place of PublicationNew York, NY, USA
PublisherACM
Pages284-295
Number of pages12
ISBN (Print)978-1-4503-1053-6
DOIs
Publication statusPublished - 2011

Publication series

NameMICRO-44 '11
PublisherACM

Fingerprint

Dive into the research topics of 'Complementing user-level coarse-grain parallelism with implicit speculative parallelism'. Together they form a unique fingerprint.

Cite this