Abstract
Branch prediction is crucial for modern highperformance processors, ensuring efficient execution by anticipating branch outcomes. Despite decades of research, achieving high prediction accuracy remains challenging, particularly in server workloads, where large branch working sets and hard-to-predict branches are prevalent. State-of-the-art predictors, such as the 64KB TAGE-SC-L design, experience high misprediction rates on server workloads, with 3.6-20% (9.2% on average) of execution cycles wasted due to mispredictions on a modern server CPU. While more predictor capacity can reduce mispredictions by up to 36% in the limit (with infinite storage), realizing meaningful gains in practice requires hundreds of KBs of storage, which is infeasible for a latency- and area-sensitive in-core predictor.
This work introduces the Last-Level Branch Predictor (LLBP), a microarchitectural approach that improves branch prediction accuracy through additional high-capacity storage backing the baseline TAGE predictor. LLBP leverages the insight that branches requiring longer histories tend to span multiple program contexts – notionally, function calls. A given program context, which can be thought of as a call chain, localizes the branch prediction state, affording a small number of patterns per context even for hard-to-predict branches. LLBP predicts upcoming contexts and prefetches the associated branch metadata into a small in-core buffer, which is accessed in parallel with the unmodified TAGE predictor. Our results show that a 512KB LLBP backing 64KB TAGE-SC-L reduces MPKI by 0.5-25.9% (avg. 8.9%) over the baseline without LLBP.
This work introduces the Last-Level Branch Predictor (LLBP), a microarchitectural approach that improves branch prediction accuracy through additional high-capacity storage backing the baseline TAGE predictor. LLBP leverages the insight that branches requiring longer histories tend to span multiple program contexts – notionally, function calls. A given program context, which can be thought of as a call chain, localizes the branch prediction state, affording a small number of patterns per context even for hard-to-predict branches. LLBP predicts upcoming contexts and prefetches the associated branch metadata into a small in-core buffer, which is accessed in parallel with the unmodified TAGE predictor. Our results show that a 512KB LLBP backing 64KB TAGE-SC-L reduces MPKI by 0.5-25.9% (avg. 8.9%) over the baseline without LLBP.
Original language | English |
---|---|
Title of host publication | 2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO) |
Editors | Lisa O'Conner |
Place of Publication | Piscataway, NJ, USA |
Publisher | Institute of Electrical and Electronics Engineers |
Pages | 464-479 |
Number of pages | 16 |
ISBN (Electronic) | 9798350350579 |
ISBN (Print) | 9798350350586 |
DOIs | |
Publication status | Published - 3 Dec 2024 |
Event | The 57th IEEE/ACM International Symposium on Microarchitecture - AT&T Hotel and Conference Center, Austin, United States Duration: 2 Nov 2024 → 6 Nov 2024 Conference number: 57 https://microarch.org/micro57/index.php |
Publication series
Name | Proceedings of the Annual International Symposium on Microarchitecture |
---|---|
Publisher | IEEE |
ISSN (Print) | 1072-4451 |
ISSN (Electronic) | 2379-3155 |
Symposium
Symposium | The 57th IEEE/ACM International Symposium on Microarchitecture |
---|---|
Abbreviated title | MICRO 2024 |
Country/Territory | United States |
City | Austin |
Period | 2/11/24 → 6/11/24 |
Internet address |