Abstract
Many contemporary applications feature multimegabyte instruction footprints that overwhelm the capacity of branch target buffers (BTB) and instruction caches (L1-I), causing frequent front-end stalls that inevitably hurt performance. BTB capacity is crucial for performance as a sufficiently large BTB enables the front-end to accurately resolve the upcoming execution path and steer instruction fetch appropriately. Moreover, it also enables highly effective fetch-directed instruction prefetching that can eliminate a large portion L1-I misses. For these reasons, commercial processors allocate vast amounts of storage capacity to BTBs.
#This work aims to reduce BTB storage requirements by optimizing the organization of BTB entries. Our key insight is that storing branch target offsets, instead of full or compressed targets, can drastically reduce BTB storage cost as the vast majority of dynamic branches have short offsets requiring just a handful of bits to encode. Based on this insight, we size the ways of a set associative BTB to hold different number of target offset bits such that each way stores offsets within a particular range. Doing so enables a dramatic reduction in storage for target addresses. Our final design, called BTB-X, uses an 8-way set associative BTB with differently sized ways that enables it to track about 2.24x more branches than a conventional BTB and 1.3x more branches than a storage-optimized state-of-the-art BTB organization, called PDede, with the same storage budget.
#This work aims to reduce BTB storage requirements by optimizing the organization of BTB entries. Our key insight is that storing branch target offsets, instead of full or compressed targets, can drastically reduce BTB storage cost as the vast majority of dynamic branches have short offsets requiring just a handful of bits to encode. Based on this insight, we size the ways of a set associative BTB to hold different number of target offset bits such that each way stores offsets within a particular range. Doing so enables a dramatic reduction in storage for target addresses. Our final design, called BTB-X, uses an 8-way set associative BTB with differently sized ways that enables it to track about 2.24x more branches than a conventional BTB and 1.3x more branches than a storage-optimized state-of-the-art BTB organization, called PDede, with the same storage budget.
Original language | English |
---|---|
Title of host publication | Proceedings of The 29th IEEE International Symposium on High-Performance Computer Architecture (HPCA-29) |
Publisher | Institute of Electrical and Electronics Engineers |
Pages | 1153-1167 |
DOIs | |
Publication status | Published - 24 Mar 2023 |
Event | The 29th IEEE International Symposium on High-Performance Computer Architecture, 2023 - Montreal, Canada Duration: 25 Feb 2023 → 1 Mar 2023 Conference number: 29 https://hpca-conf.org/2023/ |
Publication series
Name | IEEE Symposium on High-Performance Computer Architecture |
---|---|
Publisher | IEEE |
ISSN (Print) | 1530-0897 |
ISSN (Electronic) | 2378-203X |
Conference
Conference | The 29th IEEE International Symposium on High-Performance Computer Architecture, 2023 |
---|---|
Abbreviated title | HPCA 2023 |
Country/Territory | Canada |
City | Montreal |
Period | 25/02/23 → 1/03/23 |
Internet address |