Prefetched Address Translation

Artemiy Margaritov, Dmitrii Ustiugov, Edouard Bugnion, Boris Grot

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

With explosive growth in dataset sizes and increasing machine memory capacities, per-application memory footprints are commonly reaching into hundreds of GBs. Such huge datasets pressure the TLB, resulting in frequent misses that must be resolved through a page walk – a long-latency pointer chase through multiple levels of the in-memory radix tree-based page table.

Anticipating further growth in dataset sizes and their adverse affect on TLB hit rates, this work seeks to accelerate page walks while fully preserving existing virtual memory abstractions and mechanisms – a must for software compatibility and generality. Our idea is to enable direct indexing into a given level of the page table, thus eliding the need to first fetch pointers from the preceding levels. A key contribution of our work is in showing that this can be done by simply ordering the pages containing the page table in physical memory to match the order of the virtual memory pages they map to. Doing so enables direct indexing into the page table using a base-plus-offset arithmetic.

We introduce Address Translation with Prefetching (ASAP), a new approach for reducing the latency of address translation to a single access to the memory hierarchy. Upon a TLB miss, ASAP launches prefetches to the deeper levels of the page table, bypassing the preceding levels. These prefetches happen concurrently with a conventional page walk, which observes a latency reduction due to prefetching while guaranteeing that only correctlypredicted entries are consumed. ASAP requires minimal extensions to the OS and trivial microarchitectural support. Moreover, ASAP is fully legacy-preserving, requiring no modifications to the existing radix tree-based page table, TLBs and other software and hardware mechanisms for address translation. Our evaluation on a range of memory-intensive workloads shows that under SMT colocation, ASAP is able to reduce page walk latency by an average of 25% (42% max) in native execution, and 45% (55% max) under virtualization.
Original languageEnglish
Title of host publicationThe 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-52), October 12–16, 2019, Columbus, OH, USA
PublisherAssociation for Computational Linguistics
Number of pages14
ISBN (Electronic)978-1-4503-6938-1
Publication statusPublished - 12 Oct 2019
Event52nd IEEE/ACM International Symposium on Microarchitecture - Columbus, United States
Duration: 12 Oct 201916 Oct 2019


Conference52nd IEEE/ACM International Symposium on Microarchitecture
Abbreviated titleMICRO 2019
Country/TerritoryUnited States
Internet address

Keywords / Materials (for Non-textual outputs)

  • virtual memory
  • microarchitecture
  • virtualization


Dive into the research topics of 'Prefetched Address Translation'. Together they form a unique fingerprint.

Cite this