Abstract
Serverless computing has emerged as a widely-used paradigm for running services in the cloud. In serverless, developers organize their applications as a set of functions, which are invoked on-demand in response to events, such as an HTTP request. To avoid long start-up delays of launching a new function instance, cloud providers tend to keep recently-triggered instances idle (or warm) for some time after the most recent invocation in anticipation of future invocations. Thus, at any given moment on a server, there may be thousands of warm instances of various functions whose executions are interleaved in time based on incoming invocations.
This paper observes that (1) there is a high degree of interleaving among warm instances on a given server; (2) the individual warm functions are invoked relatively infrequently, often at the granularity of seconds or minutes; and (3) many function invocations complete within a few milliseconds. Interleaved execution of rarely invoked functions on a server leads to thrashing of each function’s microarchitectural state between invocations. Meanwhile, the short execution time of a function impedes amortization of the warmup latency of the cache hierarchy, causing a 31-114% increase in CPI compared to execution with warm microarchitectural state. We identify on-chip misses for instructions as a major contributor to the performance loss. In response we propose Jukebox, a record-and-replay instruction prefetcher specifically designed for reducing the start-up latency of warm function instances. Jukebox requires just 32KB of metadata per function instance and boosts performance by an average of 18.7% for a wide range of functions, which translates into a corresponding throughput improvement.
This paper observes that (1) there is a high degree of interleaving among warm instances on a given server; (2) the individual warm functions are invoked relatively infrequently, often at the granularity of seconds or minutes; and (3) many function invocations complete within a few milliseconds. Interleaved execution of rarely invoked functions on a server leads to thrashing of each function’s microarchitectural state between invocations. Meanwhile, the short execution time of a function impedes amortization of the warmup latency of the cache hierarchy, causing a 31-114% increase in CPI compared to execution with warm microarchitectural state. We identify on-chip misses for instructions as a major contributor to the performance loss. In response we propose Jukebox, a record-and-replay instruction prefetcher specifically designed for reducing the start-up latency of warm function instances. Jukebox requires just 32KB of metadata per function instance and boosts performance by an average of 18.7% for a wide range of functions, which translates into a corresponding throughput improvement.
Original language | English |
---|---|
Title of host publication | Proceedings of ACM/IEEE International Symposium on Computer Architecture (ISCA) 2022 |
Publisher | ACM |
Pages | 757-770 |
Number of pages | 14 |
ISBN (Print) | 978-1-4503-8610-4 |
DOIs | |
Publication status | Published - 18 Jun 2022 |
Event | The 49th International Symposium on Computer Architecture (ISCA) - New York City, United States Duration: 18 Jun 2022 → 22 Jun 2022 Conference number: 49 https://iscaconf.org/isca2022/ |
Conference
Conference | The 49th International Symposium on Computer Architecture (ISCA) |
---|---|
Abbreviated title | ISCA 2022 |
Country/Territory | United States |
City | New York City |
Period | 18/06/22 → 22/06/22 |
Internet address |
Keywords
- Serverless
- characterization
- microarchitecture
- instruction prefetching