Abstract
As AI models grow exponentially in size, memory has emerged as a critical bottleneck for inference at scale. While hardware solutions like Compute Express Link (CXL) promises to solve the problem of memory capacity and sharing, they require capital investment, and are not widely available. This paper presents RMAI, an in-kernel remote shared memory framework tailored for AI inference workloads, offering a transparent, scalable, and cost-effective software alternative to hardware-based memory expansion and sharing solutions. By leveraging the operating system’s capabilities, RMAI introduces dynamic virtual memory regions that reduce page faults, minimize overheads associated with user-kernel transitions, and optimize data locality for inference workloads. In this paper, we particularly focus on Mixture-of-Experts (MoE) models. In this initial evaluation we demonstrate that RMAI achieves performance levels comparable to CXL-like architectures, with up to 10x faster expert switching and reduced memory management overhead across large-scale inference tasks compared to disk-based solutions. This work
redefines the role of remote shared memory in AI systems, positioning it as a practical and high-performance solution for memory capacity and sharing in modern data centers.
redefines the role of remote shared memory in AI systems, positioning it as a practical and high-performance solution for memory capacity and sharing in modern data centers.
Original language | English |
---|---|
Title of host publication | The 5th Workshop on Machine Learning and Systems (EuroMLSys ’25) |
Place of Publication | New York, NY, USA |
Publisher | Association for Computing Machinery (ACM) |
Pages | 1-10 |
Number of pages | 10 |
DOIs | |
Publication status | Accepted/In press - 25 Feb 2025 |
Event | The 5th Workshop on Machine Learning and Systems - Rotterdam, Netherlands Duration: 31 Mar 2025 → 31 Mar 2025 Conference number: 5 https://euromlsys.eu/# |
Workshop
Workshop | The 5th Workshop on Machine Learning and Systems |
---|---|
Abbreviated title | EuroMLSys 2025 |
Country/Territory | Netherlands |
City | Rotterdam |
Period | 31/03/25 → 31/03/25 |
Internet address |
Keywords / Materials (for Non-textual outputs)
- memory disaggregation
- distributed shared memory
- mixture of experts
- kernel-level memory management
- RDMA
- compute eXpress link