NOC-Out: Microarchitecting a Scale-Out Processor

P. Lotfi-Kamran, B. Grot, B. Falsafi

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Scale-out server workloads benefit from many-core processor organizations that enable high throughput thanks to abundant request-level parallelism. A key characteristic of these workloads is the large instruction footprint that exceeds the capacity of private caches. While a shared last-level cache (LLC) can capture the instruction working set, it necessitates a low-latency interconnect fabric to minimize the core stall time on instruction fetches serviced by the LLC. Many-core processors with a mesh interconnect sacrifice performance on scale-out workloads due to NOC-induced delays. Low-diameter topologies can overcome the performance limitations of meshes through rich inter-node connectivity, but at a high area expense. To address the drawbacks of existing designs, this work introduces NOC-Out - a many-core processor organization that affords low LLC access delays at a small area cost. NOC-Out is tuned to accommodate the bilateral core-to-cache access pattern, characterized by minimal coherence activity and lack of inter-core communication, that is dominant in scale-out workloads. Optimizing for the bilateral access pattern, NOC-Out segregates cores and LLC banks into distinct network regions and reduces costly network connectivity by eliminating the majority of inter-core links. NOC-Out further simplifies the interconnect through the use of low-complexity tree-based topologies. A detailed evaluation targeting a 64-core CMP and a set of scale-out workloads reveals that NOC-Out improves system performance by 17% and reduces network area by 28% over a tiled mesh-based design. Compared to a design with a richly-connected flattened butterfly topology, NOC-Out reduces network area by 9× while matching the performance.
Original languageEnglish
Title of host publicationMicroarchitecture (MICRO), 2012 45th Annual IEEE/ACM International Symposium on
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Number of pages11
ISBN (Print)978-1-4673-4819-5
Publication statusPublished - 1 Dec 2012


  • cache storage
  • instruction sets
  • network topology
  • network-on-chip
  • LLC access delays
  • LLC banks
  • NOC-Out
  • NOC-induced delays
  • bilateral access pattern
  • bilateral core-to-cache access pattern
  • coherence activity
  • core stall time
  • instruction working set
  • intercore communication
  • intercore links
  • internode connectivity
  • low-complexity tree-based topologies
  • low-diameter topologies
  • low-latency interconnect fabric
  • many-core processor organizations
  • mesh interconnect
  • network connectivity
  • private cache capacity
  • request-level parallelism
  • richly-connected flattened butterfly topology
  • scale-out processor
  • scale-out server workloads
  • shared LLC
  • shared last-level cache
  • tiled mesh-based design


Dive into the research topics of 'NOC-Out: Microarchitecting a Scale-Out Processor'. Together they form a unique fingerprint.

Cite this