Transmuter: Bridging the Efficiency Gap using Memory and Dataflow Reconfiguration

Subhankar Pal, Siying Feng, Dong-hyeon Park, Sung Kim, Aporva Amarnath, Chi-Sheng Yang, Xin He, Jonathan Beaumont, Kyle May, Yan Xiong, Kuba Kaszyk, John Magnus Morton, Jiawen Sun, Michael O'Boyle, Murray Cole, Chaitali Chakrabarti, David Blaauw, Hun-Seok Kim, Trevor Mudge, Ronald Dreslinski

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

With the end of Dennard scaling and Moore’s law, it is becoming increasingly difficult to build hardware for emerging applications that meet power and performance targets, while remaining flexible and programmable for end users. This is particularly true for domains that have frequently changing algorithms and applications involving mixed sparse/dense data structures, such as those in machine learning and graph analytics. To overcome this, we present a flexible accelerator called Transmuter, in a novel effort to bridge the gap between General-Purpose Processors (GPPs) and Application-Specific Integrated Circuits (ASICs). Transmuter adapts to changing kernel characteristics, such as data reuse and control divergence, through the ability to reconfigure the on-chip memory type, resource sharing and dataflow at run-time within a short latency. This is facilitated by a fabric of light-weight cores connected to a network of reconfigurable caches and crossbars. Transmuter addresses a rapidly growing set of algorithms exhibiting dynamic data movement patterns, irregularity, and sparsity, while delivering GPU-like efficiencies for traditional dense applications. Finally, in order to support programmability and ease-of-adoption, we prototype a software stack composed of low-level runtime routines, and a high-level language library called TransPy, that cater to expert programmers and end-users, respectively.

Our evaluations with Transmuter demonstrate average throughput (energy-efficiency) improvements of 5.0× (18.4×) and 4.2× (4.0×) over a high-end CPU and GPU, respectively, across a diverse set of kernels predominant in graph analytics, scientific computing and machine learning. Transmuter achieves energy-efficiency gains averaging 3.4× and 2.0× over prior FPGA and CGRA implementations of the same kernels, while remaining on average within 9.3× of state-of-the-art ASICs.
Original languageEnglish
Title of host publicationProceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques
PublisherACM Association for Computing Machinery
Number of pages16
ISBN (Electronic)9781450380751
Publication statusPublished - 30 Sept 2020
Event29th International Conference on Parallel Architectures and Compilation Techniques - Virtual conference
Duration: 3 Oct 20207 Oct 2020


Conference29th International Conference on Parallel Architectures and Compilation Techniques
Abbreviated titlePACT 2020
CityVirtual conference
Internet address

Keywords / Materials (for Non-textual outputs)

  • reconfigurable architectures
  • memory reconfiguration
  • dataflow reconfiguration
  • hardware acceleration
  • general-purpose acceleration


Dive into the research topics of 'Transmuter: Bridging the Efficiency Gap using Memory and Dataflow Reconfiguration'. Together they form a unique fingerprint.

Cite this