High-Performance Generalized Tensor Operations: A Compiler-Oriented Approach

Roman Gareev, Tobias Grosser, Michael Kruse

Research output: Contribution to journalArticlepeer-review


The efficiency of tensor contraction is of great importance. Compilers cannot optimize it well enough to come close to the performance of expert-tuned implementations. All existing approaches that provide competitive performance require optimized external code. We introduce a compiler optimization that reaches the performance of optimized BLAS libraries without the need for an external implementation or automatic tuning. Our approach provides competitive performance across hardware architectures and can be generalized to deliver the same benefits for algebraic path problems. By making fast linear algebra kernels available to everyone, we expect productivity increases when optimized libraries are not available.
Original languageEnglish
Article number34
Pages (from-to)34:1-34:27
Number of pages27
JournalACM Transactions on Architecture and Code Optimization
Issue number3
Publication statusPublished - 4 Sep 2018


  • Tensor contractions
  • high-performance computing
  • matrix-matrix multiplication


Dive into the research topics of 'High-Performance Generalized Tensor Operations: A Compiler-Oriented Approach'. Together they form a unique fingerprint.

Cite this