DelayRepay: Delayed Execution for Kernel Fusion in Python

John Magnus Morton, Kuba Kaszyk, Lu Li, Jiawen Sun, Christophe Dubach, Michel Steuwer, Murray Cole, Michael F P O'Boyle

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Python is a popular, dynamic language for data science and scientific computing. To ensure efficiency, significant numerical libraries are implemented in static native languages. However, performance suffers when switching between native and non-native code, especially if data has to be converted between native arrays and Python data structures. As GPU accelerators are increasingly used, this problem becomes particularly acute. Data and control has to be repeatedly transferred between the accelerator and the host.

In this paper, we present DelayRepay, a delayed execution framework for numeric Python programs. It avoids excessive switching and data transfer by using lazy evaluation and kernel fusion. Using DelayRepay, operations on NumPy arrays are executed lazily, allowing multiple calls to accelerator kernels to be fused together dynamically. DelayRepay is available as a drop-in replacement for existing Python libraries. This approach enables significant performance improvement over the state-of-the-art and is invisible to the application programmer. We show that our approach provides a maximum 377× speedup over NumPy - a 409% increase over the state of the art.
Original languageEnglish
Title of host publicationProceedings of the 16th ACM SIGPLAN International Symposiumon Dynamic Languages (DLS ’20),
PublisherACM Association for Computing Machinery
Number of pages14
Publication statusAccepted/In press - 21 Aug 2020
Event16th ACM SIGPLAN International Symposiumon Dynamic Languages - Chicago, United States
Duration: 17 Nov 202017 Nov 2020


Conference16th ACM SIGPLAN International Symposiumon Dynamic Languages
Abbreviated titleDLS 2020
Country/TerritoryUnited States
Internet address


  • delayed evaluation
  • code fusion
  • dynamic compilation
  • GPU


Dive into the research topics of 'DelayRepay: Delayed Execution for Kernel Fusion in Python'. Together they form a unique fingerprint.

Cite this