Abstract
The simplicity of Python and its rich set of libraries has made it the most popular language for data science. Moreover, the interpreted nature of Python offers an easy debugging experience for the developers. However, it comes with the price of poor performance compared to the compiled code. In this paper, we adopt and extend state-of-the-art research in query compilers to propose an efficient query engine embedded in Python. Our open-sourced framework enables the developers to do the debugging in Python, while being able to easily build a compiled version of the code for deployment. Our benchmark results on the entire set of TPC-H queries show that our approach covers different types of relational workloads and is competitive with state-of-the-art in-memory engines in both single-and multi-threaded settings.
Original language | English |
---|---|
Title of host publication | Proceedings of the 32nd ACM SIGPLAN International Conference on Compiler Construction |
Editors | Clark Verbrugge, Ondrej Lhotak, Xipeng Shen |
Publisher | ACM |
Pages | 180-190 |
Number of pages | 11 |
ISBN (Print) | 9798400700880 |
DOIs | |
Publication status | Published - 17 Feb 2023 |
Event | 32nd ACM SIGPLAN International Conference on Compiler Construction - Montreal, Canada Duration: 25 Feb 2023 → 26 Feb 2023 |
Conference
Conference | 32nd ACM SIGPLAN International Conference on Compiler Construction |
---|---|
Abbreviated title | CC 2023 |
Country/Territory | Canada |
City | Montreal |
Period | 25/02/23 → 26/02/23 |
Keywords / Materials (for Non-textual outputs)
- data science
- Python
- query compilation