Micro-core architectures combine many low memory, low power computing cores together in a single package. These are attractive for use as accelerators but due to limited on-chip memory and multiple levels of memory hierarchy, the way in which programmers offload kernels needs to be carefully considered. In this paper we use Python as a vehicle for exploring the semantics and abstractions of higher level programming languages to support the offloading of computational kernels to these devices. By moving to a pass by reference model, along with leveraging memory kinds, we demonstrate the ability to easily and efficiently take advantage of multiple levels in the memory hierarchy, even ones that are not directly accessible to the micro-cores. Using a machine learning benchmark, we perform experiments on both Epiphany-III and MicroBlaze based micro-cores, demonstrating the ability to compute with data sets of arbitrarily large size. To provide context of our results, we explore the performance and power efficiency of these technologies, demonstrating that whilst these two micro-core technologies are competitive within their own embedded class of hardware, there is still a way to go to reach HPC class GPUs.
- Parallel programming languages
- Runtime environments
- Hardware accelerators
- Neural networks