TY - GEN
T1 - Implicit Data-Parallelism in Kahn Process Networks: Bridging the MacQueen Gap
AU - Khasanov, Robert
AU - Goens, Andrés
AU - Castrillon, Jeronimo
PY - 2018/1/23
Y1 - 2018/1/23
N2 - Modern embedded systems are rapidly increasing their complexity, both in terms of numbers of cores, as well as heterogeneity. To generate efficient code for these systems, it is common to leverage formal models of computation. Among these, the dataflow model of Kahn Process Networks (KPN) is widespread because it is expressive but guarantees a deterministic execution. However, the KPN model is ill-suited to expose data-level parallelism, since this has to be made explicit in the process network. This is aggravated by the fact that its most common execution model, Kahn-MacQueen, poses restrictive conditions on the scheduling of data-parallel processes, leading to an inefficient execution. In this paper we present a novel extension to the KPN model and a relaxed execution strategy that addresses this problem, while keeping the deterministic KPN semantics. It improves run-time adaptivity in malleable way and provides implicit parallelism. We evaluate our approach on two architectures, improving the performance of a benchmark by up to 25.6% on an Intel chip with hyper-threading, and by up to 78.0 % on a heterogeneous embedded ARM big.LITTLE architecture.
AB - Modern embedded systems are rapidly increasing their complexity, both in terms of numbers of cores, as well as heterogeneity. To generate efficient code for these systems, it is common to leverage formal models of computation. Among these, the dataflow model of Kahn Process Networks (KPN) is widespread because it is expressive but guarantees a deterministic execution. However, the KPN model is ill-suited to expose data-level parallelism, since this has to be made explicit in the process network. This is aggravated by the fact that its most common execution model, Kahn-MacQueen, poses restrictive conditions on the scheduling of data-parallel processes, leading to an inefficient execution. In this paper we present a novel extension to the KPN model and a relaxed execution strategy that addresses this problem, while keeping the deterministic KPN semantics. It improves run-time adaptivity in malleable way and provides implicit parallelism. We evaluate our approach on two architectures, improving the performance of a benchmark by up to 25.6% on an Intel chip with hyper-threading, and by up to 78.0 % on a heterogeneous embedded ARM big.LITTLE architecture.
KW - MPSoC
KW - Streaming applications
KW - adaptivity
KW - process networks
KW - heterogeneous
U2 - 10.1145/3183767.3183790
DO - 10.1145/3183767.3183790
M3 - Conference contribution
SN - 9781450364447
T3 - PARMA-DITAM '18
SP - 20
EP - 25
BT - Proceedings of the 9th Workshop and 7th Workshop on Parallel Programming and RunTime Management Techniques for Manycore Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms
PB - ACM Association for Computing Machinery
CY - New York, NY, USA
T2 - 9th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures & 7th Workshop on Design Tools and Architectures For Multicore Embedded Computing Platforms
Y2 - 23 January 2018 through 23 January 2018
ER -