Parallel skeletons are a structured parallel programming abstrac- tion that provide programmers with a predefined set of algorithmic templates that can be combined, nested and parametrized with se- quential code to produce complex programs. The implementation of these skeletons is currently a manual process, requiring human expertise to choose suitable implementation parameters that pro- vide good performance. This paper presents an empirical explo- ration of the optimization space of the FastFlow parallel skeleton framework. We performed this using a Monte Carlo search of a ran- dom subset of the space, for a representative set of platforms and programs. The results show that the space is program and platform dependent, non-linear, and that automatic search achieves a signif- icant average speedup in program execution time of 1.6× over a human expert. An exploratory data analysis of the results shows a linear dependence between two of the parameters, and that another two parameters have little effect on performance. These properties are then used to reduce the size of the space by a factor of 6, re- ducing the cost of the search. This provides a starting point for au- tomatically optimizing parallel skeleton programs without the need for human expertise, and with a large improvement in execution time compared to that achievable using human expert tuning.
|Title of host publication||Proceedings of High-Level Programming for Heterogeneous and Hierarchical Parallel Systems|
|Number of pages||7|
|Publication status||Published - 1 Jan 2012|