Parallel skeletons are a structured parallel programming abstraction that provide programmers with a predefined set of algorithmic templates that can be combined, nested and parameterized with sequential code to produce complex programs. The implementation of these skeletons is currently a manual process, requiring human expertise to choose suitable implementation parameters that provide good performance. This paper presents an empirical exploration of the optimization space of the FastFlow parallel skeleton framework. We performed this using a Monte Carlo search of a random subset of the space, for a representative set of platforms and programs. The results show that the space is program and platform dependent, non-linear, and that automatic search achieves a significant average speedup in program execution time of 1.6× over a human expert. An exploratory data analysis of the results shows a linear dependence between two of the parameters, and that another two parameters have little effect on performance. These properties are then used to reduce the size of the space by a factor of 6, reducing the cost of the search. This provides a starting point for automatically optimizing parallel skeleton programs without the need for human expertise, and with a large improvement in execution time compared to that achievable using human expert tuning.
Read More: http://www.worldscientific.com/doi/abs/10.1142/S0129626412400051