LU Factorisation on Xeon and Xeon Phi Processors

William Jackson, Mateusz Iwo Dubaniowski

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper outlines the parallelisation and vectorisation methods we have used to port a LU decomposition library to the Xeon Phi co-processor. We ported a LU factorisation algorithm, which utilizes the Gaussian elimination method to perform the decomposition, using Intel LEO directives, OpenMP 4.0 directives, Intel's Cilk array notation, and vectorisation directives. We compare the performance achieved with these different methods, investigate the cost of data transfer on the overall time to solution, and analyse the impact of these optimization and parallelisation techniques on code running on the host processors as well. The results show that performance can be improved on the Xeon Phi by optimising the memory operations, and that Cilk array notation can benefit this benchmark on standard processors but do not have the same impact on the Xeon Phi co-processor. We have also demonstrated cases where the Xeon Phi will compute our implementations faster than we can run them on a node of a HPC system, and that our implementations are not as efficient as the LU factorisation implemented in the mkl library.
Original languageEnglish
Title of host publicationParallel Computing: On the Road to Exascale
EditorsGerhard R. Joubert, Hugh Leather, Mark Parsons, Frans Peters, Mark Sawyer
Pages591 - 599
Volume27
Edition2016
ISBN (Electronic)978-1-61499-621-7
DOIs
Publication statusPublished - 31 Mar 2016

Publication series

NameAdvances in Parallel Computing
PublisherIOS Press Ebooks
Volume27

Fingerprint Dive into the research topics of 'LU Factorisation on Xeon and Xeon Phi Processors'. Together they form a unique fingerprint.

Cite this