Optimising PLINK: MSc in High Performance Computing

Weronika Filinger, Alan Gray (Editor), Mairead Bermingham (Editor)

Research output: Other contribution

Abstract

Every year the amount of genetic data increases greatly, creating the need for the tool capable of analysing large data sets in a fast and efficient manner. One such software package, providing a wide range of functionality required in whole-genome association studies is PLINK. Although, it does not limit the size of the data sets, the time needed to process them is often a bottleneck. This master project was focused on improving the performance of two functionality options: epistasis analysis and haplotype blocks estimation. It has been determined that the g++ compiler and –O2flag provide the optimal performance for both options. The epistasis analysis has been parallelised using OpenMP. The parallel for schedule directive has been used and dynamic schedule with the chunk size of the size 128 provided the best scaling. When executed on 12 threads the epistasis analysis was 10.5 times faster than when executed on 1 thread. Haplotype blocks option has been serially optimised. Introduced optimisations improved the execution time by about 30%.

Original languageEnglish
TypeCo-supervised MSc project
Media of outputThesis
PublisherUniversity of Edinburgh
Publication statusPublished - 2 Sep 2013

Fingerprint

Dive into the research topics of 'Optimising PLINK: MSc in High Performance Computing'. Together they form a unique fingerprint.

Cite this