Every year the amount of genetic data increases greatly, creating the need for the tool capable of analysing large data sets in a fast and efficient manner. One such software package, providing a wide range of functionality required in whole-genome association studies is PLINK. Although, it does not limit the size of the data sets, the time needed to process them is often a bottleneck. This master project was focused on improving the performance of two functionality options: epistasis analysis and haplotype blocks estimation. It has been determined that the g++ compiler and –O2flag provide the optimal performance for both options. The epistasis analysis has been parallelised using OpenMP. The parallel for schedule directive has been used and dynamic schedule with the chunk size of the size 128 provided the best scaling. When executed on 12 threads the epistasis analysis was 10.5 times faster than when executed on 1 thread. Haplotype blocks option has been serially optimised. Introduced optimisations improved the execution time by about 30%.
|Type||Co-supervised MSc project|
|Media of output||Thesis|
|Publisher||University of Edinburgh|
|Publication status||Published - 2 Sep 2013|