Projects per year
Abstract
Bootstrapping is a popular and computationally demanding resampling method used for measuring the accuracy of sample estimates and assisting with statistical inference. R is a freely available language and environment for statistical computing popular with biostatisticians for genomic data analyses. A survey of such R users highlighted its implementation of bootstrapping as a prime candidate for parallelization to overcome computational bottlenecks. The Simple Parallel R Interface (SPRINT) is a package that allows R users to exploit high performance computing in multicore desktops and supercomputers without expert knowledge of such systems. This paper describes the parallelization of bootstrapping for inclusion in the SPRINT R package. Depending on the complexity of the bootstrap statistic and the number of resamples, this implementation has close to optimal speed up on up to 16 nodes of a supercomputer and close to 100 on 512 nodes. This performance in a multinode setting compares favourably with an existing parallelization option in the native R implementation of bootstrapping.
Original language  English 

Publisher  ArXiv 
Publication status  Published  24 Jan 2014 
Keywords
 stat.CO
 J.2; J.3; D.1.3
Fingerprint
Dive into the research topics of 'Parallel Optimisation of Bootstrapping in R'. Together they form a unique fingerprint.Projects
 2 Finished

The SPRINT approach to network biology
Ghazal, P., Sloan, T., Cebamanos, L., Forster, T., Mitchell, L., Robertson, K. & Troup, E.
1/10/12 → 30/09/14
Project: Research

Bootstrapping and support vector machines with R and SPRINT
Sloan, T., Mewissen, M., Forster, T., Piotrowski, M. & Cebamanos, L.
1/01/12 → 30/06/12
Project: Awarded Facility Time