Parallel Optimisation of Bootstrapping in R

T. M. Sloan, M. Piotrowski, T. Forster, P. Ghazal

Research output: Working paper

Abstract

Bootstrapping is a popular and computationally demanding resampling method used for measuring the accuracy of sample estimates and assisting with statistical inference. R is a freely available language and environment for statistical computing popular with biostatisticians for genomic data analyses. A survey of such R users highlighted its implementation of bootstrapping as a prime candidate for parallelization to overcome computational bottlenecks. The Simple Parallel R Interface (SPRINT) is a package that allows R users to exploit high performance computing in multi-core desktops and supercomputers without expert knowledge of such systems. This paper describes the parallelization of bootstrapping for inclusion in the SPRINT R package. Depending on the complexity of the bootstrap statistic and the number of resamples, this implementation has close to optimal speed up on up to 16 nodes of a supercomputer and close to 100 on 512 nodes. This performance in a multi-node setting compares favourably with an existing parallelization option in the native R implementation of bootstrapping.
Original languageEnglish
PublisherArXiv
Publication statusPublished - 24 Jan 2014

Keywords / Materials (for Non-textual outputs)

  • stat.CO
  • J.2; J.3; D.1.3

Fingerprint

Dive into the research topics of 'Parallel Optimisation of Bootstrapping in R'. Together they form a unique fingerprint.
  • The SPRINT approach to network biology

    Ghazal, P. (Principal Investigator), Sloan, T. (Co-investigator), Cebamanos, L. (Researcher), Forster, T. (Researcher), Mitchell, L. (Researcher), Robertson, K. (Researcher) & Troup, E. (Researcher)

    BBSRC

    1/10/1230/09/14

    Project: Research

  • Bootstrapping and support vector machines with R and SPRINT

    Sloan, T. (Principal Investigator), Mewissen, M. (Co-investigator), Forster, T. (Co-investigator), Piotrowski, M. (Researcher) & Cebamanos, L. (Researcher)

    1/01/1230/06/12

    Project: Awarded Facility Time

Cite this