Project Details
Description
This Distributed Computational Science and Engineering (dCSE) project optimised SPRINT for use on the HECToR, the UK's national supercomputing service. SPRINT is an add-on package for the R language and environment for statistical computing and graphics. SPRINT (Simple Parallel R INTerface) offers both a parallel functions library and an interface for adding parallel functions to R.
Key findings
- An installation guide on how to compile SPRINT on HECToR.
-The performance of the parallel correlation function (pcor) now scales for up to 512 processes. Originally, all results were gathered on and written by the master process. By using the underlying high performance Lustre filesystem the results are now distributed among all processes and written into the file with MPI-I/O.
- The permutation testing function (mt.maxT) was parallelised to give pmaxT. The parallelism is introduced by dividing the permutation count equally to the available processes. Each process gathers a few of the observations and at the end all partial observations are reduced on the master process. Using this information the p-values are computed.
- Based on the benchmarks performed on the HECToR XT4 system, both functions are now able to scale close to optimal for process counts up to 512. Statisticians can now use the parallel versions of these functions to process their large data sets and also get results within reasonable run times.
- The work performed under this dCSE project was presented at HPDC 2010 and useR! 2010.
-The performance of the parallel correlation function (pcor) now scales for up to 512 processes. Originally, all results were gathered on and written by the master process. By using the underlying high performance Lustre filesystem the results are now distributed among all processes and written into the file with MPI-I/O.
- The permutation testing function (mt.maxT) was parallelised to give pmaxT. The parallelism is introduced by dividing the permutation count equally to the available processes. Each process gathers a few of the observations and at the end all partial observations are reduced on the master process. Using this information the p-values are computed.
- Based on the benchmarks performed on the HECToR XT4 system, both functions are now able to scale close to optimal for process counts up to 512. Statisticians can now use the parallel versions of these functions to process their large data sets and also get results within reasonable run times.
- The work performed under this dCSE project was presented at HPDC 2010 and useR! 2010.
| Status | Finished |
|---|---|
| Effective start/end date | 1/10/09 → 31/03/10 |
| Links | http://www.hector.ac.uk/cse/distributedcse/reports/sprint/ |
Fingerprint
Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.
-
Optimization of a parallel permutation testing function for the SPRINT R package
Petrou, S., Sloan, T., Mewissen, M., Forster, T., Piotrowski, M., Dobrzelecki, B., Ghazal, P., Trew, A. & Hill, J., 10 Dec 2011, In: Concurrency and Computation: Practice and Experience. 23, 17, p. 2258-2268 11 p.Research output: Contribution to journal › Article › peer-review
Open AccessFile -
SPRINT: Parallel computing with R on HECToR: HECToR Training Course presented at NAG, Oxford, 1st Dec 2011
Piotrowski, M., Mewissen, M., Sloan, T., Forster, T., Mitchell, L. & Ghazal, P., 1 Dec 2011Research output: Other contribution
-
SPRINT: data analysis in minutes, not days
Sloan, T., Oct 2010, 1 p. Edinburgh : EPCC, University of Edinburgh.Research output: Other contribution