Abstract / Description of output
Recently, general-purpose computation on graphics processing units (GPGPU) has become an increasingly
popular field of study as graphics processing units (GPUs)
continue to be proposed as high performance and relatively
low cost implementation platforms for scientific computing
applications. Among these applications figure astrophysical
N-body simulations, which form one of the most challenging problems in computational science. However, in most
reported studies, a simple O(N2) algorithm was used for
GPGPUs, and the resulting performances were not observed to be better than those of conventional CPUs that were
based on more optimized O(N log N) algorithms such as
the tree algorithm or the particle-particle particle-mesh algorithm. Because of the difficulty in getting efficient implementations of such algorithms on GPUs, a GPU cluster had no practical advantage over general-purpose PC
clusters for N-body simulations. In this paper, we report
a new method for efficient parallel implementation of the
tree algorithm on GPUs. Our novel tree code allows the
realization of an N-body simulation on a GPU cluster at
a much higher performance than that on general PC clusters.
We practically performed a cosmological simulation with
562 million particles on a GPU cluster using 128 NVIDIA
GeForce 8800GTS GPUs at an overall cost of 168 172 $.
We obtained a sustained performance of 20.1 Tflops, which
when normalized against a general-purpose CPU implementation leads to a performance of 8.50 Tflops. The achieved
cost/performance was hence a mere $19.8 /Gflops which
shows the high competitiveness of GPGPUs.
popular field of study as graphics processing units (GPUs)
continue to be proposed as high performance and relatively
low cost implementation platforms for scientific computing
applications. Among these applications figure astrophysical
N-body simulations, which form one of the most challenging problems in computational science. However, in most
reported studies, a simple O(N2) algorithm was used for
GPGPUs, and the resulting performances were not observed to be better than those of conventional CPUs that were
based on more optimized O(N log N) algorithms such as
the tree algorithm or the particle-particle particle-mesh algorithm. Because of the difficulty in getting efficient implementations of such algorithms on GPUs, a GPU cluster had no practical advantage over general-purpose PC
clusters for N-body simulations. In this paper, we report
a new method for efficient parallel implementation of the
tree algorithm on GPUs. Our novel tree code allows the
realization of an N-body simulation on a GPU cluster at
a much higher performance than that on general PC clusters.
We practically performed a cosmological simulation with
562 million particles on a GPU cluster using 128 NVIDIA
GeForce 8800GTS GPUs at an overall cost of 168 172 $.
We obtained a sustained performance of 20.1 Tflops, which
when normalized against a general-purpose CPU implementation leads to a performance of 8.50 Tflops. The achieved
cost/performance was hence a mere $19.8 /Gflops which
shows the high competitiveness of GPGPUs.
Original language | English |
---|---|
Article number | Volume 24, Numbers 1-2 |
Pages (from-to) | 21 |
Number of pages | 31 |
Journal | Computer Science - Research and Development |
Volume | 24 |
Issue number | 12 |
Early online date | 20 May 2009 |
Publication status | Published - 20 May 2009 |
Keywords / Materials (for Non-textual outputs)
- GPU · N-body simulation · Tree algorithm