Abstract
Lattice Boltzmann Methods (LBM) are used for the computational simulation of Newtonian fluid dynamics. LBM-based simulations are readily parallelizable; they have been implemented on general-purpose processors [1][2][3], field-programmable gate arrays (FPGAs) [4], and graphics processing units (GPUs) [5][6][7]. Of the three methods, the GPU implementations achieved the highest simulation performance per chip. With memory bandwidth of up to 141 GB/s and a theoretical maximum floating point performance of over 600 GFLOPS [8], CUDA-ready GPUs from NVIDIA provide an attractive platform for a wide range of scientific simulations, including LBM. This paper improves upon prior single-precision GPU LBM results for the D3Q19 model [7] by increasing GPU multiprocessor occupancy, resulting in an increase in maximum performance by 20%, and by introducing a space-efficient storage method which reduces GPU RAM requirements by 50% at a slight detriment to performance. Both GPU implementations are over 28 times faster than a singleprecision quad-core CPU version utilizing OpenMP.
Original language | English |
---|---|
Title of host publication | Proceedings of The 38th International Conference on Parallel Processing |
Subtitle of host publication | ICPP-2009 |
Place of Publication | USA |
Publisher | IEEE, Institute of Electrical and Electronics Engineers |
Pages | 550-557 |
Number of pages | 8 |
ISBN (Print) | 9780769538020 |
DOIs | |
Publication status | Published - 1 Dec 2009 |
Externally published | Yes |
Event | International Conference on Parallel Processing 2009 - Vienna University of Technology, Vienna, Austria Duration: 22 Sept 2009 → 25 Sept 2009 Conference number: 38th |
Conference
Conference | International Conference on Parallel Processing 2009 |
---|---|
Abbreviated title | ICPP 2009 |
Country/Territory | Austria |
City | Vienna |
Period | 22/09/09 → 25/09/09 |