High-performance tomographic reconstruction using graphics processing units

Ya I. Nesterets, T. E. Gureyev

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

Abstract

Computed Tomography (CT) has become a routine tool for three-dimensional (3D) visualization of the internal structure of objects which are opaque to visible light. X-ray CT is widely used in materials science, biomedical applications and elsewhere. Different CT reconstruction algorithms have been developed including the well-known Filtered-Back-Projection (FBP) algorithm in the parallel beam geometry and the Feldkamp-Davis-Kress (FDK) algorithm applicable to cone-beam geometry with a circular trajectory of a small X-ray source and a flat two-dimensional (2D) detector. Recent progress in detector technology resulted in the availability of commercial 2D Charge-Coupled-Devices (CCDs) with linear dimensions of the order of 4k pixels or more. The amount of memory required for storing a 3D volume of floating-point data of such linear size is 256GB or more, which significantly exceeds the typical amount of RAM found not only in high-end desktop computers but also in small computer clusters. The most computationally intensive step in the FBP and FDK CT reconstruction algorithms is the so-called back-projection operation that takes up to 99% of the total reconstruction time in a typical CPU-based implementation. The 3D computer graphics capabilities of general-purpose graphics processing units (GPGPUs) utilized e.g. via OpenGL or DirectX have been used for the back-projection operation for the last ten to fifteen years. With the recent increase in the size of reconstructed volumes as mentioned above, standard approaches which usually store the whole reconstructed volume in GPU memory have become problematic. Different algorithmic approaches are required for the effective use of GPUs for CT reconstruction of large data volumes. We have developed new CPU-based and GPU-based implementations of the FBP and FDK algorithms for X-ray CT. These implementations take into account the following principles: • Use as little RAM and/or GPU memory for the reconstruction of each axial slice of the object as possible. E.g., for the GPU-based back-projection code, memory for only a single reconstructed slice is allocated on the GPU; this allows the reconstruction of volumes with linear dimension of up to 16k using top-end GPUs (with more than 1GB memory onboard); • Reconstruction of each axial slice should be as independent from the others as possible. This allows for parallel reconstruction of different slices using CPU/GPU multithreading capabilities. As a result, the total reconstruction time reduces with the number of CPU cores and/or GPUs. According to our tests, the GPU-based implementations of the back-projection operation result in up to two orders of magnitude speed-up of the back-projection itself and more than an order of magnitude speed-up of the total CT reconstruction compared to the corresponding results for a single CPU core.

Original languageEnglish
Title of host publication18th World IMACS Congress and MODSIM09 International Congress on Modelling and Simulation: Interfacing Modelling and Simulation with Mathematical and Computational Sciences, Proceedings
Pages1045-1051
Number of pages7
Publication statusPublished - 2009
Externally publishedYes
EventInternational Congress on Modelling and Simulation 2009: Interfacing Modelling and Simulation with Mathematical and Computational Sciences - Cairns, Australia
Duration: 13 Jul 200917 Jul 2009
Conference number: 18th
https://www.mssanz.org.au/modsim09/

Conference

ConferenceInternational Congress on Modelling and Simulation 2009
Abbreviated titleMODSIM 2009
Country/TerritoryAustralia
CityCairns
Period13/07/0917/07/09
Internet address

Keywords

  • Computed Tomography (CT)
  • Graphics Processing Unit (GPU)
  • High-Performance Computing (HPC)

Cite this