1887
banner image
No data available.
Please log in to see this content.
You have no subscription access to this content.
No metrics data to plot.
The attempt to load metrics for this article has failed.
The attempt to plot a graph for these metrics has failed.
Fully 3D list-mode time-of-flight PET image reconstruction on GPUs using CUDA
Rent:
Rent this article for
USD
10.1118/1.3661998
/content/aapm/journal/medphys/38/12/10.1118/1.3661998
http://aip.metastore.ingenta.com/content/aapm/journal/medphys/38/12/10.1118/1.3661998

Figures

Image of FIG. 1.
FIG. 1.

Illustration of the caching mechanism of the proposed GPU-CUDA method.

Image of FIG. 2.
FIG. 2.

By slicing the image volume orthogonal to the predominant TOR direction, the area of the intersection between the TOR and the slice is bounded. Here, a y direction TOR and x-z slice are shown as an example.

Image of FIG. 3.
FIG. 3.

Number of collisions per voxel for backprojecting a random set of 1 million lines. White means no collision, gray means one collision, and black means two collisions.

Image of FIG. 4.
FIG. 4.

With the ToF information, fewer LOR-slice pairs need to be processed. Four LORs are shown intersecting the current slice. The dots on each line denote the ToF center, and the bell shapes denote the ToF kernels. Only LORs 2 and 3 contribute to the current slice significantly.

Image of FIG. 5.
FIG. 5.

Cumulative contributions of different optimization strategies to the overall speedup of the GPU-CUDA method compared to the CPU-based code. Simple GPU implementation refers to the method that directly maps the computation to the GPU hardware without using the subsequent optimizations listed in this figure.

Image of FIG. 6.
FIG. 6.

Number of randomly-generated LORs that can be processed per second, as a function of the number of thread blocks for the GPU-CUDA method. Due to hardware limitations, block size of GTX 285 cannot be set to 1024.

Image of FIG. 7.
FIG. 7.

Hot rod phantom (a) acquired on a preclinical PET scanner and reconstructed with 2 iterations and 40 subsets of list-mode OSEM, using (c) CPU method and (d) GPU-CUDA method. (b) The normalization map. Profiles of (c) and (d) through the centers of the hot rods, depicted in (a), are shown in (e). Contrast between the two ROIs in (a) and noise as functions of the number of iterations are shown in (f). The method for computing contrast and noise are explained in Sec. ???. The processing time for the GPU and the CPU is 7.0 s and 23 min, respectively.

Image of FIG. 8.
FIG. 8.

Mouse PET scan (maximum intensity projection), reconstructed with three iterations and five subsets of list-mode OSEM using the CPU method (a) and the GPU-CUDA method (b). The processing time for the GPU and the CPU is 8.0 s and 28 min, respectively.

Image of FIG. 9.
FIG. 9.

Transaxial image taken through slice 30 of the liver of the reconstructed patient data from a Philips Gemini TF PET/CT scanner. A CT image at the same slice location is shown in (e) with a soft tissue window and inverse gray scale to provide an anatomical frame of reference. The (cropped) normalization image is shown in (f). The lesion is visualized with higher contrast for the ToF data. For non-ToF, the lesion contrast for the CPU and GPU methods are 2.6 and 2.7, respectively. For ToF, the values are 3.0 and 3.1, respectively. The processing time for the GPU and the CPU is 7.7 s and 42 min, respectively.

Tables

Generic image for table

Generic image for table

Generic image for table
TABLE I.

Execution time (ms) for processing varying numbers of randomly-generated LORs for the GPU-CUDA method.

Generic image for table
TABLE II.

Effect of using fast math for one iteration of 1 million random ToF LORs in a 75 × 75 × 26 image.

Generic image for table
TABLE III.

Execution time for processing 1 million random events in an image matrix of L × L × L with TOR width T w increasing simultaneously with L.

Generic image for table
TABLE IV.

Execution time for processing 1 million random events in an image matrix of L × L × L with fixed TOR width 3 × 3.

Generic image for table
TABLE V.

Execution time for processing 1 million random LORs in an image matrix of 75 × 75 × 26 for different TOR width T w . is the maximum number of voxels in a TOR-slice intersection.

Loading

Article metrics loading...

/content/aapm/journal/medphys/38/12/10.1118/1.3661998
2011-12-01
2014-04-16
Loading

Full text loading...

This is a required field
Please enter a valid email address
752b84549af89a08dbdd7fdb8b9568b5 journal.articlezxybnytfddd
Scitation: Fully 3D list-mode time-of-flight PET image reconstruction on GPUs using CUDA
http://aip.metastore.ingenta.com/content/aapm/journal/medphys/38/12/10.1118/1.3661998
10.1118/1.3661998
SEARCH_EXPAND_ITEM