Figure 5: Processing time for different implementations of
rigid registration in relation to size of Fourier transformed
volume.
Additionally, Figure 5 shows the processing time
not for the input size, but for the volume size actually
being Fourier transformed, which differs between the
different methods due to different padding. The well-
aligned CUDA implementation is the fastest in this
comparison as well.
Table 1: Processing time in seconds for different sizes of
the search volume.
Volume size (voxels) 50
3
100
3
150
3
Matlab 273 1354 3577
Matlab JACKET 26 167 513
Colores 1 46 2451 2757
Colores 2 26 623 815
CUDA 1 13 47 156
CUDA 2 4.5 12 65
6 DISCUSSION
We describe an efficient implementation of 3D rigid
template registration based on the normalized cross-
correlation measure using CUDA. This has not, to the
best of our knowledge, been implemented on GPU be-
fore. We compare our implementation with the Col-
ores software and two Matlab implementations, one
of which is using the GPU accelerated JACKET li-
brary. We show the processing times when the same
size of input volume is used as well as when the same
size of the volume actually being Fourier transformed
in the different methods is used. The CUDA im-
plementation with well-aligned padding is the fastest
method in the comparison, with a factor of 10 for
large volumes compared to the Matlab JACKET and
the Colores implementation.
Implementing rigid registration for MET images
on GPU gives a substantial performance increase,
providing a base for advanced interactive image ex-
ploration and analysis of this kind of data in real-time.
The actual registration method requires no input from
the user. In the planned scenarios the user interacts
with the volumes by choosing templates and marks
regions where the registration should be performed.
The rigid search is performed, and within seconds it
will be possible to explore the 3D visualization of the
registration parameter space, the so called score vol-
ume. This is an important step for the interactiveanal-
ysis we have in mind for these volumes. Although the
method and implementation will only be able to han-
dle marked subvolumes due to memory constraints, it
will be possible to subdivide larger volumes and pro-
cess them sequentially, or add another layer of paral-
lelisation using multiple GPU:s.
REFERENCES
Berman, H., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.,
Weissig, H., Shindyalov, I., and Bournen, P. (2000).
The protein data bank. Nucleic Acids Research,
28:235 – 242.
Chac´on, P. and Wriggers, W. (2002). Multi-resolution
contour-based fitting of macromolecular structures.
Journal of molecular biology, 317(3):375–384.
Eaton, D. (2011). www.cs.ubc.ca/∼deaton/. Accessed on
Oct. 18, 2011.
Harris, L. J., Larson, S. B., Hasel, K. W., and McPherson,
A. (1997). Refined structure of an intact IgG2a mon-
oclonal antibody. Biochemistry, 36(7):1581–1597.
Lewis, J. P. (1995). Fast normalized cross-correlation.
www.scribblethink.org/Work/nvisionInterface/nip.pdf.
Pittet, J.-J., Henn, C., Engel, A., and Heymann, J. B. (1999).
Visualizing 3D data obtained from microscopy on the
internet. Journal of Structural Biology, 125:123–132.
Sandin, S.,
¨
Ofverstedt, L.-G., Wikstr¨om, A.-C., Wrange, O.,
and Skoglund, U. (2004). Structure and flexibility of
individual immunoglobulin G molecules in solution.
Structure, 12:409–415.
Svensson, L., Brun, A., Nystr¨om, I., and Sintorn, I.-M.
(2011). Registration parameter spaces for molecu-
lar electron tomography images. In Image Analysis
and Processing – ICIAP 2011, volume 6978 of LNCS,
pages 403–412.
Weber, R., Gothandaraman, A., Hinde, R., and Peterson, G.
(2011). Comparing hardware accelerators in scientific
applications: A case study. Parallel and Distributed
Systems, IEEE Transactions on, 22(1):58 –68.
Wriggers, W. (2010). Using situs for the integration
of multi-resolution structures. Biophysical Reviews,
2:21–27.
VISAPP 2012 - International Conference on Computer Vision Theory and Applications
422