Parallel Version n-Dimensional Fast Fourier Transform Algorithm
Analog of the Cooley-Tukey Algorithm
M. V. Noskov and V. S. Tutatchikov
Institute of Space and Information Technology, Siberian Federal University, Kirenskogo Street 26, Krasnoyarsk, Russia
Keywords: Multi-dimensional Discrete Fourier Transform, Cooley-Tukey FFT, Parallel Algorithm.
Abstract: One-, two- and three-dimensional fast Fourier transform (FFT) algorithms has been widely used in digital
processing. Multi-dimensional discrete Fourier transform is reduced to a combination of one-dimensional
FFT for all coordinates due to the increased complexity and the large amount of computation by increasing
the dimensional of the signal. This article provides a general Cooley-Tukey algorithm analog, which
requires less complex operations of additional and multiplication than the standard method, and runs 1.5
times faster than analogue in Matlab.
1 INTRODUCTION
One-, two- and three-dimensional fast Fourier
transform (FFT) algorithms has been widely used in
digital processing (Dudgeon, 1983, Blahut, 1985).
Multi-dimensional discrete Fourier transform is
reduced to a combination of one-dimensional FFT
for all coordinates due to the increased complexity
and the large amount of computation by increasing
the dimensional of the signal. This article provides a
general Cooley-Tukey algorithm analog, which
requires less complex operations of additional and
multiplication than the standard method
(Tutatchikov, 2013). Testing of the resulting
algorithm in two- and three-dimensions in
comparison with the standard algorithm in Matlab
(Gonzalez, 2009).
2 THE ALGORITHM
DESCRIPTION
Let us have a look at the signal
f
, which is an n-
dimensional periodic signal with a period
s
2 of
over all n coordinate with values in a complex space.
The counts are given as

nxx
xxff
n
,...,
1,...,
1
,
where
nix
i
,...,1,
take values 12,...,1,0
s
.
The discrete Fourier transformation (DFT)
nyy
yyFF
n
,...,
1,...,
1
for the signal
n
xxf ,...,
1
is given in the formula:


s
nn
ss
n
yxyxi
xx
nn
e
xxfyyF
2
...2
12
0
12
0
11
11
1
,...,...,...,

(1)
where
niy
i
,...,1,
take values 12,...,0
s
.
2.1 n-Dimensional FFT
Transform the formula (1) as follows:




n
s
n
s
aa
ayayi
bb
bbaa
n
s
n
bb
s
n
bybyg
e
by
byFyyF
n
s
nn
n
nn
n
11
1
11
1
1
,...,
2
...2
...
1
0
1
0
1
0
1
0
11
1
0
1
0
1
11
1
1
1
1
2,...,2
1
......)2...,
,2(...,...,
1
1
1
1
1
1
11
1




(2)
where coordinates
1
i
y
of the final counts subsignals
1
,...,
1 n
aa
g
run
1
2
s
values,
12:0
11
s
i
x
,
114
Noskov M. and Tutatchikov V..
Parallel Version n-Dimensional Fast Fourier Transform Algorithm - Analog of the Cooley-Tukey Algorithm.
DOI: 10.5220/0005461401140117
In Proceedings of the 5th International Workshop on Image Mining. Theory and Applications (IMTA-5-2015), pages 114-117
ISBN: 978-989-758-094-9
Copyright
c
2015 SCITEPRESS (Science and Technology Publications, Lda.)
ni :1
,
1
F - FFT of source signal
f
. For
convenience, denote
fF
0
:






s
n
s
nnn
s
ss
n
s
nn
n
n
n
byaxbyaxi
xx
nn
ayayi
aa
bb
bb
n
e
axaxF
e
yyF
2
22...222
12
0
12
1
1
1
1
0
2
...2
1
0
1
0
...
1
0
1
0
1
1
111
1
11
11
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2,...,2...
1...
...,...,






(3)
Continue the same procedure for each
1
,...,
1 n
aa
g
, that
is represented signal
1
,...,
1 n
aa
g
as a sum subsignals:
2
,...,
1
,...,
11 nn
gg
aa
(4)
where coordinates of the final counts subsignals
2
,...,
1 n
g
run
2
2
s
values.
Continuing this process, we can be represented

n
yyF ,...,
1
1
as the sum of DFT signals, wherein
each of the n coordinates counts runs on only two
values, we obtain the following formula for
calculating

n
v
yyF ,...,
1
:




)2
2,...,22
(...
1...
...,...,
1
1
1
1,...,
2
)2(2
1
0
1
0
2
)2(2
...
12
0
1
0
1
0
1
1
1
1
1
1
1
1
1
1
c
bycb
yge
e
yyF
vs
n
vsv
n
vsvs
vv
aa
acyi
aa
acyi
bb
cb b
n
v
n
s
n
vsv
n
n
s
vsv
n
v
n




(5)
where
sv :1
- step number of the partition

n
xxF ,...,
1
on the subsignals.
Consider in more detail the formula (5):








s
vs
n
sv
n
vs
n
v
n
s
vssvvsv
vs
v
vs
v
n
s
n
vsv
n
n
s
vsv
n
v
n
v
n
cbycaxi
cbycaxi
vs
n
v
n
vsvv
xx
acyi
aa
acyi
bb
cb b
vs
n
vs
v
n
vsvsvv
cb b
n
v
e
e
ca
xcaxF
e
e
cb
ycbyF
yyF
2
22222
2
22222
1
1
11
1
12
0
12
2
)2(2
1
0
1
0
2
)2(2
...
12
0
1
0
1
0
1
1
11
12
0
1
0
1
0
1
111
1
1
1
1
1
11
1
1
1
1
1
1
1
1
1
1
1
...
)2
2,...,22(
......
1...
...)22
,...,22(
...,...,








(6)
where
12:0,
vsv
i
v
i
yx
,
ni :1
,
nn
s
yyFyyF ,...,,...,
11
- discrete Fourier
transformation
f
.
2.2 Parallel Algorithm FFT
Calculation
n
yyF ,...,
1
can be parallelized on
independent flows calculations. In the presence
sq
q
0,2 of flow formula (6) takes the form:

)22,...,2
2,2
22(
...,...,
11
22
2
1
11
0
12
0
12
0
1
0
1
0
1
1
1
cbyc
byt
cbyF
yyF
vs
n
vsv
n
pvs
pvsvpvs
pvspvsvv
q
pt c b b
n
v
pv
n


(7)
Consider in more detail the formula (7):
ParallelVersionn-DimensionalFastFourierTransformAlgorithm-AnalogoftheCooley-TukeyAlgorithm
115

)22,...,2
2,22
2(...
)1(...
...,...,
11
22
21
11,...,
2
))2((2
2
))2((2
2
))22((2
1
0
1
0
...
1
12
0
12
0
1
0
1
0
1
1
1
2
1
21
21
1
1
1
11
1
cbyc
bytc
byge
ee
yyF
vs
n
vsv
n
vs
vsvpvsvs
vsvv
aa
acyi
acyiatcyi
aa
bb
q
pt cb b
n
v
n
s
n
vsv
n
s
vsv
s
pvsvsv
n
n
pv
n





(8)
Subsignals
v
aa
n
g
,...,
1
may be described as follows:











s
vs
n
sv
n
vs
n
v
n
s
vssvvsv
s
pvsvssv
s
pvsvsv
pvs
v
vs
v
n
s
n
vsv
n
s
vsv
s
pvsvsv
p
n
n
v
n
cbycaxi
cbycaxi
tcby
tcaxi
pvsvs
n
v
n
vsvpvsvs
q
pt
xx
vv
acyi
acyi
atcyi
q
ptaa
bb
cb b
n
v
e
e
e
e
tcax
caxtc
axF
ee
e
yyF
2
22222
2
22222
2
222
2
2222
21
1
22
21
1
12
0
12
0
12
11
1
2
)2(2
2
)2(2
2
)22(2
1
12
0
1
0
1
0
...
12
0
1
0
1
0
1
111
1
2
1
2
1
22
21
1
1
1
21
11
1
1
1
2
1
2
1
21
1
1
1
1
1
1
...
)222...,
,22,22
2(...
...
1...
...,...,






(9)
3 THE OBTAINED RESULTS
For the algorithm testing program in the
programming language C++ has been written for
two- and three-dimensional signal. The testing was
conducted on PC with following characteristics:
Processor: AMD FX-4170 4.2 GHz;
RAM: 8 GB;
Operating system: Windows 7.
Was compared with a standard algorithm for the
discrete Fourier transform in the environment of
Matlab 7.5.0 (R2007b) in two- and three-
dimensional case. Test results are shown in seconds
in tables.
Table 1 shows a comparison runtime in seconds
of the two-dimensional FFT by analogue Cooley-
Tookey algorithm and a standard algorithm for
computing two-dimensional FFT in Matlab.
Table 2 shows a comparison runtime in seconds
of the three-dimensional FFT by analogue Cooley-
Tookey algorithm and a standard algorithm for
computing three-dimensional FFT in Matlab.
Table 3 shows a comparison runtime in seconds
of the parallel version two-dimensional FFT by
analogue Cooley-Tookey algorithm and parallel
standard algorithm for computing two-dimensional
FFT by combination one-dimensional FFT.
Table 1: Calculating 2D FFT.
Size signal
2D FFT
Matlab
2D FFT
Cooley-
Tukey
algorithm
analog
Speedup
С++
128*128 0.001 0.001 ~1
256*256 0.005 0.004 ~1
512*512 0.027 0.017 ~1.6
1024*1024 0.125 0.087 ~1.4
2048*2048 0.620 0.389 ~1.6
4096*4096 2.634 1.637 ~1.6
8192*8192 13.609 6.904 ~2
16384*16384 - 20.383
Table 2: Calculating 3D FFT.
Size signal
3D FFT
Matlab
3D FFT
Cooley-
Tukey
algorithm
analog
Speedup
С++
32*32*32 0.002 0.002 ~1.0
64*64*64 0.028 0.020 ~1.4
128*128*128 0.282 0.188 ~1.5
256*256*256 2.546 1.660 ~1.5
512*512*512 - 14.736
IMTA-52015-5thInternationalWorkshoponImageMining.TheoryandApplications
116
Table 3: Parallel calculating 2D FFT.
Size signal
Numb
er of
proces
ses
Combinati
on 1D FFT
2D FFT
Cooley-
Tukey
algorith
m analog
Speedu
p
Cooley-
Tukey
1024*1024
1 0.112 0.057 ~1.6
2 0.142 0.070 ~1.0
4 0.154 0.099 ~0.8
8 0.257 0.092 ~0.7
16 0.330 0.088 ~0.5
2048*2048
1 0.516 0.275 ~1.7
2 0.512 0.396 ~1.2
4 0.596 0.407 ~1.1
8 1.045 0.345 ~0.9
16 1.195 0.453 ~0.8
4096*4096
1 2.193 1.355 ~1.7
2 2.399 1.194 ~1.4
4 2.393 2.098 ~1.2
8 4.412 1.946 ~1.1
16 3.946 1.912 ~1.1
8192*8192
1 12.538 4.957 ~1.7
2 10.509 5.245 ~1.4
4 11.753 7.848 ~1.2
8 18.551 8.162 ~1.1
16 18.196 8.907 ~1.2
Figure 1: Example of two-dimensional signal.
4 CONCLUSIONS
The modified algorithm of the n-dimensional fast
Fourier transform by analogue of the Cooley-Tukey
algorithm requires
NN
n
n
n
2
log
2
12
complex
operations of multiplications and
NnN
n
2
log
additions , where
s
N 2
is number of counts in
the one of the coordinates (Starovoitov, 2010).
Standard algorithm requires
NnN
n
2
log complex
multiplications and
NnN
n
2
log complex
additions. The modified algorithm requires less
complex than the standard method, and runs 1.5
times faster than analogue in Matlab.
ACKNOWLEDGEMENTS
Work performed under the state order of the
Ministry of Education and Science if the Russian
Federation in the Siberian Federal University to
perform R&D in 2014 (Task No 1.1462.2014/K).
Project title: “Algebraic and analytic methods for
creating algorithms for solving differential and
polynomial systems: factorization, resolution of
singularities and the optimal lattice”
REFERENCES
Dudgeon, D. E. and Mersereau, R. M., 1983.
Multidimensional Digital Signal Processing, Prentice
Hall.
Blahut, R. E., 1985. Fast Algorithms for Digital Signal
Processing, Addison-Wesley Press.
Tutatchikov V. S., Kiselev O. I., Noskov M. V., 2013.
“Calculating the n-Dimensional Fast Fourier
Transform”, Pattern Recognition and Image Analysis,
vol. 23, no. 3, pp. 429-433.
Gonzalez, R. C., Woods, R. E., Eddins, S. L., 2009.
Digital Image Processing Using MATLAB, Gatesmark
Publishing. Knoxville.
Starovoitov, A. V., 2010. “On multidimensional analog of
Cooley-Tukey algorithm”, Reporter Siberian State
Aerospace University named after academician
M.F.Reshetnev, no. 1 (27), pp. 69-73.
ParallelVersionn-DimensionalFastFourierTransformAlgorithm-AnalogoftheCooley-TukeyAlgorithm
117