this section we present the most common claims and
try to centre the balance.
3.1 GPUs do not Cost as Much as
CELL
This first claim rises from one of the few papers
available that draw a direct comparison between the
two architectures (Baker, Gokhale, & Tripp, 2007).
In this work it is shown how GPU has a lower price
and an higher Speedup/$K rate. This could may
appear obvious reading the paper, but some things
need to be pointed out.
First of all, by looking to raw data the difference
in performance and costs is not so huge because
although CELL price is three times nVidia 7900
GTX price, it is also three times faster. In fact the
Speedup/$K rate is almost the same: the difference
is 0.34. The important thing is that in this benchmark
a single graphic card vs a CELL blade system which
mounts two processors is used. In order to make
results comparable, only one Blade’s processor is
used. In this way we have the cost of a blade but half
the power it could provide. Using the single CELL
of a PS3 we discover that a single video card has the
same price of a Playstation, which does not only
includes the CELL processor. This small difference
in terms of price is more evident if we compare the
nVidia Deskside Tesla (sold at 7500$) and CELL
QS21 Blade (almost 8000$). What Baker, Gokhale
& Tripp (Baker, Gokhale, & Tripp, 2007) show is
the importance of the possibility to buy a single
CELL solution without all the PS3 environment.
The last thing to point out is that in the paper the
code used for benchmarks is not optimized. This
affects more the CELL performances than the
GPU’s, as we will discuss further on.
3.2 GPUs have a Faster Learning
Curve
As a matter of fact GPUs have a faster learning
curve if your aim is just to write a “Hello world”
program. If your goal is to use GPU for small
algorithms with no high performance needs you will
be able to do that after a while. On the contrary if
your goal is to develop an optimized solution for a
problem where performances really matter, then you
will have to learn graphic programming and
OpenGL (or DirectX). This will not make your
learning curve so fast. Some good news comes from
nVidia with the announcement that a C compiler
will be available for CUDA. In this way learning
graphics will be no longer necessary but you will
always need to know how your code is executed on
GPU. This is the very problem which makes the
CELL the learning curve so slow.
In considering learning curves, the only
difference worth pointing out is that GPU makes
parallel programming transparent to users (nVidia
CUDA, 2007). However it has not yet been
demonstrated that this would be and advantage in
specific contexts where optimizations matter.
3.3 GPUs are Specific for Graphic and
Provide Better Performances
This is a claim often proposed while presenting
benchmarks between GPUs and CPUs, and is
obviously true. If you have an algorithm, the closer
it is to graphic context, the more porting it to GPU
would provide faster performance. The common
example is image filtering, where we can obtain an
incredible speedup with respect to CPU
implementations. What is never said but often
thought is that GPU performances are the best tool
available in parallel computing, both at the graphic
and general purpose levels. This is not true.
One of the most significant results provided by
CELL over GPGPU architecture concerns the
solution of a matrix multiplication problems. This
has been used for a long time to demonstrate the
GPU’s abilities. nVidia Quadro 4600 performs
single precision matrix multiplications with a
throughput of 90 GFLOPS (GPU-Tech, 2007). The
same operation performed on CELL processor with
8 SPU runs at 140 GFLOPS (Barcelona
Supercomputing Center, 2007). This result is highly
significant, as matrix multiplication has always been
GPU computing’s greatest achievement. We do not
aim to claim that CELL should be used for graphics
rendering. Our purpose is just to demonstrate that, if
this processor is valuable even for a context where
GPU has always been the top solution, its flexibility
probably makes it a better choice for general
purpose parallel computing.
It might be argued that, on paper, nVidia G80
offers a higher GFLOPS rate than CELL (500
against 208). This claim is true if you only compare
the raw computation rates, positing a full utilization
of both technologies. This is just an ideal case. In
real applications, code optimization is extremely
difficult for GPU, and is even more so if we consider
the C compiler layer introduced by CUDA
architecture. In poor words in real applications, such
as real-time ray tracing, CELL benefits from code
optimization more than GPU and provides higher
performance even with the single six core CELL
GRAPP 2008 - International Conference on Computer Graphics Theory and Applications
422