a similar manner as for the Relevance Vector Ma-
chine (RVM) (Tipping, 2001)(Tipping and Lawrence,
2003). This prior prunes out kernels that do not fit
the data. Second, this model is made robust to large
errors of the imaging model. This is achieved by as-
suming the errors to be non Gaussian distributed and
are modeled by a pdf with heavier tails. The Student-t
pdf is used to model both PSF and image model er-
rors. This pdf can be viewed as an infinite mixture
of Gaussians with different variances (Bishop, 2006)
and provides both sparse models and robust represen-
tations of large errors (Peel and Mclachlan, 2000),
(Tipping and Lawrence, 2003).
Since the proposed Bayesian model cannot be
solved exactly we resort to the variational approxima-
tion. This approximation methodology (Jordan et al.,
1998) considers a class of approximate posterior dis-
tributions and then searches to find the best approx-
imation of the true posterior within this class. This
methodology has been used in many Bayesian infer-
ence problems with success.
The rest of this paper is organised as follows. In
section 2 we explain in detail the proposed model.
Then in section 3 we present a brief introduction of
variational methods and in section 4 we aply the vari-
ational methodology to infer the proposed model. In
section 5 we present experiments, first on artificially
blurred images and then on real astronimical images.
Finally, in section 6 we conclude and provide direc-
tions for future work.
2 STOCHASTIC MODEL
We assume that the observed image g has been gener-
ated by convolving an unknown image f with an also
unknown PSF h and then adding independent Gaus-
sian distributed noise n, with inverse variance β:
g = f ∗ h+ n. (1)
Here, g, f, h and n are N × 1 vectors of the intensities
of the degraded image, observed image, blurring PSF
and additive noise respectively, in lexicographical or-
der and ∗ denotes two-dimensional circular convolu-
tion between the images.
The blind deconvolution problem is very difficult
because there are too many unknown parameters that
have to be estimated. In fact, the number of unknown
parameters h, f is twice the number of observations
g, and thus reliable estimation of these parameters can
only be achieved by exploiting prior knowledge of the
characteristics of the unknown quantities. Following
the Bayesian framework, the unknown parameters are
treated as random variables and prior knowledge is
expressed by assuming that they have been sampled
from specific prior distributions.
2.1 PSF Model
We model the PSF as a linear combination of a fixed
set of kernel basis functions and specifically there is
one kernel function K(x) centered at each pixel of the
image. This kernel function is then evaluated at all the
pixels of the image to give the N × 1 basis vector φ.
We denote with Φ the N ×N matrix Φ = (φ
1
,.. .,φ
N
),
which is the block-circulant matrix whose first col-
umn is φ
1
= φ, so that Φw = φ ∗ w. Each column φ
i
can also be considered as the kernel function shifted
at the corresponding pixel φ
i
= K(x− x
i
). The PSF h
is then modeled as:
h =
N
∑
i=1
w
i
φ
i
= Φw. (2)
Thus, the data generation model (1) can be written as:
g = (Φw) ∗ f + n = FΦw+ n = ΦW f + n. (3)
Matrices F, W are defined similarly with matrix Φ,
and are block-circulant matrices generated by f and
w respectively, so that Fw = f ∗ w and W f = w∗ f.
In this paper Gaussian kernel functions are con-
sidered, which produce smooth estimates of the PSF.
However, any other type of kernel could be used as
well. It is even possible that many different types of
kernels are used simultaneously, with small additional
computational cost (Tzikas et al., 2006a).
A hierarchical prior that enforces sparsity is then
imposed on the weigths w:
p(w|α) =
N
∏
i=1
N(w
i
|0,α
−1
i
). (4)
Each weight is assigned a separate inverse variance
parameter α
i
, which is treated as a random variable
that follows a Gamma distribution:
p(α) =
N
∏
i=1
Γ(α
i
|a
α
,b
α
). (5)
This two level hierarchical prior is equivalent with
a Student-t prior distribution. This can be realized by
integrating out the parameters α
i
to compute the prior
weight distribution p(w):
p(w) =
p(w|α)p(α)dα = St(w|0,
b
α
a
α
,2a
α
), (6)
where St(w|0,
a
α
b
α
,2a
α
) denotes a zero mean Student t
distribution with variance
b
α
a
α
and 2a
α
degrees of free-
dom (Bishop, 2006).
VISAPP 2007 - International Conference on Computer Vision Theory and Applications
144