Empirical Bayesian Models of L1/L2 Mixed-norm Constraints
Deirel Paz-Linares, Mayrim Vega-Hernández and Eduardo Martínez-Montes
Neuroinformatics Department, Cuban Neuroscience Center, Havana, Cuba
1 OBJECTIVES
Inverse problems are common in neuroscience and
neurotechnology, where usually a small amount of
data is available with respect to the large number of
parameters needed for modelling the brain activity.
Classical examples are the EEG/MEG source
localization and the estimation of effective brain
connectivity. Many kinds of constraints or prior
information have been proposed to regularize these
inverse problems. Combination of smoothness (L2
norm-based penalties) and sparseness (L1 norm-
based penalties) seem to be a promising approach
due to its flexibility, but the estimation of optimal
weights for balancing these constraints became a
critical issue (Vega-Hernández et al., 2008). Two
important examples of constraints that combine
L
1
/L
2
norms are the Elastic Net (Vega-Hernández et
al., 2008) and the Mixed-Norm L
12
(MxN, Gramfort
et al., 2012). The latter imposes the properties along
different dimensions of a matrix inverse problem. In
this work, we formulate an empirical Bayesian
model based on an MxN prior distribution. The
objective is to pursue sparse learning along the first
dimension (along rows) preserving smoothness in
the second dimension (along columns), by
estimating both parameter and hyperparameters
(regularization weights).
2 METHODS
The matrix linear Inverse Problem consists in
inferring an SxT parameter matrix in the model
, where (data), (noise) are NxT, is
NxS, with N<<S, making it an ill-posed problem due
to its non-uniqueness. One approach to address this
problem is the Tikhonov regularization which uses a
penalty function
to find the inverse solution
through a penalized least-squares (PLS) regression
‖
‖
, where is the
regularization parameter. Another approach is the
Bayesian theory, where the solution maximizes the
posterior probability density function (pdf), given by
the Bayes equation:
,,
|
∝
|
,
|
,
which is largely equivalent to the PLS model if we
set the likelihood of the data to
|
,
2
|
|
, and the prior
distribution of the parameters as an exponential
function
|
⁄
, where Z is a
normalizing constant.
The first approach has led to development of fast
and efficient algorithms for a wide range of solvers
, but is determined heuristically using
information criteria which often do not provide
optimal values. On the other hand, Bayesian
approach allows inference on the hyperparameters
and
but frequently involving numerical Monte
Carlo calculations that makes it very slow and
computationally intensive. However, recent
developments of approximate models such as
Variational and Empirical Bayes, allow for fast
computation of complex models.
In this work, we propose to use the squared Mixed-
Norm penalty for the parameters, which is defined as
the L
2
norm of the vector obtained from the L
1
norms of all columns
of (Gramfort et al.,
2012) and can be written as
‖
‖
;,
∑‖
‖
,
where is the weights (positive)
diagonal matrix. The prior pdf for this penalty
represents a Markov Random Field (MRF) where
the states of the variable
are not separable.
|
∝
|
||
|
(1)
Using Empirical Bayes, we first transform this MRF
into a Bayesian network, to arrive to a hierarchical
model (figure 1) by reformulating the pdf of each
as
,
, where
. In this way, the
information received by
from
is contained in
an auxiliary magnitude
∑
|
|
, leading to
a Normal-Laplace joint pdf:
,
|
2
|
|
∝
‖
⁄
‖
⁄
;
x
(2)
Then, using the scaled mixture of Gaussians for the
Normal-Laplace pdf (Li and Lin 2010), a hyper-
Paz-Linares D., Vega-Hernández M. and Martínez Montes E..
Empirical Bayesian Models of L1/L2 Mixed-norm Constraints.
Copyright
c
2014 SCITEPRESS (Science and Technology Publications, Lda.)