2.7 Review of Relevant Work and Our
Contribution
This section provides a review of relevant work in or-
der to show that how our proposal differs or extend
the previous studies.
It is known that the Lasso can not handle the situ-
ations where predictors are highly correlated or group
of predictors are linearly dependent. In order to deal
with such situations, various algorithms have been
proposed which are based on the concept of cluster-
ing variables first and then pursuing variable selec-
tion, or clustering variables and model selection si-
multaneously.
The methods that perform clustering and model
fitting simultaneously are The Elastic Net (Zou and
Hastie, 2005), Fused LASSO (Tibshirani et al., 2005),
OSCAR(octagonal shrinkage and clustering algo-
rithm for regression) (H. and B., 2008) and Mnet
(Huang et al., 2010). ENet uses a combination of the
L
1
and L
2
penalties, OSCAR uses a combination of
L
1
norm and and L
∞
norm and Mnet uses a combina-
tion of the MCP(minimum concave penalty) and L
2
penalties. As these methods are based on combina-
tion of penalties, they do not use any specific informa-
tion on the correlation pattern among the predictors,
Hence they can not handle linear dependency prob-
lem. Moreover Fused Lasso is applicable only when
the variables have a natural ordering and not suitable
to perform automated variable clustering to unordered
features.
We list few methods that perform clustering and
model fitting at different stages: Principal component
regression (M., 1957) , Tree Harvesting (Hastie et al.,
2001), Cluster Group Lasso (B
¨
uhlmann et al., 2012),
Cluster representative Lasso(CRL) (B
¨
uhlmann et al.,
2012) and the sparse Laplacian shrinkage (SLS) (J.
et al., 2011)). All these methods have been proven
to be consistent variable selection techniques but they
fail to control the false positive rate.
Since the Lasso tends to select only one variable
from the group of strongly correlated variables(even if
many or all of these variables are important), the sta-
bility feature selection using Lasso does not choose
any variable from the group of highly correlated vari-
ables because correlated variables split the vote. To
overcome this problem we propose to cluster the vari-
ables first and then do stability feature selection using
Lasso for cluster-representatives. Basically, our work
can be viewed as an extension of CRL (B
¨
uhlmann
et al., 2012) and an application of stability feature
selection (Meinshausen and B
¨
uhlmann, 2010). We
compare our algorithm with the CRL, in terms of vari-
able selection in section 4. Our simulation studies
shows that our method outperforms the CRL.
3 STABILITY FEATURE
SELECTION USING CRL
We consider high dimensional setting, where group
of variables are strongly correlated or there exist near
linearly dependency among few variables. It is known
that the Lasso tends to select only one variable from
the group of highly correlated or linearly dependent
variables even though many or all of them belong to
the active set S
0
. Various techniques based on cluster-
ing in combination with sparse estimation have been
proposed in past for variable selection, or in more
mathematical terms, to infer the true active set S
0
,
but mostly they fail to control the selection of false
positives. In this article, Our aim is to identify the
true active set and to control false positives simultane-
ously. We use the concept of clustering the correlated
or linearly dependent variables and then selecting or
dropping the whole group instead of single variables
similar to the CRG method proposed in (B
¨
uhlmann
et al., 2012). The stability feature selection has been
proven for identifying the most stable features and for
providing control on the family-wise error rate, we
recommend (Meinshausen and B
¨
uhlmann, 2010) for
theoretical proofs. In order to reduce the selection of
false positive groups, we propose to combine the CRL
with sub-sampling, we call it SCRL, stability feature
selection using CRL.
The proposed SCLR method can be seen as an
application of stability feature selection where the
base selection procedure is the Lasso and the Lasso
is applied on the reduced design matrix of clus-
ter representatives. The advantage of using reduced
design matrix of cluster representatives are as follows:
(a) The plain stability feature selection, where
the baseline feature is the Lasso and the Lasso is
applied on the whole design matrix X(there is no
pre-processing step of clustering the variables). It
is a special case of SCRL when the group sizes are
all one. The plain stability feature selection is not
suitable for variable selection when the variables are
highly correlated because the underlying selection
method Lasso tends to select single variables per
cluster, it selects none from the correlated groups
as the vote gets split within the cluster variables.
Therefore, using the reduced design matrix ensures
the most stable group selection.
(b)The Lasso is repeatedly applied on the reduced
design matrix, therefore the SCRL method is
computationally fast as well.
ICPRAM 2016 - International Conference on Pattern Recognition Applications and Methods
384