Authors:
Niharika Gauraha
1
;
Tatyana Pavlenko
2
and
Swapan k. Parui
1
Affiliations:
1
Indian Statistical Institute, India
;
2
KTH Royal Institute of Technology, Sweden
Keyword(s):
Lasso, Weighted Lasso, Variable Selection, Stability Selection, High Dimensional Data.
Related
Ontology
Subjects/Areas/Topics:
Applications
;
Bioinformatics and Systems Biology
;
Feature Selection and Extraction
;
Pattern Recognition
;
Regression
;
Software Engineering
;
Theory and Methods
Abstract:
Lasso and sub-sampling based techniques (e.g. Stability Selection) are nowadays most commonly used methods
for detecting the set of active predictors in high-dimensional linear models. The consistency of the Lassobased
variable selection requires the strong irrepresentable condition on the design matrix to be fulfilled, and
repeated sampling procedures with large feature set make the Stability Selection slow in terms of computation
time. Alternatively, two-stage procedures (e.g. thresholding or adaptive Lasso) are used to achieve consistent
variable selection under weaker conditions (sparse eigenvalue). Such two-step procedures involve choosing
several tuning parameters that seems easy in principle, but difficult in practice. To address these problems
efficiently, we propose a new two-step procedure, called Post Lasso Stability Selection (PLSS). At the first
step, the Lasso screening is applied with a small regularization parameter to generate a candidate subset of
active features. A
t the second step, Stability Selection using weighted Lasso is applied to recover the most
stable features from the candidate subset. We show that under mild (generalized irrepresentable) condition,
this approach yields a consistent variable selection method that is computationally fast even for a very large
number of variables. Promising performance properties of the proposed PLSS technique are also demonstrated
numerically using both simulated and real data examples.
(More)