Stability Feature Selection using Cluster Representative LASSO
Niharika Gauraha
2016
Abstract
Variable selection in high dimensional regression problems with strongly correlated variables or with near linear dependence among few variables remains one of the most important issues. We propose to cluster the variables first and then do stability feature selection using Lasso for cluster representatives. The first step involves generation of groups based on some criterion and the second step mainly performs group selection with controlling the number of false positives. Thus, our primary emphasis is on controlling type-I error for group variable selection in high-dimensional regression setting. We illustrate the method using simulated and pseudo-real data, and we show that the proposed method finds an optimal and consistent solution.
References
- Bühlmann, P., Kalisch, M., and Meier, L. (2014). Highdimensional statistics with a view towards applications in biology. Annual Review of Statistics and its Applications, 1:255-278.
- Bühlmann, P., Rütimann, P., van de Geer, S., and Zhang, C.- H. (2012). Correlated variables in regression: clustering and sparse estimation. Journal of Statistical Planning and Inference, 143:1835-1871.
- Bühlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer Verlag.
- H., B. and B., R. (2008). Simultaneous regression shrinkage, variable selection and clustering of predictors with oscar. Biometrics, pages 115-123.
- Hastie, T., Tibshirani, R., Botstein, D., and Brown, P. (2001). Supervised harvesting of expression trees. Genome Biology, 2:1-12.
- Huang, J., Breheny, P., Ma, S., and hui Zhang, C. (2010). The mnet method for variable selection. Department of Statistics and Actuarial Science, University of Iowa.
- J., H., S, M., H., L., and CH., Z. (2011). The sparse laplacian shrinkage estimator for high-dimensional regression. statistical signal processing, in SSP09. IEEE/SP 15th Workshop on Statistical Signal Processing, pages 2021-2046.
- M., K. (1957). A course in multivariate analysis. Griffin: London.
- Meinshausen, N. and Bühlmann, P. (2010). Stability selection (with discussion). J. R. Statist. Soc, 72:417-473.
- Segal, M., Dahlquist, K., and Conklin, B. (2003). Regression approaches for microarray data analysis. Journal of Computational Biology, 10:961-980.
- Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Statist. Soc, 58:267-288.
- Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., and Knight, K. (2005). Sparsity and smoothness via the fused lasso. Journal of the Royal Statistical Society Series B, pages 91-108.
- Yuan, M. and Lin, Y. (2007). Model selection and estimation in regression with grouped variables. J. R. Statist. Soc, 68(1):49-67.
- Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. J. R. Statist. Soc, 67:301- 320.
Paper Citation
in Harvard Style
Gauraha N. (2016). Stability Feature Selection using Cluster Representative LASSO . In Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-758-173-1, pages 381-386. DOI: 10.5220/0005827003810386
in Bibtex Style
@conference{icpram16,
author={Niharika Gauraha},
title={Stability Feature Selection using Cluster Representative LASSO},
booktitle={Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2016},
pages={381-386},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005827003810386},
isbn={978-989-758-173-1},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - Stability Feature Selection using Cluster Representative LASSO
SN - 978-989-758-173-1
AU - Gauraha N.
PY - 2016
SP - 381
EP - 386
DO - 10.5220/0005827003810386