Efficient Online Feature Selection based on ℓ1-Regularized Logistic Regression

Kengo Ooi, Takashi Ninomiya

Abstract

Finding features for classifiers is one of the most important concerns in various fields, such as information retrieval, speech recognition, bio-informatics and natural language processing, for improving classifier prediction performance. Online grafting is one solution for finding useful features from an extremely large feature set. Given a sequence of features, online grafting selects or discards each feature in the sequence of features one at a time. Online grafting is preferable in that it incrementally selects features, and it is defined as an optimization problem based on ℓ1-regularized logistic regression. However, its learning is inefficient due to frequent parameter optimization. We propose two improved methods, in terms of efficiency, for online grafting that approximate original online grafting by testing multiple features simultaneously. The experiments have shown that our methods significantly improved efficiency of online grafting. Though our methods are approximation techniques, deterioration of prediction performance was negligibly small.

References

  1. Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., and Lin, C.-J. (2008). LIBLINEAR:a library for large linear classification. Journal of Machine Learning Research, 9:1871-1874.
  2. Frank, A. and Asuncion, A. (2010). UCI machine learning repository.
  3. Gao, J., Andrew, G., Johnson, M., and Toutanova, K. (2007). A comparative study of parameter estimation methods for statistical natural language processing. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics (ACL'07), pages 824-831, Prague, Czech Republic. The Association for Computational Linguistics.
  4. Guyon, I. and Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3:1157-1182.
  5. Hastie, T., Tibshirani, R., and Friedman, J. H. (2001). The Elements of Statistical Learning. Springer, New York: Springer-Verlag.
  6. Keerthi, S. S. and DeCoste, D. (2005). A modified finite newton method for fast solution of large scale linear SVMs. Journal of Machine Learning Research, 6:341-361.
  7. Okanohara, D. and Tsujii, J. (2009). Learning combination features with L1 regularization. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers (NAACL-short'09), pages 97- 100, Stroudsburg, PA, USA. Association for Computational Linguistics.
  8. Perkins, S., Lacker, K., Theiler, J., Guyon, I., and Elisseeff, A. (2003). Grafting: Fast, incremental feature selection by gradient descent in function space. Journal of Machine Learning Research, 3:1333-1356.
  9. Perkins, S. and Theiler, J. (2003). Online feature selection using grafting. In International Conference on Machine Learning (ICML 2003), pages 592-599. ACM Press.
  10. Platt, J. C. (1999). Advances in kernel methods. chapter Fast training of support vector machines using sequential minimal optimization, pages 185-208. MIT Press.
  11. Prokhorov, D. (2001). IJCNN 2001 neural network competition. In IJCNN'01, Ford Research Laboratory.
  12. Tibshirani, R. (1994). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society (Series B), 58:267-288.
Download


Paper Citation


in Harvard Style

Ooi K. and Ninomiya T. (2013). Efficient Online Feature Selection based on ℓ1-Regularized Logistic Regression . In Proceedings of the 5th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART, ISBN 978-989-8565-39-6, pages 277-282. DOI: 10.5220/0004255902770282


in Bibtex Style

@conference{icaart13,
author={Kengo Ooi and Takashi Ninomiya},
title={Efficient Online Feature Selection based on ℓ1-Regularized Logistic Regression},
booktitle={Proceedings of the 5th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,},
year={2013},
pages={277-282},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004255902770282},
isbn={978-989-8565-39-6},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 5th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,
TI - Efficient Online Feature Selection based on ℓ1-Regularized Logistic Regression
SN - 978-989-8565-39-6
AU - Ooi K.
AU - Ninomiya T.
PY - 2013
SP - 277
EP - 282
DO - 10.5220/0004255902770282