The experimental results are listed in Tables 2, 3,
4, and 5. The results from the tables indicate that the
multiplicative division method achieved the least cu-
mulative number of optimized weights and the short-
est training time of the two proposed methods. That
is, it successfully reduced the computational cost.
When comparing the two proposed methods with
original online grafting. The proposed methods dra-
matically reduced both training time and the cumula-
tive number of optimized weights. The difference in
training time between LR+L1 and the multiplicative
division method was rather small, but the multiplica-
tive division method was slightly faster than LR+L1,
and the cumulative number of optimized weights with
the multiplicative division method was smaller than
LR+L1. These tables also show that the difference in
precision was negligibly small among the proposed
methods, original online grafting, and LR+L1 in a9a,
w8a, and IJCNN1. These results indicate that our
methods are good approximations of LR+L1 in terms
of precision.
6 CONCLUSIONS
We proposed two improved methods, in terms of effi-
ciency, for online grafting. Online grafting is an incre-
mental gradient-based method for feature selection,
which incrementally estimates features that should be
assigned exactly zero weights in ℓ
1
-regularized lo-
gistic regression, and eliminates them one at a time.
Online grafting was preferable as a feature selection
method but its learning was inefficient due to frequent
parameter optimization. We approximated original
online grafting by testing multiple features simultane-
ously, i.e., multiple features were tested successively
without optimization.
We evaluated our two methods. They attempt
to optimize parameters each time multiple/constant
numbers of features are tested. Though our meth-
ods have trade-offs between efficiency and prediction
accuracy, the experimental results showed that our
methods worked efficiently with negligibly small loss
of prediction accuracy, and in some cases prediction
accuracy was better than original online grafting and
ℓ
1
-regularized logistic regression.
ACKNOWLEDGEMENTS
We would like to thank the InTrigger team for oper-
ating and offering the computing resources consisting
of more than 1,900 CPU cores in 14 sites. This work
was supported by JSPS KAKENHI Grant-in-Aid for
Scientific Research (C) Grant Number 22500121.
REFERENCES
Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., and
Lin, C.-J. (2008). LIBLINEAR:a library for large lin-
ear classification. Journal of Machine Learning Re-
search, 9:1871–1874.
Frank, A. and Asuncion, A. (2010). UCI machine learning
repository.
Gao, J., Andrew, G., Johnson, M., and Toutanova, K.
(2007). A comparative study of parameter estimation
methods for statistical natural language processing. In
Proceedings of the 45th Annual Meeting of the Asso-
ciation of Computational Linguistics (ACL’07), pages
824–831, Prague, Czech Republic. The Association
for Computational Linguistics.
Guyon, I. and Elisseeff, A. (2003). An introduction to vari-
able and feature selection. Journal of Machine Learn-
ing Research, 3:1157–1182.
Hastie, T., Tibshirani, R., and Friedman, J. H. (2001). The
Elements of Statistical Learning. Springer, New York:
Springer-Verlag.
Keerthi, S. S. and DeCoste, D. (2005). A modified finite
newton method for fast solution of large scale lin-
ear SVMs. Journal of Machine Learning Research,
6:341–361.
Okanohara, D. and Tsujii, J. (2009). Learning combina-
tion features with L1 regularization. In Proceedings
of Human Language Technologies: The 2009 Annual
Conference of the North American Chapter of the As-
sociation for Computational Linguistics, Companion
Volume: Short Papers (NAACL-short’09), pages 97–
100, Stroudsburg, PA, USA. Association for Compu-
tational Linguistics.
Perkins, S., Lacker, K., Theiler, J., Guyon, I., and Elisseeff,
A. (2003). Grafting: Fast, incremental feature selec-
tion by gradient descent in function space. Journal of
Machine Learning Research, 3:1333–1356.
Perkins, S. and Theiler, J. (2003). Online feature selection
using grafting. In International Conference on Ma-
chine Learning (ICML 2003), pages 592–599. ACM
Press.
Platt, J. C. (1999). Advances in kernel methods. chapter
Fast training of support vector machines using sequen-
tial minimal optimization, pages 185–208. MIT Press.
Prokhorov, D. (2001). IJCNN 2001 neural network compe-
tition. In IJCNN’01, Ford Research Laboratory.
Tibshirani, R. (1994). Regression shrinkage and selection
via the lasso. Journal of the Royal Statistical Society
(Series B), 58:267–288.
EfficientOnlineFeatureSelectionbasedonl1-RegularizedLogisticRegression
281