validation on five sets of data of incremental size.
Results show that our methodology allows filtering
methods to achieve an average precision very close to
using all features and higher than SVM. In particular,
our methodology permits to obtain and also surpass
90% accuracy. Such remarkable result in accuracy is
paired with a dramatic reduction of the classification
time when comparing against PCA.
REFERENCES
Ahmad, A. and Dey, L. (2005). A feature selection tech-
nique for classificatory analysis. Pattern Recognition
Letters, 26(1):43–56.
Bouckaert, R. R., Frank, E., Hall, M., Kirkby, R., Reute-
mann, P., Seewald, A., and Scuse, D. (2010). WEKA
Manual for Version 3-6-2. The University of Waikato.
Breiman, L. (2001). Random forests. Machine learning,
45(1):5–32.
Cano, A., Zafra, A., and Ventura, S. (2015). Speeding up
multiple instance learning classification rules on gpus.
Knowledge and Information Systems, 44(1):127–145.
Cooper, G. F. and Herskovits, E. (1992). A bayesian method
for the induction of probabilistic networks from data.
Machine learning, 9(4):309–347.
Estivill-Castro, V., Lombardi, M., and Marani, A. (2018).
Improving binary classification of web pages using
an ensemble of feature selection algorithms. Aus-
tralasian Computer Science Week Multiconference,
page 17. ACM.
Granitto, P. M., Furlanello, C., Biasioli, F., and Gasperi,
F. (2006). Recursive feature elimination with ran-
dom forest for ptr-ms analysis of agroindustrial prod-
ucts. Chemometrics and Intelligent Laboratory Sys-
tems, 83(2):83–90.
Guyon, I., Weston, J., Barnhill, S., and Vapnik, V. (2002).
Gene selection for cancer classification using support
vector machines. Machine learning, 46(1):389–422.
Jaderberg, M., Vedaldi, A., and Zisserman, A. (2014).
Speeding up convolutional neural networks with low
rank expansions. British Machine Vision Conference.
BMVA Press.
Kenekayoro, P., Buckley, K., and Thelwall, M. (2014). Au-
tomatic classification of academic web page types.
Scientometrics, 101(2):1015–1026.
Kohavi, R. (1995). The power of decision tables. Machine
learning: ECML-95, pages 174–189.
Le Cessie, S. and Van Houwelingen, J. C. (1992). Ridge
estimators in logistic regression. Applied statistics,
pages 191–201.
Li, J., Cheng, K., Wang, S., Morstatter, F., Trevino, R. P.,
Tang, J., and Liu, H. (2017). Feature selection: A data
perspective. ACM Comput. Surv., 50(6):94:1–94:45.
Li, Y., Hsu, D. F., and Chung, S. M. (2009). Combin-
ing multiple feature selection methods for text cate-
gorization by using rank-score characteristics. 21st
Int. Conf. Tools with Artificial Intelligence, ICTAI’09,
pages 508–517. IEEE.
Limongelli, C., Lombardi, M., Marani, A., and Taibi, D.
(2017). Enrichment of the dataset of joint educational
entities with the web of data. IEEE 17th Int. Conf. Ad-
vanced Learning Technologies (ICALT), pages 528–
529. IEEE.
Lombardi, M. and Marani, A. (2015). A comparative frame-
work to evaluate recommender systems in technology
enhanced learning: a case study. Advances in Artifi-
cial Intelligence and Its Applications, pages 155–170.
Springer.
Mohammad, R. M., Thabtah, F., and McCluskey, L. (2014).
Predicting phishing websites based on self-structuring
neural network. Neural Computing and Applications,
25(2):443–458.
Obfuscated (2018). Reference obfuscated to block infer-
ences of authors. PhD thesis.
P., K., Stantic, B., and Sattar, A. (2010). Building a dynamic
classifier for large text data collections. Database
Technologies 21st Australasian Database Conference
(ADC 2010), v. 104 CRPIT, pages 113–122. Aus-
tralian Computer Society.
Piao, G. and Breslin, J. G. (2016). User modeling on twit-
ter with wordnet synsets and dbpedia concepts for
personalized recommendations. 25th ACM Int. Conf.
on Information and Knowledge Management, pages
2057–2060.
Qi, X. and Davison, B. D. (2009). Web page classifica-
tion: Features and algorithms. ACM Comput. Surv.,
41(2):12:1–12:31.
Rastegari, M., Ordonez, V., Redmon, J., and Farhadi, A.
(2016). Xnor-net: Imagenet classification using bi-
nary convolutional neural networks. European Con-
ference on Computer Vision, pages 525–542. Springer.
Saeys, Y., Abeel, T., and Van de Peer, Y. (2008). Robust fea-
ture selection using ensemble feature selection tech-
niques. Machine learning and knowledge discovery
in databases, pages 313–325.
Schonhofen, P. (2006). Identifying document topics using
the wikipedia category network. IEEE/WIC/ACM Int.
Conf. on Web Intelligence, WI ’06, pages 456–462,
Washington, USA. IEEE Computer Society.
Witten, I. H., Frank, E., Hall, M. A., and Pal, C. J. (2016).
Data Mining: Practical machine learning tools and
techniques. Morgan Kaufmann.
Wold, S., Esbensen, K., and Geladi, P. (1987). Principal
component analysis. Chemometrics and intelligent
laboratory systems, 2(1-3):37–52.
Xiong, C., Liu, Z., Callan, J., and Hovy, E. (2016).
Jointsem: Combining ery entity linking and entity
based document ranking.
Zhu, J., Xie, Q., Yu, S.-I., and Wong, W. H. (2016). Exploit-
ing link structure for web page genre identification.
Data Mining and Knowledge Discovery, 30(3):550–
575.
Panel of Attribute Selection Methods to Rank Features Drastically Improves Accuracy in Filtering Web-pages Suitable for Education
57