Authors:
Igor Santos
;
Carlos Laorden
and
Pablo G. Bringas
Affiliation:
University of Deusto, Spain
Keyword(s):
Security, Computer viruses, Data-mining, Malware detection, Machine learning.
Related
Ontology
Subjects/Areas/Topics:
Data and Application Security and Privacy
;
Data Protection
;
Information and Systems Security
;
Intrusion Detection & Prevention
Abstract:
Malware is any type of computer software harmful to computers and networks. The amount of malware is
increasing every year and poses as a serious global security threat. Signature-based detection is the most
broadly used commercial antivirus method, however, it fails to detect new and previously unseen malware.
Supervised machine-learning models have been proposed in order to solve this issue, but the usefulness of
supervised learning is far to be perfect because it requires a significant amount of malicious code and benign
software to be identified and labelled in beforehand. In this paper, we propose a new method that adopts a
collective learning approach to detect unknown malware. Collective classification is a type of semi-supervised
learning that presents an interesting method for optimising the classification of partially-labelled data. In this
way, we propose here, for the first time, collective classification algorithms to build different machine-learning
classifiers using a
set of labelled (as malware and legitimate software) and unlabelled instances. We perform
an empirical validation demonstrating that the labelling efforts are lower than when supervised learning is
used, while maintaining high accuracy rates.
(More)