Authors:
Xichen Zhang
;
Arash Habibi Lashkari
and
Ali A. Ghorbani
Affiliation:
University of New Brunswick (UNB), Canada
Keyword(s):
Advertisement Detection, Machine Learning, Characterization, Lexical Features.
Related
Ontology
Subjects/Areas/Topics:
Data and Application Security and Privacy
;
Data Protection
;
Information and Systems Security
;
Privacy
;
Security and Privacy in Web Services
;
Security in Information Systems
;
Security Metrics and Measurement
Abstract:
Due to the significant development of online advertising, malicious advertisements have become one of the
major issues to distribute scamming information, click fraud and malware. Most of the current approaches are
involved with using filtering lists for online advertisements blocking, which are not scalable and need manual
maintenance. This paper presents a lightweight online advertising classification system using lexical-based
features as an alternative solution. In order to imitate real-world cases, three different scenarios are generated
depending on three different URL sources. Then a set of URL lexical-based features are selected from previous
researches in the purpose of training and testing the proposed model. Results show that by using lexical-based
features, advertising detection accuracy is about 97% in certain scenarios.