Authors:
Yao Jean Marc Pokou
;
Philippe Fournier-Viger
and
Chadia Moghrabi
Affiliation:
Université de Moncton, Canada
Keyword(s):
Authorship Attribution, Stylometry, Part-of-Speech Tags, Variable Length Sequential Patterns.
Related
Ontology
Subjects/Areas/Topics:
Agents
;
Artificial Intelligence
;
Data Mining
;
Databases and Information Systems Integration
;
Enterprise Information Systems
;
Privacy, Safety and Security
;
Sensor Networks
;
Signal Processing
;
Soft Computing
Abstract:
Identifying the author of a book or document is an interesting research topic having numerous real-life applications. A number of algorithms have been proposed for the automatic authorship attribution of texts. However, it remains an important challenge to find distinct and quantifiable features for accurately identifying or narrowing the range of likely authors of a text. In this paper we propose a novel approach for authorship attribution, which relies on the discovery of variable-length sequential patterns of parts of speech to build signatures representing each author’s writing style. An experimental evaluation using 10 authors and 30 books, consisting of 2,615,856 words, from Project Gutenberg was carried. Results show that the proposed approach can accurately classify texts most of the time using a very small number of variable-length patterns. The proposed approach is also shown to perform better using variable-length patterns than with fixed-length patterns (bigrams or trigra
ms).
(More)