Data Collection and Analysis of Print and Fan Fiction Classification

Channing Donaldson, James Pope

2022

Abstract

Fan fiction has provided opportunities for genre enthusiasts to produce their own story lines from existing print fiction. It has also introduced concerns including intellectual property issues for traditional print publishers. An interesting and difficult problem is determining whether a given segment of text is fan fiction or print fiction. Classifying unstructured text remains a critical step for many intelligent systems. In this paper we detail how a significant volume of print and fan fiction was obtained. The data is processed using a proposed pipeline and then analysed using various supervised machine learning classifiers. Given 5 to 10 sentences, our results show an accuracy of 80-90% can be achieved using traditional approaches. To our knowledge this is the first study that explores this type of fiction classification problem.

Download


Paper Citation


in Harvard Style

Donaldson C. and Pope J. (2022). Data Collection and Analysis of Print and Fan Fiction Classification. In Proceedings of the 11th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-758-549-4, pages 511-517. DOI: 10.5220/0010774100003122


in Bibtex Style

@conference{icpram22,
author={Channing Donaldson and James Pope},
title={Data Collection and Analysis of Print and Fan Fiction Classification},
booktitle={Proceedings of the 11th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2022},
pages={511-517},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010774100003122},
isbn={978-989-758-549-4},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 11th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - Data Collection and Analysis of Print and Fan Fiction Classification
SN - 978-989-758-549-4
AU - Donaldson C.
AU - Pope J.
PY - 2022
SP - 511
EP - 517
DO - 10.5220/0010774100003122