Content Rating Classification in Fan Fiction Using Active Learning and Explainable Artificial Intelligence
Yi Sheng Heng, James Pope
2024
Abstract
The emergence of fan fiction websites, where fans write their own storied about a topic/genre, has resulted in serious content rating issues. The websites are accessible to general audiences but often includes explicit content. The authors can rate their own fan fiction stories but this is not required and many stories are unrated. This motivates automatically predicting the content rating using recent natural languages processing techniques. The length of the fan fiction text, ambiguity in ratings schemes, self-annotated (weak) labels, and style of writing all make automatic content rating prediction very difficult. In this paper, we propose several embedding techniques and classification models to address these problem. Based on a dataset from a popular fan fiction website, we show that binary classification is better than multiclass classification and can achieve nearly 70% accuracy using a transformer-based model. When computation is considered, we show that a traditional word embedding technique and Logistic Regression produce the best results with 66% accuracy and 0.1 seconds computation (approximately 15,000 times faster than DistilBERT). We further show that many of the labels are not correct and require subsequent preprocessing techniques to correct the labels. We propose an Active Learning approach, that while the results are not conclusive, suggest further work to address.
DownloadPaper Citation
in Harvard Style
Heng Y. and Pope J. (2024). Content Rating Classification in Fan Fiction Using Active Learning and Explainable Artificial Intelligence. In Proceedings of the 13th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM; ISBN 978-989-758-684-2, SciTePress, pages 224-231. DOI: 10.5220/0012313400003654
in Bibtex Style
@conference{icpram24,
author={Yi Sheng Heng and James Pope},
title={Content Rating Classification in Fan Fiction Using Active Learning and Explainable Artificial Intelligence},
booktitle={Proceedings of the 13th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM},
year={2024},
pages={224-231},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012313400003654},
isbn={978-989-758-684-2},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 13th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM
TI - Content Rating Classification in Fan Fiction Using Active Learning and Explainable Artificial Intelligence
SN - 978-989-758-684-2
AU - Heng Y.
AU - Pope J.
PY - 2024
SP - 224
EP - 231
DO - 10.5220/0012313400003654
PB - SciTePress