
did not help improve the classification performance of
the LR model. These include the existence of noisy
labels in the initial pool of labelled fan fiction data,
the selection of the uncertainty measure for the AL
strategy, and the quality of the new annotations.
7 CONCLUSION AND FUTURE
WORKS
Despite the results of the content rating classification
carried out in this research, several improvements can
be made to this project for future enhancement. For
instance, future works could consider including the
summary part of the fan fiction apart from the
main stories, whereby this approach may help in bet-
ter generalising the model.
In addition, future studies should also consider
adopting different AL strategies, which could in-
volve experimenting with uncertainty measures
other than entropy. For example, least confidence,
margin sampling, ratio sampling, or other uncertainty
measurement techniques. Moreover, while employ-
ing a professional annotator might be costly, it en-
sures consistent annotations throughout the fan fiction
dataset, maintaining controlled quality for the labels.
Last but not least, ML and AI models with global
explanations could also be explored, which provide
a high-level overview of how these models make cer-
tain decisions. An example will be employing the
SHAP technique, whereby the impact of the features
on the model output was computed with the Shapley
value (Lundberg and Lee, 2017). Other XAI tech-
niques, such as Dalex or Shapash, that support local
and global explanations could also be taken into con-
sideration, which might bring additional value to the
content rating classification task.
REFERENCES
Al-Tamimi, A.-K., Bani-Isaa, E., and Al-Alami, A. (2021).
Active learning for arabic text classification. In
2021 International Conference on Computational In-
telligence and Knowledge Economy (ICCIKE), pages
123–126.
Archive of Our Own (2023). A fan-created, fan-run, non-
profit, noncommercial archive for transformative fan-
works, like fanfiction, fanart, fan videos, and podfic.
https://archiveofourown.org/.
Barfian, E., Iswanto, B. H., and Isa, S. M. (2017). Twitter
pornography multilingual content identification based
on machine learning. Procedia Computer Science,
116:129–136. Discovery and innovation of computer
science technology in artificial intelligence era: The
2nd International Conference on Computer Science
and Computational Intelligence (ICCSCI 2017).
Donaldson, C. and Pope, J. (2022). Data collection and
analysis of print and fan fiction classification. In Pro-
ceedings of the 11th International Conference on Pat-
tern Recognition Applications and Methods - Volume
1: ICPRAM,, pages 511–517. INSTICC, SciTePress.
Glazkova, A. (2020). Text age rating methods for digital li-
braries. In Proceedings of the International Scientific
Conference “Digitalization of Education: History,
Trends and Prospects” (DETP 2020), pages 364–368.
Atlantis Press.
Knorr, C. (2017). Inside the racy, nerdy world of fanfic-
tion. https://edition.cnn.com/2017/07/05/health/kids-
teens-fanfiction-partner/index.html.
Lundberg, S. and Lee, S.-I. (2017). A unified approach to
interpreting model predictions.
Minaee, S., Kalchbrenner, N., Cambria, E., Nikzad, N.,
Chenaghlu, M., and Gao, J. (2021). Deep learning
based text classification: A comprehensive review.
Mohamed, E. and Ha, L. A. (2020). A first dataset for film
age appropriateness investigation. In Proceedings of
the Twelfth Language Resources and Evaluation Con-
ference, pages 1311–1317, Marseille, France. Euro-
pean Language Resources Association.
Murat, I. (2023). A smart movie suitability rating system
based on subtitle. Gazi University Journal of Science
Part C: Design and Technology, 11(1):252–262.
Ofcom (2022). Children and parents: media use and atti-
tudes report 2022. Annual report, Ofcom.
Qiao, Y. and Pope, J. (2022). Content rating classification
for fan fiction.
Ridzuan, F. and Wan Zainon, W. M. N. (2019). A review on
data cleansing methods for big data. Procedia Com-
puter Science, 161:731–738. The Fifth Information
Systems International Conference, 23-24 July 2019,
Surabaya, Indonesia.
Shafaei, M., Samghabadi, N. S., Kar, S., and Solorio, T.
(2019). Rating for parents: Predicting children suit-
ability rating for movies based on language of the
movies.
Tae, K. H., Roh, Y., Oh, Y. H., Kim, H., and Whang, S. E.
(2019). Data cleaning for accurate, fair, and robust
models: A big data - ai integration approach.
Ul Haque, M. A., Rahman, A., and Hashem, M. M. A.
(2021). Sentiment analysis in low-resource bangla
text using active learning. In 2021 5th International
Conference on Electrical Information and Communi-
cation Technology (EICT), pages 1–6.
Vazquez-Calvo, B., Zhang, L.-T., Pascual, M., and Cas-
sany, D. (2019). Fan translation of games, anime,
and fanfiction. Language, Learning and Technology,
23(1):49–71.
Zang, T. (2021). Active learning approach for spam filter-
ing. In 2021 3rd International Conference on Artifi-
cial Intelligence and Advanced Manufacture (AIAM),
pages 366–370.
Content Rating Classification in Fan Fiction Using Active Learning and Explainable Artificial Intelligence
231