Supporting Trainset Annotation for Text Classification of Incoming Enterprise Documents

Juris Rats, Inguna Pede

2022

Abstract

Volumes of documents organisations receive on a daily basis increase constantly which makes organizations hire more people to index and route them properly. A machine learning based model aimed at automation of the indexing of the incoming documents is proposed in this article. The overall automation process is described and two methods for support of trainset annotation are analysed and compared. Experts are supported during the annotation process by grouping the stream of documents into clusters of similar documents. It is expected that this may improve both the process of topic selection and that of document annotation. Grouping of the document stream is performed firstly via clustering of documents and selecting the next document from the same cluster and secondly searching the next document via Elasticsearch More Like This (MLT) query. Results of the experiments show that MLT query outperforms the clustering.

Download


Paper Citation


in Harvard Style

Rats J. and Pede I. (2022). Supporting Trainset Annotation for Text Classification of Incoming Enterprise Documents. In Proceedings of the 11th International Conference on Data Science, Technology and Applications - Volume 1: DATA, ISBN 978-989-758-583-8, pages 211-218. DOI: 10.5220/0011113000003269


in Bibtex Style

@conference{data22,
author={Juris Rats and Inguna Pede},
title={Supporting Trainset Annotation for Text Classification of Incoming Enterprise Documents},
booktitle={Proceedings of the 11th International Conference on Data Science, Technology and Applications - Volume 1: DATA,},
year={2022},
pages={211-218},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011113000003269},
isbn={978-989-758-583-8},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 11th International Conference on Data Science, Technology and Applications - Volume 1: DATA,
TI - Supporting Trainset Annotation for Text Classification of Incoming Enterprise Documents
SN - 978-989-758-583-8
AU - Rats J.
AU - Pede I.
PY - 2022
SP - 211
EP - 218
DO - 10.5220/0011113000003269