Supporting Trainset Annotation for Text Classification of Incoming Enterprise Documents
Juris Rats, Inguna Pede
2022
Abstract
Volumes of documents organisations receive on a daily basis increase constantly which makes organizations hire more people to index and route them properly. A machine learning based model aimed at automation of the indexing of the incoming documents is proposed in this article. The overall automation process is described and two methods for support of trainset annotation are analysed and compared. Experts are supported during the annotation process by grouping the stream of documents into clusters of similar documents. It is expected that this may improve both the process of topic selection and that of document annotation. Grouping of the document stream is performed firstly via clustering of documents and selecting the next document from the same cluster and secondly searching the next document via Elasticsearch More Like This (MLT) query. Results of the experiments show that MLT query outperforms the clustering.
DownloadPaper Citation
in Harvard Style
Rats J. and Pede I. (2022). Supporting Trainset Annotation for Text Classification of Incoming Enterprise Documents. In Proceedings of the 11th International Conference on Data Science, Technology and Applications - Volume 1: DATA, ISBN 978-989-758-583-8, pages 211-218. DOI: 10.5220/0011113000003269
in Bibtex Style
@conference{data22,
author={Juris Rats and Inguna Pede},
title={Supporting Trainset Annotation for Text Classification of Incoming Enterprise Documents},
booktitle={Proceedings of the 11th International Conference on Data Science, Technology and Applications - Volume 1: DATA,},
year={2022},
pages={211-218},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011113000003269},
isbn={978-989-758-583-8},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 11th International Conference on Data Science, Technology and Applications - Volume 1: DATA,
TI - Supporting Trainset Annotation for Text Classification of Incoming Enterprise Documents
SN - 978-989-758-583-8
AU - Rats J.
AU - Pede I.
PY - 2022
SP - 211
EP - 218
DO - 10.5220/0011113000003269