Chat Language Normalisation using Machine Learning Methods
Daiga Deksne
2019
Abstract
This paper reports on the development of a chat language normalisation module for the Latvian language. The model is trained using a random forest classifier algorithm that learns to rate normalisation candidates for every word. Candidates are generated using pre-trained word embeddings, N-gram lists, a spelling checker module and some other modules. The use of different means in generation of the normalisation candidates allows covering a wide spectre of errors. We are planning to use this normalisation module in the development of intelligent virtual assistants. We have performed tests to detect if the results of the intent detection module improve when text is pre-processed with the normalisation module.
DownloadPaper Citation
in Harvard Style
Deksne D. (2019). Chat Language Normalisation using Machine Learning Methods.In Proceedings of the 11th International Conference on Agents and Artificial Intelligence - Volume 2: NLPinAI, ISBN 978-989-758-350-6, pages 965-972. DOI: 10.5220/0007693509650972
in Bibtex Style
@conference{nlpinai19,
author={Daiga Deksne},
title={Chat Language Normalisation using Machine Learning Methods},
booktitle={Proceedings of the 11th International Conference on Agents and Artificial Intelligence - Volume 2: NLPinAI,},
year={2019},
pages={965-972},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0007693509650972},
isbn={978-989-758-350-6},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 11th International Conference on Agents and Artificial Intelligence - Volume 2: NLPinAI,
TI - Chat Language Normalisation using Machine Learning Methods
SN - 978-989-758-350-6
AU - Deksne D.
PY - 2019
SP - 965
EP - 972
DO - 10.5220/0007693509650972