Chat Language Normalisation using Machine Learning Methods

Daiga Deksne

2019

Abstract

This paper reports on the development of a chat language normalisation module for the Latvian language. The model is trained using a random forest classifier algorithm that learns to rate normalisation candidates for every word. Candidates are generated using pre-trained word embeddings, N-gram lists, a spelling checker module and some other modules. The use of different means in generation of the normalisation candidates allows covering a wide spectre of errors. We are planning to use this normalisation module in the development of intelligent virtual assistants. We have performed tests to detect if the results of the intent detection module improve when text is pre-processed with the normalisation module.

Download


Paper Citation


in Harvard Style

Deksne D. (2019). Chat Language Normalisation using Machine Learning Methods.In Proceedings of the 11th International Conference on Agents and Artificial Intelligence - Volume 2: NLPinAI, ISBN 978-989-758-350-6, pages 965-972. DOI: 10.5220/0007693509650972


in Bibtex Style

@conference{nlpinai19,
author={Daiga Deksne},
title={Chat Language Normalisation using Machine Learning Methods},
booktitle={Proceedings of the 11th International Conference on Agents and Artificial Intelligence - Volume 2: NLPinAI,},
year={2019},
pages={965-972},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0007693509650972},
isbn={978-989-758-350-6},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 11th International Conference on Agents and Artificial Intelligence - Volume 2: NLPinAI,
TI - Chat Language Normalisation using Machine Learning Methods
SN - 978-989-758-350-6
AU - Deksne D.
PY - 2019
SP - 965
EP - 972
DO - 10.5220/0007693509650972