Performance Evaluation of Context-Dependent Lexical Information Normalization Using Word Embedding Techniques

Amit Shukla, Rajendra Gupta

2023

Abstract

The word embedding is a type of word representation that allows machine learning algorithms to recognise words that have the same meaning. The majority of lexical normalization approaches work at the character level. While character-level models use far less memory than word-level models, they have a proclivity for predicting slightly erroneous character sequences, resulting in lower accuracy. Since, the misspelt words do not have corresponding word embedding vectors unless the embedding model is trained on the training corpus itself, which is often much smaller than the corpora used for embedding training, word-level models are rarely employed for lexical normalization. The usefulness of these cutting-edge embedding models for lexical normalization of small text data has yet to be determined. Furthermore, practically the lexical normalization research is focused on social media applications. The paper presents the performance evaluation of context dependent lexical information normalization using word embedding technique and found that the word-level model is better in predicting a word that needs to be normalized. The result shows the accuracy percentage is around 75 which is about 2 percent better than the earlier proposed normalization methods.

Download


Paper Citation


in Harvard Style

Shukla A. and Gupta R. (2023). Performance Evaluation of Context-Dependent Lexical Information Normalization Using Word Embedding Techniques. In Proceedings of the 1st International Conference on Artificial Intelligence for Internet of Things: Accelerating Innovation in Industry and Consumer Electronics - Volume 1: AI4IoT; ISBN 978-989-758-661-3, SciTePress, pages 615-619. DOI: 10.5220/0012604000003739


in Bibtex Style

@conference{ai4iot23,
author={Amit Shukla and Rajendra Gupta},
title={Performance Evaluation of Context-Dependent Lexical Information Normalization Using Word Embedding Techniques},
booktitle={Proceedings of the 1st International Conference on Artificial Intelligence for Internet of Things: Accelerating Innovation in Industry and Consumer Electronics - Volume 1: AI4IoT},
year={2023},
pages={615-619},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012604000003739},
isbn={978-989-758-661-3},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 1st International Conference on Artificial Intelligence for Internet of Things: Accelerating Innovation in Industry and Consumer Electronics - Volume 1: AI4IoT
TI - Performance Evaluation of Context-Dependent Lexical Information Normalization Using Word Embedding Techniques
SN - 978-989-758-661-3
AU - Shukla A.
AU - Gupta R.
PY - 2023
SP - 615
EP - 619
DO - 10.5220/0012604000003739
PB - SciTePress