SubTST: A Combination of Sub-word Latent Topics and Sentence Transformer for Semantic Similarity Detection
Binh Dang, Tran-Thai Dang, Le-Minh Nguyen
2022
Abstract
Topic information has been useful for semantic similarity detection. In this paper, we present a study on a novel and efficient method to incorporate the topic information with Transformer-based models, which is called the Sub-word Latent Topic and Sentence Transformer (SubTST). The proposed model basically inherits the advantages of the SBERT (Reimers and Gurevych, 2019) architecture, and learns latent topics in the sub- word level instead of the document or word levels as previous work. The experimental results illustrate the effectiveness of our proposed method that significantly outperforms the SBERT, and the tBERT (Peinelt et al., 2020), two state-of-the-art methods for semantic textual detection, on most of the benchmark datasets.
DownloadPaper Citation
in Harvard Style
Dang B., Dang T. and Nguyen L. (2022). SubTST: A Combination of Sub-word Latent Topics and Sentence Transformer for Semantic Similarity Detection. In Proceedings of the 14th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART, ISBN 978-989-758-547-0, pages 91-97. DOI: 10.5220/0010775100003116
in Bibtex Style
@conference{icaart22,
author={Binh Dang and Tran-Thai Dang and Le-Minh Nguyen},
title={SubTST: A Combination of Sub-word Latent Topics and Sentence Transformer for Semantic Similarity Detection},
booktitle={Proceedings of the 14th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART,},
year={2022},
pages={91-97},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010775100003116},
isbn={978-989-758-547-0},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 14th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART,
TI - SubTST: A Combination of Sub-word Latent Topics and Sentence Transformer for Semantic Similarity Detection
SN - 978-989-758-547-0
AU - Dang B.
AU - Dang T.
AU - Nguyen L.
PY - 2022
SP - 91
EP - 97
DO - 10.5220/0010775100003116