SubTST: A Combination of Sub-word Latent Topics and Sentence Transformer for Semantic Similarity Detection

Binh Dang, Tran-Thai Dang, Le-Minh Nguyen

2022

Abstract

Topic information has been useful for semantic similarity detection. In this paper, we present a study on a novel and efficient method to incorporate the topic information with Transformer-based models, which is called the Sub-word Latent Topic and Sentence Transformer (SubTST). The proposed model basically inherits the advantages of the SBERT (Reimers and Gurevych, 2019) architecture, and learns latent topics in the sub- word level instead of the document or word levels as previous work. The experimental results illustrate the effectiveness of our proposed method that significantly outperforms the SBERT, and the tBERT (Peinelt et al., 2020), two state-of-the-art methods for semantic textual detection, on most of the benchmark datasets.

Download


Paper Citation


in Harvard Style

Dang B., Dang T. and Nguyen L. (2022). SubTST: A Combination of Sub-word Latent Topics and Sentence Transformer for Semantic Similarity Detection. In Proceedings of the 14th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART, ISBN 978-989-758-547-0, pages 91-97. DOI: 10.5220/0010775100003116


in Bibtex Style

@conference{icaart22,
author={Binh Dang and Tran-Thai Dang and Le-Minh Nguyen},
title={SubTST: A Combination of Sub-word Latent Topics and Sentence Transformer for Semantic Similarity Detection},
booktitle={Proceedings of the 14th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART,},
year={2022},
pages={91-97},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010775100003116},
isbn={978-989-758-547-0},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 14th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART,
TI - SubTST: A Combination of Sub-word Latent Topics and Sentence Transformer for Semantic Similarity Detection
SN - 978-989-758-547-0
AU - Dang B.
AU - Dang T.
AU - Nguyen L.
PY - 2022
SP - 91
EP - 97
DO - 10.5220/0010775100003116