EmBoost: Embedding Boosting to Learn Multilevel Abstract Text Representation for Document Retrieval
Tolgahan Cakaloglu, Tolgahan Cakaloglu, Xiaowei Xu, Roshith Raghavan
2022
Abstract
Learning hierarchical representation has been vital in natural language processing and information retrieval. With recent advances, the importance of learning the context of words has been underscored. In this paper we propose EmBoost i.e. Embedding Boosting of word or document vector representations that have been learned from multiple embedding models. The advantage of this approach is that this higher order word embedding represents documents at multiple levels of abstraction. The performance gain from this approach has been demonstrated by comparing with various existing text embedding strategies on retrieval and semantic similarity tasks using Stanford Question Answering Dataset (SQuAD), and Question Answering by Search And Reading (QUASAR). The multilevel abstract word embedding is consistently superior to existing solo strategies including Glove, FastText, ELMo and BERT-based models. Our study shows that further gains can be made when a deep residual neural model is specifically trained for document retrieval.
DownloadPaper Citation
in Harvard Style
Cakaloglu T., Xu X. and Raghavan R. (2022). EmBoost: Embedding Boosting to Learn Multilevel Abstract Text Representation for Document Retrieval. In Proceedings of the 14th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART, ISBN 978-989-758-547-0, pages 352-360. DOI: 10.5220/0010822900003116
in Bibtex Style
@conference{icaart22,
author={Tolgahan Cakaloglu and Xiaowei Xu and Roshith Raghavan},
title={EmBoost: Embedding Boosting to Learn Multilevel Abstract Text Representation for Document Retrieval},
booktitle={Proceedings of the 14th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART,},
year={2022},
pages={352-360},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010822900003116},
isbn={978-989-758-547-0},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 14th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART,
TI - EmBoost: Embedding Boosting to Learn Multilevel Abstract Text Representation for Document Retrieval
SN - 978-989-758-547-0
AU - Cakaloglu T.
AU - Xu X.
AU - Raghavan R.
PY - 2022
SP - 352
EP - 360
DO - 10.5220/0010822900003116