Documents as Intelligent Agents: An Approach to Optimize Document Representations in Semantic Search
Oliver Strauß, Holger Kett
2023
Abstract
Finding good representations for documents in the context of semantic search is a relevant problem with applications in domains like medicine, research or data search. In this paper we propose to represent each document in a search index by a number of different contextual embeddings. We define and evaluate eight different strategies to combine embeddings of document title, document passages and relevant user queries by means of linear combinations, averaging, and clustering. In addition we apply an agent-based approach to search whereby each data item is modeled as an agent that tries to optimize its metadata and presentation over time by incorporating information received via the users’ interactions with the search system. We validate the document representation strategies and the agent-based approach in the context of a medical information retrieval dataset and find that a linear combination of the title embedding, mean passage embedding and the mean over the clustered embeddings of relevant queries offers the best trade-off between search-performance and index size. We further find, that incorporating embeddings of relevant user queries can significantly improve the performance of representation strategies based on semantic embeddings. The agent-based system performs slightly better than the other representation strategies but comes with a larger index size.
DownloadPaper Citation
in Harvard Style
Strauß O. and Kett H. (2023). Documents as Intelligent Agents: An Approach to Optimize Document Representations in Semantic Search. In Proceedings of the 19th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST; ISBN 978-989-758-672-9, SciTePress, pages 164-175. DOI: 10.5220/0012239200003584
in Bibtex Style
@conference{webist23,
author={Oliver Strauß and Holger Kett},
title={Documents as Intelligent Agents: An Approach to Optimize Document Representations in Semantic Search},
booktitle={Proceedings of the 19th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST},
year={2023},
pages={164-175},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012239200003584},
isbn={978-989-758-672-9},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 19th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST
TI - Documents as Intelligent Agents: An Approach to Optimize Document Representations in Semantic Search
SN - 978-989-758-672-9
AU - Strauß O.
AU - Kett H.
PY - 2023
SP - 164
EP - 175
DO - 10.5220/0012239200003584
PB - SciTePress