Documents as Intelligent Agents: An Approach to Optimize Document Representations in Semantic Search

Oliver Strauß, Holger Kett

2023

Abstract

Finding good representations for documents in the context of semantic search is a relevant problem with applications in domains like medicine, research or data search. In this paper we propose to represent each document in a search index by a number of different contextual embeddings. We define and evaluate eight different strategies to combine embeddings of document title, document passages and relevant user queries by means of linear combinations, averaging, and clustering. In addition we apply an agent-based approach to search whereby each data item is modeled as an agent that tries to optimize its metadata and presentation over time by incorporating information received via the users’ interactions with the search system. We validate the document representation strategies and the agent-based approach in the context of a medical information retrieval dataset and find that a linear combination of the title embedding, mean passage embedding and the mean over the clustered embeddings of relevant queries offers the best trade-off between search-performance and index size. We further find, that incorporating embeddings of relevant user queries can significantly improve the performance of representation strategies based on semantic embeddings. The agent-based system performs slightly better than the other representation strategies but comes with a larger index size.

Download


Paper Citation


in Harvard Style

Strauß O. and Kett H. (2023). Documents as Intelligent Agents: An Approach to Optimize Document Representations in Semantic Search. In Proceedings of the 19th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST; ISBN 978-989-758-672-9, SciTePress, pages 164-175. DOI: 10.5220/0012239200003584


in Bibtex Style

@conference{webist23,
author={Oliver Strauß and Holger Kett},
title={Documents as Intelligent Agents: An Approach to Optimize Document Representations in Semantic Search},
booktitle={Proceedings of the 19th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST},
year={2023},
pages={164-175},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012239200003584},
isbn={978-989-758-672-9},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 19th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST
TI - Documents as Intelligent Agents: An Approach to Optimize Document Representations in Semantic Search
SN - 978-989-758-672-9
AU - Strauß O.
AU - Kett H.
PY - 2023
SP - 164
EP - 175
DO - 10.5220/0012239200003584
PB - SciTePress