Benchmarking of Retrieval Augmented Generation: A Comprehensive Systematic Literature Review on Evaluation Dimensions, Evaluation Metrics and Datasets
Simon Knollmeyer, Oğuz Caymazer, Leonid Koval, Muhammad Akmal, Saara Asif, Selvine Mathias, Daniel Großmann
2024
Abstract
Despite the rapid advancements in the field of Large Language Models (LLM), traditional benchmarks have proven to be inadequate for assessing the performance of Retrieval Augmented Generation (RAG) systems. Therefore, this paper presents a comprehensive systematic literature review of evaluation dimensions, metrics, and datasets for RAG systems. This review identifies key evaluation dimensions such as context relevance, faithfulness, answer relevance, correctness, and citation quality. For each evaluation dimension, several metrics and evaluators are proposed on how to assess them. This paper synthesizes the findings from 12 relevant papers and presents a concept matrix that categorizes each evaluation approach. The results provide a foundation for the development of robust evaluation frameworks and suitable datasets that are essential for the effective implementation and deployment of RAG systems in real-world applications.
DownloadPaper Citation
in Harvard Style
Knollmeyer S., Caymazer O., Koval L., Akmal M., Asif S., Mathias S. and Großmann D. (2024). Benchmarking of Retrieval Augmented Generation: A Comprehensive Systematic Literature Review on Evaluation Dimensions, Evaluation Metrics and Datasets. In Proceedings of the 16th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 3: KMIS; ISBN 978-989-758-716-0, SciTePress, pages 137-148. DOI: 10.5220/0013065700003838
in Bibtex Style
@conference{kmis24,
author={Simon Knollmeyer and Oğuz Caymazer and Leonid Koval and Muhammad Akmal and Saara Asif and Selvine Mathias and Daniel Großmann},
title={Benchmarking of Retrieval Augmented Generation: A Comprehensive Systematic Literature Review on Evaluation Dimensions, Evaluation Metrics and Datasets},
booktitle={Proceedings of the 16th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 3: KMIS},
year={2024},
pages={137-148},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013065700003838},
isbn={978-989-758-716-0},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 16th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 3: KMIS
TI - Benchmarking of Retrieval Augmented Generation: A Comprehensive Systematic Literature Review on Evaluation Dimensions, Evaluation Metrics and Datasets
SN - 978-989-758-716-0
AU - Knollmeyer S.
AU - Caymazer O.
AU - Koval L.
AU - Akmal M.
AU - Asif S.
AU - Mathias S.
AU - Großmann D.
PY - 2024
SP - 137
EP - 148
DO - 10.5220/0013065700003838
PB - SciTePress