Benchmarking of Retrieval Augmented Generation: A Comprehensive Systematic Literature Review on Evaluation Dimensions, Evaluation Metrics and Datasets

Simon Knollmeyer, Oğuz Caymazer, Leonid Koval, Muhammad Akmal, Saara Asif, Selvine Mathias, Daniel Großmann

2024

Abstract

Despite the rapid advancements in the field of Large Language Models (LLM), traditional benchmarks have proven to be inadequate for assessing the performance of Retrieval Augmented Generation (RAG) systems. Therefore, this paper presents a comprehensive systematic literature review of evaluation dimensions, metrics, and datasets for RAG systems. This review identifies key evaluation dimensions such as context relevance, faithfulness, answer relevance, correctness, and citation quality. For each evaluation dimension, several metrics and evaluators are proposed on how to assess them. This paper synthesizes the findings from 12 relevant papers and presents a concept matrix that categorizes each evaluation approach. The results provide a foundation for the development of robust evaluation frameworks and suitable datasets that are essential for the effective implementation and deployment of RAG systems in real-world applications.

Download


Paper Citation


in Harvard Style

Knollmeyer S., Caymazer O., Koval L., Akmal M., Asif S., Mathias S. and Großmann D. (2024). Benchmarking of Retrieval Augmented Generation: A Comprehensive Systematic Literature Review on Evaluation Dimensions, Evaluation Metrics and Datasets. In Proceedings of the 16th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 3: KMIS; ISBN 978-989-758-716-0, SciTePress, pages 137-148. DOI: 10.5220/0013065700003838


in Bibtex Style

@conference{kmis24,
author={Simon Knollmeyer and Oğuz Caymazer and Leonid Koval and Muhammad Akmal and Saara Asif and Selvine Mathias and Daniel Großmann},
title={Benchmarking of Retrieval Augmented Generation: A Comprehensive Systematic Literature Review on Evaluation Dimensions, Evaluation Metrics and Datasets},
booktitle={Proceedings of the 16th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 3: KMIS},
year={2024},
pages={137-148},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013065700003838},
isbn={978-989-758-716-0},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 16th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 3: KMIS
TI - Benchmarking of Retrieval Augmented Generation: A Comprehensive Systematic Literature Review on Evaluation Dimensions, Evaluation Metrics and Datasets
SN - 978-989-758-716-0
AU - Knollmeyer S.
AU - Caymazer O.
AU - Koval L.
AU - Akmal M.
AU - Asif S.
AU - Mathias S.
AU - Großmann D.
PY - 2024
SP - 137
EP - 148
DO - 10.5220/0013065700003838
PB - SciTePress