Karpukhin, V., Oğuz, B., Min, S., Lewis, P., Wu, L.,
Edunov, S., Chen, D [Danqi], & Yih, W. (2020). Dense
Passage Retrieval for Open-Domain Question
Answering. https://doi.org/10.48550/arXiv.2004.04906
Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M.,
Parikh, A., Alberti, C., Epstein, D., Polosukhin, I.,
Devlin, J., Lee, K., Toutanova, K., Jones, L.,
Kelcey, M., Chang, M.‑W., Dai, A. M., Uszkoreit, J.,
Le, Q., & Petrov, S. (2019). Natural Questions: A
Benchmark for Question Answering Research.
Transactions of the Association for Computational
Linguistics, 7, 453–466. https://doi.org/10.116
2/tacl_a_00276
Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V.,
Goyal, N., Küttler, H., Lewis, M., Yih, W.,
Rocktäschel, T., Riedel, S., & Kiela, D. (2020).
Retrieval-Augmented Generation for Knowledge-
Intensive NLP Tasks. https://doi.org/10.4
8550/arXiv.2005.11401
Ma, X., Gong, Y., He, P., Zhao, H., & Duan, N. (2023).
Query Rewriting for Retrieval-Augmented Large
Language Models. https://doi.org/10.48550/arXiv.
2305.14283
Mallen, A., Asai, A., Zhong, V., Das, R., Khashabi, D., &
Hajishirzi, H. (2022). When Not to Trust Language
Models: Investigating Effectiveness of Parametric and
Non-Parametric Memories. https://doi.org/10.48550/
arXiv.2212.10511
Min, S., Krishna, K., Lyu, X., Lewis, M., Yih, W., Koh, P.,
Iyyer, M., Zettlemoyer, L., & Hajishirzi, H. (2023).
FActScore: Fine-grained Atomic Evaluation of Factual
Precision in Long Form Text Generation. In H.
Bouamor, J. Pino, & K. Bali (Eds.), Proceedings of the
2023 Conference on Empirical Methods in Natural
Language Processing (pp. 12076–12100). Association
for Computational Linguistics.
https://doi.org/10.18653/v1/2023.emnlp-main.741
OpenAI, Achiam, J., Adler, S., Agarwal, S., Ahmad, L.,
Akkaya, I., Aleman, F. L., Almeida, D.,
Altenschmidt, J., Altman, S., Anadkat, S., Avila, R.,
Babuschkin, I., Balaji, S., Balcom, V., Baltescu, P.,
Bao, H., Bavarian, M., Belgum, J., . . . Zoph, B.
(2023). GPT-4 Technical Report.
https://doi.org/10.48550/arXiv.2303.08774
Paré, G., Tate, M., Johnstone, D., & Kitsiou, S. (2016).
Contextualizing the twin concepts of systematicity and
transparency in information systems literature reviews.
European Journal of Information Systems, 25(6), 493–
508. https://doi.org/10.1057/s41303-016-0020-3
Rackauckas, Z., Câmara, A., & Zavrel, J. (2024).
Evaluating RAG-Fusion with RAGElo: an Automated
Elo-based Framework. https://doi.org/10.48550/
arXiv.2406.14783
Rajpurkar, P., Jia, R., & Liang, P. (2018). Know What You
Don't Know: Unanswerable Questions for SQuAD.
https://doi.org/10.48550/arXiv.1806.03822
Rau, D., Déjean, H., Chirkova, N., Formal, T., Wang, S.,
Nikoulina, V., & Clinchant, S. (2024). BERGEN: A
Benchmarking Library for Retrieval-Augmented
Generation. https://doi.org/10.48550/arXiv.2407.
01102
Ravi, S. S., Mielczarek, B., Kannappan, A., Kiela, D., &
Qian, R. (2024). Lynx: An Open Source Hallucination
Evaluation Model. https://arxiv.org/abs/2407.08488
Saad-Falcon, J., Khattab, O., Potts, C., & Zaharia, M.
(2023). ARES: An Automated Evaluation Framework
for Retrieval-Augmented Generation Systems.
https://doi.org/10.48550/arXiv.2311.09476
Wang, C., Liu, X., Yue, Y., Tang, X., Zhang, T.,
Jiayang, C., Yao, Y., Gao, W., Hu, X [Xuming], Qi, Z.,
Wang, Y [Yidong], Yang, L., Wang, J [Jindong],
Xie, X., Zhang, Z [Zheng], & Zhang, Y [Yue]. (2023).
Survey on Factuality in Large Language Models:
Knowledge, Retrieval and Domain-Specificity.
https://doi.org/10.48550/arXiv.2310.07521
Webster, J., & Watson, R. T. (2002). Analyzing the Past to
Prepare for the Future: Writing a Literature Review.
MIS Quarterly, 26(2), xiii–xxiii. https://www.jstor.org/
stable/4132319?seq=1#metadata_info_tab_contents
Yang, Z., Qi, P., Zhang, S., Bengio, Y., Cohen, W. W.,
Salakhutdinov, R., & Manning, C. D. (2018).
HotpotQA: A Dataset for Diverse, Explainable Multi-
hop Question Answering. https://doi.org/10.48550/
arXiv.1809.09600
Yu, H., Gan, A., Zhang, K., Tong, S., Liu, Q., & Liu, Z.
(2024). Evaluation of Retrieval-Augmented
Generation: A Survey. https://doi.org/10.48550/
arXiv.2405.07437
Zhang, Z [Zihan], Fang, M., & Chen, L. (2024).
RetrievalQA: Assessing Adaptive Retrieval-Augmented
Generation for Short-form Open-Domain Question
Answering. https://doi.org/10.48550/arXiv.2402.16457
Zhang, Z [Zihan], Fang, M., Chen, L., Namazi-Rad, M.‑R.,
& Wang, J [Jun]. (2023). How Do Large Language
Models Capture the Ever-changing World Knowledge?
A Review of Recent Advances. https://doi.org/10.
48550/arXiv.2310.07343