
Aftar, S., Gagliardelli, L., El Ganadi, A., Ruozzi, F., and
Bergamaschi, S. (2024b). A novel methodology for
topic identification in hadith. In Proceedings of the
20th Conference on Information and Research Sci-
ence Connecting to Digital and Library Science (for-
merly the Italian Research Conference on Digital Li-
braries).
Almazrouei, E., Alobeidli, H., Alshamsi, A., Cappelli, A.,
Cojocaru, R., Debbah, M., Goffinet,
´
E., et al. (2023).
The falcon series of open language models. arXiv
preprint arXiv:2311.16867.
Alnefaie, S., Atwell, E., and Alsalka, M. A. (2023). Is gpt-4
a good islamic expert for answering quran questions?
In Proceedings of the 35th Conference on Computa-
tional Linguistics and Speech Processing (ROCLING
2023).
Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.,
Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G.,
Askell, A., Agarwal, S., Herbert-Voss, A., Krueger,
G., Henighan, T., Child, R., Ramesh, A., Ziegler,
D. M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler,
E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner,
C., McCandlish, S., Radford, A., Sutskever, I., and
Amodei, D. (2020). Language models are few-shot
learners. arXiv preprint, arXiv:2005.14165.
Chelli, M., Descamps, J., Lavou
´
e, V., Trojani, C., Azar, M.,
Deckert, M., Raynier, J.-L., Clowez, G., Boileau, P.,
and Ruetsch-Chelli, C. (2024). Hallucination rates
and reference accuracy of chatgpt and bard for sys-
tematic reviews: Comparative analysis. In Journal of
Medical Internet Research, volume 26, page e53164.
El Ganadi, A., Vigliermo, R. A., Sala, L., Vanzini, M.,
Ruozzi, F., and Bergamaschi, S. (2023). Bridging is-
lamic knowledge and ai: Inquiring chatgpt on possible
categorizations for an islamic digital library (full pa-
per). In CEUR Workshop Proceedings, volume 3536,
pages 21–33.
Gallifant, J., Fiske, A., Strekalova, Y. A. L., Osorio-
Valencia, J. S., Parke, R., Mwavu, R., Martinez, N.,
Gichoya, J. W., Ghassemi, M., Demner-Fushman, D.,
et al. (2024). Peer review of gpt-4 technical report and
systems card. PLOS Digital Health, 3(1):e0000417.
Gao, L., Dai, Z., Pasupat, P., Chen, A., Chaganty, A. T.,
Fan, Y., Zhao, V. Y., Lao, N., Lee, H., Juan, D.-C.,
et al. (2022). Rarr: Researching and revising what
language models say, using language models. arXiv
preprint arXiv:2210.08726.
Ilyas, M. (2018). A multilingual datasets repository of the
hadith content. International Journal of Advanced
Computer Science and Applications, 9(2):165–172.
Ji, Z., Lee, N., Frieske, R., Yu, T., Su, D., Xu, Y., Ishii, E.,
Bang, Y. J., Madotto, A., and Fung, P. (2023a). Survey
of hallucination in natural language generation. ACM
Computing Surveys, 55(12):1–38.
Ji, Z., Liu, Z., Lee, N., Yu, T., Wilie, B., Zeng, M., and
Fung, P. (2023b). RHO: Reducing hallucination in
open-domain dialogues with knowledge grounding. In
Rogers, A., Boyd-Graber, J., and Okazaki, N., edi-
tors, Findings of the Association for Computational
Linguistics: ACL 2023, pages 4504–4522, Toronto,
Canada. Association for Computational Linguistics.
Kamali, M. H. (2014). A Textbook of Hadith Studies: Au-
thenticity, Compilation, Classification and Criticism
of Hadith. Kube Publishing Ltd.
Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin,
V., Goyal, N., K
¨
uttler, H., Lewis, M., Yih, W.-t.,
Rockt
¨
aschel, T., et al. (2020). Retrieval-augmented
generation for knowledge-intensive nlp tasks. Ad-
vances in Neural Information Processing Systems,
33:9459–9474.
Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., and Neubig,
G. (2023). Pre-train, prompt, and predict: A system-
atic survey of prompting methods in natural language
processing. ACM Computing Surveys, 55(9):1–35.
Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L.,
Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S.,
Yang, Y., Gupta, S., Majumder, B. P., Hermann, K.,
Welleck, S., Yazdanbakhsh, A., and Clark, P. (2023).
Self-refine: Iterative refinement with self-feedback.
Mghari, M., Bouras, O., and Hibaoui, A. E. (2022). Sanad-
set 650k: Data on hadith narrators. Data in Brief,
44:108540.
M
¨
undler, N., He, J., Jenko, S., and Vechev, M. (2023). Self-
contradictory hallucinations of large language mod-
els: Evaluation, detection and mitigation. In arXiv
preprint, volume arXiv:2305.15852.
Qiu, Y., Embar, V., Cohen, S. B., and Han, B. (2023).
Think while you write: Hypothesis verification pro-
motes faithful knowledge-to-text generation. In arXiv
preprint, volume arXiv:2311.09467.
Reddy, G. P., Kumar, Y. V. P., and Prakash, K. P. (2024).
Hallucinations in large language models (llms). In
2024 IEEE Open Conference of Electrical, Electronic
and Information Sciences (eStream). IEEE.
Rizqullah, M. R., Purwarianti, A., and Aji, A. F. (2023).
Qasina: Religious domain question answering using
sirah nabawiyah. In 2023 10th International Confer-
ence on Advanced Informatics: Concept, Theory and
Application (ICAICTA). IEEE.
Siino, M. and Tinnirello, I. (2024). Gpt hallucination detec-
tion through prompt engineering. In Working Notes of
CLEF.
Templeton, A., Conerly, T., Marcus, J., Lindsey, J., Bricken,
T., Chen, B., Pearce, A., et al. (2024). Scaling
monosemanticity: Extracting interpretable features
from claude 3 sonnet. Transformer Circuits Thread.
Varshney, N., Yao, W., Zhang, H., Chen, J., and Yu, D.
(2023). A stitch in time saves nine: Detecting and
mitigating hallucinations of llms by validating low-
confidence generation. In arXiv preprint.
Wang, Y., Kordi, Y., Mishra, S., Liu, A., Smith, N. A.,
Khashabi, D., and Hajishirzi, H. (2023). Self-instruct:
Aligning language models with self-generated in-
structions.
ICAART 2025 - 17th International Conference on Agents and Artificial Intelligence
1228