Design Principles and a Software Reference Architecture for Big Data Question Answering Systems

Leonardo Moraes, Leonardo Moraes, Pedro Jardim, Cristina Dutra Aguiar

2023

Abstract

Companies continuously produce several documents containing valuable information for users. However, querying these documents is challenging, mainly because of the heterogeneity and volume of documents available. In this work, we investigate the challenge of developing a Big Data Question Answering system, i.e., a system that provides a unified, reliable, and accurate way to query documents through naturally asked questions. We define a set of design principles and introduce BigQA, the first software reference architecture to meet these design principles. The architecture consists of high-level layers and is independent of programming language, technology, querying and answering algorithms. BigQA was validated through a pharmaceutical case study managing over 18k documents from Wikipedia articles and FAQ about Coronavirus. The results demonstrated the applicability of BigQA to real-world applications. In addition, we conducted 27 experiments on three open-domain datasets and compared the recall results of the well-established BM25, TF-IDF, and Dense Passage Retriever algorithms to find the most appropriate generic querying algorithm. According to the experiments, BM25 provided the highest overall performance.

Download


Paper Citation


in Harvard Style

Moraes L., Jardim P. and Dutra Aguiar C. (2023). Design Principles and a Software Reference Architecture for Big Data Question Answering Systems. In Proceedings of the 25th International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 978-989-758-648-4, SciTePress, pages 57-67. DOI: 10.5220/0011842700003467


in Bibtex Style

@conference{iceis23,
author={Leonardo Moraes and Pedro Jardim and Cristina Dutra Aguiar},
title={Design Principles and a Software Reference Architecture for Big Data Question Answering Systems},
booktitle={Proceedings of the 25th International Conference on Enterprise Information Systems - Volume 1: ICEIS,},
year={2023},
pages={57-67},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011842700003467},
isbn={978-989-758-648-4},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 25th International Conference on Enterprise Information Systems - Volume 1: ICEIS,
TI - Design Principles and a Software Reference Architecture for Big Data Question Answering Systems
SN - 978-989-758-648-4
AU - Moraes L.
AU - Jardim P.
AU - Dutra Aguiar C.
PY - 2023
SP - 57
EP - 67
DO - 10.5220/0011842700003467
PB - SciTePress