V. (2019). Roberta: A robustly optimized bert pre-
training approach. arXiv preprint arXiv:1907.11692.
Lydia, E. L., Satyanarayan, S., Kumar, K. V., and Ramya,
D. (2020). Indexing documents with reliable indexing
techniques using Apache Lucene in Hadoop. Interna-
tional Journal of Intelligent Enterprise, 7(1-3):203–
Misra, S., Kumar, V., Kumar, U., Fantazy, K., and
Akhter, M. (2012). Agile software development prac-
tices: evolution, principles, and criticisms. Interna-
tional Journal of Quality & Reliability Management,
oller, T., Reina, A., Jayakumar, R., and Pietsch, M.
(2020). COVID-QA: A question answering dataset
for covid-19. In Proceedings of the 1st Workshop on
NLP for COVID-19 at Association for Computational
Linguistics, page 1.
uller, M., Vorraber, W., and Slany, W. (2019). Open prin-
ciples in new business models for information sys-
tems. Journal of Open Innovation: Technology, Mar-
ket, and Complexity, 5(6):1–13.
Nielsen, R. D., Masanz, J., Ogren, P., Ward, W., Martin,
J. H., Savova, G., and Palmer, M. (2010). An archi-
tecture for complex clinical question answering. In
Proceedings of the 1st ACM International Health In-
formatics Symposium, page 395–399.
Novo-Loures, M., Pavon, R., Laza, R., Ruano-Ordas, D.,
and Mendez, J. R. (2020). Using natural language
preprocessing architecture (NLPA) for big data text
sources. Hindawi Scientific Programming, 2020:1–
13, article id 2390941.
Petroni, F., Rockt
aschel, T., Riedel, S., Lewis, P., Bakhtin,
A., Wu, Y., and Miller, A. (2019). Language models
as knowledge bases? In Proceedings of the Confer-
ence on Empirical Methods in Natural Language Pro-
cessing and the 9th International Joint Conference on
Natural Language Processing, pages 2463–2473.
nda, A. and Czechowska, E. (2021). Ku-
bernetes cluster for automating software production
environment. Sensors Jornal, 21(5): article number
Rajpurkar, P., Zhang, J., Lopyrev, K., and Liang, P.
(2016). SQuAD: 100,000+ Questions for Ma-
chine Comprehension of Text. arXiv e-prints, page
Robertson, S. E. and Jones, K. S. (1976). Relevance weight-
ing of search terms. Journal of the American Society
for Information Science, 27(3):129–146.
Romualdo, A., Real, L., and Caseli, H. (2021). Measur-
ing brazilian portuguese product titles similarity using
embeddings. In Proceedings of XIII Brazilian Sym-
posium on Information Technology and Human Lan-
guage, pages 121–132. SBC.
Saha, A., Aralikatte, R., Khapra, M. M., and Sankara-
narayanan, K. (2018). Duorc: Towards complex lan-
guage understanding with paraphrased reading com-
prehension. CoRR, abs/1804.07927.
Sammut, C. and Webb, G. I., editors (2010). TF-IDF, pages
986–987. Springer Science & Business Media.
Schaffer, N., Weking, J., and St
ahler, O. (2020). Require-
ments and design principles for business model tools.
In Proceedings of Americas Conference on Informa-
tion Systems Proceedings, pages 1–10.
Shvachko, K., Kuang, H., Radia, S., and Chansler, R.
(2010). The Hadoop distributed file system. In Pro-
ceedings of IEEE 26th symposium on mass storage
systems and technologies, pages 1–10.
Sucunuta, M. E. and Riofrio, G. E. (2010). Architecture of
a question-answering system for a specific repository
of documents. In Proceedings of 2nd International
Conference on Software Technology and Engineering,
pages V2–12–V2–16.
Yousfi, S., Rhanoui, M., and Chiadmi, D. (2021). To-
wards a generic multimodal architecture for batch
and streaming big data integration. arXiv preprint
Zaharia, M., Chowdhury, M., Franklin, M. J., Shenker, S.,
and Stoica, I. (2010). Spark: Cluster computing with
working sets. In Proceedings of 2nd USENIX Work-
shop on Hot Topics in Cloud Computing, pages 1–7.
Zhang, G., Jiang, T., Bie, R., Liu, X., Wang, Z., and Rao, J.
(2013). The architecture of ProMe instant question an-
swering system. In Proceedings of International Con-
ference on Cyber-Enabled Distributed Computing and
Knowledge Discovery, pages 237–242.
Zhu, J. Y., Tang, B., and Li, V. O. (2019). A five-layer ar-
chitecture for big data processing and analytics. Inter-
national Journal of Big Data Intelligence, 6(1):38–49.
Design Principles and a Software Reference Architecture for Big Data Question Answering Systems