comprehensive studies specific to the business
domain. To address these gaps, this paper conducts a
review of existing research on business IE and
associated tasks, shedding light on the potential
applications of RAG with LLMs. Furthermore, it
introduces a novel real-world application that
showcases the practical implementation of this
integration in developing a Business IE application.
While this application is still in its developmental
stages, thorough evaluation is necessary, which
constitutes the future work of this study. The primary
aim is to illustrate the adaptability and efficiency of
RAG with LLMs within the realm of business
operations.
DATASET AVAILABILITY
https://drive.google.com/file/d/18UB-TamXvCFpq0
edfH7EVPjBl9Ec34dC/view?usp=sharing
ACKNOWLEDGEMENTS
The authors express gratitude to the French
government for the National Research Agency
(ANR) funding and extend appreciation to Cyril
Nguyen Van (company: FirstEco) for generously
providing the dataset.
REFERENCES
Abdullah, M. H. A., Aziz, N., Abdulkadir, S. J., Alhussian,
H. S. A., & Talpur, N. (2023). Systematic literature
review of information extraction from textual data:
recent methods, applications, trends, and
challenges. IEEE Access, 11, 10535-10562.
Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I.,
Aleman, F. L., ... & McGrew, B. (2023). Gpt-4
technical report. arXiv preprint arXiv:2303.08774.
Arendarenko, E., & Kakkonen, T. (2012). Ontology-based
information and event extraction for business
intelligence. In Artificial Intelligence: Methodology,
Systems, and Applications: 15th International
Conference, AIMSA 2012, Varna, Bulgaria, September
12-15, 2012. Proceedings 15 (pp. 89-102). Springer
Berlin Heidelberg.
Arslan, M., & Cruz, C. (2022). Extracting Business Insights
through Dynamic Topic Modeling and NER. In KDIR
(pp. 215-222).
Arslan, M., & Cruz, C. (2024). Business text classification
with imbalanced data and moderately large label spaces
for digital transformation. Applied Network Science,
9(1), 11.
Bellan, P., Dragoni, M., & Ghidini, C. (2022, September).
Extracting business process entities and relations from
text using pre-trained language models and in-context
learning. In International Conference on Enterprise
Design, Operations, and Computing (pp. 182-199).
Cham: Springer International Publishing.
Bzhalava, L., Kaivo-oja, J., & Hassan, S. S. (2024). Digital
business foresight: Keyword-based analysis and CorEx
topic modeling. Futures, 155, 103303.
de Almeida Bordignon, A. C., Thom, L. H., Silva, T. S.,
Dani, V. S., Fantinato, M., & Ferreira, R. C. B. (2018,
June). Natural language processing in business process
identification and modeling: a systematic literature
review. In Proceedings of the XIV Brazilian
Symposium on Information Systems (pp. 1-8).
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018).
Bert: Pre-training of deep bidirectional transformers for
language understanding. arXiv preprint
arXiv:1810.04805.
Feuerriegel, S., Hartmann, J., Janiesch, C., & Zschech, P.
(2024). Generative ai. Business & Information Systems
Engineering, 66(1), 111-126.
Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., ... &
Wang, H. (2023). Retrieval-augmented generation for
large language models: A survey. arXiv preprint
arXiv:2312.10997.
Hoffmann, J., Borgeaud, S., Mensch, A., Buchatskaya, E.,
Cai, T., Rutherford, E., ... & Sifre, L. (2022). Training
compute-optimal large language models. arXiv preprint
arXiv:2203.15556.
Kandpal, N., Deng, H., Roberts, A., Wallace, E., & Raffel,
C. (2023, July). Large language models struggle to
learn long-tail knowledge. In International Conference
on Machine Learning (pp. 15696-15707). PMLR.
Korger, A., & Baumeister, J. (2021, September). Rule-
based Semantic Relation Extraction in Regulatory
Documents. In LWDA (pp. 26-37).
Le Scao, T., Fan, A., Akiki, C., Pavlick, E., Ilić, S.,
Hesslow, D., ... & Al-Shaibani, M. S. (2022). Bloom: A
176b-parameter open-access multilingual language
model.
Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V.,
Goyal, N., ... & Kiela, D. (2020). Retrieval-augmented
generation for knowledge-intensive nlp
tasks. Advances in Neural Information Processing
Systems, 33, 9459-9474.
Li, H., Su, Y., Cai, D., Wang, Y., & Liu, L. (2022). A
survey on retrieval-augmented text generation. arXiv
preprint arXiv:2202.01110.
Martinez-Rodriguez, J. L., Hogan, A., & Lopez-Arevalo, I.
(2020). Information extraction meets the semantic web:
a survey. Semantic Web, 11(2), 255-335.
Piskorski, J., Stefanovitch, N., Jacquet, G., & Podavini, A.
(2021, April). Exploring linguistically-lightweight
keyword extraction techniques for indexing news
articles in a multilingual set-up. In Proceedings of the
EACL Hackashop on news media content analysis and
automated report generation (pp. 35-44).
Raiaan, M. A. K., Mukta, M. S. H., Fatema, K., Fahad, N.
M., Sakib, S., Mim, M. M. J., ... & Azam, S. (2024). A