InfoGenie: A Chatbot that Enhances Information Extraction Using Modern Natural Language Processing Techniques

Yerram Deekshith Kumar, Manash Pratim Lahkar, Aditya Kumar Singh, Biki Dey, Utpal Sharma

2024

Abstract

Information extraction and question-answering systems face challenges in efficiently extracting information from large repositories, particularly when dealing with PDF files. In response, this paper presents an innovative application of Natural language processing (NLP) techniques. We address these challenges by developing an intelligent chatbot tailored for streamlined Information extraction. Leveraging established language models and embeddings, including the Hugging Face Transformers library and Sentence Transformer models, our solution seamlessly integrates with the Chroma vector store. We outline a robust data ingestion process encompassing Portable Document Format(PDF) document parsing, text segmentation, and document embedding creation. These embeddings serve as the foundation for a resilient vector store, enhancing Information extraction efficiency. The chatbot’s underlying model is fine-tuned for sequence-to-sequence learning, enabling it to generate coherent responses to user queries. Implemented through a user-friendly web interface powered by Streamlit, users can interact seamlessly with the chatbot, upload PDF documents, and ask queries based on those PDF documents. Evaluation on a crowdsourced dataset collected by us demonstrates a 95% cosine similarity between generated and ground truth answers. This research advances NLP-based Information extraction systems, offering practical solutions and insights for future enhancements.

Download


Paper Citation


in Harvard Style

Kumar Y., Lahkar M., Singh A., Dey B. and Sharma U. (2024). InfoGenie: A Chatbot that Enhances Information Extraction Using Modern Natural Language Processing Techniques. In Proceedings of the 1st International Conference on Cognitive & Cloud Computing - Volume 1: IC3Com; ISBN 978-989-758-739-9, SciTePress, pages 239-247. DOI: 10.5220/0013312500004646


in Bibtex Style

@conference{ic3com24,
author={Yerram Deekshith Kumar and Manash Pratim Lahkar and Aditya Kumar Singh and Biki Dey and Utpal Sharma},
title={InfoGenie: A Chatbot that Enhances Information Extraction Using Modern Natural Language Processing Techniques},
booktitle={Proceedings of the 1st International Conference on Cognitive & Cloud Computing - Volume 1: IC3Com},
year={2024},
pages={239-247},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013312500004646},
isbn={978-989-758-739-9},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 1st International Conference on Cognitive & Cloud Computing - Volume 1: IC3Com
TI - InfoGenie: A Chatbot that Enhances Information Extraction Using Modern Natural Language Processing Techniques
SN - 978-989-758-739-9
AU - Kumar Y.
AU - Lahkar M.
AU - Singh A.
AU - Dey B.
AU - Sharma U.
PY - 2024
SP - 239
EP - 247
DO - 10.5220/0013312500004646
PB - SciTePress