
We are currently training our model to process
queries in French. For the French text, a pre-
trained French encoder model such as camembert-
large model from (Martin et al., 2019) would be suit-
able. However, our early tests show that it does
not perform as well as the English encoder model
for information retrieval tasks; therefore, further fine-
tuning is required.
Currently, the question answering algorithm is
retrieval-based, meaning that all the responses it gives
are directly from a policy. While this ensure that chat-
bot does not hallucinate, some of the responses it re-
turns can be long since they come from a policy docu-
ment where the length of text is not an issue. In order
to solve this, we could use a large language model to
perform text summarization on the response from the
chatbot. Text summarization is an NLP task where a
model attempts to shorten the input text as much as
possible while maintaining all the vital information.
In doing so, the responses from the chatbot will be
much shorter and easier for the users to read.
We also intend to fine-tune both the bi-encoder
and cross-encoder models from the BERT encoder
to strengthen the chatbot’s performance. Recogniz-
ing the potential of specialization, we aspire to train
the models to more policy and military-specific lan-
guage and contexts. Central to this enhancement strat-
egy will be user feedback, which we plan to har-
ness for both stages of fine-tuning. Initially, we’ll
fine tune the bi-encoder model using techniques from
(Wolf et al., 2020), with the data collected from user
feedback. Subsequently, the cross-encoder model
will also be fine-tuned for optimal answer re-ranking
(Thakur et al., 2020).
5 CONCLUSIONS
In conclusion, recognizing the complexity of long
military policy documents, we introduced an NLP-
based scalable and automated system to extract infor-
mation from web documents texts. We developed a
chatbot that leverages two BERT based encoder mod-
els: the bi-encoder for initial retrieval of the most rel-
evant answers and the cross-encoder for re-ranking
based on relevance. This chatbot is housed within a
user-friendly web application crafted with Flask and
React, eliminating the need for separate installations.
While our evaluations highlight the chatbot’s capa-
bility to address fundamental questions with approx-
imately 88.46% accuracy on military policy topics,
there are areas to enhance. Specifically, more precise
responses could be elicited when users pose vague
questions. The areas for future improvements have
also been identified, including fine-tuning retrieval
models based on user feedback and using large lan-
guage models with semantic search to generate more
accurate responses.
REFERENCES
Adamopoulou, E. and Moussiades, L. (2020). Chat-
bots: History, technology, and applications. Machine
Learning with Applications, 2:100006.
Agrawal, P., Menon, T., Kam, A., Naim, M., Chouragade,
C., Singh, G., Kulkarni, R., Suri, A., Katakam, S.,
Pratik, V., Bansal, P., Kaur, S., Duggal, A., Chalabi,
A., Choudhari, P., Satti, S. R., Nayak, N., and Rajput,
N. (2020). Qnamaker: Data to bot in 2 minutes. In
Companion Proceedings of the Web Conference 2020,
WWW ’20, pages 131–134, New York, NY, USA. As-
sociation for Computing Machinery.
Aliannejadi, M., Zamani, H., Crestani, F., and Croft,
W. B. (2019). Asking clarifying questions in open-
domain information-seeking conversations. CoRR,
abs/1907.06554.
Amati, G. (2009). BM25, pages 257–260. Springer US,
Boston, MA.
Bajaj, P., Campos, D., Craswell, N., Deng, L., Gao, J., Liu,
X., Majumder, R., McNamara, A., Mitra, B., Nguyen,
T., et al. (2016). Ms marco: A human generated ma-
chine reading comprehension dataset. arXiv preprint
arXiv:1611.09268.
Bayan Abu Shawar, E. A. (2007). Chatbots: Are they re-
ally useful? Journal for Language Technology and
Computational Linguistics, 22(1):29–49.
Biswas, M. (2018). Microsoft bot framework. Beginning
AI Bot Frameworks: Getting Started with Bot Devel-
opment, pages 25–66.
Canada (2023). Policies, standards, orders, directives and
regulations from the Department of National Defence
and the Canadian Armed Forces. https://www.canada
.ca/en/department-national-defence/corporate/policie
s-standards.html. [Online; accessed 05-July-2023].
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K.
(2019). Bert: Pre-training of deep bidirectional trans-
formers for language understanding.
Gupta, S., Borkar, D., de Mello, C. S. B., and Patil, S. S.
(2015). An e-commerce website based chatbot.
Hugging Face (2021). cross-encoder/ms-marco-tinybert-l-
4. https://huggingface.co/cross-encoder/ms-marco-T
inyBERT-L-4. [Online; accessed 05-October-2023].
Humeau, S., Shuster, K., Lachaux, M., and Weston,
J. (2019). Real-time inference in multi-sentence
tasks with deep pretrained transformers. CoRR,
abs/1905.01969.
Kim, M. and Kim, D. (2022). A suggestion on the lda-based
topic modeling technique based on elasticsearch for
indexing academic research results. Applied Sciences,
12(6):3118.
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D.,
Levy, O., Lewis, M., Zettlemoyer, L., and Stoyanov,
V. (2019). Roberta: A robustly optimized bert pre-
training approach. arXiv preprint arXiv:1907.11692.
Information Retrieval Chatbot on Military Policies and Standards
721