encoder. We see that the reranking stage also drasti-
cally improves the results up to 61% at Top 1 in auto-
matic evaluation and 85% Top 1 at human evaluation.
6 CONCLUSIONS
This paper presented our method for rapidly creating
LFQA datasets from existing data sources in business
environments. Our method trained a Siamese-BERT
Model with Noise-Added artificial data to retrieve
supporting document passages in order to generate
<<question-answer-document passage>> triplets to
be used as an LFQA training dataset. We proposed a
noise function to obtain altered versions of document
passages and trained a Siamese-BERT encoder using
these altered passages with the original ones. The
model then used this encoder to create the triplets by
using existing help-desk logs consisting of supporting
document links.
We used the proposed method for creating such a
dataset in a real-world business setting together with
some baseline retrieval methods such as BM25 and
base BERT indexing. For evaluating our approaches,
we used the created datasets for training a DPR based
question answering system. Both automatic and hu-
man evaluation results showed that the DPR model
trained with the dataset generated by our methodol-
ogy outperformed others; the proposed Noise-Added
Siamese-BERT model was able to generate better
quality LFQA results using fewer training data sam-
ples.
REFERENCES
Fan, A., Jernite, Y., Perez, E., Grangier, D., Weston, J., and
Auli, M. (2019). ELI5: Long form question answer-
ing. In Proceedings of the 57th Annual Meeting of
the Association for Computational Linguistics, pages
3558–3567, Florence, Italy. Association for Computa-
tional Linguistics.
Gu, N., Gao, Y., and Hahnloser, R. H. R. (2021). Local cita-
tion recommendation with hierarchical-attention text
encoder and scibert-based reranking.
Guu, K., Lee, K., Tung, Z., Pasupat, P., and Chang, M.-
W. (2020). REALM: Retrieval-augmented language
model pre-training. arXiv preprint arXiv:2002.08909.
Johnson, J., Douze, M., and J
´
egou, H. (2017). Billion-
scale similarity search with gpus. arXiv preprint
arXiv:1702.08734.
Joshi, M., Choi, E., Weld, D., and Zettlemoyer, L. (2017).
TriviaQA: A large scale distantly supervised chal-
lenge dataset for reading comprehension. In Proceed-
ings of the 55th Annual Meeting of the Association
for Computational Linguistics (Volume 1: Long Pa-
pers), pages 1601–1611, Vancouver, Canada. Associ-
ation for Computational Linguistics.
Karpukhin, V., Oguz, B., Min, S., Lewis, P., Wu, L.,
Edunov, S., Chen, D., and Yih, W.-t. (2020). Dense
passage retrieval for open-domain question answer-
ing. In Proceedings of the 2020 Conference on
Empirical Methods in Natural Language Processing
(EMNLP), pages 6769–6781, Online. Association for
Computational Linguistics.
Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M.,
Parikh, A., Alberti, C., Epstein, D., Polosukhin, I.,
Kelcey, M., Devlin, J., Lee, K., Toutanova, K. N.,
Jones, L., Chang, M.-W., Dai, A., Uszkoreit, J., Le,
Q., and Petrov, S. (2019). Natural questions: a bench-
mark for question answering research. Transactions
of the Association of Computational Linguistics.
Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mo-
hamed, A., Levy, O., Stoyanov, V., and Zettlemoyer,
L. (2020). BART: Denoising sequence-to-sequence
pre-training for natural language generation, trans-
lation, and comprehension. In Proceedings of the
58th Annual Meeting of the Association for Compu-
tational Linguistics, pages 7871–7880, Online. Asso-
ciation for Computational Linguistics.
Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin,
V., Goyal, N., K
¨
uttler, H., Lewis, M., tau
Yih, W., Rockt
¨
aschel, T., Riedel, S., and Kiela,
D. (2021). Retrieval-augmented generation for
knowledge-intensive nlp tasks.
Petroni, F., Piktus, A., Fan, A., Lewis, P., Yazdani, M., Cao,
N., Thorne, J., Jernite, Y., Karpukhin, V., Maillard, J.,
et al. (2021). Kilt: a benchmark for knowledge in-
tensive language tasks. In NAACL-HLT, pages 2523–
2544. Association for Computational Linguistics.
Qi, P., Lee, H., Sido, O. T., and Manning, C. D. (2021).
Answering open-domain questions of varying reason-
ing steps from text. In Empirical Methods for Natural
Language Processing (EMNLP).
Rajpurkar, P., Jia, R., and Liang, P. (2018). Know what you
don’t know: Unanswerable questions for SQuAD. In
Proceedings of the 56th Annual Meeting of the Associ-
ation for Computational Linguistics (Volume 2: Short
Papers), pages 784–789, Melbourne, Australia. Asso-
ciation for Computational Linguistics.
Rajpurkar, P., Zhang, J., Lopyrev, K., and Liang, P. (2016).
SQuAD: 100,000+ questions for machine comprehen-
sion of text. In Proceedings of the 2016 Conference
on Empirical Methods in Natural Language Process-
ing, pages 2383–2392, Austin, Texas. Association for
Computational Linguistics.
Reimers, N. and Gurevych, I. (2019). Sentence-bert: Sen-
tence embeddings using siamese bert-networks. In
Proceedings of the 2019 Conference on Empirical
Methods in Natural Language Processing. Associa-
tion for Computational Linguistics.
Yang, Y., Yih, W.-t., and Meek, C. (2015). WikiQA: A
challenge dataset for open-domain question answer-
ing. In Proceedings of the 2015 Conference on Empir-
ical Methods in Natural Language Processing, pages
Long Form Question Answering Dataset Creation for Business Use Cases using Noise-Added Siamese-BERT
81