Long Form Question Answering Dataset Creation for Business Use Cases using Noise-Added Siamese-BERT

Tolga Çekiç, Yusufcan Manav, Batu Helvacıoğlu, Enes Burak Dündar, Onur Deniz, Gülşen Eryiğit

2022

Abstract

In business cases, there is an increasing need for automated long form question answering (LFQA) systems from business documents, however data for training such systems is not easily achievable. Developing such data sets require a costly human annotation stage where <<question-answer-related document passage>> triplets should be created. In this paper, we present a method to rapidly develop an LFQA dataset from existing logs of help-desk data without need of manual human annotation stage. This method first creates a SiameseBert encoder to relate recorded answers with business documents’ passages. For this purpose, the SiameseBert encoder is trained over a synthetically created dataset imitating paraphrased document passages using a noise model. The encoder is then used to create the necessary triplets for LFQA from business documents. We train a Dense Passage Retrieval (DPR) system using a bi-encoder architecture for the retrieval stage and a cross-encoder for re-ranking the retrieved document passages. The results show that the proposed method is successful at rapidly developing LFQA systems for business use cases, yielding a 85% recall of the correct answer at the top 1 of the returned results.

Download


Paper Citation


in Harvard Style

Çekiç T., Manav Y., Helvacıoğlu B., Dündar E., Deniz O. and Eryiğit G. (2022). Long Form Question Answering Dataset Creation for Business Use Cases using Noise-Added Siamese-BERT. In Proceedings of the 14th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2022) - Volume 1: KDIR; ISBN 978-989-758-614-9, SciTePress, pages 75-82. DOI: 10.5220/0011550900003335


in Bibtex Style

@conference{kdir22,
author={Tolga Çekiç and Yusufcan Manav and Batu Helvacıoğlu and Enes Burak Dündar and Onur Deniz and Gülşen Eryiğit},
title={Long Form Question Answering Dataset Creation for Business Use Cases using Noise-Added Siamese-BERT},
booktitle={Proceedings of the 14th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2022) - Volume 1: KDIR},
year={2022},
pages={75-82},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011550900003335},
isbn={978-989-758-614-9},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 14th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2022) - Volume 1: KDIR
TI - Long Form Question Answering Dataset Creation for Business Use Cases using Noise-Added Siamese-BERT
SN - 978-989-758-614-9
AU - Çekiç T.
AU - Manav Y.
AU - Helvacıoğlu B.
AU - Dündar E.
AU - Deniz O.
AU - Eryiğit G.
PY - 2022
SP - 75
EP - 82
DO - 10.5220/0011550900003335
PB - SciTePress