loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Paper Unlock

Authors: Tolga Çekiç 1 ; Yusufcan Manav 1 ; Batu Helvacıoğlu 1 ; Enes Burak Dündar 1 ; Onur Deniz 1 and Gülşen Eryiğit 2

Affiliations: 1 Yapı Kredi Teknoloji, Istanbul, Turkey ; 2 Istanbul Technical University, Department of AI&Data Eng., Istanbul, Turkey

Keyword(s): Long Form Question Answering, Siamese-BERT, Dense Passage Retrieval.

Abstract: In business cases, there is an increasing need for automated long form question answering (LFQA) systems from business documents, however data for training such systems is not easily achievable. Developing such data sets require a costly human annotation stage where <> triplets should be created. In this paper, we present a method to rapidly develop an LFQA dataset from existing logs of help-desk data without need of manual human annotation stage. This method first creates a SiameseBert encoder to relate recorded answers with business documents’ passages. For this purpose, the SiameseBert encoder is trained over a synthetically created dataset imitating paraphrased document passages using a noise model. The encoder is then used to create the necessary triplets for LFQA from business documents. We train a Dense Passage Retrieval (DPR) system using a bi-encoder architecture for the retrieval stage and a cross-encoder for re-ranking the r etrieved document passages. The results show that the proposed method is successful at rapidly developing LFQA systems for business use cases, yielding a 85% recall of the correct answer at the top 1 of the returned results. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 3.143.254.11

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Çekiç, T., Manav, Y., Helvacıoğlu, B., Dündar, E. B., Deniz, O. and Eryiğit, G. (2022). Long Form Question Answering Dataset Creation for Business Use Cases using Noise-Added Siamese-BERT. In Proceedings of the 14th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2022) - KDIR; ISBN 978-989-758-614-9; ISSN 2184-3228, SciTePress, pages 75-82. DOI: 10.5220/0011550900003335

@conference{kdir22,
author={Tolga \c{C}eki\c{c} and Yusufcan Manav and Batu Helvacıoğlu and Enes Burak Dündar and Onur Deniz and Gülşen Eryiğit},
title={Long Form Question Answering Dataset Creation for Business Use Cases using Noise-Added Siamese-BERT},
booktitle={Proceedings of the 14th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2022) - KDIR},
year={2022},
pages={75-82},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011550900003335},
isbn={978-989-758-614-9},
issn={2184-3228},
}

TY - CONF

JO - Proceedings of the 14th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2022) - KDIR
TI - Long Form Question Answering Dataset Creation for Business Use Cases using Noise-Added Siamese-BERT
SN - 978-989-758-614-9
IS - 2184-3228
AU - Çekiç, T.
AU - Manav, Y.
AU - Helvacıoğlu, B.
AU - Dündar, E.
AU - Deniz, O.
AU - Eryiğit, G.
PY - 2022
SP - 75
EP - 82
DO - 10.5220/0011550900003335
PB - SciTePress