loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Paper Unlock

Authors: Pavel Král 1 and Christophe Cerisara 2

Affiliations: 1 University of West Bohemia, Czech Republic ; 2 LORIA UMR 7503, France

Keyword(s): Automatic labeling, Corpus, Dialog act, Internet.

Related Ontology Subjects/Areas/Topics: Artificial Intelligence ; Enterprise Information Systems ; Human-Computer Interaction ; Intelligent User Interfaces ; Machine Perception: Vision, Speech, Other

Abstract: This work presents two complementary tools dedicated to the task of textual corpus creation for linguistic researches. The chosen application domain is automatic dialog acts recognition, but the proposed tools might also be applied to any other research area that is concerned with dialogs processing. The first software captures relevant dialogs from freely available resources on the World Wide Web. Filtering and parsing of these web pages is realized thanks to a set of hand-crafted rules. A second set of rules is then applied to achieve automatic segmentation and dialog act tagging. The second software is finally used as a post-processing step to manually check and correct tagging errors when needed. In this paper, both softwares are presented, and the performances of automatic tagging are evaluated on a dialog corpus extracted from an online Czech journal. We show that reasonably good dialog act labeling accuracy may be achieved, hence greatly reducing the cost of building such cor pora. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 52.14.110.171

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Král, P. and Cerisara, C. (2010). AUTOMATIC DIALOG ACT CORPUS CREATION FROM WEB PAGES. In Proceedings of the 12th International Conference on Enterprise Information Systems - Volume 1: ICEIS; ISBN 978-989-8425-08-9; ISSN 2184-4992, SciTePress, pages 198-203. DOI: 10.5220/0003019501980203

@conference{iceis10,
author={Pavel Král. and Christophe Cerisara.},
title={AUTOMATIC DIALOG ACT CORPUS CREATION FROM WEB PAGES},
booktitle={Proceedings of the 12th International Conference on Enterprise Information Systems - Volume 1: ICEIS},
year={2010},
pages={198-203},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003019501980203},
isbn={978-989-8425-08-9},
issn={2184-4992},
}

TY - CONF

JO - Proceedings of the 12th International Conference on Enterprise Information Systems - Volume 1: ICEIS
TI - AUTOMATIC DIALOG ACT CORPUS CREATION FROM WEB PAGES
SN - 978-989-8425-08-9
IS - 2184-4992
AU - Král, P.
AU - Cerisara, C.
PY - 2010
SP - 198
EP - 203
DO - 10.5220/0003019501980203
PB - SciTePress