CONCEPT DISCOVERY FOR LANGUAGE UNDERSTANDING IN AN INFORMATION-QUERY DIALOGUE SYSTEM
Nathalie Camelin, Boris Detienne, Stéphane Huet, Dominique Quadri, Fabrice Lefevre
2011
Abstract
Most recent efficient statistical approaches for natural language understanding require a segmental annotation of training data. Such an annotation implies both to determine the concepts in a sentence and to link them to their corresponding word segments. In this paper we propose a two-steps alternative to the fully manual annotation of data: an initial unsupervised concept discovery, based on latent Dirichlet allocation, is followed by an automatic segmentation using integer linear optimisation. The relation between discovered topics and task-dependent concepts is evaluated on a spoken dialogue task for which a reference annotation is available. Topics and concepts are shown close enough to achieve a potential reduction of one half of the manual annotation cost.
References
- Blei, D., Ng, A., and Jordan, M. (2003). Latent dirichlet allocation. The Journal of Machine Learning Research, 3:993-1022.
- Bonneau-Maynard, H., Rosset, S., Ayache, C., Kuhn, A., and Mostefa, D. (2005). Semantic annotation of the French MEDIA dialog corpus. In Proceedings of Eurospeech.
- Celikyilmaz, A., Hakkani-Tur, D., and Tur, G. (2010). LDA based similarity modeling for question answering. In Proceedings of the NAACL HLT 2010 Workshop on Semantic Search, pages 1-9. Association for Computational Linguistics.
- Chen, D., Batson, R. G., and Dang, Y. (2010). Applied Integer Programming: Modeling and Solution. Wiley.
- Iosif, E., Tegos, A., Pangos, A., Fosler-Lussier, E., and Potamianos, A. (2006). Unsupervised combination of metrics for semantic class induction. In Proceedings of the IEEE Spoken Language Technology Workshop, pages 86-89.
- Pargellis, A., Fosler-Lussier, E., Potamianos, A., and Lee, C. (2001). Metrics for measuring domain independence of semantic classes. In Proceedings of Eurospeech.
- Phan, X., Nguyen, L., and Horiguchi, S. (2008). Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In Proceeding of the 17th international conference on World Wide Web, pages 91-100. ACM.
- ReVelle, C. S. and Eiselt, H. A. (2005). Location analysis: A synthesis and survey. European Journal of Operational Research, 165(1):1-19.
- Siu, K. and Meng, H. (1999). Semi-automatic acquisition of domain-specific semantic structures. In Proceedings of Eurospeech.
- Tam, Y. and Schultz, T. (2006). Unsupervised language model adaptation using latent semantic marginals. In Proceedings of Interspeech, pages 2206-2209.
Paper Citation
in Harvard Style
Camelin N., Detienne B., Huet S., Quadri D. and Lefevre F. (2011). CONCEPT DISCOVERY FOR LANGUAGE UNDERSTANDING IN AN INFORMATION-QUERY DIALOGUE SYSTEM . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011) ISBN 978-989-8425-79-9, pages 24-29. DOI: 10.5220/0003640500240029
in Bibtex Style
@conference{kdir11,
author={Nathalie Camelin and Boris Detienne and Stéphane Huet and Dominique Quadri and Fabrice Lefevre},
title={CONCEPT DISCOVERY FOR LANGUAGE UNDERSTANDING IN AN INFORMATION-QUERY DIALOGUE SYSTEM},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011)},
year={2011},
pages={24-29},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003640500240029},
isbn={978-989-8425-79-9},
}
in EndNote Style
TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011)
TI - CONCEPT DISCOVERY FOR LANGUAGE UNDERSTANDING IN AN INFORMATION-QUERY DIALOGUE SYSTEM
SN - 978-989-8425-79-9
AU - Camelin N.
AU - Detienne B.
AU - Huet S.
AU - Quadri D.
AU - Lefevre F.
PY - 2011
SP - 24
EP - 29
DO - 10.5220/0003640500240029