CONCEPT DISCOVERY FOR LANGUAGE UNDERSTANDING IN AN INFORMATION-QUERY DIALOGUE SYSTEM

Nathalie Camelin, Boris Detienne, Stéphane Huet, Dominique Quadri, Fabrice Lefevre

Abstract

Most recent efficient statistical approaches for natural language understanding require a segmental annotation of training data. Such an annotation implies both to determine the concepts in a sentence and to link them to their corresponding word segments. In this paper we propose a two-steps alternative to the fully manual annotation of data: an initial unsupervised concept discovery, based on latent Dirichlet allocation, is followed by an automatic segmentation using integer linear optimisation. The relation between discovered topics and task-dependent concepts is evaluated on a spoken dialogue task for which a reference annotation is available. Topics and concepts are shown close enough to achieve a potential reduction of one half of the manual annotation cost.

References

  1. Blei, D., Ng, A., and Jordan, M. (2003). Latent dirichlet allocation. The Journal of Machine Learning Research, 3:993-1022.
  2. Bonneau-Maynard, H., Rosset, S., Ayache, C., Kuhn, A., and Mostefa, D. (2005). Semantic annotation of the French MEDIA dialog corpus. In Proceedings of Eurospeech.
  3. Celikyilmaz, A., Hakkani-Tur, D., and Tur, G. (2010). LDA based similarity modeling for question answering. In Proceedings of the NAACL HLT 2010 Workshop on Semantic Search, pages 1-9. Association for Computational Linguistics.
  4. Chen, D., Batson, R. G., and Dang, Y. (2010). Applied Integer Programming: Modeling and Solution. Wiley.
  5. Iosif, E., Tegos, A., Pangos, A., Fosler-Lussier, E., and Potamianos, A. (2006). Unsupervised combination of metrics for semantic class induction. In Proceedings of the IEEE Spoken Language Technology Workshop, pages 86-89.
  6. Pargellis, A., Fosler-Lussier, E., Potamianos, A., and Lee, C. (2001). Metrics for measuring domain independence of semantic classes. In Proceedings of Eurospeech.
  7. Phan, X., Nguyen, L., and Horiguchi, S. (2008). Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In Proceeding of the 17th international conference on World Wide Web, pages 91-100. ACM.
  8. ReVelle, C. S. and Eiselt, H. A. (2005). Location analysis: A synthesis and survey. European Journal of Operational Research, 165(1):1-19.
  9. Siu, K. and Meng, H. (1999). Semi-automatic acquisition of domain-specific semantic structures. In Proceedings of Eurospeech.
  10. Tam, Y. and Schultz, T. (2006). Unsupervised language model adaptation using latent semantic marginals. In Proceedings of Interspeech, pages 2206-2209.
Download


Paper Citation


in Harvard Style

Camelin N., Detienne B., Huet S., Quadri D. and Lefevre F. (2011). CONCEPT DISCOVERY FOR LANGUAGE UNDERSTANDING IN AN INFORMATION-QUERY DIALOGUE SYSTEM . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011) ISBN 978-989-8425-79-9, pages 24-29. DOI: 10.5220/0003640500240029


in Bibtex Style

@conference{kdir11,
author={Nathalie Camelin and Boris Detienne and Stéphane Huet and Dominique Quadri and Fabrice Lefevre},
title={CONCEPT DISCOVERY FOR LANGUAGE UNDERSTANDING IN AN INFORMATION-QUERY DIALOGUE SYSTEM},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011)},
year={2011},
pages={24-29},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003640500240029},
isbn={978-989-8425-79-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011)
TI - CONCEPT DISCOVERY FOR LANGUAGE UNDERSTANDING IN AN INFORMATION-QUERY DIALOGUE SYSTEM
SN - 978-989-8425-79-9
AU - Camelin N.
AU - Detienne B.
AU - Huet S.
AU - Quadri D.
AU - Lefevre F.
PY - 2011
SP - 24
EP - 29
DO - 10.5220/0003640500240029