Clustering by Tree Distance for Parse Tree Normalisation

Martin Emms



The application of tree-distance to clustering is considered. Previous work identified some parameters which favourably affect the use of tree-distance in question-answering tasks. Some evidence is given that the same parameters favourably affect the cluster quality. A potential application is in the creation of systems to carry out transformation of interrogative to indicative sentences, a first step in a question-answering system. It is argued that the clustering provides a means to navigate the space of parses assigned to large question sets. A tree-distance analogue of vector-space notion of centroid is proposed, which derives from a cluster a kind of pattern tree summarising the cluster.


  1. Martin Emms. Tree distance in answer retrieval and parser evaluation. In Bernadette Sharp, editor, Proceedings of NLUCS 2005, 2005.
  2. Michael Collins. Head-driven statistical models for natural language parsing. PhD thesis, University of Pennsylvania, 1999.
  3. Ellen Vorhees and Lori Buckland, editors. The Eleventh Text REtrieval Conference (TREC 2002). Department of Commerce, National Institute of Standards and Technology, 2002.
  4. V.I.Levenshtein. Binary codes capable of correcting insertions and reversals. Sov. Phys. Dokl, 10:707-710, 1966.
  5. L.Kaufman and P.J. Rousseeuw. Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, 1990.
  6. Vasin Punyakanok, Dan Roth, and Wen tau Yih. Natural language inference via dependency tree mapping: An application to question answering. Computational Linguistics, 2004.
  7. Milen Kouylekov and Bernardo Magnini. Recognizing textual entailment with tree edit distance algorithms. In Ido Dagan, Oren Glickman, and Bernardo Magnini, editors, Pascal Challenges Workshop on Recognising Textual Entailment, 2005.

Paper Citation

in Harvard Style

Emms M. (2006). Clustering by Tree Distance for Parse Tree Normalisation . In Proceedings of the 3rd International Workshop on Natural Language Understanding and Cognitive Science - Volume 1: NLUCS, (ICEIS 2006) ISBN 978-972-8865-50-4, pages 91-100. DOI: 10.5220/0002502400910100

in Bibtex Style

author={Martin Emms},
title={Clustering by Tree Distance for Parse Tree Normalisation},
booktitle={Proceedings of the 3rd International Workshop on Natural Language Understanding and Cognitive Science - Volume 1: NLUCS, (ICEIS 2006)},

in EndNote Style

JO - Proceedings of the 3rd International Workshop on Natural Language Understanding and Cognitive Science - Volume 1: NLUCS, (ICEIS 2006)
TI - Clustering by Tree Distance for Parse Tree Normalisation
SN - 978-972-8865-50-4
AU - Emms M.
PY - 2006
SP - 91
EP - 100
DO - 10.5220/0002502400910100