Authors:
Martin Emms
and
Hector-Hugo Franco-Penya
Affiliation:
Trinity College, Ireland
Keyword(s):
Similarity, Distance, Tree, Sequence.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Artificial Intelligence and Decision Support Systems
;
Case-Based Reasoning
;
Classification
;
Clustering
;
Enterprise Information Systems
;
Graphical and Graph-Based Models
;
Pattern Recognition
;
Similarity and Distance Learning
;
Symbolic Systems
;
Theory and Methods
Abstract:
Both ’distance’ and ’similarity’ measures have been proposed for the comparison of sequences and for the comparison of trees, based on scoring mappings, and the paper concerns the equivalence or otherwise of these. These measures are usually parameterised by an atomic ’cost’ table, defining label-dependent values for swaps, deletions and insertions. We look at the question of whether orderings induced by a ’distance’ measure, with some cost-table, can be dualized by a ’similarity’ measure, with some other cost-table, and vice-versa. Three kinds of orderings are considered: alignment-orderings, for fixed source S and target T, neighbour-orderings, where for a fixed S, varying candidate neighbours Ti are ranked, and pair-orderings, where for varying Si, and varying Tj , the pairings hSi,Tji are ranked. We show that (1) alignment-orderings by distance can be dualized by similarity, and vice-versa; (2) neigbour-ordering and pair-ordering by distance can be dualized by similarity; (3) nei
ghbour-ordering and pair-ordering by similarity can sometimes not be dualized by distance. A consequence if this is that there are categorisation and hierarchical clustering outcomes which can be achieved via similarity but not via distance.
(More)