triangle-inquality and re-arranging again gives an up-
per bound which might be used to accelerate a simi-
larity range query: Θ(S,T
2
) ≤ Θ(S,T
1
) + Θ(T
2
,T
2
) −
Θ(T
2
,T
1
).
Though again this similarity to distance conver-
sion is not sought in the context of finding P- or N-
duals, Stojmirov et al’s equation in (6) does make the
derived distance an N-dual of the similarity. This is
not, however, inconsistent with the example in sec-
tion 3.2 of a similarity with no N-dualizing distance.
Stojmirov et al’s conversion generates asymmetric in-
sertion and deletion entries in the distance cost-table
C
∆
, whereas the proof in section 3.2 concerned the
impossibily of a N-dualizing distance with symmetric
insertion and deletion entries.
Our findings on the various order-relating conjec-
tures concern notions with specific, though widely
used, definitions (Defs.1, 2, 3 and 4). There are other
closely related notions, and the corresponding ques-
tions concerning these have not been addressed. One
variant is stochastic: in a stochastic similarity, proba-
bilities are assigned to aspects of a mapping and mul-
tiplied. We conjecture that these will be A-, N- and
P-dualisable to distance. This is because, under a log-
arithmic mapping, it seems such stochastic variants
can be exactly simulated by a similarity as we have
defined it. In the resulting table, all C
Θ
(x,y) ≤ 0,
allowing the (ii) conversion of Lemma 1 to define a
C
∆
choosing δ = 0. There are also normalised vari-
ants, which we have not considered. Throwing the net
very much more widely, (Chen et al., 2009) study re-
lationships between distance and similarity measures,
in a very general setting, not restricted to measures
based on sequence or tree alignment. Parallel to the
well-known axioms of a distance-measure, they pro-
pose a set of similarity axioms, and they define con-
versions from similarity to distance and in the other
direction, showing that the derived score satisfies the
relevant axioms if the score that is input to the conver-
sion does. Their work, however, does not address the
question whether the conversions give N- or P-duals,
that is whether they preserve relevant orderings.
Concerning directions for further work, the em-
pirical investigation in section 4 was quite prelimi-
nary. For the Kendall-tau comparison of distance and
similarity neighbourhoods, we looked at just one par-
ticular baseline distance and one particular baseline
similarity, and compared only to A-duals as given by
Lemma 1, so clearly there are other possibilities one
could consider here. One is Spiro and Macura’s re-
lation in (5). The Appendix notes some further A-
dualizing conversions, from distannce to similarity
and from similarity to distance, which might be con-
sidered. It is also the case that we applied the Kendall-
tau comparison to full rankings, and it would be of in-
terest to look also at top-k ranking, as has been done
for vector- and set-based measures (Lesot and Rifqi,
2010).
ACKNOWLEDGEMENTS
This research is supported by the Science Foundation
Ireland (Grant 07/CE/I1142) as part of the Centre for
Next Generation Localisation (www.cngl.ie) at Trin-
ity College Dublin.
REFERENCES
Alves, C. E. R., C´aceres, E. N., and Dehne, F. (2002). Paral-
lel dynamic programming for solving the string edit-
ing problem on a cgm/bsp. In Proceedings of the four-
teenth annual ACM symposium on Parallel algorithms
and architectures, SPAA ’02, pages 275–281. ACM.
Batagelj, V. and Bren, M. (1995). Comparing resemblance
measures. Journal of Classification, 12(1):73–90.
Bernard, M., Boyer, L., Habrard, A., and Sebban, M.
(2008). Learning probabilistic models of tree edit dis-
tance. Pattern Recogn., 41(8):2611–2629.
Bose, R. P. J. C. and van der Aalst, W. M. P. (2009). Con-
text aware trace clustering: Towards improving pro-
cess mining results. In SAIM International Confer-
ence on Data Mining, SDM, pages 401–412.
Chen, S., Ma, B., and Zhang, K. (2009). On the similarity
metric and the distance metric. Theoretical Computer
Science, 410(24-25):2365 – 2376.
Emms, M. (2010). Trainable tree distance and an applica-
tion to question categorisation. In KONVENS 2010.
Emms, M. and Franco-Penya, H. (2011). Data-
set used in Kendall-Tau experiments
www.scss.tcd.ie/Martin.Emms/SimVsDistData.
Gusfield, D. (1997). Algorithms on strings, trees, and se-
quences. Cambridge Univ. Press.
Haji, J., Ciaramita, M., Johansson, R., Kawahara, D., Mey-
ers, A., Nivre, J., Surdeanu, M., Xue, N., and Zhang,
Y. (2009). The conll-2009 shared task: Syntactic and
semantic dependencies in multiple languages. In Pro-
ceedings of the 13th Conference on Computational
Natural Language Learning (CoNLL-2009).
Herrbach, C., Denise, A., Dulucq, S., and Touzet, H. (2006).
Alignment of rna secondary structures using a full set
of operations. Technical Report 145, LRI.
Kendall, M. G. (1945). The treatment of ties in ranking
problems. Biometrika, 33(3):239–251.
Kondrak, G. (2003). Phonetic alignment and similarity.
Computers and the Humanities, 37.
Kuboyama, T. (2007). Matching and Learning in Trees.
PhD thesis, Graduate School of Engineering, Univer-
sity of Tokyo.
Lesot, M.-J. and Rifqi, M. (2010). Order-based equiva-
lence degrees for similarity and distance measures.
In Proceedings of the Computational intelligence
ON ORDER EQUIVALENCES BETWEEN DISTANCE AND SIMILARITY MEASURES ON SEQUENCES AND
TREES
23