5 FUTURE RESEARCH
It was shown in (Coifman and Lafon, 2006) that any
positive semi-definite kernel may be used for the di-
mensionality reduction. Rigorous analysis of families
of kernels to facilitate the derivation of an optimal ker-
nel for a given set Γ is an open problem.
The parameter η (δ) determines the dimensional-
ity of the diffusion space. A rigorous method for
choosing η(δ) will facilitate an automatic embedding
of the data. Naturally, η (δ) is data driven (similarly
to ε) i.e. it depends on the set Γ at hand.
Finally, various applications of the diffusion bases
scheme are currently being investigated by the authors
- namely, video segmentation and construction of en-
sembles of classifiers.
REFERENCES
Bourgain, J. (1985). On lipschitz embedding of finite metric
spaces in hilbert space. Israel Journal of Mathematics,
52:46–52.
Candes, E., Romberg, J., and Tao, T. (2006). Robust
uncertainty principles: Exact signal reconstruction
from highly incomplete frequency information. IEEE
Transactions on Information Theory, 52(2):489–509.
Chung, F. R. K. (1997). Spectral Graph Theory. AMS
Regional Conference Series in Mathematics, 92.
Coifman, R. R. and Lafon, S. (2006). Diffusion maps. Ap-
plied and Computational Harmonic Analysis: special
issue on Diffusion Maps and Wavelets, 21:5–30.
Coifman, R. R., Lafon, S., Lee, A., Maggioni, M., Nadler,
B., Warner, F., and Zucker, S. (2005). Geometric dif-
fusions as a tool for harmonics analysis and structure
definition of data: Diffusion maps. In Proceedings of
the National Academy of Sciences, volume 102, pages
7432–7437.
Donoho, D. (2006). Compressed sensing. IEEE Transac-
tions on Information Theory, 52(4):1289–1306.
Fowlkes, C., Belongie, S., Chung, F., and Malik, J. (2004).
Spectral grouping using the nystr
¨
om method. IEEE
Transactions on Pattern Analysis and Machine Intel-
ligence, 26(2):214–225.
Hein, M. and Audibert, Y. (2005). Intrinsic dimensional-
ity estimation of submanifolds in euclidean space. In
Proceedings of the 22nd International Conference on
Machine Learning, pages 289–296.
Johnson, W. B. and Lindenstrauss, J. (1984). Extensions
of lipshitz mapping into hilbert space. Contemporary
Mathematics, 26:189–206.
Keller, S. L. Y. and Coifman, R. R. (2006). Data fusion
and multi-cue data matching by diffusion maps. IEEE
Transactions on Pattern Analysis and Machine Intel-
ligence, 28(11):1784–1797.
Linial, M., Linial, N., Tishby, N., and Yona, G. (1997).
Global self-organization of all known protein se-
quences reveals inherent biological signatures. Jour-
nal of Molecular Biology, 268(2):539–556.
Mardia, K. V., Kent, J. T., and Bibby, J. M. (1979). Multi-
variate Analysis. Academic Press, London.
Roweis, S. T. and Saul, L. K. (2000). Nonlinear dimension-
ality reduction by locally linear embedding. Science,
290:2323–2326.
Schclar, A., Averbuch, A., Hochman, K., Rabin, N., and
Zheludev, V. (2010). A diffusion framework for de-
tection of moving vehicles. Digital Signal Process-
ing,, 20(1):111–122.
Schclar, A. and Rokach, L. (ICEIS 2009). Random projec-
tion ensemble classifiers. Lecture Notes in Business
Information Processing, Proceedings of the 11th Con-
ference on Enterprise Information System.
Schclar, A., Rokach, L., and Amit, A. (2012). Diffusion
ensemble classifiers. In Proceedings of the 4th Inter-
national Conference on Neural Computation Theory
and Applications (NCTA 2012), Barcelona, Spain.
Tenenbaum, J. B., de Silva, V., and Langford, J. C. (2000).
A global geometric framework for nonlinear dimen-
sionality reduction. Science, 290:2319–2323.
APPENDIX: CHOOSING ε
The choice of ε is critical to achieve the optimal per-
formance of the DM and DB algorithms since it de-
fines the size of the local neighborhood of each point.
On one hand, a large ε produces a coarse analysis
of the data as the neighborhood of each point will
contain a large number of points. In this case, the
similarity weight will be close to one for most pairs
of points. On the other hand, a small ε might pro-
duce neighborhoods that contain only one point. In
this case, the similarity will be zero for most pairs of
points. Clearly, an adequate choice of ε lies between
these two extreme cases and should be derived from
the data.
In the following, we derive the range from which ε
should be chosen when a Gaussian weight function is
used and when the dataset Γ approximately lies near a
low dimensional manifold. We denote by d the intrin-
sic dimension of M. Let L = I − P = I − D
−1
W be the
normalized graph Laplacian (Chung, 1997) where P
was defined in Eq. (4) and I is the identity matrix.
The matrices L and P share the same eigenvectors.
Furthermore, Singer (2006) proved that if the points
in Γ are independently uniformly distributed over M
then with high probability
1
ε
m
∑
j=1
L
i j
f (x
j
) =
1
2
4
M
f (x
i
) + O
1
m
1/2
ε
1/2+d/4
,ε
(7)
where f : M → R is a smooth function and 4
M
is the
continuous Laplace-Beltrami operator of the manifold
Diffusion Bases Dimensionality Reduction
155