NEIGHBORHOOD FUNCTION DESIGN FOR EMBEDDING IN
REDUCED DIMENSION
Jiun-Wei Liou and Cheng-Yuan Liou
Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan
Keywords:
Dimension reduction, Local linear embedding, K-nearest neighbors, Epsilon distance.
Abstract:
LLE(Local linear embedding) is a widely used approach for dimension reduction. The neighborhood selection
is an important issue for LLE. In this paper, the ε-distance approach and a slightly modified version of k-nn
method are introduced. For different types of datasets, different approaches are needed in order to enjoy higher
chance to obtain better representation. For some datasets with complex structure, the proposed ε-distance
approach can obtain better representations. Different neighborhood selection approaches will be compared by
applying them to different kinds of datasets.
1 INTRODUCTION
LLE(Roweis and Saul, 2000) is a well known ap-
proach for showing the structure of high dimensional
data within low dimentional embeddings. The first
step of LLE algorithm is to find out the neighborhoods
of every points. Traditionally, the k-nearest neighbor
approach is the most widely used one. This approach
has many advantages such as easy to implement, suit-
able for most of cases when the distribution of the
dataset is uniform enough and have no complex struc-
tures, fast enough and can be parallelized and further
accelerated (Yeh et al., 2010).
But for some other type of dataset, the k-nn ap-
proach will face difficulty since the number of se-
lected nearest neighbors can only be a fixed integer
over full dataset, the possible LLE embedding will be
limited if the dataset is not very large, but contains
complex structure. If k is small, the structure is hard
to extract, while for large k, the complex structure
may be destroyed because of generating errornous
connections from one possible sub-structure to an-
other. Also, for non-uniform sampling, the selection
of k may also be a problem. For these kind of prob-
lems, the ε-distance approach is suggested for attempt
to get better embeddings.
Although there are already attempts for modifying
neighborhood functions, such as weighted neighbor-
hood (Chang and Yeung, 2006; Pan et al., 2009; Wen
et al., 2009; Zuo et al., 2008), clustering approaches
(Wen et al., 2006), or including k-means (Wei et al.,
2010; Wen et al., 2006). But these modified ap-
proaches are mostly analyzed and based on original
k-nn method only. In this paper, the ε-distance will be
taken into main consideration as a different concep-
tual method from k-nn for trying to deal with more
complex datasets.
Since the neighborhood selection approach is
changed, following the original LLE algorithm, the
weight computation is not affected significantly,
while the minimum eigenvalue finding needs some
modification since the neighborhood selection is no
more balanced across all points, the matrix is more
likely to hold more zero eigenvalues so that the origi-
nal way of finding smallest eigenvalue may not work
properly. The further modification details for finding
minimal eigenvalues will be discussed later.
The rest of the paper is organized as following:
In section 2, the detail of ε-distance approach will be
introduced. In section 3, the experiments on different
sets of data will be discussed. Before the experiments,
some more details and minor modifications for LLE
will be addressed. In section 4 is the final thought
about the comparison.
2 METHOD
2.1 Neighborhood Selection
In this paper, we focus on the nearest neighbor ap-
proaches using in LLE. The original approach used in
LLE is k-nearest neighbors, which just look into full
190
Liou J. and Liou C..
NEIGHBORHOOD FUNCTION DESIGN FOR EMBEDDING IN REDUCED DIMENSION.
DOI: 10.5220/0003681201900195
In Proceedings of the International Conference on Neural Computation Theory and Applications (NCTA-2011), pages 190-195
ISBN: 978-989-8425-84-3
Copyright
c
2011 SCITEPRESS (Science and Technology Publications, Lda.)