Figure 6: Relationship between data density and success
rates of estimating 2nd Betti number.
in Figure 1, using the conventional method. The ma-
jor radius and the minor radius of the torus were 2.5
and 1, respectively. Then, we calculated the rates of
data sets of which the conventional method estimated
2nd Betti numbers of their underlying manifold cor-
rectly among 100 data sets at once trial. While chang-
ing the number of data points in one data set from
300 to 350 by 10, we conducted this trial five times in
each data density. Each black circle in Figure 6 indi-
cates a success rate for each trial. For example, when
data density is 31/π
2
, that is 310 data points lie on the
torus, the conventional method estimated correctly for
32% of 100 data sets in first trial, and for 35% of 100
data sets in second trial, and for 38% of 100 data sets
in third trial, and for 40% of 100 data sets in fourth
trial, and for 39% of 100 data sets in fifth trial. A
black line and black dashed lines are straight lines
connecting the mean of success rates and the sum or
difference between the mean and standard deviation
of success rates in each data density, respectively.
Figure 6 shows that estimation accuracy gets
worse as data density decreases. When data den-
sity is low, data points is distributed sparsely. There-
fore, being crossed 3-dimensional balls each other
take more time, that is, birth times of 2-dimensional
holes get later, when data density is low. On the other
hand, death times of 2-dimensional holes are about
the same when data density is high. When birth times
is delayed, the persistence of 2-dimensional holes is
smaller and detecting 2-dimensional cycles becomes
more difficult.
Figure 7: Sequence of proposed method.
3 PROPOSED METHOD
3.1 Method to Estimate the Betti
Number using Interpolation
The estimation accuracy of 2nd Betti numbers using
the conventional method gets worse as data density
decreases. Late birth times of 2-dimensional holes
make the persistence of 2-dimensional cycles smaller
and discriminating between cycles and noises more
difficult. Difficulty of detecting cycles causes the poor
estimation accuracy of 2nd Betti numbers. To solve
this problem, we propose a method to estimate 2nd
Betti numbers using interpolation that add points in
sparse areas on an underlying manifold. Using in-
terpolation to make the detection of cycles easier im-
proves the estimation accuracy of 2nd Betti numbers.
We add points close to a tangent space and a point
of tangency in the underlying manifold in our pro-
posed method. According to Zomorodian (2005), in-
tuitively, a manifold is a topological space that locally
looks like R
n
. We approximate a tangent space using
this property that N-dimensional manifold is locally
similar to N-dimensional Euclidean space. Points in a
tiny range on an N-dimensional underlying manifold
are considered to be in an N-dimensional space. Ad-
ditionally, this space is considered to be approximate
to a tangent space. Therefore, we approximate tan-
gent spaces using points in a tiny range on an under-
lying manifold. Adding points in approximated tan-
gent spaces increases data density while retaining the
topological feature of the underlying manifold. In our
proposed method, we use principal component analy-
sis (PCA) to approximate tangent spaces.
However, if too many points are added, the com-
putational time of persistent homology will be too
long to estimate the Betti numbers of an underlying
manifold in most practical applications. Therefore,
we add points only in comparatively sparse areas on
an underlying manifold as much as possible.
The proposed method employs the conventional
method to analyze data set whose density is increased.
Figure 7 shows the sequence of the proposed method.
We describe an interpolation method employed in
the proposed method in Sec.3.2.
ICAART 2020 - 12th International Conference on Agents and Artificial Intelligence
398