section, we see the results of random matrix theory
related to eigenvalue distribution.
The formulation of this paper is as C = X
T
X / n,
where X is an n x p random data matrix. Eigenvalues
of the matrix C are denoted λ
k
(k = 1, ..., p) with
ranking in descending order. Based on Random
Matrix Theory, asymptotic eigenvalue distribution
can be calculated with enlarging the matrix size to
infinity.
In general, eigenvalue distribution of random data
agrees with the predicted eigenvalue distribution
based on Random Matrix Theory, which is called
Marchenko-Pastur (MP) distribution. Under the
condition that n, p go to infinity with p/n goes to c,
asymptotic distribution of eigenvalues with random
entries, P(λ), is described as follows:
P(λ) = ((λ
p
- λ) (λ - λ
n
))
0.5
/ (2πcλ)
(1)
where λ
p
= (1 + c
0.5
)
2
, λ
n
= (1 - c
0.5
)
2
. As an
approximation in real case, we can apply it to finite
large covariance matrix.
As seen in Figure 1, red curve shows the MP
distribution, which are fit to distribution of random
bulk histogram.
Figure 1: Example of MP distribution. horizontal axis:
eigenvalue, vertical axis: density.
However, many empirical studies indicate that the
eigenvalue distribution of actual data matrix has
dominant random eigenvalues (bulk) and small
number of large eigenvalues (signals or spikes) that
are not random related eigenvalues (Plerou 2002,
Baik 2006). This is shown in Figure 1.
The studies for the phenomena in references
(Martin 2019, Martin 2021) describe that eigenvalues
distributions are classified into some types of
distributions such as Random-like, Bulk-Spikes, Bulk-
Decay, Heavy-tailed, and so on. Figure 1 is an
example of Bulk-Decay-Spikes type distribution with
random bulk in left side and other signal eigenvalues
in right side.
This MP distribution can be used as a method to
distinguish whether eigenvalues have randomness or
signal characteristics. In other words, assume that the
eigenvalues included in the red distribution have
randomness, and the eigenvalues on the right which
are not included in the MP distribution have signal
characteristics.
However, in actual eigenvalue distribution, the
boundary or separation point between random part
and signal part is not necessarily clear. Therefore,
appropriate discrimination or extraction method of
signals from eigenvalue distribution is important.
Other question is whether all eigenvalues that deviate
to the right from the red MP distribution are signals
or not. It is true that properties other than randomness
are included in such eigenvalues, but not all of them
necessarily have important meaning.
Regarding this issue, the author's past initial
experiments have confirmed that there are cases in
which the separation based on the MP distribution
and the separation suggested by statistical tests almost
match. However, in general, this may not always be
the case. In this paper, we will deeply consider this
issue.
3 DISCRIMINATION METHOD
In this paper, signal discrimination method is
extended which is based on the eigenvalue
distribution of random matrix theory but does not
simply depend on the MP distribution. Particularly
focusing on Bulk-Spikes or Bulk-Decay type
distribution, discrimination method of signals from
eigenvalue distribution of large covariance matrix is
investigated.
First, we perform singular value decomposition of
the data matrix. Next, we consider the method for
identifying signals by reconstructing data matrix
using the important singular values and performing
the statistical test on the matrix. The final signal
discrimination is determined comprehensively by
combining the indication of statistical test and other
considerations. This method provides an appropriate
indication of separation point for signal eigenvalues.
Data matrix reconstruction in the above process
means ‘Sparsification’ which extracts and utilizes
only useful eigenvalues.
[Discrimination method]
Set: Target data matrix X.
Process: Singular value decomposition for the
target data matrix, X = U diag(s) V
T
, where s is a
list of singular values (descending order), ‘diag’
means a diagonal matrix, U is a matrix of left-
singular vectors, and V is a matrix of right-
singular vectors.
while not appropriate separation do