industry sectors, and the relationship between
industry sectors is changeable at different stages.
Liang Ye (2014) (Liang, 2014) made an empirical
study on 32 industry sectors in Shanghai and
Shenzhen stock markets by multidimensional scaling
method, and found that the phenomenon of industry
sectors was obvious.
In addition, there are some scattered studies that
believe that other factors, such as capital flow, The
banker's hype can also explain the phenomenon of hot
spot switching and sector rotation.
2.2 Sector Rotation as Investment
Strategies
At present, there are also some studies that analyze
sector rotation as investment strategies, including
qualitative and quantitative aspects.
First, a qualitative understanding, such as Zhang
Wei (2001), divides market stocks into high-priced
stocks, medium-priced stocks and low-priced stocks
from the perspective of technology investment, and
holds that in the rising market, the high-priced stocks
begin to rise first, followed by the medium-priced
stocks. Finally, it is the low-priced stock sector.
The second is the quantitative analysis of the
sector rotation strategy. For example, Huang Yin
(2019) (Huang, 2019) obtained different dimensions
of sector data by processing China's A-share data, and
trained the neural network for different data to obtain
the optimal quantitative investment strategy based on
Recurrent Neural Network. Yu Zeqi (2019) (Yu,
2011) quantified investment strategy of industry
sector rotation based on regression model. The
regression model is used to quantitatively study
whether there is real investment value in the sector
rotation strategy, taking the market itself as the
research object.
Through reading the literature, it is found that
although there are many researchers on sector
rotation at present, there are not any research on
sector rotation identification by using stock data to
establish a model. Therefore, this paper mainly
establishes a model through many stock data,
explores the existence of sectors and sector rotation,
and illustrates the problem through data.
3 DENTIFICATION OF SECTOR
In previous studies, people default that sectors exist,
but whether sectors really exist has not been verified.
In this section we use stock data to verify the
authenticity of sectors through mathematical models
and algorithms.
3.1 Model Preparation
1)Python package such as Glob, Pandas and
Openpyxl is used to read the file name, filter the data,
and store the rise and fall of each stock in the new
xlsx file.
2) Pearson correlation coefficient is used to
measure whether two data sets are on a line, that is,
to measure the linear relationship between distance
variables. When both variables are normal continuous
variables and there is a linear relationship between
them, Pearson correlation coefficient is often used to
describe the degree of correlation between them. The
specific calculation formula is as follows:
𝑟=
∑
𝑋
−𝑋
𝑌
−𝑌
∑
𝑋
−𝑋
∙
∑
𝑌
−𝑌
1
3) Spearman Grade correlation coefficient is
used to estimate correlation between variables The
correlation between variables can be described by
monotone function. The formula is as follows:
𝜌=1−
6
∑
𝑑
𝑁
𝑁
−1
2
4) K-Mean clustering algorithm is a kind of
iterative clustering algorithm. The K-Mean clustering
algorithm for solving the problem includes the
following steps:
Pre-dividing the data into K groups, randomly
selecting K objects as initial cluster centers, then
calculating the distance between each object and each
seed cluster center. After that assign each object to
the nearest cluster center. Cluster centers and the
objects assigned to them represent one cluster. Every
time a sample is assigned, the cluster center of the
cluster is recalculated according to the existing
objects in the cluster. This process will be repeated
until a certain termination condition is met.
The termination condition can be that no (or
minimum number) objects are reassigned to different
clusters, no (or minimum number) cluster centers
change again, error sum of squares local minimum.