methods, we also use linear regression and PCA as
benchmark models to complete the comparison.
There are the frame of this paper: We finished
introducing related work by showing each category of
prediction stars’ types in Section 2. We describe the
details of our methods in Section 3, including our
reasons for choosing these methods and the theories
of methods are introduced. After that, we examine the
experimental results and analyze them in Section 4.
Last but not least, the conclusion of this study is listed
in Section 5 with references showing at the end.
2 RELATED WORK
In the beginning, some scientists used a single
parameter to classify stars. At first, Michael and
Meghar classified the differences in quality, and
classified different orders of magnitude into one
category, which is a very traditional way (Swedenborg
1973 & See 1909). Then Fischer and others classified
stars with different densities by taking into account
differences in composition, but it was still not perfect
(Fischer et al., 2014). After that, Chen and Kipping
analyzed the mass-radius relationship of planets and
then classified them (Chen and Kipping, 2017).
Furthermore, Marley and others proposed a
classification method based on components through
the study of spectra. But none of these methods have
very good results (Marley et al., 1999).
However, the method of classifying a single
variable is not more accurate than considering many
variables at the same time. Therefore, many
subsequent scientists will consider many variables at
the same time in the problem of star classification.
Furthermore, there are many factors that need to
be considered when predicting star type. At the same
time, the more factors we consider, the easier it is for
us to conduct research. First of all, Stern and
Levinson proposed a classification method based on
quality and composition in 2002. On the other hand,
they showed that such classification methods are
imperfect and proposed seven requirements that
should be met to build a classification framework.
After that, the classification method introduced by
Russell in the article took into account the three
properties of the planet's composition, mass, and
orbit. The mutual combination of the three aspects
constitutes the final classification. Later, in
FANDOM's introduction to planet classification, the
classification framework took into account the
planet's mass, orbit, surface state, and composition.
They are also various solutions having been used
for planet type classification and prediction in the
past, including machine learning methods and deep
learning models.
First of all, Dieleman and others trained
convolutional neural networks on galaxy images and
established a model to achieve fine-grained galaxy
morphology classification with very high accuracy
(Dieleman et al., 2015). Secondly, Huertas-Company
and others used deep convolutional neural networks to
classify the morphological catalog of 50,000 galaxies
in the H-band (Huertas-Company, 2015). Kim and
Brunner trained a deep CNN to establish some image
classification models for star-galaxy (Kim and
Brunner, 2017). Then Dom´ınguez S´anchez and
others used convolutional neural networks to provide
two classification methods: Hubble sequence T-type
and Galaxy Zoo 2 morphological classification
methods (Dom´ınguez S´anchez et al., 2018). After
that, Lukic and Br¨uggen also applied deep neural
networks to train classification models on data sets
(Lukic, 2017). Moreover, Aniyan and Thorat used a
convolutional neural network improved other model
for morphological classification (Aniyan and Thorat,
2017).
In this research, we first conduct exploratory data
analysis on the dataset and preprocess the input data.
Then we construct and train several machine learning
models, including Linear Regression (LR), Principal
Component Analysis (PCA), Linear Support Vector
Machine (SVM), Random Forest (RF), XGBoost
Regression (XGB) and Artificial Neural Network
(ANN) to obtain corresponding results for further
analysis. Figure 1 shows the workflow of our study in
this paper.
Figure 1: Research Workflow (Picture credit: Original).
3 METHOD
3.1 Exploratory Data Analysis
In the first part, about conducting exploratory data
analysis, the aim is to provide valuable insights about
the data set. The analysis covers data distributions,