In this problem, we can also consider the
modelling of generalised linear models. We use
different link functions and families to build the
corresponding models and thus compare them to
obtain the best model. Because the join function
involves a logarithmic link function, the
HEALTHEXPEND variable is not treated
logarithmically in this question.
Do the glm LIFEEXP and
PUBLICEDUCATION+HEALTHEXPEND+FERT
ILITY+REGION. The result is in Table6 below.
Table 6: Results of generalized linear model.
EDM
g(μ)
𝛽
^
𝛽
^
𝛽
^
𝛽
^
AIC
Gaussian Identity -0.451 0.004 -4.246 -1.369 950.2
Gamma Identity -0.608 0.005 -4.043 -1.531 982.9
Inverse Gaussian Identity -0.699 0.005 -3.928 -1.625 1004
Gaussian Log -5.092 e-03 5.255 e-05 -7.115 e-02 -2.012 e-02 954.4
Gamma Log -7.395 e-03 6.002 e-05 -6.988 e-02 -2.269 e-02 985.6
Inverse Gaussian Log -8.779 e-03 6.465 e-05 -6.884 e-02 -2.416 e-02 1006
Then we can find the gaussian response with
identity function seems most appropriate, since both
regression parameters are significant for modelling
LIFEXP, and the AIC is the smallest among all the
models.
Interestingly, we know that the Gaussian
distribution is approximated as a normal
distribution. That said, if we use the above variables
to build a linear model, a multivariate linear model
might work better than a generalised linear model,
as not every variable is suitable for modelling using
a logarithmic link function. Whether there is a more
appropriate generalised linear model deserves
further research and investigation.
4 CONCLUSION AND
DISCUSSION
In this paper, we first conducted a descriptive
analysis of the data, observing the missing value
characteristics of some variables. The correlation
matrix of the data was then derived, and a simple
linear model was developed and analyzed for the
most highly correlated variables. We then built a
multiple regression model using stepwise regression
to explore which potential variables had a more
significant effect on life expectancy and test the
model’s feasibility and plausibility. After analyzing
this model, we added the region variable, a
categorical variable with significantly different
means across regions. After building a new model
using stepwise regression, we found that region,
fertility rate, healthcare costs, and public education
expenditure significantly affected national life
expectancy. Finally, generalized linear models with
different link functions were developed for
comparison and further analysis.
However, this article still has some
shortcomings, such as the treatment of the selection
of variables by deleting columns with many missing
values. It is worth further debating how to
supplement the missing values. As well as in the
generalized linear model, there is no better choice of
linking function, and the form of the link function
still needs further determination. Finally, I believe
that the established multivariate linear model R
2
can
still be further improved, and in the future, we may
conduct further research.
REFERENCES
Dirac P. (1953) The lorentz transformation and absolute
time. Physica, 19:888–896.
https://doi.org/10.1016/S0031-8914(53)8009 9-6
Feynman R, Vernon F. (1963) The theory of a general
quantum system interacting with a linear dissipative
system. Annals of Physics, 24:118–173.
https://doi.org/10.1006/aphy.2000.6017
Frees E. (1993) Regression modeling with actuarial and
financial applications. Cambridge University Press,
London. https://doi.org/10.1017/CBO9780511814372
Lima M, Siqueira H, Moura A, Hora E, Brito H, Marques
A, et al. (2020) Temporal trend of cancer mortality in
a Brazilian state with a medium Human Development
Index (1980–2018). Sci Rep. 10(1):213-284.
https://doi.org/10.1038/s41598-020-78381-4
Perry K. (2020) Structuralism and Human Development:
A Seamless Marriage? An Assessment of Poverty,
Production and Environmental Challenges in
CARICOM Countries. International Journal of
Political Economy.49(3):222–242.
https://doi.org/10.1080/08911916. 2020.1824735