distributed, indicating that the homovariance
hypothesis is satisfied.
As can be seen from the figure below, there are
three outliers in the value of Cook’s D, which may
have a little influence on the fitting effect of the
model due to the extreme data. However, the data
sample size in this paper is large, so the influence can
be ignored theoretically.
On the whole, from the model diagnosis of
multiple linear regression analysis, there is no
problem in model setting and no missetting model.
Finally, ANOVA was used to verify whether each
variable of the above multiple linear regression model
could not be deleted. The results of the analysis of
variance are as follows.
Figure 12: ANOVA results of multiple linear regression.
It can be seen from the figure above that the P-
values of the F-test of different variables removed are
all less than 0.05, which indicates that these variables
have a significant impact on the model and there are
significant differences between the removed model
and the original model.
4 DISCUSSION
From the above analysis, there was a significant effect
on the number of heart disease deaths (per 100,000)
by sex, race and region. The average death toll for
men was nearly 1.55 times higher than for women,
probably because women were better at taking care of
themselves.
In terms of race, the number of heart disease
deaths of American Indians and blacks was higher
than that of white Americans, with blacks having the
highest number. Asians and Hispanics are below the
average for white Americans. This may be related to
the high cost of medical treatment in the United
States, which is unaffordable for ordinary people
without medical insurance. The income of American
Indians and blacks is generally lower than that of
white Americans, resulting in fewer people buying
health insurance and thus more deaths. Asians and
Hispanics who move to America tend to be wealthy
locals, have better health insurance, and die less. It
could also have something to do with the smaller
sample sizes of Asians and Hispanics.
In terms of regions, the distribution of most
regions is similar, and only the District of Columbia,
Massachusetts, Minnesota and Rhode Island have
lower mean values, which should be related to the
environment and income and expenditure of each
region.
In the multiple linear regression, we found that
gender, race and region were all independent
variables that could not be ignored. In ANOVA, no
matter which variable was deleted, it would have a
significant impact on the multiple linear regression
model.
Compared with the previous analysis, the
previous data were mainly used for trend analysis to
study the changing trend of the number of deaths from
heart disease. Or to study the related causes of death
from heart disease, to see which causes have a higher
number of deaths. This paper does not stop at the
description and summary of data. Further analysis of
variance, multiple linear regression and other
methods are used to reveal the relationship between
data from the perspective of statistics.
5 CONCLUSIONS
This study found that the number of heart disease
deaths (per 100,000 people) was significantly
different by gender, race and region. Men were more
likely to die from heart disease than women, and
women were better able to take care of themselves.
Indians and blacks have higher rates of death from
heart disease than other races, which is probably
related to the lower income of Indians and blacks.
Regionally, people in the District of Columbia,
Massachusetts, Minnesota and Rhode Island died less
frequently from heart disease than people in other
regions.
Due to the data, there are still many conjectures
that cannot be verified. For example, the reason for
the higher death rate of heart disease among Indians
and blacks, and the reason for the higher death rate of
men than women remains to be revealed.
REFERENCES
Benjamin EJ, Muntner P, Alonso A, Bittencourt MS,
Callaway CW, Carson AP, et al. Heart disease and
stroke statistics—2019 update: a report from the