Figure 11: Testing set prediction outcomes (Picture credit:
Original).
This paper performed fitting regression analysis
on the median home price MEDV and the estimated
value of the test data, and the results showed that the
residual standard error is 3.033 and adjusted R-
squared value is 0.859. Many of the points are
clustered around the blue fitting line in the above
image, even though a tiny portion of the points are
dispersed across the fitting line, and the random-
forests model has a usually accurate prediction
consequence.
5 DISCUSSION
First, there are additional factors that affect the cost
of housing, such as size, kind, height, and condition.
Second, I eliminated variables and outliers that had
an impact on the model's fit during the data analysis
stage (Wang, Bah and Hammad, 2019).
6 CONCLUSION
6.1 Comparison between Different
Models
Residual standard error of multiple linear regression
prediction is 4.402 and the R-squared, a measure of
determination, is 0.7927. R-squared coefficient of
determination is 0.86, and the residual standard error
of the prediction from random forest regression is
3.033. The information demonstrates that the random
forest model is not only better than the linear
regression model in data fitting optimization, but also
has higher prediction accuracy than the linear
regression model.
6.2 Different Factors' Effects on Home
Prices
The regression coefficient and scatter plot of the
model show that the percent of people with fewer
socioeconomic status (LSTAT) and the sheer number
of rooms in the house (RM) have the biggest effects
on housing costs. In other words, the price of an area
increases exponentially as the count of rooms
increases. Likewise, when the population's share of
the lower class rises, average disposable income falls,
which in turn causes a decline in home values. The
price of a home decreases with increasing weighted
distance (DIS: locations of five Boston employment
centers. ) from Boston's five major neighborhoods,
but prices increase in areas with low nitric oxide
concentration (NOX), where there is greater housing
dispersal. The price of housing decreases when the
teacher-to-student ratio (PIRATIO) increases. High
property taxes have a negative effect on home prices,
but this effect is less pronounced in certain places.
6.3 Outlook
The above conclusions are from a macro point of
view, the conclusion is only general. If researchers
want to be specific to a particular house, they need to
analyze according to the actual local situation. At the
same time, due to time constraints, this study only
built and trained linear regression and random forest
models. It is hoped that more models will be added
for analysis and comparison in future studies, and the
optimal model will be selected for better prediction.
REFERENCES
SY Chen. "Study of the new juvenile housing safety in our
large metropolis——Take Shenzhen for example."
Shanghai Real Estate, vol.1, 2023, pp.41-45.
D.Harrison, and DL.Rubinfeld. "Hedonic housing prices
and the demand for clean air." Journal of
environmental economics and management, vol.5,
Mar.1978, pp.81-102.
ZK Chen, XR Cheng. "Regression analysis and prediction
of housing price based on gradient descent
algorithm. " Information Technology and
Informatization, vol.5, 2020, pp. 10-13.
WW Yin. "Research on verification methods of variable
coefficient error model of Boston housing data. "
Journal of Chongqing Technology and Business
University(Natural Science Edition), vol.3, 2018,
pp.26-29.
K.Horn, M.Merante. "Is Home Sharing Driving up Rents?
Evidence from Airbnb in Boston." Journal of
Housing Economics, vol.38, Dec.2017, pp.14-24.
D.Makowski, MS.Ben-Shachar, I.Patil, et al."Methods and
Algorithms for Correlation Analysis in R. " The
Journal of Open Source Software, vol.51, Jul.2020,
pp.2306.