ing algorithms make them difficult to comprehend.
Although, the algorithms provide information on the
importance of the covariates; they do not provide any
information on the direction (positive or negative) of
contribution in the model. Thus, to study k−inflated
count data sets, the corresponding regression mod-
els are appropriate for interpretations while, machine
learning algorithms give superior predictions. So, it is
recommended to study both the approaches. Our fu-
ture work involves studying the approach on a larger
data set from a different area. We plan to extend the
comparative study by including various artificial neu-
ral network approaches.
REFERENCES
Alfredo, S. G., Dutang, C., and Petrini, L. (2018). Machine
learning methods to perform pricing optimization. a
comparison with standard glms. Variance, Casualty
Actuarial Society.
Arief, F. M. and Murfi, H. (2018). The accuracy of xgboost
for insurance claim prediction. International Journal
of Advances in Soft Computing and Its Applications.
Arora, M. (2018). Extended Poisson models for count data
with inflated frequencies. PhD thesis, Old Dominion
University.
Bae, S., Famoye, F., Wulu, J., Bartolucci, A., and Singh,
K. (2005). A rich family of generalized poisson re-
gression models with applications. Mathematics and
Computers in Simulation, 69(1):4–11. Second Special
Issue: Selected Papers of the MSSANZ/IMACS 15th
Biennial Conference on Modelling and Simulation.
Breiman, L. (1996b). Out-of-bag estimation. Technical re-
port, Department of Statistics, University of Califor-
nia, Berkeley.
Breiman, L. (2001). Random forests. Machine Learning,
45(1):5–32.
Burnham, K. P. and Anderson, D. R. (2002). Model Selec-
tion and Multimodel Inference. Springer.
Cameron, A. C. and Trivedi, P. K. (2013). Regression Anal-
ysis of Count Data. Cambridge Press, London, UK.
Chant, D. (1974). On asymptotic tests of composite
hypotheses in nonstandard conditions. Biometrika,
61:291–298.
Chen, T. and Guestrin, C. (2016). Xgboost: A scal-
able tree boosting system. Proceedings of the 22nd
ACM SIGKDD International Conference on Knowl-
edge Discovery and Data Mining.
Dietterich, T. G. (2000). An experimental comparison of
three methods for constructing ensembles of decision
trees: Bagging, boosting, and randomization. Ma-
chine Learning, 40(2):139–157.
Freund, Y. and Schapire, R. E. (1997). A decision-theoretic
generalization of on-line learning and an application
to boosting. Journal of Computer and System Sci-
ences, 55(1):119–139.
Greene, W. (1994). Accounting for excess zeros and sam-
ple selection in Poisson and negative binomial regres-
sion models. Working papers, New York University,
Leonard N. Stern School of Business, Department of
Economics.
Gurmu, S. and Trivedi, P. (1996). Excess zeros in count
models for recreational trips. Journal of Business &
Economic Statistics, 14(4):469–477.
Hall, D. B. (2000). Zero-inflated Poisson and binomial re-
gression with random effects: A case study. Biomet-
rics, 56:1030–1039.
Kass, R. E. and Raftery, A. E. (1995). Bayes fac-
tors. Journal of the American Statistical Association,
90(430):773–795.
Lambert, D. (1992). Zero-inflated Poisson regression, with
an application to defects in manufacturing. Techno-
metrics, 34:1–14.
Lee, S.-K. and Jin, S. (2006). Decision tree approaches for
zero-inflated count data. Journal of Applied Statistics,
33(8):853–865.
Lin, T. H. and Tsai, M.-H. (2012). Modeling health survey
data with excessive zero and k responses. Statistics in
Medicine, 32:1572–1583.
Lord, D., Guikema, S. D., and Geedipally, S. R. (2008).
Application of the Conway-Maxwell-Poisson general-
ized linear model for analyzing motor vehicle crashes.
Accident Analysis & Prevention, 40:1123 – 1134.
Lord, D., Washington, S. P., and Ivan, J. N. (2005). Poisson,
poisson-gamma and zero-inflated regression models
of motor vehicle crashes: balancing statistical fit and
theory. Accident Analysis & Prevention, 37(1):35 –
46.
Payandeh Najafabadi, A. T. and MohammadPour, S. (2018).
A k-inflated negative binomial mixture regression
model: Application to rate–making systems. Asia-
Pacific Journal of Risk and Insurance, 12(2).
Police, T. (2019). Toronto police service public safety data
portal. website.
Ridout, M., Demetrio, C., and Hinde, J. (1998). Models for
count data with many zeros. In International Biomet-
ric Conference, Cape Town.
Schwarz, G. (1978). Estimating the dimension of a model.
The Annals of Statistics, 6(2):461–464.
Sellers, K. F. and Shmueli, G. (2010). Predict-
ing censored count data with COM-Poisson re-
gression. Technical report, Robert H. Smith
School Research Paper No. RHS-06-129. Avail-
able at SSRN: https://ssrn.com/abstract=1702845 or
http://dx.doi.org/10.2139/ssrn.1702845.
Shapiro, A. (1985). Asymptotic distribution of test statistics
in the analysis of moment structures under inequality
constraints. Biometrika, 72:133–144.
Tin Kam Ho (1995). Random decision forests. In Pro-
ceedings of 3rd International Conference on Docu-
ment Analysis and Recognition, volume 1, pages 278–
282 vol.1.
Tin Kam Ho (1998). The random subspace method for con-
structing decision forests. IEEE Transactions on Pat-
tern Analysis and Machine Intelligence, 20(8):832–
844.
DATA 2021 - 10th International Conference on Data Science, Technology and Applications
38