Since the visual approach already maintains a high
level of accuracy and precision, not reaching 100%,
but working with visual and not semantic characteris-
tics, the textual approach acts as a consolidator, ad-
dressing textual characteristics and ensuring 100%
precision in the classification of the documents. In
this study case, a high measure of precision will be
more beneficial than a high measure of accuracy. This
factor demonstrates the efficient contribution of this
paper because the mobile insurance company in ques-
tion, the focus of our case study, will be able to invest
in an aggressive marketing strategy without having to
double the number of call center resources to meet
the new needs. Adding the visual approach prevents
fraudulent attempts — fundamental in our mobile in-
surance company scenario.
As a limitation of this work, we point out that
other technologies could be implemented to increase
the accuracy of the textual approach, such as Natural
Language Processing. We will consider this limita-
tion as future work. Another future work will be col-
lecting actual fraudulent data to experiment using our
hybrid framework.
REFERENCES
Afzal, M. Z., Capobianco, S., Malik, M. I., Marinai, S.,
Breuel, T. M., Dengel, A., and Liwicki, M. (2015).
Deepdocclassifier: Document classification with deep
convolutional neural network. In 2015 13th Interna-
tional Conference on Document Analysis and Recog-
nition (ICDAR), pages 1111–1115. IEEE.
Audebert, N., Herold, C., Slimani, K., and Vidal, C.
(2019). Multimodal deep networks for text and
image-based document classification. arXiv preprint
arXiv:1907.06370.
Fawzi, A., Samulowitz, H., Turaga, D., and Frossard, P.
(2016). Adaptive data augmentation for image clas-
sification. In 2016 IEEE International Conference on
Image Processing (ICIP), pages 3688–3692. Ieee.
Friedman, J., Hastie, T., and Tibshirani, R. (2001). The
elements of statistical learning, volume 1. Springer
series in statistics New York.
Harley, A. W., Ufkes, A., and Derpanis, K. G. (2015). Eval-
uation of deep convolutional nets for document image
classification and retrieval. In 2015 13th International
Conference on Document Analysis and Recognition
(ICDAR), pages 991–995. IEEE.
Hassan, H., YehiaDahab, M., Bahnassy, K., and Idrees,
A. M. (2015). Arabic documents classification method
a step towards efficient documents summarization. In-
ternational Journal on Recent and Innovation Trends
in Computing and Communication, 3(1):351–359.
Islam, N., Islam, Z., and Noor, N. (2017). A survey on
optical character recognition system. arXiv preprint
arXiv:1710.05703.
Khan, M. J., Yousaf, A., Abbas, A., and Khurshid, K.
(2018). Deep learning for automated forgery detec-
tion in hyperspectral document images. Journal of
Electronic Imaging, 27(5):053001.
Khanalni, S. and Gharehchopogh, F. S. (2018). A new ap-
proach for text documents classification with invasive
weed optimization and naive bayes classifier. Journal
of Advances in Computer Engineering and Technol-
ogy, 4(3):31–40.
K
¨
olsch, A., Afzal, M. Z., Ebbecke, M., and Liwicki, M.
(2017). Real-time document image classification us-
ing deep cnn and extreme learning machines. In
2017 14th IAPR International Conference on Docu-
ment Analysis and Recognition (ICDAR), volume 1,
pages 1318–1323. IEEE.
Krithara, A., Amini, M. R., Renders, J.-M., and Goutte, C.
(2008). Semi-supervised document classification with
a mislabeling error model. In European Conference
on Information Retrieval, pages 370–381. Springer.
Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Im-
agenet classification with deep convolutional neural
networks. In Advances in neural information process-
ing systems, pages 1097–1105.
Lee, Y., Song, J., and Won, Y. (2019). Improving personal
information detection using ocr feature recognition
rate. The Journal of Supercomputing, 75(4):1941–
1952.
Pathak, A., Ruhela, A., Saroha, A. K., and Bhardwaj, A.
(2019). Examining robustness of google vision api
based on the performance on noisy images.
Popereshnyak, S., Suprun, O., Suprun, O., and Wieck-
owski, T. (2018). Personal documents identification
system development using neural network. In 2018
IEEE 13th International Scientific and Technical Con-
ference on Computer Sciences and Information Tech-
nologies (CSIT), volume 1, pages 129–134. IEEE.
Revanasiddappa, M. and Harish, B. (2019). A novel text
representation model to categorize text documents us-
ing convolution neural network.
Sicre, R., Awal, A. M., and Furon, T. (2017). Identity doc-
uments classification as an image classification prob-
lem. In International Conference on Image Analysis
and Processing, pages 602–613. Springer.
Su, Y., Li, W., Nie, W., Song, D., and Liu, A.-A. (2019).
Unsupervised feature learning with graph embedding
for view-based 3d model retrieval. IEEE Access,
7:95285–95296.
Tensmeyer, C. and Martinez, T. (2017). Analysis of convo-
lutional neural networks for document image classi-
fication. In 14th IAPR International Conference on
Document Analysis and Recognition, ICDAR 2017,
Kyoto, Japan, November 9-15, 2017, pages 388–393.
Xiao, Y. and Cho, K. (2016). Efficient character-level docu-
ment classification by combining convolution and re-
current layers. arXiv preprint arXiv:1602.00367.
Personal Documents Classification using a Hybrid Framework at a Mobile Insurance Company: A Case Study
497