the customer requires and whether omitting
categories with only a few member products is
acceptable. In order to investigate the real-world
relevance more closely, we suggest using the model
first in a semi-automated process where categories are
proposed to the user. Based on the user decisions, the
model can then be further optimized and the degree
of automation can be increased. Apart from
classifying new products, our approach can also be
used for reclassification an already classified product
data base into a different classification system.
In summary, our results have shown that the
classification of food products can be carried out
during the initial product data generation step using
only the product name. Standard algorithms are
capable of achieving satisfying results without the
need for hyper-specialized and difficult to optimize
models. Our work can be extended to products from
other segments like clothing or consumer electronics.
Further research is needed to answer the question
whether a model covering products from all segments
is better than a compartmentalized approach with one
separate model for each segment.
ACKNOWLEDGEMENTS
The authors would like to thank the German Federal
Ministry of Education and Research for supporting
their work through the KMU-innovativ programme
under grant number 01—S18018.
REFERENCES
Allweyer, O. (2019). Entwicklung maschineller Lern-
verfahren zur Klassifizierung von Produktdatensätzen
im Einzelhandel, Master thesis, University of Applied
Sciences Trier.
Cevahir, A. and K. Murakami (2016). Large-scale Multi-
class and Hierarchical Product Categorization for an E-
commerce Giant. Proceedings of COLING 2016, 525–
535.
Chavaltada, C., K. Pasupa and D. R. Hardoon (2017). A
Comparative Study of Machine Learning Techniques
for Automatic Product Categorisation. Advances in
Neural Networks - ISNN 2017.
Chen, J. and D. Warren (2013). Cost-sensitive Learning for
Large-scale Hierarchical Classification of Commercial
Products. Proceedings of the CIKM 2013.
Cortes, C. and V. N. Vapnik (1995). Support-vector
networks. Machine Learning. 20 (3): 273–297.
Ding, Y., M. Korotkiy, B. Omelayenko, V. Kartseva, V.
Zykov, M. Klein, E. Schulten and D. Fensel (2002).
GoldenBullet: Automated Classification of Product
Data in E-commerce. Proceedings of BIS 2002.
GS1 Germany, Global Product Classification (GPC)
(2018). https://www.gs1-germany.de/ (02.03.20).
Ha, J. W., H. Pyo and J. Kim. (2016). Large-scale item
categorization in e-commerce using multiple recurrent
neural networks. Proceedings of the 22nd ACM
SIGKDD.
Hepp, M. and J. Leukel and V. Schmitz (2007). A
quantitative analysis of product categorization
standards: content, coverage, and maintenance of
eCl@ss, UNSPSC, eOTD, and the RosettaNet
Technical Dictionary, Knowledge and Information
Systems 13.1, 77–114.
Ho, T.K. (1995). Random Decision Forests. Proceedings of
the 3rd ICDAR, 278–282.
Jones, K. S. (1972). A statistical interpretation of term
specificity and its application in retrieval. Journal of
documentation, 28(1), 11–21.
Kohavi, R. (1995). A study of cross-validation and
bootstrap for accuracy estimation and model selection".
Proceedings of the 14th International Joint Conference
on Artificial Intelligence. 2 (12): 1137–1143.
Kozareva, Z. (2015). Everyone Likes Shopping! Multi-
class Product Categorization for e-Commerce.
Proceedings of the HLTC 2015, 1329–1333.
Maron, M. E. (1961). Automatic Indexing: An
Experimental Inquiry. Journal of the ACM. 8 (3).
Mikolov, T.; et al. (2013). Efficient Estimation of Word
Representations in Vector Space, arXiv:1301.3781.
Porter, M. F. (1980). An algorithm for suffix stripping.
Program, 14, 130–137.
Rosenblatt, F. (1958): The perceptron: a probabilistic
model for information storage and organization in the
brain. Psychological Reviews 65 (1958) 386–408.
Shankar, S. and I. Lin (2011). Applying Machine Learning
to Product Categorization. http://cs229.stanford.edu/
proj2011/LinShankar-Applying Machine Learning to
Product Categorization.pdf (02.03.20).
Scikit-Learn (2019). https://scikit-learn.org/ (02.03.20).
Song F., Liu S. and Yang J. (2005) A comparative study on
text representation schemes in text categorization
Pattern Anal Applic 8: 199–209.
Song, G.; et al. (2014). Short Text Classification: A Survey.
Journal of Multimedia.
Sun, C., Rampalli, N., Yang, F., Doan, A.. (2014) Chimera:
Large-Scale Classification using Machine Learning,
Rules, and Crowdsourcing. Proceedings of the VLDB
Endowment,Vol. 7, No. 13.
Taddy, M. (2019). Stochastic Gradient Descent. Business
Data Science: Combining Machine Learning and
Economics to Optimize, Automate, and Accelerate
Business Decisions. McGraw-Hill. 303–307.
Uysal, A. K., and Gunal, S. (2014). The impact of
preprocessing on text classification. Information
Processing & Management, 50(1), 104-112.
Vandic, D., F. Frasincar and U. Kaymak (2018). A
Framework for Product Description Classification in E-
Commerce. Journal of Web Engineering. 17, 1–27.