We would like to apply different machine-
learning-based algorithms for feature matching to find
more suitable document classifications in future work.
Also, we would like to use these algorithms to deter-
mine which features may have more or less impact on
the optimal way of processing. In parallel, we would
like to improve the quality of the detectors that cur-
rently do not reach the required quality criteria to con-
sider the corresponding features.
REFERENCES
Aguirre, S. and Rodriguez, A. (2017). Automation of
a business process using robotic process automation
(RPA): A case study. In Communications in Computer
and Information Science, Communications in com-
puter and information science, pages 65–71. Springer
International Publishing, Cham.
Albrecht, A. and Naumann, F. (2008). Managing ETL pro-
cesses. In NTII.
Beck, K. (2003). Extreme Programming - die revolu-
tionäre Methode für Softwareentwicklung in kleinen
Teams ; [das Manifest]. Pearson Deutschland GmbH,
München.
Borko, H. and Bernick, M. (1963). Automatic document
classification. J. ACM, 10(2):151–162.
Cardie, C. (1997). Empirical Methods in Information Ex-
traction. page 15.
Dumas, M., Van Der Aalst, W. M., and ter Hofstede, A.
H. M. (2005). Process-aware information systems.
John Wiley & Sons, Nashville, TN.
Evans, E. and Evans, E. J. (2004). Domain-driven De-
sign - Tackling Complexity in the Heart of Software.
Addison-Wesley Professional, Boston.
Fowler, M. (2013). An appropriate use of metrics.
Hanson, C. and Sussman, G. J. (2021). Software Design for
Flexibility - How to Avoid Programming Yourself into
a Corner. MIT Press, Cambridge.
Hashmi, K. A., Liwicki, M., Stricker, D., Afzal, M. A.,
Afzal, M. A., and Afzal, M. Z. (2021). Current Sta-
tus and Performance Analysis of Table Recognition in
Document Images with Deep Neural Networks.
Ivan
ˇ
ci
´
c, L., Suša Vugec, D., and Bosilj Vukši
´
c, V. (2019).
Robotic process automation: Systematic literature re-
view. In Business Process Management: Blockchain
and Central and Eastern Europe Forum, Lecture notes
in business information processing, pages 280–295.
Springer International Publishing, Cham.
Jablonski, S. and Bussler, C. (1996). Workflow Manage-
ment: Modeling Concepts, Architecture, and Imple-
mentation.
Jiang, S., Pang, G., Wu, M., and Kuang, L. (2012). An
improved k-nearest-neighbor algorithm for text cate-
gorization. Expert Syst. Appl., 39(1):1503–1509.
Lilleberg, J., Zhu, Y., and Zhang, Y. (2015). Support
vector machines and word2vec for text classification
with semantic features. In 2015 IEEE 14th Interna-
tional Conference on Cognitive Informatics & Cogni-
tive Computing (ICCI*CC). IEEE.
Liu, Q. and Ng, P. A. (1996). Document classification and
information extraction. In Document Processing and
Retrieval, pages 97–145. Springer US, Boston, MA.
Newman, S. (2015). Building Microservices: Designing
Fine-Grained Systems. O’Reilly Media, first edition
edition.
Rubin, K. S. (2012). Essential Scrum - A Practical Guide
to the Most Popular Agile Process. Addison-Wesley
Professional, Boston, 01. edition.
Saggion, H., Funk, A., Maynard, D., and Bontcheva, K.
(2007). Ontology-Based information extraction for
business intelligence. In The Semantic Web, Lecture
notes in computer science, pages 843–856. Springer
Berlin Heidelberg, Berlin, Heidelberg.
Schmidts, O., Kraft, B., Siebigteroth, I., and Zündorf, A.
(2019). Schema matching with frequent changes on
semi-structured input files: A machine learning ap-
proach on biological product data. In Proceedings of
the 21st International Conference on Enterprise Infor-
mation Systems. SCITEPRESS - Science and Technol-
ogy Publications.
Sebastiani, F. (2002). Machine learning in automated text
categorization. ACM Comput. Surv., 34(1):1–47.
Sildatke, M., Karwanni, H., Kraft, B., and Zündorf,
A. (2022). ARTIFACT: Architecture for Auto-
mated Generation of Distributed Information Extrac-
tion Pipelines. In Proceedings of the 24th Interna-
tional Conference on Enterprise Information Systems
- Volume 2, pages 17–28.
Skoutas, D. and Simitsis, A. (2006). Designing ETL pro-
cesses using semantic web technologies. In Proceed-
ings of the 9th ACM international workshop on Data
warehousing and OLAP - DOLAP ’06, New York,
New York, USA. ACM Press.
Tarr, P., Ossher, H., Harrison, W., and Sutton, S. (1999).
N degrees of separation: multi-dimensional separa-
tion of concerns. In Proceedings of the 1999 Inter-
national Conference on Software Engineering (IEEE
Cat. No.99CB37002), pages 107–119.
van der Aalst, W. M. P. (2004). Business process manage-
ment demystified: A tutorial on models, systems and
standards for workflow management. In Lectures on
Concurrency and Petri Nets, Lecture notes in com-
puter science, pages 1–65. Springer Berlin Heidel-
berg, Berlin, Heidelberg.
Wimalasuriya, D. C. and Dou, D. (2010). Ontology-based
information extraction: An introduction and a survey
of current approaches. J. Inf. Sci., 36(3):306–323.
ICSOFT 2022 - 17th International Conference on Software Technologies
260