vide an approach to implement information extraction
for arbitrary documents or data formats, e.g., tables.
Additionally, ARTIFACT addresses more com-
plex IE problems because it includes tasks like con-
verting non-machine-readable into machine-readable
documents, e.g., PDF to text.
Due to the application in a real-world project, we
have shown that our pattern supports companies to
automate their information extraction process succes-
sively and gains business value.
7 OUTLINE
In the course of future development, we would like
to add a classification mechanism to the information
extraction processes. We assume that there are sev-
eral document classes with different characteristics.
Possible pipelines could perform differently to single
document classes.
Additionally, we would like to add caching mech-
anisms to the different pipeline runs. As shown in
Section 5, some pipeline parts are recurring when pro-
cessing a specific document. Due to performance rea-
sons, the system could cache intermediate results of
specific steps.
Beyond that, we would like to optimize the choice
of possible pipelines if the results were nearly equal.
The system should be able to take other metrics like
the expected pipeline performance into account when
choosing.
REFERENCES
Beck, K. (2003). Extreme Programming - die revolu-
tionäre Methode für Softwareentwicklung in kleinen
Teams ; [das Manifest]. Pearson Deutschland GmbH,
München.
Camposo, G. (2021). Cloud Native Integration with Apache
Camel - Building Agile and Scalable Integrations for
Kubernetes Platforms. Apress, New York.
Cardie, C. (1997). Empirical Methods in Information Ex-
traction. page 15.
Chowdhury, S. R., Salahuddin, M. A., Limam, N., and
Boutaba, R. (2019). Re-Architecting NFV Ecosys-
tem with Microservices: State of the Art and Research
Challenges. 33(3):168–176.
Chris Richardson (2018). Microservices Patterns.
Dragoni, N., Giallorenzo, S., Lafuente, A. L., Mazzara, M.,
Montesi, F., Mustafin, R., and Safina, L. (2017). Mi-
croservices: Yesterday, today, and tomorrow.
Evans, E. and Evans, E. J. (2004). Domain-driven De-
sign - Tackling Complexity in the Heart of Software.
Addison-Wesley Professional, Boston.
Fowler, M. (2013). An appropriate use of metrics.
Fuld, I., Partner, J., Fisher, M., and Bogoevici, M. (2012).
Spring Integration in Action -. Simon and Schuster,
New York.
Hanson, C. and Sussman, G. J. (2021). Software Design for
Flexibility - How to Avoid Programming Yourself into
a Corner. MIT Press, Cambridge.
Hashmi, K. A., Liwicki, M., Stricker, D., Afzal, M. A.,
Afzal, M. A., and Afzal, M. Z. (2021). Current Sta-
tus and Performance Analysis of Table Recognition in
Document Images with Deep Neural Networks.
Hohpe, G. and Woolf, B. (2003). Enterprise Integration
Patterns - Designing, Building And Deploying Mes-
saging Solutions. Addison-Wesley Professional.
Jamshidi, P., Pahl, C., Mendonca, N. C., Lewis, J., and
Tilkov, S. (2018). Microservices: The Journey So Far
and Challenges Ahead. 35(3):24–35.
Lewis, J. and Fowler, M. (2014). Microservices.
Marin-Perianu, R., Hartel, P., and Scholten, H. (2005).
A Classification of Service Discovery Protocols.
page 23.
Newman, S. (2015). Building Microservices: Designing
Fine-Grained Systems. O’Reilly Media, first edition
edition.
Peltz, C. (2003). Web services orchestration and choreog-
raphy. 36(10):46–52.
Rubin, K. S. (2012). Essential Scrum - A Practical Guide
to the Most Popular Agile Process. Addison-Wesley
Professional, Boston, 01. edition.
Schmidts, O., Kraft, B., Schreiber, M., and Zündorf, A.
(2018). Continuously evaluated research projects
in collaborative decoupled environments. In 2018
IEEE/ACM 5th International Workshop on Software
Engineering Research and Industrial Practice (SER
IP), pages 2–9.
Schreiber, M., Kraft, B., and Zündorf, A. (2017). Metrics
Driven Research Collaboration: Focusing on Com-
mon Project Goals Continuously. In 2017 IEEE/ACM
4th International Workshop on Software Engineering
Research and Industrial Practice (SER IP), pages 41–
47.
Seidler, K. and Schil, A. (2011). Service-oriented infor-
mation extraction. In Proceedings of the 2011 Joint
EDBT/ICDT Ph.D. Workshop on - PhD ’11, pages 25–
31. ACM Press.
Tarr, P., Ossher, H., Harrison, W., and Sutton, S. (1999).
N degrees of separation: multi-dimensional separa-
tion of concerns. In Proceedings of the 1999 Inter-
national Conference on Software Engineering (IEEE
Cat. No.99CB37002), pages 107–119.
Voron, F. (2021). Building Data Science Applications with
FastAPI - Develop, manage, and deploy efficient ma-
chine learning applications with Python. Packt Pub-
lishing Ltd, Birmingham.
Walls, C. (2015). Spring Boot in Action -. Simon and Schus-
ter, New York.
ICEIS 2022 - 24th International Conference on Enterprise Information Systems
28