Authors:
Michael Sildatke
1
;
Hendrik Karwanni
1
;
Bodo Kraft
1
and
Albert Zündorf
2
Affiliations:
1
FH Aachen, University of Applied Sciences, Germany
;
2
University of Kassel, Germany
Keyword(s):
Architectural Design, Refactoring and Patterns, Model-driven Software Engineering, Process Modeling, Quality Management, Software and Systems Modeling, Enterprise Information Systems, Information Extraction, Document Classification, Feature Detection, Software Metrics and Measurement.
Abstract:
Information Extraction (IE) processes are often business-critical, but very hard to automate due to a heterogeneous data basis. Specific document characteristics, also called features, influence the optimal way of processing. Architecture for Automated Generation of Distributed Information Extraction Pipelines (ARTIFACT) supports businesses in successively automating their IE processes by finding optimal IE pipelines. However, ARTIFACT treats each document the same way, and does not enable document-specific processing. Single solution strategies can perform extraordinarily well for documents with particular traits. While manual approvals are superfluous for these documents, ARTIFACT does not provide the opportunity for Fully Automatic Processing (FAP). Therefore, we introduce an enhanced pattern that integrates an extensible and domain-independent concept of feature detection based on microservices. Due to this, we create two fundamental benefits. First, the document-specific process
ing increases the quality of automated generated IE pipelines. Second, the system enables FAP to eliminate superfluous approval efforts.
(More)