loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Authors: Michael Sildatke 1 ; Hendrik Karwanni 1 ; Bodo Kraft 1 and Albert Zündorf 2

Affiliations: 1 FH Aachen, University of Applied Sciences, Germany ; 2 University of Kassel, Germany

Keyword(s): Architectural Design, Refactoring and Patterns, Model-driven Software Engineering, Process Modeling, Quality Management, Software and Systems Modeling, Enterprise Information Systems, Information Extraction, Document Classification, Feature Detection, Software Metrics and Measurement.

Abstract: Information Extraction (IE) processes are often business-critical, but very hard to automate due to a heterogeneous data basis. Specific document characteristics, also called features, influence the optimal way of processing. Architecture for Automated Generation of Distributed Information Extraction Pipelines (ARTIFACT) supports businesses in successively automating their IE processes by finding optimal IE pipelines. However, ARTIFACT treats each document the same way, and does not enable document-specific processing. Single solution strategies can perform extraordinarily well for documents with particular traits. While manual approvals are superfluous for these documents, ARTIFACT does not provide the opportunity for Fully Automatic Processing (FAP). Therefore, we introduce an enhanced pattern that integrates an extensible and domain-independent concept of feature detection based on microservices. Due to this, we create two fundamental benefits. First, the document-specific process ing increases the quality of automated generated IE pipelines. Second, the system enables FAP to eliminate superfluous approval efforts. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 3.137.186.26

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Sildatke, M. ; Karwanni, H. ; Kraft, B. and Zündorf, A. (2022). FUSION: Feature-based Processing of Heterogeneous Documents for Automated Information Extraction. In Proceedings of the 17th International Conference on Software Technologies - ICSOFT; ISBN 978-989-758-588-3; ISSN 2184-2833, SciTePress, pages 250-260. DOI: 10.5220/0011351100003266

@conference{icsoft22,
author={Michael Sildatke and Hendrik Karwanni and Bodo Kraft and Albert Zündorf},
title={FUSION: Feature-based Processing of Heterogeneous Documents for Automated Information Extraction},
booktitle={Proceedings of the 17th International Conference on Software Technologies - ICSOFT},
year={2022},
pages={250-260},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011351100003266},
isbn={978-989-758-588-3},
issn={2184-2833},
}

TY - CONF

JO - Proceedings of the 17th International Conference on Software Technologies - ICSOFT
TI - FUSION: Feature-based Processing of Heterogeneous Documents for Automated Information Extraction
SN - 978-989-758-588-3
IS - 2184-2833
AU - Sildatke, M.
AU - Karwanni, H.
AU - Kraft, B.
AU - Zündorf, A.
PY - 2022
SP - 250
EP - 260
DO - 10.5220/0011351100003266
PB - SciTePress