Authors:
Alejandro Gabriel Villanueva Zacarias
;
Laura Kassner
and
Bernhard Mitschang
Affiliation:
Graduate School of Excellence Advanced Manufacturing Engineering, Germany
Keyword(s):
Data Analytics, Unstructured Data, Text Data, Classification Algorithms, Text Classification.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Artificial Intelligence and Decision Support Systems
;
Data Mining
;
Databases and Information Systems Integration
;
Enterprise Information Systems
;
Industrial Applications of Artificial Intelligence
;
Natural Language Interfaces to Intelligent Systems
;
Performance Evaluation and Benchmarking
;
Problem Solving
;
Sensor Networks
;
Signal Processing
;
Soft Computing
Abstract:
Automated Text Classification (ATC) is an important technique to support industry expert workers, e.g. in product quality assessment based on part failure reports. In order to be useful, ATC classifiers must entail reasonable costs for a certain accuracy level and processing time. However, there is little clarity on how to customize the composing elements of a classifier for this purpose. In this paper we highlight the need to configure an ATC classifier considering the properties of the algorithm and the dataset at hand. In this context, we develop three contributions: (1) the notion of ATC Configuration to arrange the relevant design choices to build an ATC classifier, (2) a Feature Selection technique named Smart Feature Selection, and (3) a visualization technique, called ATCC Performance Cube, to translate the technical configuration aspects into a performance visualization. With the help of this Cube, business decision-makers can easily understand the performance and cost varia
bility that different ATC Configurations have in their specific application scenarios.
(More)