and criteria is crucial for successful communication
in the project.
The first definition of usage goals was based on in-
complete information about the characteristics of doc-
uments and features. The discussion of intermediate
results with the participants led to the development
of suitable feature definitions and usage goals. Based
on this we could refine the evaluation criteria and the
rating functions for extracted values. The customers
gave the feedback that the approach is suitable and
helpful in the described setting.
In the end, we achieved a common understanding
of the influencing factors of feature extraction scenar-
ios for text, how these factors affect the evaluation
results and what solutions (or what kind of solutions)
are suitable to extract what kind of features.
In the following, we give an outlook on which im-
provements can be made especially during the evalu-
ation phase of text mining projects.
First of all, introducing a form of weighted F1
score could prove as useful. Depending on the ap-
plication scenario it might be more important to find
less, but reliable results. This may take into account
a higher rate of FNs (like the feature date of incident
in our scenario 1). In contrast, for use cases similar to
scenario 2, it might be wiser to rather allow a higher
rate of FPs in the document than losing some impor-
tant information.
Although not concerning the three features se-
lected from our industrial use case in this work, hav-
ing to deal with highly imbalanced occurrences of
features within the document sets causes well-known
challenges. For example, features with a small set
of target classes (like yes/no), predicting always ’yes’
may achieve a high evaluation score when the target
values occur very uneven. Here using Cohen’s kappa
(Cohen, 1960) instead of F1 score seems to be a good
option and will be investigated in the future.
