THE CHALLENGE OF AUTOMATICALLY ANNOTATING
SOLUTION DOCUMENTS
Comparing Manual and Automatic Annotation of Solution Documents
in the Field of Mechanical Engineering
Andreas Kohn, Udo Lindemann
Institute of product development, Technische Universität München, Boltzmannstraße 15, 85748 Garching, Germany
Gerhard Peter
Festo GmbH & Ko. KG, Plieninger Straße 50, 73760 Ostfidern-Schamhausen, Germany
Keywords: Solution knowledge, Manual annotation, Evaluation of annotation.
Abstract: This paper contains part of the actual research in the use case PROCESSUS of the German research
program THESEUS. A case study about comparing manual and automatic annotation of solution documents
in the field of mechanical engineering is described. A set of six solution documents was annotated manually
by four users. Then, the same set of documents was annotated automatically by an ontology-based system.
The two annotations are compared considering proposed ranking numbers. These ranking numbers give the
weighting of annotations according to the overall and merged manual annotations. Therewith, they serve as
a reference for the expected result of the automatic annotation. Comparing the automated and the manual
annotation can not only reveal limitations of the automatic annotation process but also raise interesting
questions to what extent domain specific knowledge has to be represented in the ontology.
1 INTRODUCTION
In product development, the access to existing
knowledge about previous solutions may reduce the
amount of development cycles and conception
rework and therewith reduce the efforts of time and
costs. Principally, various sources exist for
supporting this knowledge. In a study in the German
automation industry, sources for the search of
existing solution knowledge were identified (Ponn et
al., 2006). Besides direct personal communication,
organisation-internal knowledge sources (e.g.
project folders or databases), construction
catalogues, internet portals, and publically available
marketing documents were identified as mostly
used.
However, an engineer who wants to retrieve
existing solution knowledge may face several
barriers (see Figure 1). First of all, solution
knowledge is mostly unstructured and the access to
unstructured data is often insufficient (Blumberg et
al., 2003). Secondly, different wordings are used by
the involved developers (Dylla, 1990). This different
wording hinders the access via a normal full-text
search (Pocsai, 2000). Furthermore, varying
taxonomies and classifications due to different
viewpoints in sales, marketing, and engineering
(Hepp, 2003) contribute to the barrier that hinders
the access to needed solutions.
Figure 1: Barriers in the process of retrieving technical
solutions.
Improving this process of retrieving existing
solutions is one of the main goals of the use case
PROCESSUS as one part of the German research
e. g. transfer one
bottle for packaging
Technical
Problem/Tasks
Solution C
Solution B
Solution A
Existing solutions
Barriers: e. g. unstructured data,
different wordings and taxonomies
What are possible
solutions?
153
Kohn A., Lindemann U. and Peter G..
THE CHALLENGE OF AUTOMATICALLY ANNOTATING SOLUTION DOCUMENTS - Comparing Manual and Automatic Annotation of Solution
Documents in the Field of Mechanical Engineering .
DOI: 10.5220/0003055201530158
In Proceedings of the International Conference on Knowledge Engineering and Ontology Development (KEOD-2010), pages 153-158
ISBN: 978-989-8425-29-4
Copyright
c
2010 SCITEPRESS (Science and Technology Publications, Lda.)
project THESEUS. Within PROCESSUS, an
ontology has been developed, that is used for
capturing the knowledge of technical solutions
(Gaag et al., 2009). The instances of the ontology
and the modelled relations can also be used as a
vocabulary for automated annotation of solution
documents. This annotation should help in the later
retrieval of the documents.
This paper focuses on improving the process of
annotating unstructured text data stored in publicly
available solution documents of the automation
industry. In these documents, companies provide
information about previously installed solutions (e.g.
a bottling and filling line for beverages). They are
mostly used for marketing purposes to give
references of previous work. Furthermore, they are
useful in generating first ideas how to approach an
engineering task.
In engineering design theory, technical solutions
can be described by their functions - typically
composed of an object and an operation performed
on the object (Ponn et al., 2008). Given a solution
document with a certain number of different
functions, an annotation tool that identifies most of
these functions but not the really important ones is
surely not the best one. Due to these uncertainties,
generally applied methods of ranking like term
frequency or the evaluation of annotations with
precision and recall can hardly be applied here.
To evaluate and improve annotations, it is
necessary to get a deeper insight into the content of
the existing solution documents. For this purpose,
solution documents are analysed by comparing
manual annotations made by different persons.
These manual annotations are merged and by
applying ranking numbers the most relevant content
of the document concerning the technical functions
of the solution is identified. Subsequently, these
ranking can be used to evaluate the automated
annotation. This procedure is exemplarily tested
with six solution documents and applied on the
developed annotation tool of our prototype.
The paper is organised as follows: First we will
provide a short overview of the ontology (the main
concepts and their relations) and its use in the
developed prototype. The technical functionalities of
the prototype will not be described in detail and only
as far as it is relevant for this work. Second, we
describe our methodology. Then, we will describe
the case study and results in detail. This is followed
by a review of related work. We will conclude the
paper with a discussion and summary of our findings
and provide an outlook on the next steps to take.
2 USAGE OF THE ONTOLOGY
IN THE PROTOTYPE
A prototype was implemented that uses the
developed ontology (as an OWL ontology) to
support the automated annotation and the subsequent
search for solution documents. For the automated
annotation, the ontology serves as a vocabulary and
provides the needed information about the existing
relations of elements belonging to technical
solutions. The base structure with the core concepts
of this ontology is shown in Figure 2. The function
has the central position. It is realised by a technical
solution, used in a special industrial sector, executed
by a function owner, and performs a certain
operation on a decent object. Existing solutions can
be described by instantiating these concepts with the
appropriate instances.
Figure 2: Base structure of the domain-specific ontology
to support solution retrieval.
For the automated annotation, the prototype uses
the label property of the instances in the ontology to
recognize the appropriate words and attach the
corresponding concept to the document. Linguistic
features as word stemming and flexion of words are
considered. Also, linguistic algorithms are supposed
to analyze the syntax of a sentence and to determine
relations between the words in a sentence. The
annotation process will be illustrated by a simple
exemplary sentence “The conveyor belt transports
the boxes” taken from one of the solution
documents. “Conveyor belt” is the function owner
which performs the operation “transports” on the
object “box”. If these instances are available in the
ontology, the corresponding concepts are annotated.
With the help of the linguistic algorithms, the
combination of “transport” and “box” in one
sentence leads to the annotation of the function
“transport box”.
Figure 3 shows screenshot of this prototype with
an exemplary result of an automated annotation. On
the left side, the annotated instances of a document
are listed according to the concepts chosen as
annotation filter (property of a solution, industrial
industrial sectorfunction owner
objectoperation
function
uses
executes
performs works on
technical
solution
realises
KEOD 2010 - International Conference on Knowledge Engineering and Ontology Development
154
sector, etc.). On the right side, a graph browser
offers the possibility to navigate through the
ontology and adding further annotations manually.
Figure 3: Screenshot of the prototype.
3 ANALYSING MANUAL
ANNOTATIONS
This section shows the procedure of manually
annotating the documents and merging these
annotations. Afterwards, the applied ranking
numbers for the annotated instances and their use for
the evaluation of automatic annotation are explained.
3.1 Manual Annotation of Documents
To get a deeper insight into the content of the
solution documents, they are analysed by a
comparison of manual annotations. Test participants
were asked to identify all function owners and the
corresponding functions. There was no limitation of
the number of maximum function owners or
functions annotated in each document. It was also
allowed to annotate only function owners without a
corresponding function or vice versa.
As a result of the single manual annotation, a set
of function owners and corresponding functions
(operation and object) emerges. Additionally, the
position of the source for the annotation in the
document was marked in order to identify where the
annotation stems from.
3.2 Merging of the Manual
Annotations
Subsequently, the manual annotations of one
document are merged to give an overview over the
similarities and differences of the single manual
annotations. The manual annotations are merged
according to their appearance in the document.
Figure 4 gives a short overview of the merging
procedure.
Figure 4: Merging of the manual annotations.
When, for example the same part of the
document is annotated by more than one person (in
this example the objects “bottle” and “box” in line 1
and 6 by person 1 and person 2), it is only added
once to the merged annotation. While merging the
documents, both the number of overall different
annotations of one concept and the number of equal
annotation between the single manual annotations
becomes evident. In the example, the instance “box”
was annotated once by two persons, while the
instance “bottle” was annotated twice (one time by
two persons, the other time by one person).
Therewith, the merged annotations can be
interpreted as a very precise annotation as possibly
missed annotations of one single annotation can be
found in the annotation of another person.
3.3 Appliance of Ranking Numbers
The number of equal annotations within the single
manual annotations gives a first impression about
major or minor important instances. When an
instance of a concept or two instances of two related
concepts are annotated by a high number of people,
they can be interpreted as important for the
document. In the example presented above, the
instance “bottle” is annotated by two people in line 1
but only once in line 2. This indicates that in the
second line, the one person did not interpret the
“bottle” in this sentence as an important object for
this solution.
Additionally to this simple measurement, two
more ranking numbers are proposed. These ranking
numbers show similarities to the term frequency
used in information retrieval (Salton et al., 1986). In
contrast to the term frequency, not the importance of
a word in a document, but the importance of an
annotation according to all annotations of the
respective document is focused here. Furthermore,
Annotated instances
Graph browser
Property
Industrial sector
Function
Company
Function owner
Visualisation of the
modelled concepts
and instances
Object
Location in
the document
bottle line 1
box line 6
Object
Location in
the document
bottle line 1
bottle line 2
box line 6
Object
Found in
annotation
bottle 1 and 2
bottle 1
box 1 and 2
Manual annotation 1
Manual annotation 2
Merged annotation 1
st
step
Object
Found in
annotation
Found in
annotation
bottle 1 and 2 1
box 1 and 2
Merged annotation 2
nd
step
THE CHALLENGE OF AUTOMATICALLY ANNOTATING SOLUTION DOCUMENTS - Comparing Manual and
Automatic Annotation of Solution Documents in the Field of Mechanical Engineering
155
the ranking numbers combine the amount of overall
annotations of an instance within all manual
annotations with the number of annotations of an
instance after the merging of the manual
annotations. By this combination, the error rate of a
single manual annotation is decreased while the
“overall intelligence” of several manual annotations
is increased.
The first ranking number R(i) considers the
annotation of instances of single concepts. It is
calculated by the multiplication of the overall
number of similar annotated instances of a single
concept N(io) with the number of similar annotated
instances of a single concept after merging the
manual annotations N(im). To normalise the number,
the product is divided by the product of the
maximums of N(io) and N(im) over all instances
(equation 1).



max

max

(1)
In the example in Figure 4, the ranking numbers
are calculated as shown in Table 1.
Table 1: Calculation of the ranking numbers.
Object N(io) N(im) R(i)
bottle 3 2 1
box 2 1 0,33
The instance “box” was annotated twice in line 1
and once in line 2, so the overall number of
annotations N(io) is 3. It is annotated in line 1 and 2
which makes the N(im) equal to 2.
The second ranking number R(r) - and from the
ontological point of view the more interesting one -
considers the annotation of instances of related
concepts. Similar to R(i), it is calculated by the
multiplication of the number of overall annotations
and the number of annotations after merging the
manual annotations. This time, the numbers are only
counted when the annotation contains a pair of
instances belonging to concepts that are related in
the ontology. Once again, it is normalised by the
maximum of these numbers N(ro) and N(rm) as
shown in equation 2.



max

max

(2)
Therewith, the ranking number R(r) provides
information about the mutual annotation of instances
that are related according to the ontology.
3.4 Interpretation
The ranking numbers take values between 0 and 1.
These ranking numbers, applied to each instance or
related instances, give the weighting according to
the overall and merged manual annotations and
therewith the reference for the expected result of the
automatic annotation. The automatic annotation has
to identify at least the highest ranked instances.
Especially R(r) can be used for evaluating the
quality of the annotation of related instances.
With the help of the ranking numbers, precision and
recall measures for the evaluation of the automatic
annotation can be calculated with a higher
granularity. It is more important to find higher
ranked instances than lower ranked ones.
4 CASE STUDY
This section shows the application of the above
described steps of annotating and merging the
documents. The annotated documents and the results
of the annotations are presented and finally
compared with the automatic annotation of the
developed prototype. Four persons of different
background (marketing, computer science and
mechanical engineering) were asked to manually
annotate six solution documents concerning the
contained function owners and their corresponding
functions.
The documents describe technical solutions in
the field of automation technology (see Table 2 for a
short overview of the content of the documents).
Their length varies between 2 and 8 DIN A4 pages
and their number of words lies between 343 and
912.
Table 2: Overview of the used documents.
Content Pages Words
Packaging of medical tablets 2 877
Separation of small components 4 750
Sorting of empty bottles 2 343
Bottling of bottles 2 741
Palletizing of bread 2 752
Packaging of drink crates 8 912
4.1 Merging of Manual Annotations
By the example of one document (packaging of
medical tablets), the results of the manual annotation
and the merging shall be explained. Table 3 shows
the numbers of annotations of instances of the four
concepts. The first column shows the number of
KEOD 2010 - International Conference on Knowledge Engineering and Ontology Development
156
overall annotations after merging; the following
columns show the number of annotations of the
individual manual annotations.
Table 3: Number of annotations.
Concept all 1 2 3 4
Operation 30 28 14 12 14
Object 27 24 14 9 14
Function owner 16 13 11 9 8
Function 27 24 14 9 14
The four participants differ in the number of
annotations made. In subsequent interviews it was
identified that this can be explained due to the
different professional backgrounds. A mechanical
engineer did not consider every function as
“important”. He focused on the core functions. In a
subsequent search, he expects these functions to be
ranked higher than other functions.
By merging these annotations the number of
different annotations of instances of the four
concepts can be identified. In this document, 26
different operations, 14 objects, 7 function owners,
and 26 different functions were identified.
Table 4 gives an exemplary overview of the
instances of the concept “function owner”. The
corresponding values of N(io) and N(im) are
presented and the resulting R(i)-values shown.
Table 4: Annotations of the concept “function owner”.
Function owner
N(io) N(im) R(i)
Robot 17 7 1,00
Machine 8 3 0,20
Conveyor belt 4 1 0,03
Barcode reader 4 1 0,03
Operator 2 2 0,03
The instance “robot” was annotated 7 times in
the document and was mentioned 17 times
altogether by the four annotators. This identifies this
instance as most relevant for the annotation of
function owners.
4.2 Evaluation of the Automatic
Annotation
With the help of these ranking, numbers, the
automatic annotation can be evaluated. Table 5
shows exemplary which terms have been annotated
as functions owners in the document by the
automated annotation process. As illustrated, the
most important function owner (R(i) = 1) has been
identified. Nevertheless, some instances have not
been automatically annotated.
Table 5: Comparing with the automated annotation.
Function owner
R(i) Autom. annotation
Robot 1,00 found
Machine 0,20 not found
Conveyor belt 0,03 found
Barcode reader 0,03 found
Operator 0,03 not found
The results of the evaluation of function owners,
operation und object over the six documents were
quite similar. Only the annotation of technical
functions did not achieve the expected results. This
result can be explained by the fact, that the linguistic
algorithms do not properly recognise when an object
and an operation constitute a technical function.
5 RELATED WORK AND
DISCUSSION
An overview of general methods and tools for
semantic annotation is given by Uren et al. (2006).
Uren et al. proposed seven requirements for
ontology-supported annotation and evaluated
twenty-seven annotation tools. Especially automatic
annotation was mentioned as an important field for
further improvement. Corcho (2006) compared
different annotation approaches (ontology, thesauri
and controlled vocabulary) for supporting the
process of creating metadata. He identified
ontology-based annotation as the most powerful
annotation approach concerning the annotation of
relations between the instances of a document and
also emphasized the meaning of improving
automated annotation. A domain ontology as
knowledge base for information retrieval is used to
improve search over large document repositories by
Vallet et al. (2005). In their approach, Vallet et al.
also used a label property to identify potential
occurrences of instances in the annotated documents.
The high amount of work for manually
annotating and the following merging make this
approach only limited applicable for a larger number
of documents and questionable concerning its
statistical validation. Furthermore, the influence of
the personal background has to be considered when
interpreting the results of the manual annotations.
Nevertheless, in addition to the identification of
ranked instances for the annotation, this approach is
twofold useful: First of all, by analysing and
verifying the manual annotations, linguistic and
syntactic properties of the solution documents can be
identified. In a next step, these can be used to
deduce typical linguistic schemes (e.g. the syntax of
sentences) of solution documents for improving the
THE CHALLENGE OF AUTOMATICALLY ANNOTATING SOLUTION DOCUMENTS - Comparing Manual and
Automatic Annotation of Solution Documents in the Field of Mechanical Engineering
157
automated annotation. Secondly, the merging of the
manual annotation and its later validation is useful
for obtaining a set of well-annotated documents for
further evaluation of automatic annotations.
The findings of this work can be used in other
domains of knowledge where unstructured data has
to be annotated using a domain-specific ontology. In
this context, it has to be considered, where the
needed knowledge is stored. If using only instances
for the annotation, the ontology could become huge.
For example, if every function owner should be part
of the ontology, huge classifications or standards
have to be integrated. For instance, transferring the
products and services categorization standards
eCl@ss in OWL yielded 75,000 ontology classes
plus more than 5,000 properties (Hepp, 2006).
Alternatively, you may use a combination of
ontological knowledge and linguistic patterns (or
rules) for annotation. For example, modelling only
on the (technical) operations in the ontology and
defining patterns to annotate a technical function in
combination with an identified noun in the sentence
would decrease the size of the ontology, as the
number of technical operations is limited. However,
the number of rules to be defined will increase.
What works best has to be judged considering the
relevant domain and the complexity of the modelled
knowledge.
6 CONCLUSIONS AND
OUTLOOK
The analysis of solution documents done in this
research permits an insight into the content of
solution documents in the field of automation
technology. With the help of the proposed ranking
numbers, important instances can be identified
according to the manual annotations made by
different persons. This ranking numbers can be
subsequently used for the evaluation of an
automated annotation. The evaluation of the used
prototype showed need for improvement concerning
the annotation of related instances in the ontology.
To improve this annotation, further work will
focus on the interpretation of the made analyses for
identifying patterns in the syntax or layout of
solution documents. Furthermore, the personal
background of the manual annotations will be
considered for the purpose of identify individual
requirements on the annotation. This will improve
the automatic annotation and may also be
instrumental to identifying the “core functions” of a
technical solution.
ACKNOWLEDGEMENTS
This work has been funded by the German Federal
Ministry of Economy and Technology (BMWi)
through THESEUS. The authors wish to
acknowledge gratitude and appreciation to all the
PROCESSUS project partners for their contribution
during the development of various ideas and
concepts presented in this paper.
REFERENCES
Blumberg, R., and Atre, S. (2003). The Problem with
Unstructured Data. In DM Review, 13(2).
Corcho, O. (2006). Ontology based document annotation:
trends and open research problems. Int. J. of Metadata,
Semantics and Ontologies, 1(1), 47–57.
Dylla, N. (1990). Denk- und Handlungsabläufe beim
Konstruieren, PhD thesis, Technische Universität
München.
Gaag, A., Kohn, A., and Lindemann, U., (2009). Function-
based Solution Retrieval and Semantic Search in
Mechanical Engineering. In ICED’09, 17th
International Conference on Engineering Design,
Stanford, California, USA.
Hepp, M. (2003). Güterklassifikation als semantisches
Standardisierungsproblem. Wiesbaden: Deutscher
Universitäts-Verlag.
Hepp, M. (2006). Products and Services Ontologies: A
Methodology for Deriving OWL Ontologies from
Industrial Categorization Standards. Int. J. on
Semantic Web & Information Systems, 2 (1), 72-99.
Ponn, J., and Lindemann, U. (2008). Konzeptentwicklung
und Gestaltung technischer Produkte. Berlin:
Springer.
Pocsai, Z. (2000). Ontologiebasiertes Wissensmanagement
für die Produktentwicklung. PhD thesis, Technische
Universität Karlsruhe.
Ponn, J., Deubzer, F., and Lindemann, U., (2006).
Intelligent Search for Product Development
Information - an Ontology-based Approach. In:
DESIGN’06, 9th International Design Conference,
Dubrovnik, Croatia.
Salton, G. and McGill, M. J. (1986). Introduction to
Modern Information Retrieval. New York: McGraw-
Hill.
Uren, V., Cimiano, P., Iria, J. e., Handschuh, S., Vargas-
Vera, M., Motta, E., and Ciravegna, F. (2006).
Semantic annotation for knowledge management:
Requirements and a survey of the state of the art.
Journal of Web Semantics, 4(1), 14-28.
Vallet, D., Fernández, M., and Castells, P., (2005). An
Ontology-Based Information Retrieval Model. In
ESWC’05, The Semantic Web: Research and
Applications, Second European Semantic Web
Conference. Heraklion, Crete, Greece.
KEOD 2010 - International Conference on Knowledge Engineering and Ontology Development
158