Towards an Ontology-based Software Documentation Management

A Case Study

Anna Goy and Diego Magro

Dipartimento di Informatica, Università di Torino, C. Svizzera 185, Torino, Italy

Keywords: Software Documentation, Documentation Management, Ontology, Semantics.

Abstract: One of the main issues that a company has to face is the generation and maintenance of product

documentation. In particular, several software houses have to take into account the frequent need of rapidly

updating software applications, and the corresponding technical documentation, as a consequence of

changes in the administrative rules and laws. In order to support documentation generation and maintenance

processes, we performed an ontological analysis of these processes in a large Italian software house that

produces and sells enterprise applications for small-to-medium sized enterprises. The goal of such a domain

analysis was to build a conceptual model enabling a formal characterization of the main elements involved

in software documentation. Such a formalization represents the “competence” of a system supporting

documentation processes, since it enables it to answer competency questions representing the information

needs of the documentation writers (e.g., “In which technical sheets/application manuals/operating manuals

is used a given concept?”; “Which technical sheets belonging to a given operating manual do mention a

given functionality/screenful/form field?”; “Which are the functionalities/screenfuls/technical sheets

potentially impacted by the change of a given software module/file?”).

1 INTRODUCTION

In this paper we present the outcomes of the

ontological analysis we performed on the

documentation processes of a large Italian software

house that produces and sells enterprise applications

for small-to-medium sized enterprises. One of the

main problems of the software house is the

generation and maintenance of product

documentation. In particular, the company has to

face the need of modifying software applications,

and the corresponding software manuals, as a

consequence of the change of administrative rules

and laws (e.g., accounting rules). These kinds of

updates, concerning both applications and manuals,

must be rapid and occur frequently. For example, if

a new information item is required within an

administrative process, e.g., due to a change in the

law, a new field must be added in the application

user interface and in the database; moreover, the

update must be reported in the proper places within

the documentation.

Another process that impacts the generation and

maintenance of software documentation is software

localization, which requires the translation of the

corresponding instruction manuals. Moreover, the

different linguistic versions must be kept aligned in

order to avoid mismatches between, for instance, the

English manual and the Italian one.

Currently, the documentation and maintenance

processes are completely manual: company experts

write the documentation, update it, and know “by

heart” which are the relationships between software

modules, functionalities, and document parts, as well

as the relationships between different parts of the

documentation itself (e.g., the same functionality can

be mentioned in different parts of an instruction

manual, or in different manuals). The process is

quite complex, also because many writers take part

in it. A partial automation would help writers and

would reduce the introduction of errors within the

documentation (e.g., missing or outdated parts).

In order to study the feasibility of a tool

supporting the mentioned processes, we performed a

domain analysis aimed at identifying the main

concepts and relationships that characterize the

documentation itself and the software products,

considered as subjects of the documentation. The

domain analysis was conducted through interviews

and informal discussions with the experts from the

125

Goy A. and Magro D..

Towards an Ontology-based Software Documentation Management - A Case Study.

DOI: 10.5220/0004124001250131

In Proceedings of the International Conference on Knowledge Management and Information Sharing (KMIS-2012), pages 125-131

ISBN: 978-989-8565-31-0

 2012 SCITEPRESS (Science and Technology Publications, Lda.)

company, coupled with a detailed analysis of the

existing software documentation.

The goal of this paper is to show that a company

documentation generation and maintenance

processes can be effectively supported by exploiting

a semantic model of the concepts and relationships

involved in them. In particular, the explicit and

formal representation of such a semantic model

represents a “documentation map” which can be

browsed by a software tool in order to provide

documentation writers with a guide based on the

relationships between products and documentation

items describing them. In other words, the semantic

model represents the system “competence” and

enables it to answer competency questions

expressing the information needs of the

documentation writers. Some examples of such

questions are the following: “Which are the concepts

used within glossary definitions?”; “In which

technical sheets (application manuals/operating

manuals) is used a given concept?”; “Which are the

concepts (not) defined in the glossary and used

within a given technical sheet (application/operating

manual)?”; “Which technical sheets do mention a

given functionality?”; “Which are the functionalities

(screenfuls/technical sheets) potentially impacted by

the change of a given software module (file)?”; and

so on. A complete list of the competency questions

can be found in Appendix.

In the following, we will present the ontological

analysis (Section 2), by explaining the role of

general and domain-specific concepts and

relationships; then we will show how the semantic

model can be applied to a concrete case (Section 3),

and exploited to model documentation-related

concepts/relations and to perform useful inferences.

Finally, we briefly survey some related work

(Section 5) and conclude the paper (Section 6).

2 ONTOLOGICAL ANALYSIS

Several definitions for the notion of ontology in

Computer Science have been proposed. Here,

following Studer, Benjamins and Fensel (1998), we

consider an ontology as a “formal and explicit

specification of a shared conceptualization”. Thus,

an ontology is an artifact represented in a machine-

understandable way (“formal” in the terms of Studer

and colleagues), which accounts for a set of

concepts, relations and other entities providing a -

possibly simplified - view of some area of interest

(“conceptualization”) and which makes explicit the

assumptions and constraints on the usage of the

above-said concepts, relations, and entities

(“explicit”). Moreover, the conceptualization

accounted for should be accepted by a group of

people (“shared”).

The ontological analysis that we conducted on

the considered domain was mainly aimed at

identifying and explicitly representing the concepts,

relations and entities involved in software

documentation generation and maintenance

processes, as well as at providing an explicit account

for their main characteristics.

The derived ontology can be represented in a

machine-understandable language and can be used

to build and maintain an explicit characterization of

the elements involved in the documentation-related

processes. Such a characterization and the ontology

itself can then be exploited by the software

applications that support the documentation-related

processes, in order to answer the set of competency

questions listed in Appendix and, ultimately, to

support people involved in the documentation

generation and maintenance processes.

In the following, we will illustrate the outcome

of our ontological analysis by discussing the main

features of the resulting ontology. We will depict

concepts and relations of the ontology by means of

Entity/Relationship diagrams, since this approach is

well-known within Italian software companies,

which are still more familiar with

Entity/Relationships and relational databases than

with formal ontology languages based on XML.

The concepts and relationships that the domain

analysis has identified as relevant for the task at

hand can be grouped into two categories. First, there

are “general” concepts/relationships, representing

more general entities. These notions are directly

related to conceptual frameworks provided by upper

level core ontologies (Oberle, 2006), and are

characterized in the upper level of the produced

ontology. Second, there are domain “specific”

concepts/relationships representing notions relevant

to the company products and product

documentation. The latter are characterized in a

lower (domain) level of the produced ontology.

2.1 General Concepts

Among the concepts belonging to the more general

level of the ontology, a major role is played by those

representing information items. In order to account

for these concepts, and the relationships among

them, we exploited the semantic model defined by

the Ontology of Information Objects (OIO)

(Gangemi et al., 2005), developed at the Laboratory

KMIS2012-InternationalConferenceonKnowledgeManagementandInformationSharing

126

of Applied Ontology (ISTC-CNR, www.loa-cnr.it),

and by O-CREAM (Ontology for Customer

Relationship Management) (Magro and Goy, 2012),

developed at the Computer Science Department of

the University of Torino. In the conceptual

framework proposed by these two models (suitably

adapted to fit the modeling needs of the considered

domain), every Information Item has three aspects:

(a) Its meaning, i.e., the Information Content itself;

(b) The Language(s) in which the meaning is

expressed; (c) The support that physically realizes

the information object (Information Physical

Realization). Figure 1 shows some of the upper level

concepts and their taxonomic relations.

Figure 1: Partial view of the taxonomy of upper level

concepts.

2.2 Domain Specific Concepts

Domain specific concepts are those concepts

specifically characterizing software documentation

or software products.

The software documentation produced by the

company can be classified into four main categories:

(1) Application manuals (where the main

functionality of the software application is

described); (2) Operating manuals (where

instructions about how to operate on the user

interfaces are provided in details); (3) Update notes

(explaining software updates); (4) Glossary

(explaining the business and technical terminology;

currently including more than 1.600 entries).

Figure 2 shows a fragment of the taxonomy

related to documentation concepts. A

Documentation Resource, which is a top level

concept representing a specific type of Information

Item, besides being internal (Internal Documentation

Resource) or public (Public Documentation

Resource), can belong to different typologies

(represented by its subclasses): Product

Documentation, Glossary, Glossary Entry, Technical

Sheet. Moreover, Product Documentation is

superclass of Application Manual, Operating

Manual, and Update Note, while a Technical Sheet

can belong to any of these, thus being an

Application Manual Technical Sheet, an Operating

Manual Technical Sheet, an Update Note Technical

Sheet, or a Glossary Technical Sheet.

Figure 2: A fragment of the taxonomy of documentation-

related concepts.

The main concepts modeling the company

products are those related to software, as partially

shown in Figure 3. In particular, a Software Element

is a specific type of Information Item (according to

the notion of software specified in Oberle et al,

2006) and can be a Software Module or a Software

Module Suite, i.e., an integrated set of software

modules. Moreover, both the concepts of Source

Code and Executable Code are particular types of

Software Elements.

Figure 3: A fragment of the taxonomy of product-related

concepts, showing concepts modeling software.

This part of the ontology also includes concepts

modeling software functionality and user interfaces,

which play a major role in software documentation.

Figure 4 shows the main concepts involved in the

user interface model, together with their taxonomic

relationships. User Interface Element, which is a

type of Information Physical Realization, has two

subclasses: Screenful, which can be simple (Simple

Screenful) or complex (Complex Screenful), and

TowardsanOntology-basedSoftwareDocumentationManagement-ACaseStudy

127

Screenful Element, which has several subclasses

referring to user interface elements, such as form

fields (Field).

Figure 4: A fragment of the taxonomy of user interface-

related concepts.

As far as the functionality is concerned, the most

important concept is Functionality, which represents

the general functions of a software products (e.g.

complaints management) and is distinct from the

actual activities involved in the provision of such a

functionality. The role of the Functionality concept

within some ontological relations can be seen in

Figure 5 and 6.

2.3 General Relationships

The most general relationship is the has_part

relation, modeling the relationship between an object

and its parts. Moreover, there are relationships

connecting information items to their meanings

(expresses), to the languages used to express them

(is_encoded, is_completely_encoded), and to their

physical realizations (is_realized,

is_completely_realized). In addition, there is a

relationship linking semantically equivalent

information items (semantically_equivalent), and

there are relationships linking meanings

(Information Content) to the elements they are about

(talks_about), to the elements they identify

(identifies), and to the concepts they use (uses,

defines, characterizes).

These general relations can obviously be used to

characterize items belonging to more specific

classes; for instance, it is possible to specify the

parts of: an application manual, a technical sheet, a

glossary entry, a software suite, a single software

module, a functionality, a screenful, and so on.

2.4 Domain Specific Relationships

The most relevant specific relationships are those

connecting each Software Element to elements of a

user interface (User Interface Element) and to the

functionality implemented by that software element

(specifies_UI_element, implements_functionality), as

shown in Figure 5.

Figure 5: Relationships linking software to functionality

and user interface elements.

3 APPLYING THE

ONTOLOGICAL MODEL

In this section we present some examples exploiting

the ontological conceptualization presented in

Section 2, in order to characterize elements

belonging to the company documentation system.

3.1 Modeling

The first example we would like to introduce shows

how the model supports the distinction between

linguistic representations and information contents.

This distinction enables us to represent both

common aspects and different features

characterizing distinct information items.

For example, consider the glossary, which is

composed of technical sheets. The ontology enables

us to state that a Glossary Technical Sheet is part of

a Glossary, which is expressed in some Language

(e.g., Italian or English). Now, let’s imagine that the

company decides to produce an Italian and an

English version of the glossary: the Language in

which the two Glossary instances are expressed are

different (i.e., Italian and English), but all the other

involved concepts and relations characterizing

information contents are shared. In order to see in

some more details these common aspects, let’s

consider the “Studi di settore” technical sheet. The

expression “Studi di settore” (business sector

analysis) refers to a methodology used to estimate

company and self-employed worker receipts, by

KMIS2012-InternationalConferenceonKnowledgeManagementandInformationSharing

128

taking into consideration several parameters,

including their kind of business. By means of the

ontology, we can formally specify that the

considered technical sheet contains a Glossary Entry

which is composed of two parts, i.e., Glossary Entry

Name and Glossary Entry Definition. The Glossary

Entry Name identifies a specific Concept (i.e.,

business sector analysis) and the Glossary Entry

Definition expresses a Concept Definition, which

defines the named concept by using other elements

defined in the ontology, such as company, self-

employed worker, receipt, etc. By separately

representing linguistic and content information, the

ontology enable us to characterize the two instances

(the Italian and the English glossaries) by exploiting

the same semantic elements (concepts, relations, and

instances).

Another important aspect that is worth pointing

out is how the proposed ontology enables us to

represent the connections between software modules

and functionalities, as well as between software

modules and the files containing the corresponding

code. These connections enable us to link files to

functionalities, and this is an issue of major

importance: for example, if some code files are

modified, thanks to the mentioned relations writers

know which are the impacted functionalities, and

thus the manual parts that need to be updated. In the

following, we illustrate this case with an example.

In Figure 6, each node represents an instance and

is labeled by an instance name (in boldface),

followed by the name of the ontology class it

belongs to (in uppercase). In particular, the figure

shows two software modules, ma_base_mod and

ma_cost_mod; both are instances of the

Management Accounting Module class: the first one

represents the basic management accounting

software module, while the second one represents

the cost accounting module. Analogously, the figure

shows two functionality instances, ma_base_func

and ma_cost_func; both are instances of the

Management Accounting Functionality class, and

represent, respectively, the basic management

accounting and the cost accounting functionalities.

The relationship linking the software modules with

the implemented functionalities are instances of the

implements_functionality relation. Moreover, the

figure shows two file blocks (file_block_01 and

file_block_02), each one composed of some files;

the relationship linking the software modules with

the corresponding file blocks are instances of the

is_completely_realized relation, i.e., the relation that

links each information item to the entities providing

complete physical realizations for them.

Figure 6: Characterization of the relationships between

software modules, functionalities and files.

The semantic model described in Section 2

enables us to provide both a “physical”

characterization (which files realize the mentioned

software modules) and a “logical” characterization

(which functionalities are provided by the mentioned

modules). From both these perspectives, we can

specify sub-parts, i.e., parts of the file block, and

parts of the overall functionality.

The main functionality implemented by the

module ma_base_mod is the ma_base_func

functionality (see Figure 6), which is composed of

some sub-parts, including, for instance,

doc_mgm_mabase, which is an instance of

Management Accounting Functionality, and

represents the management of documents related to

the basic functions of management accounting;

analogously, the ma_cost_func functionality is

composed of some sub-parts, including, for instance,

doc_mgm_macost, which is an instance of

Management Accounting Functionality, and

represents the management of documents related to

the cost accounting functions.

Both doc_mgm_mabase and doc_mgm_macost

have, in turn, sub-parts (sub-functionalities), among

which doc_mgm_gen (referring to those aspects of

doc_mgm_mabase and doc_mgm_macost which can

be considered generic document management

functions). Finally, doc_mgm_gen includes, as a

part, the upload functionality.

As we will see in the next section, this

characterization enables us to answer competency

questions like, for example, “Which are the

functionalities implemented by the x module?”, “In

which software modules is implemented the y

functionality?”, “Which are the functionalities

TowardsanOntology-basedSoftwareDocumentationManagement-ACaseStudy

129

potentially impacted by a change in the z file?”.

Moreover, the ontology enables us to express the

links between the documentation resources and the

functionalities they are about, therefore the system

can also answer questions such as “Which are the

technical sheets potentially impacted by a change in

the z file?”. These questions (and others, all listed in

Appendix) are of major relevance in generation and

maintenance documentation processes.

3.2 Inferences

Typically, only a part of an information system

knowledge is explicitly represented: other

knowledge, in fact, can usually be inferred, and

made explicit, by applying reasoning mechanisms to

the explicit knowledge. In the following, we will

describe, with the help of some examples, the role

that such inferential mechanisms can play within a

documentation system.

First of all, for each individual in the

characterization, a reasoning process can infer all the

classes that individual is instance of. Moreover,

some important inferences concern the part-of

relationships between elements. For example, from

the characterization of the concepts and relationships

concerning the glossary (see Section 2) and from the

specific features of the has_part relation itself (in

particular, the fact that it is reflexive and transitive)

the reasoner can infer that a Glossary Entry (a direct

part of a Glossary Technical Sheet, which is a direct

part of a Glossary in its turn), a Glossary Entry

Name, and a Glossary Entry Definition (both direct

parts of the considered Glossary Entry) are also parts

of the mentioned Glossary.

Another example of useful inference supported

by the semantic model presented is the one enabling

us to answer competency questions like the

following (see Figure 6):

(a) “Which are the functionalities implemented by

the module ma_cost_mod?”. Given the

characterization of the implements_functionality

relation, if a module implements a functionality f,

then it implements also all f sub-parts. Thus, from

the presented representation, a reasoner can infer the

answer: all those directly linked to the mentioned

module by the implements_functionality relation,

together with all the sub-parts of it, including, for

instance, the upload function.

(b) “In which software modules is implemented the

upload functionality?”. Again, from the ontology

and the characterization of the items in terms of the

ontology, a reasoner can infer the answer, which will

include the module ma_cost_mod.

Inferences like these are very important. For

instance, if I am writing an update note concerning a

given functionality, I need to know in which

modules it is implemented, in order to update all the

corresponding manual parts. Analogously, if

developers modify a given file, I need to know

which functionalities are potentially involved, in

order to properly document the changes.

4 RELATED WORK

The idea of using semantic technologies to support

software documentation processes has been

investigated by focusing on different aspects of the

problem. For instance, Kleiber, Sabol, Kern, Muhr

and Granitzer (2009) face the problem of keeping

the software documentation up to date, given the

reduced time-to-market development cycles, and

propose a support to the software documentation

process based on an underlying ontology modeling

software documentation activities and the structure

of the software itself. Hepp and Wechselberger

(2008) exploit ontologies modeling SAP-related

business and technical concepts in order to improve

the accessibility of ERP documentation search

systems.

There is some work on (semi)automatically

deriving ontologies from software documentation;

see (Sabou, 2004). Moreover, some approaches aim

at supporting software maintenance by exploiting

semantic technologies. For instance, Witte, Zhang

and Rilling (2007) propose to use ontologies in order

to connect and integrate knowledge about source

code and software documentation; the resulting

knowledge can be exploited to support software

maintenance. Similarly, Ambrósio, Santos, Lucena

and Silva (2004) present a tool that exploits

ontologies to integrate domain and software

engineering knowledge in order to help keeping

software documentation, up to date.

5 CONCLUSIONS

In this paper we presented an ontological analysis

aimed at developing a semantic model useful to

support the generation and maintenance of software

documentation. In particular, by describing

examples from a concrete use case, the paper shows

that providing a company documentation system

with an explicit semantic model of the concepts and

relationships involved in documentation generation

KMIS2012-InternationalConferenceonKnowledgeManagementandInformationSharing

130

and maintenance processes means increasing the

system “competence” about those processes; this

competence, in turn, enables the system to satisfy

the information needs of the documentation writers.

Such needs can be represented by the competency

questions in Appendix.

ACKNOWLEDGEMENTS

This work has been partially funded by CELI s.r.l.

(www.celi.it).

REFERENCES

Ambrósio, A.P., Santos, D.C.d., Lucena, F.N.d., Silva,

J.C.d. (2004). Software Engineering Documentation:

an Ontology-based Approach. WebMedia & LAWeb

Joint Conference. Washington: IEEE Press, 38-40.

Gangemi, A., Borgo, S., Catenacci, C., Lehmann, J.

(2005). Task Taxonomies for Knowledge Content.

Metokis, Deliverable D07.

Hepp, M. and Wechselberger, A. (2008). OntoNaviERP:

Ontology-Supported Navigation in ERP Software

Documentation. International Semantic Web

Conference - ISWC2008, LNCS 5318. Heidelberg:

Springer, 764-776.

Kleiber, W., Sabol, V., Kern, R., Muhr, M., Granitzer, M.

(2009). Using Ontologies For Software

Documentation. Malaysian Joint Conference on

Artificial Intelligence - MJCAI2009. Kuala Lumpur,

Malaysia.

Magro, D., Goy A. (2012). A core reference ontology for

the customer relationship domain. Applied Ontology,

7(1), 1-48.

Oberle, D. (2006). Semantic Management of Middleware.

Heidelberg: Springer.

Oberle, D., Lamparter, S., Grimm, S., Vrandečić, D.,

Staab, S., Gangemi, A. (2006). Towards Ontologies

for Formalizing Modularization and Communication

in Large Software Systems. Applied Ontology, 1(2),

163-202.

Sabou, M. (2004). Extracting Ontologies from Software

Documentation: a Semi-Automatic Method and its

Evaluation. In Workshop on Ontology Learning and

Population at ECAI 2004, Valencia, Spain.

Studer, R., Benjamins, V. R., Fensel D. (1998).

Knowledge Engineering: Principles and Methods.

Data and Knowledge Engineering, 25(1-2), 161-197 .

Witte, R., Zhang, Y., Rilling, J. (2007). Empowering

Software Maintainers with Semantic Web

Technologies. European Semantic Web Conference -

ESWC2007, LNCS 4519. Heidelberg. Springer, 37-52.

APPENDIX

A system based on a knowledge base as the one

described in this paper can support the processes

devoted to the production and maintenance of the

company product documentation, by answering the

following competency questions:

 Which are the glossary elements?

 Which are the concepts defined within the

glossary?

 Which are the concepts used within glossary

definitions?

 Among the concepts used in glossary definitions,

which of them are (not) defined in the glossary?

 Which are the glossary entries which refer to

other glossary entries?

 Which are the synonyms in the glossary?

 In which technical sheets (application

manuals/operating manuals) is used a given

concept?

 Which are the concepts a given technical sheet

(manual) is about?

 Which are the concepts (not) defined in the

glossary and used within a given technical sheet

(application/operating) manual?

 Which are the documentation resources

(partially/completely) represented in Italian

(English/…)?

 Which technical sheets do mention a given

functionality?

 Which are the functionalities never mentioned in

any manual?

 Which technical sheets belonging to a given

operating manual do mention a given screenful

(form field)?

 Which are the screenfuls of a given software

module that are never mentioned in any operating

manual?

 Which are the functionalities (screenfuls/

technical sheets) potentially impacted by the change

of a given software module (file)?

 Which are the software modules which could be

involved in the change of a given technical sheet?

TowardsanOntology-basedSoftwareDocumentationManagement-ACaseStudy

131