Extracting API Structures from Documentation to Create Virtual

Knowledge Graphs

Maximilian Weigand

1 a

, Felix Gehlhoff

1 b

and Alexander Fay

2 c

Institute of Automation Technology,

Helmut-Schmidt-University / University of the Federal Armed Forces Hamburg, Germany

Chair of Automation Technology, Ruhr University Bochum, Germany

Keywords:

Application Programming Interface (API), Virtual Knowledge Graph, Ontology, Software, Documentation.

Abstract:

Semantic Web technologies and standards have emerged as effective solutions for data exchange, also in

engineering contexts. They provide a standardized way to exchange data between different software and

facilitate interoperability. Within this work, we introduce a workﬂow to systematically analyze the structure

of application programming interfaces (APIs) of software, enabling the efﬁcient transformation of information

available from the API into information models that are structured according to Semantic Web standards. Our

goal is to create a reusable interface for engineering software on top of its API. The approach leverages shared

concepts between object-oriented programming and knowledge graphs to abstract components of the API into

a knowledge graph. The workﬂow allows to selectively extract relevant API components and automates the

generation of necessary code. To demonstrate the approach, we created an application that implements the

workﬂow and use it for a Java-based API for a modeling software, showcasing the reduction of manual effort.

1 INTRODUCTION

In modern engineering, engineers rely on a wide

range of software applications. However, data ex-

change between these applications often poses a chal-

lenge due to limited interoperability.

Many of these software applications offer appli-

cation programming interfaces (APIs), which make

the software more open by providing a systematic

and versatile means to extract information (Fay et al.,

2013). However, implementing such functionality is

often time-consuming and tends to be a specialized,

non-reusable solution.

Ontologies and knowledge graphs (KGs), origi-

nating from the Semantic Web, have emerged as ef-

fective means for accurate information modeling and

efﬁcient data exchange in engineering ﬁelds, e.g. au-

tomation engineering (Dotoli et al., 2018; Ekaputra

et al., 2017; Sabou et al., 2020). Their standardized

formats make them ideal for deﬁning and exchanging

technical information.

In a previous publication (Weigand and Fay,

https://orcid.org/0009-0000-1602-4873

https://orcid.org/0000-0002-8383-5323

https://orcid.org/0000-0002-1922-654X

2022), we addressed the challenge of data exchange

and interoperability between various software appli-

cations in engineering. We proposed an approach

leveraging Semantic Web technologies to enhance

data exchange by abstracting data available from the

API of a software into a KG.

In recent years, the concept of Digital Twins

(DTs) has gained signiﬁcant attention in engineering.

A DT is essentially a digital representation of a phys-

ical asset. Ontologies and KGs have been discussed

as potential solutions for DTs, as one of the main

challenges remains creating interoperable and seman-

tically unambiguous DTs from various data sources

(Vogel-Heuser et al., 2021). Our approach can there-

fore support the creation of DTs in engineering by

providing a structured method for extracting and rep-

resenting data from engineering software in a stan-

dardized and reusable way.

In our previous publication (Weigand and Fay,

2022), the interface on top APIs of engineering soft-

ware was described generically, and was adapted for

an exemplary software, to demonstrate its functional-

ity. The adaption requires tasks like specifying a set

of relevant components of the API, i.e. components

that represent information that should be abstracted

into the KG. As the adaption proved to be quite labor-

Weigand, M., Gehlhoff, F. and Fay, A.

Extracting API Structures from Documentation to Create Virtual Knowledge Graphs.

DOI: 10.5220/0013083100003838

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 16th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2024) - Volume 2: KEOD, pages 287-294

ISBN: 978-989-758-716-0; ISSN: 2184-3228

287

intensive, we will show within this publication how

this process can be structurized into a workﬂow and

can be implemented in a supported manner that is au-

tomated as much as possible.

In Section 2, we will evaluate approaches that uti-

lize the shared concepts of APIs and KGs by inte-

grating them. Section 3 will introduce requirements

for the proposed workﬂow, which will be presented

in detail in Section 4. Section 5 will demonstrate sev-

eral steps of the workﬂow through an implementation,

which will be showcased by using it to analyze the

API of an exemplary systems engineering software.

Finally, we will summarize insights from the demon-

strated use case in Section 6 and assess it.

2 FUNDAMENTALS AND

RELATED WORK

2.1 Fundamentals

An API generally serves as a mechanism for a client

software component to interact with a supplier soft-

ware component by offering key functions of the sup-

plier software while concealing their underlying im-

plementation details (Reddy, 2011). In the simplest

form, this may be providing a subset of the supplier

software’s source code, typically in the form of a li-

brary, and is therefore often called a library API. The

term API is currently more commonly used to refer to

web APIs. In the context of this publication, the term

API speciﬁcally refers to library APIs of engineering

software, such as systems engineering or mechanical

design software. These software often provide APIs

to enable developers to extend the software’s func-

tionality. Our focus is on APIs within the object-

oriented programming (OOP) paradigm, as it is the

prevailing approach in software development.

A KG is a graph-based data structure intended to

represent real-world knowledge by representing enti-

ties of that knowledge as nodes and relations in be-

tween entities as edges (Hogan et al., 2021). The

Resource Description Framework (RDF) (Cyganiak

et al., 2014) is a standard model for representing data

on the Semnatic Web using subject-predicate-object

triples. RDF Schema (RDFS) (Brickley and Guha,

2014) extends RDF by providing a basic vocabulary

for describing relationships and hierarchies, while the

Web Ontology Language (OWL) (Hitzler et al., 2012)

builds on RDF and RDFS to enable more complex se-

mantic relationships within the knowledge. Typically,

a KG is an RDF graph using RDFS or OWL vocabu-

lary. Within this publication, the term KG speciﬁcally

refers to RDF graphs using RDFS vocabulary.

2.2 Shared Concepts of Knowledge

Graphs and Object-Oriented Data

KGs and data models within the OOP paradigm share

several foundational concepts, primarily focused on

representing information. Examples are classes in

OOP and RDFS (rdfs:Class), objects in OOP and

resources in RDFS (rdfs:Resource) or methods in

OOP and properties in RDF (rdf:Property).

Within a previous publication (Weigand and Fay,

2022), we created an interface that utilizes these

shared concepts to abstract the data available through

an API of a software as a KG. The KG resulting from

this abstraction retains the advantages of KGs, e.g., it

can be queried using SPARQL (Harris and Seaborne,

2013), a standardized query language to retrieve in-

formation from KGs systematically. The meta data,

e.g. the class structure of the API, is also contained

in the KG, making it possible to query what data is

available from the API. In our approach, the KG is not

materialized, i.e. access to the API data and the ab-

straction into a KG occur only at the time of the query,

making it a virtual KG (VKG). In summary, the VKG

proposed in our previous publication abstracts:

• Classes of the API are abstracted as rdfs:Class

• Class hierarchies are abstracted as

rdfs:subClassOf properties

• The deﬁnition of public methods with no

arguments (get-methods) are abstracted

as rdf:Property, including associated

rdfs:domain and rdfs:range properties

• Instances of classes are abstracted as

rdfs:Resource, including a rdf:type property

to the associated class and superclasses

• Instances that are returned if get-methods of

other instances are executed are abstracted as a

rdf:Property to a rdfs:Resource or to literals

In the previous publication, the concept of the VKG

abstraction was explained, and we introduced a

generic version of the VKG interface. This generic

version includes key components such as the logic to

execute a query over the VKG and is intended to be

adapted to a speciﬁc software. While some adaptions

must be done manually, others can be largely auto-

mated, though they were implemented manually in

the initial demonstration in our previous publication.

For instance, as shown earlier, each class of the API

is abstracted as a class in the KG. Our goal within this

work is to develop a structured approach to automate

the extraction of such information from the API to re-

duce manual implementation effort.

DTO 2024 - Special Session on Ontologies for Digital Twin

288

2.3 Related Work

The shared concepts of OOP and KGs have been

leveraged by other approaches, which will be sum-

marized in this section.

Object Triple Mapping (OTM) libraries integrate

KGs into the data models of OOP applications. There

are very mature OTM solutions, such as Alibaba

which offers a range of features and good perfor-

mance (Ledvinka and K

remen, 2020). To use OTM

libraries, a mapping between the OOP data model and

the KG has to be deﬁned within the source code of the

application, for example by using code annotations. It

then can be used to store the application data in the

form of a KG in a triple store. This allows the appli-

cation to store data persistently, while also enabling

other applications or end-users to access it indepen-

dently. OTMs are deeply integrated into the appli-

cation and must therefore be implemented during the

development of the application.

If parts of the source code are intended for pub-

lication as an API, the code is typically documented.

A popular built-in tool of Java Development Kits is

Javadoc. Javadoc converts parts of the code including

information given in comments into documentation,

typically in HTML format, though the format can be

chosen by selecting a speciﬁc module called doclet.

Similar tools exist for other OOP languages, as well

as multi-language tools like Doxygen

The doclet ontlet leverages this principle to extract

the semantics of a Java library as an OWL ontology.

This ontology facilitates understanding the semantic

similarities between different libraries, aiding in code

migration efforts. While the approach does not re-

quire modifying the source code, it does necessitate

the source code to be available. (Ancona et al., 2012)

An alternative approach to create an ontology for a

Java project is parsing it to generate an Abstract Syn-

tax Tree (AST) and then converting the AST into RDF

triples (Atzeni and Atzori, 2017). Unlike the previous

approach, this approach can handle Java bytecode,

meaning it accepts compiled code as input.

If an API to extend a software exists, the API doc-

umentation is usually available. Although API docu-

mentation is primarily created to be human-readable,

it is machine-readable due to its standardized struc-

ture. Consequently, a viable solution is using a

mapping language like the RDF Mapping Language

(RML) (De Meester et al., 2024), which maps various

structured data sources (like CSV or JSON) into RDF

formatted data. Using this rule-based approach, the

complete API structure can be mapped.

www.doxygen.nl

3 REQUIREMENTS

The following requirements outline considerations

and constraints for the steps of a structured approach

to support and automate creating a VKG interface.

R1 (Brownﬁeld Development Context): As outlined

in the state of the art, various OTM approaches exist

that enable mapping an OOP data model within the

source code of an API to a KG. These approaches are

well-established, having been the subject of extensive

previous research, and there are multiple mature im-

plementations available. However, these existing ap-

proaches are generally designed for greenﬁeld soft-

ware development processes, where the source code

is under development, i.e. it is accessible and modi-

ﬁable. In contrast, the proposed approach (Weigand

and Fay, 2022), is tailored for use with an existing

library API, placing it in a brownﬁeld software devel-

opment context. In this context, the source code is

not accessible and cannot be modiﬁed. Only a sub-

set of components, speciﬁcally the API, is usable, but

also not accessible as source code. It is however doc-

umented, for example with tools like Javadoc. R1 re-

stricts the input of the developed workﬂow to be a

structured documentation of the API.

R2 (Selective Extraction of Relevant Compo-

nents): As the API documentation is a structured

data source, it can be transformed fully automatically

using a rule-based approach. For example, by map-

ping each class of the API to an rdfs:Class. How-

ever, given that the API likely includes an extensive

amount of components, including some that represent

irrelevant information, it is crucial to focus on extract-

ing only speciﬁc components that are relevant for ex-

change via the VKG interface. In the context of en-

gineering, e.g. mechanical design, a relevant class of

the API might be one that represents geometrical fea-

tures of a part, while an irrelevant class might be one

that handles notiﬁcations of the user interface of the

software. As the relevance of API components can-

not be determined automatically, the selection of rel-

evant API components must be carried out manually.

However, selection steps in the workﬂow should be

supported and facilitated as much as possible.

R3 (Automation): In contrast to the steps that must

be performed manually due to R2, the remainder of

the extraction process should be automated to the

greatest extent possible. Automation serves two key

purposes: ﬁrst, to reduce implementation effort, such

as automatically establishing subclass relationships

when the developer adds a class and its subclass to

the set of relevant classes; and second, to mitigate po-

tential errors, such as verifying that when a developer

adds a method, the class returned by the method is

Extracting API Structures from Documentation to Create Virtual Knowledge Graphs

289

within the set of relevant classes.

R4 (Usability): During the manual or guided steps

of the workﬂow, the developer should be able to

work within a familiar environment. This necessi-

tates reusing the graphical representation of the docu-

mentation, with minimal modiﬁcations to its appear-

ance. Modiﬁcations should only introduce the re-

quired functions, such as the ability to add a class to

the set of relevant classes.

4 DEVELOPED WORKFLOW

The workﬂow described within this section is a sys-

tematic approach to develop a VKG interface for an

API. Its steps are displayed in Figure 1 using BPMN

2.0 (Object Management Group, 2011) and will be

explained in detail subsequently.

S1 (Analyze Class Structure): In this initial step, the

complete class structure of the API is analyzed. The

input for this step is the structured documentation of

the API (R1). Since it is structured, this step can be

fully automated and is modeled as a BPMN Service

Task. A comprehensive analysis of the inheritance

structure of all classes described in the API documen-

tation is performed. The output of this step is a list

of all classes, each with references to its superclasses.

Inheritance relations are crucial information, are part

of the VKG and will be introduced in S2.2.

S2 (Add Components from API Documentation):

In this step, the developer will be assisted in select-

ing relevant classes and methods of the API. The en-

tire step is a BPMN Loop Sub-Process, meaning

the contained sub-steps (S2.1 to S2.4) can be repeated

multiple times as needed.

S2.1 (Select Relevant Classes): In this step, the de-

veloper selects relevant API classes from the entire

class structure. The relevance of a class is determined

based on whether it describes information that should

be exchangeable via the new VKG interface. As de-

tailed in R2, the relevance of a class must be deter-

mined by the developer, therefore the developer is as-

sisted by a tool during this step, which is modeled as a

BPMN User Task. According to R4, the tool reuses

the graphical representation of the API documenta-

tion, e.g. the HTML Javadoc of a Java API. Only

minimal modiﬁcations are made to introduce controls

for adding classes to the set of relevant classes.

S2.2 (Add Superclass Relation): In this step, super-

class relations are automatically added. A superclass

relationship originates from one class and points to

its superclass. Subclass relationships are not explic-

itly extracted, as they can be inferred from existing

superclass relations. Superclass relations are added

when a class is added and a superclass or subclass of

that class has already been added. If the relationship

spans multiple classes, and the intermediate classes

are not added, the relationship is created directly be-

tween the added classes. These indirect superclass

relationships are corrected if an intermediate class is

added later. The entire class hierarchy, extracted in

Step 1, serves as the input for this process, which is

executed each time a class is added. Consequently,

this step is fully automatable (R3) and is modeled as

a BPMN Service Task.

S2.3 (Select Relevant Methods): Similar to classes

in S2.1, public methods of classes are now selected

and added by the developer. The selection is restricted

to methods without arguments as they can be mapped

directly to an RDF triple. A triple represents a di-

rected relationship between two entities and is the

fundamental data model of an RDF graph. If a method

has arguments, it would imply a parametrization of

the relationship, which cannot easily be represented

in RDF. Therefore, methods with arguments cannot

be selected. As with S2.1 and in accordance with R2,

the developer must determine which methods repre-

sent relevant relationships. Therefore, this step is also

modeled as a BPMN User Task. According to R4,

the functionality for adding a method must also be in-

tegrated into the existing graphical representation of

the API documentation.

S2.4 (Add Missing Classes): If the class of the in-

stance that the method added in the previous step re-

turns is not within the set of selected classes, it will

be added automatically in this step. This step is based

on information available from previous steps and will

be executed fully automatically each time a method is

added (R3). Consequently, this step is modeled as a

BPMN Service Task.

S3 (TBox Generation): Once S2 has been com-

pleted, the TBox of the VKG is fully deﬁned. Al-

though the TBox can later be queried via the VKG

interface, an early export at this stage is possible and

can be performed optionally. The TBox can be ex-

ported in standard formats such as RDF/XML. The

export process relies entirely on the information de-

ﬁned in the previous steps. While the developer de-

cides whether to initiate the export, the remainder of

this step is executed automatically. Consequently, this

step is modeled as a BPMN Service Task.

S4 (Code Generation): In this step, parts of the code

for the VKG interface are generated. The VKG in-

terface operates using two lists. One list includes all

classes, which is primarily created in S2.1. Each class

speciﬁes its direct superclasses as established in S2.2.

The other list contains methods, as speciﬁed in S2.3.

Each method indicates the class it originates from and

DTO 2024 - Special Session on Ontologies for Digital Twin

290

Figure 1: Steps of the developed workﬂow.

the class or the type of literal it returns. This infor-

mation is transformed into a speciﬁc code format, as

deﬁned by the implementation of the generic version

of the VKG interface, which is publicly available on

GitHub

. Since the required information is provided

by the previous steps and the existing code, this step is

fully automated and is therefore modeled as a BPMN

Service Task.

S5 (Code Completion): In this step, the developer

completes parts of the code that cannot be automati-

cally generated. Completion involves creating imple-

mentations of abstract classes deﬁned by the generic

version of the VKG interface. The required comple-

tions, such as the functionality to identify the class

of an instance, are therefore speciﬁed by the generic

version of the VKG interface. However, the imple-

mentation of these completions is highly dependent

on the speciﬁc software being used. As a result, this

step cannot be automated or supported by a tool and

relies entirely on the developer’s input. Therefore, it

is modeled as a BPMN Manual Task.

S6 (Code Compilation): The code now consists of

the generic version of the VKG interface, the auto-

matically generated code from S4, and the manually

added code from S5. The compilation is performed

automatically by a compiler within this step, which is

modeled as a BPMN Service Task.

S7 (Deployment): This ﬁnal step involves deploying

the compiled VKG interface. This may include tasks

such as copying the VKG interface to a dedicated add-

on folder or creating additional descriptive ﬁles that

enable the software to recognize the VKG interface

github.com/mxweigand/vmax core

as an add-on. The speciﬁc actions required are highly

dependent on the software in question. It is therefore

modelled as a BPMN Manual Task.

5 IMPLEMENTATION

To demonstrate our approach, we implemented a tool

that automates and supports several steps of the devel-

oped workﬂow. This includes steps S1, S2, and S4, as

well as all sub-steps of step S2 (S2.1 to S2.4). Step

S3 was omitted as it is optional. Step S5 was omitted

because it involves manual input and is intended to be

completed by the developer working directly on the

source code of the VKG interface. Step S6, which

is fully automated and handled by a compiler, was

also excluded, as we did not integrate the compiler

into our tool. Step S7 was also omitted because it is

a manual task. Blue borders in Figure 1 indicate the

implemented steps, while grey borders represent the

steps that were not implemented.

The tool is built around the HTML API docu-

mentation of a Java API, which was generated by

Javadoc. Our implementation utilizes NestJS

, a

Node.js

framework typically used to build server-

side applications, particularly for the backends of web

applications, using TypeScript. NestJS can also be

used to serve HTML documents as a frontend, in our

case the Javadoc HTML documents of the API docu-

mentation. Before serving each document, the tool in-

docs.nestjs.com

nodejs.org

Extracting API Structures from Documentation to Create Virtual Knowledge Graphs

291

OVERVIEW PACKAGE CLASS DEPRECATED INDEX HELP SAVECLEARGENERATE

⚙

SEARCH:

Modifier and Type

Method

Description

Collection <ActivityNode

C -

get_activityNodeOfRedefinedNode()

M +

Returns the value of the 'activity Node Of

Redefined Node' reference list.

Activity

C +

getActivity()

M +

Returns the value of the 'Activity' container

reference.

Collection <ActivityEdge

C -

getIncoming()

M +

Returns the value of the 'Incoming' reference list.

Collection <ActivityGroup

C +

getInGroup()

M +

Returns the value of the 'In Group' reference list.

Collection

<InterruptibleActivityRegion

getInInterruptibleRegion()

M +

Returns the value of the 'In Interruptible

Region' reference list.

Collection <ActivityPartition

getInPartition()

M +

Returns the value of the 'In Partition' reference

list.

StructuredActivityNode

C +

getInStructuredNode()

M +

Returns the value of the 'In Structured Node'

container reference.

Collection <ActivityEdge

C -

getOutgoing()

M +

Returns the value of the 'Outgoing' reference list.

Collection <ActivityNode

C -

getRedefinedNode()

M +

Returns the value of the 'Redefined Node'

reference list.

✕✕

Class/Interface CallAction (Package

com.nomagic.uml2.ext.magicdraw.actions.mdbasicactions)

Status: added

Added Superclasses

ActivityNode (Package

com.nomagic.uml2.ext.magicdraw.activities.mdfundamentalactivities)

Added Subclasses

No subclasses added yet

Added Methods/Attributes

No methods/attributes added yet

Figure 2: Method section of the Javadoc API documentation of a class with additional control elements.

tercepts and modiﬁes it by injecting additional HTML

components and JavaScript functions. These modi-

ﬁcations enhance the document with interactive ele-

ments needed for the workﬂow. After these changes,

the document is sent to the web browser, which dis-

plays it with the new components integrated. In addi-

tion to the frontend, which serves the modiﬁed HTML

documents, we also set up a backend. The backend

handles tasks such as executing S1, which involves

analyzing the entire API documentation, or, when the

developer selects a class in S2.1, adding the class to

a class list and automatically executing subsequent

steps of the workﬂow, such as S2.2.

For the demonstration, we used an example soft-

ware for which we had already developed and tested

the VKG interface in a previous publication (Weigand

and Fay, 2022). In that earlier work, the inter-

face was created manually, which proved to be a

time-consuming process. The software used in the

prior study was Cameo Systems Modeler by Das-

sault Syst

emes. In this demonstration, we utilized a

very similar software version: Magic Systems of Sys-

tems Architect (MSOSA). MSOSA is a multi-purpose

modeling software that supports various modeling

languages, including SysML. This software provides

a suitable environment to showcase our tool’s abil-

ity to streamline the VKG interface creation process,

improving efﬁciency compared to the fully manual

approach previously employed. MSOSA features a

Java API that allows for modiﬁcation of the soft-

ware’s functionality. This API can also be leveraged

for our purpose: extracting engineering information.

The API is documented using Javadoc, which forms

the basis for our tool. We developed a prototypical

tool

that implements the previously outlined steps

(S1, S2 and S4), using the Javadoc API documenta-

tion of MSOSA to automate parts of the process and

simplify the extraction of relevant information.

Pages of Javadoc documentation typically display

information about a particular class, including its pub-

lic methods. Figure 2 shows an exemplary page of the

API documentation of MSOSA which was modiﬁed

by our tool. The original API documentation is pub-

licly available online

. At the top right of the page,

within the navigation bar, general tools are available,

e.g. to generate code for the interface based on the

currently added API components (GENERATE). Af-

ter each link that leads to a page detailing another

class, two buttons are added. The ﬁrst button, marked

with C, indicates that the link directs to a class. Click-

ing this button opens an informational dialog, as il-

lustrated on the right side of Figure 2. The second

button (+ or -) allows developers to add or remove

the class from the list of relevant classes. After each

link leading to a method of the class, a similar button,

marked with M is added. On the right side of Figure 2

the informational dialog that appears when a devel-

oper clicks on the class information button is shown.

It shows whether the class has been added to the list,

along with all subclasses and superclasses, including

implicit ones. Additionally, the dialog displays all

github.com/mxweigand/vmax api doc browser

jdocs.nomagic.com/2024x

DTO 2024 - Special Session on Ontologies for Digital Twin

292

Figure 3: Example robotic process modeled in a SysML

Activity Diagram (element types indicated in grey boxes).

Figure 4: Extracted API structure.

methods of the class that have been added, along with

their return types. Links to the corresponding meth-

ods or classes are provided, allowing easy navigation

within the Javadoc documentation.

To demonstrate the effectiveness of the tool, we

selected a speciﬁc set of classes and methods using

the tool and then generated the corresponding code

for the VKG interface. In our previous publication

(Weigand and Fay, 2022), we introduced a generic

version of the VKG interface, which is available on

SELECT ? name WHERE {

? ac t i o n a ex : C all B eh a vi o r Ac t io n .

? ac t i o n ex : g etNa m e ? n a m e .

}

RESULTS

- - -- - -- -- - -- - - - -- - -- - - - -- - -- - - - -- - - - --

| n a m e |

= = == = == = == = == = = = == = == = == = == = == = = = == = ==

| " move a rm to t a rget p o s itio n " |

| " move pl a t form " |

| " move a rm to o b ject p o s itio n " |

| " de t e c t o b je ct " |

| " open gri p p er " |

| " clos e g r ipper " |

- - -- - -- -- - -- - - - -- - -- - - - -- - -- - - - -- - - - --

Listing 1: Exemplary SPARQL query to extract names of

CallBehaviorActions. Preﬁxes were omitted to save space.

GitHub

. This generic version includes key compo-

nents such as the logic to execute a query over the

VKG and is intended to be adapted to a speciﬁc soft-

ware. The adaption to MSOSA is also available on

GitHub

. As an example, we used a SysML Activ-

ity Diagram created in MSOSA, shown in Figure 3.

The diagram shows a simple pick-and-place process

for a robotic manipulator on a movable platform. The

objective was to extract all information represented

by the diagram elements via the VKG interface. In-

formation contained in the description of this pick-

and-place process might be relevant in further engi-

neering phases, such as programming the robotic ma-

nipulator that executes the shown process. The re-

sulting UML class diagram, shown in Figure 4, il-

lustrates the TBox of the VKG. In Figure 3, the di-

agram element types are also indicated using grey

boxes. The TBox of the VKG was ultimately created

by selecting the 10 classes and 3 methods shown in

Figure 4 by browsing the Javadoc with the developed

tool and adding classes and methods, which was car-

ried out within a few minutes. The generated code

amounted to 6 lines of code per class and 7 lines of

code per method, i.e. a total of 81 lines of code.

Subsequently, additional steps were taken to com-

plete and compile the entire VKG interface. A sam-

ple SPARQL query (see Listing 1) was created. The

query retrieves the names (?name) of all resources

that are of type ex:CallBehaviorAction and have

a property ex:getName. As shown in Figure 4, the

getName method of the API is deﬁned by the class

NamedElement. The class CallBehaviorAction

of the API is a subclass of this class and inherits

the method. The results of the query, also shown

github.com/mxweigand/vmax core

github.com/mxweigand/vmax plugin msosa

Extracting API Structures from Documentation to Create Virtual Knowledge Graphs

293

in Listing 1, correctly included the names of all 6

CallBehaviorAction instances shown in Figure 3.

Instances of other subclasses, e.g. the InitialNode

named init, were not returned by the query.

6 CONCLUSION AND FUTURE

WORK

The objective of this publication was to develop a

workﬂow to automate and support the creation of a

VKG interface for the API of an engineering soft-

ware. Several steps of the developed workﬂow were

implemented in a tool that signiﬁcantly simpliﬁed the

implementation effort for the VKG interface. The

process was streamlined by displaying the documen-

tation as true to the original as possible, with the addi-

tional required features added in a minimalist and in-

tuitive way. The navigational functions of the Javadoc

were preserved, ensuring a familiar environment for

the developer. Automated steps reduced manual effort

while minimizing errors, ensuring an efﬁcient and ac-

curate workﬂow for the VKG interface creation. Sev-

eral steps not covered in the tool’s implementation but

included in the developed workﬂow were carried out

manually. As discussed earlier, these steps are not au-

tomated due to their complexity and the need for cus-

tomization. However, creating guidelines and deﬁn-

ing requirements for these manual steps would still

be beneﬁcial. Such guidelines would assist in check-

ing prerequisites ahead of time to determine whether

the workﬂow is applicable to a particular API. They

would also help streamline the manual steps, making

them more efﬁcient, while reducing the likelihood of

errors during the process. This will be a focus of fu-

ture research aimed at enhancing the workﬂow.

ACKNOWLEDGEMENTS

This research is part of the project iMOD which

is funded by dtec.bw – Digitalization and Technol-

ogy Research Center of the Bundeswehr. dtec.bw is

funded by the European Union – NextGenerationEU.

REFERENCES

Ancona, D., Mascardi, V., and Pavarino, O. (2012).

Ontology-based documentation extraction for semi-

automatic migration of Java code. In The 27th Annual

ACM Symposium on Applied Computing.

Atzeni, M. and Atzori, M. (2017). CodeOntology: RDF-

ization of source code. In The Semantic Web – ISWC

2017.

Brickley, D. and Guha, R. (2014). RDF Schema

1.1. W3C Recommendation. http:// www.w3.org/ TR/

rdf11-schema/.

Cyganiak, R., Wood, D., and Lanthaler, M. (2014). RDF

1.1 Concepts and Abstract Syntax. W3C Recommen-

dation. http://www.w3.org/TR/ rdf-concepts/ .

De Meester, B., Heyvaert, P., and Delva, T. (2024). RDF

Mapping Language (RML) Unofﬁcial Draft 20 June

2024. https://rml.io/specs/rml/.

Dotoli, M., Fay, A., Mi

skowicz, M., and Seatzu, C. (2018).

An overview of current technologies and emerging

trends in factory automation. International Journal

of Production Research, 57(15-16).

Ekaputra, F. J., Sabou, M., Serral, E., Kiesling, E., and Bifﬂ,

S. (2017). Ontology-Based Data Integration in Multi-

Disciplinary Engineering Environments: A Review.

Open Journal of Information Systems, 4(1).

Fay, A., Bifﬂ, S., Winkler, D., Drath, R., and Barth, M.

(2013). A method to evaluate the openness of automa-

tion tools for increased interoperability. In IECON

2013 - 39th Annual Conference of the IEEE Industrial

Electronics Society.

Harris, S. and Seaborne, A. (2013). SPARQL 1.1 Query

Language. W3C Recommendation. https:// www.w3.

org/ TR/ sparql11-query/ .

Hitzler, P., Kr

otzsch, M., Parsia, B., Patel-Schneider, P. F.,

and Rudolph, S. (2012). OWL 2 Web Ontology Lan-

guage Primer (Second Edition). W3C Recommenda-

tion. https://www.w3.org/TR/ owl2-primer/ .

Hogan, A., Blomqvist, E., Cochez, M., d’Amato, C., Melo,

G. D., Gutierrez, C., Kirrane, S., Gayo, J. E. L.,

Navigli, R., Neumaier, S., et al. (2021). Knowledge

graphs. ACM Computing Surveys (Csur), 54(4).

Ledvinka, M. and K

remen, P. (2020). A comparison of

object-triple mapping libraries. Semantic Web, 11(3).

Object Management Group (2011). Business Process

Model and Notation (BPMN) Version 2.0. https:

//www.omg.org/spec/BPMN/2.0/PDF.

Reddy, M. (2011). API Design for C++. Morgan Kauf-

mann.

Sabou, M., Bifﬂ, S., Einfalt, A., Krammer, L., Kastner,

W., and Ekaputra, F. J. (2020). Semantics for Cyber-

Physical Systems: A Cross-Domain Perspective. Se-

mantic Web, 11(1).

Vogel-Heuser, B., Ocker, F., Weiß, I., Mieth, R., and Mann,

F. (2021). Potential for combining semantics and data

analysis in the context of digital twins. Phil. Trans. R.

Soc. A, 379:20200368.

Weigand, M. and Fay, A. (2022). Creating Virtual Knowl-

edge Graphs from Software-Internal Data. In IECON

2022 - 48th Annual Conference of the IEEE Industrial

Electronics Society.

DTO 2024 - Special Session on Ontologies for Digital Twin

294