Knowledge Graph Enrichments for Credit Account Prediction

Michael Schulze

1,2

and Andreas Dengel

1,2

Computer Science Department, Rheinland-Pf

alzische Technische Universit

at Kaiserslautern-Landau (RPTU), Germany

Smart Data & Knowledge Services Departm., Deutsches Forschungszentrum f

ur K

unstliche Intelligenz (DFKI), Germany

{ﬁrstname.lastname}@dfki.de

Keywords:

Knowledge Graphs, RDF, OWL, Rule-Based Link Prediction, Case-Based Reasoning.

Abstract:

For the problem of credit account prediction on the basis of received invoices, this paper presents a pipeline

consisting of 1) construction of an accounting knowledge graph, 2) enrichment algorithms, and 3), prediction

of credit accounts with methods of a) rule-based link prediction, b) case-based reasoning, and c) a combination

of both. Explainability and traceability have been key requirements. While preserving the order of invoices

in cross-fold validation, key ﬁndings in our scenario are: 1) using all enrichments from the pipeline increases

prediction performance up to 12.45 percent points, 2) single enrichments are useful on their own, 3) case-

based reasoning beneﬁts most from having enrichments available, and 4), the combination of link prediction

and case-based reasoning yields best prediction results in our scenario.

Paper page: https://git.opendfki.de/michael.schulze/account-prediction.

1 INTRODUCTION

A typical task of accountants is the assignment of

accounts based on received invoice data. For exam-

ple, on an invoice where employees visited a restau-

rant for a business meeting, the accountant needs

to ﬁrst assess the invoicing case correctly, and then

needs to ﬁnd and choose the correct ﬁnancial account.

Through interviews with accountants from our busi-

ness partner, we learnt that accountants in such situ-

ations face two major challenges: On the one hand,

as also reported by Bardelli et al. (Bardelli et al.,

2020), the search space regarding ﬁnding and choos-

ing accounts can be very high. For example, when ac-

countants are responsible for multiple company sub-

sidiaries, accountants need to tap into multiple ac-

counting handbooks where each handbook can be

several hundred pages long. On the other hand, to

assess the invoicing case correctly, accountants usu-

ally need to tap into several heterogeneous sources

of information. For instance, in the example of the

restaurant visit, the accountant may check a separate

participation list to decide whether guests have been

present or not. This would then result in different ac-

counts. This list may come from another ERP system

or, for example, by e-mail or some other intranet re-

source. Other sources of information can be, among

others, organizational policies (e.g., regarding han-

dling restaurant visits), accounting charts, tax regu-

lations, or other attachments. These challenges can

lead to long search times, especially for non-standard

cases, for novice accountants, and for new employees.

To tackle these challenges, this paper focuses on the

outlined problem of credit account prediction on the

basis of received invoices. The term credit account

is used here in accordance with related work, such

as in (Belskis et al., 2021) and (Belskis et al., 2020).

In line with insights gained from interviews with ac-

countants, and underlined by experiments reported in

Bardelli et al. (Bardelli et al., 2020), throughout ac-

count prediction literature (e.g. (Panichi and Lazzeri,

2023; Bardelli et al., 2020)), credit account predic-

tion is considered as especially challenging. How-

ever, current research focuses mainly on the goal of

automatizing account prediction (Bardelli et al., 2020;

Panichi and Lazzeri, 2023; Belskis et al., 2021); also

with the explicit goal to exclude knowledge work-

ers (e.g. (Belskis et al., 2021)). In contrast, but in

line with forecasts from Jain and Woodcock (Jain and

Woodcock, 2017), fully automatization is not seen

as realistic. This assessment is also underlined by

reported performance numbers in related work (e.g.

(Panichi and Lazzeri, 2023; Bardelli et al., 2020; Bel-

skis et al., 2021)), where results do not meet the level

of accuracy an automated system would require in

practice. Furthermore, through interviews with ac-

countants where guidelines from Lazar et al. (Lazar

et al., 2017) have been followed, explainability and

Schulze, M. and Dengel, A.

Knowledge Graph Enrichments for Credit Account Prediction.

DOI: 10.5220/0013181000003890

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 17th International Conference on Agents and Artiﬁcial Intelligence (ICAART 2025) - Volume 2, pages 441-452

ISBN: 978-989-758-737-5; ISSN: 2184-433X

441

traceability have been identiﬁed as key requirements.

While focusing on automatization, state-of-the-art lit-

erature does not consider these aspects as require-

ments in their approaches so far.

Therefore, a different perspective on account pre-

diction research is proposed: Instead of focusing

on the goal of full automatization, we propose fo-

cusing on the goal of assisting accountants in their

daily decision-making for the problem of account as-

signment. For credit account prediction, knowledge

graphs (Ehrlinger and W

oß, 2016) can be on the one

hand useful to put prediction results as well as enti-

ties occurring in their explanations in the context of an

existing accounting knowledge graph (Schulze et al.,

2022), thus integrating them with other heterogenous

accounting resources, and on the other hand useful

to leverage the structure of knowledge graphs for ma-

chine learning approaches to predict accounts, such as

link prediction as proposed for future work by Panichi

and Lazzeri (Panichi and Lazzeri, 2023).

This paper aims to continue these lines of thought

in related work and addresses therefore the follow-

ing research questions: 1) how to predict credit ac-

counts using knowledge graphs with approaches that

can provide explanations and traceable predictions,

and 2), how to improve predictions by enriching such

knowledge graphs based on internal graph data avail-

able. Regarding explainability, the scope of the pa-

per is to consider these requirements in the prediction

approaches. Because the paper explores the feasibil-

ity and performance of such approaches, data-driven

evaluation will be conducted. Therefore, the quali-

tative assessment on the daily usability of different

explanation styles is beyond the paper’s scope.

Section 2 presents related work on credit account

prediction and Section 3 the pipeline to predict ac-

counts. Section 4 presents the evaluation setup and

evaluation results which are then compared and dis-

cussed in Section 5. Finally, Section 6 draws conclu-

sions and gives an outlook for future work.

2 RELATED WORK

The research ﬁeld of credit account prediction is get-

ting more and more traction. One reason is the in-

creasing adoption of electronic invoices (Koch, 2019),

which is in Europe also politically enforced, and with

that the availability of invoice data in a digitized form.

With the goal of automatization, Bardelli et al.

(Bardelli et al., 2020) predicted accounts based on

electronic invoices. Available to the authors were two

datasets from accounting ﬁrms consisting of 32k and

34k invoice lines (not invoices) where 61% of invoice

lines have been used for account prediction. The rest

of the lines could not be assigned back to the respec-

tive entries in the account ledger. By considering ac-

count prediction for sent and received invoices, identi-

fying the latter as the more difﬁcult case, best results

have been generated with decision-tree-based meth-

ods combined with Word2Vec (Mikolov et al., 2013).

Panichi and Lazzeri (Panichi and Lazzeri, 2023)

focused on the scenario of account prediction on re-

ceived invoices and presented a semi-supervised ap-

proach which manages to increase accuracy by 4%

compared to the approach of Bardelli et al. (Bardelli

et al., 2020) (with different data in a different setting).

The performance increase could be achieved by ap-

plying the A* search algorithm (Hart et al., 1968), re-

sulting in a loss of training data (because of the same

linkage problem) of only 14% compared to 39% as in

Bardelli et al. (Bardelli et al., 2020). Further charac-

terizations of the data used in the experiments were

not given except that it was much smaller than the

data used in Bardelli et al. (Bardelli et al., 2020).

In contrast, Belskis et al. (Belskis et al., 2021;

Belskis et al., 2020) built their approaches not on the

basis of invoice data but on postings in ledger ac-

counts. They employed natural language processing

classiﬁcation algorithms and showed that the use of

comments in postings, which were manually created

by accountants, increased the performance of preci-

sion by 2.56 percent (Belskis et al., 2021) compared

to not using them (Belskis et al., 2020). The main goal

was also automatization without the consideration of

explainability.

Schulze et al. (Schulze et al., 2022) conducted

in the context of a knowledge service for accountants

credit account prediction as a side analysis. The ap-

proach was based on an accounting knowledge graph.

However, it was restricted to the scope of food- and

beverage-scenarios and required manually creation of

vocabulary for this particular scope. Still, by learning

decision trees, results indicated that knowledge graph

enrichments may increase prediction performance.

Table 1 summarizes the comparison of related

work. All the approaches use real-world data, which

is important for applying insights generated in ac-

count prediction research. The downside is that re-

sults are not reproducible with the very same data

because such data can reasonably not be published.

Therefore, no public benchmarks exist (Panichi and

Lazzeri, 2023). Given this situation, it is even more

important to provide all resources to enable reproduc-

tion of presented approaches and evaluations on own

invoice data. However, to the best of our knowledge,

none of the related work does this so far.

Furthermore, all approaches shufﬂe the training

ICAART 2025 - 17th International Conference on Agents and Artiﬁcial Intelligence

442

Table 1: Comparison of related work on credit account prediction (? = not explicitly mentioned).

real in- domain number number consider source time-series

world voice inde- of of explain- code preserving

data based pendent invoices accounts ability public evaluation

(Belskis et al., 2021) yes no yes - ? no no no

(Bardelli et al., 2020) yes yes yes ?, 66k inv. lines 130-166 no no no

(Panichi and Lazzeri, 2023) yes yes yes ?, < 10k inv. lines ? no no no

(Schulze et al., 2022) yes yes no 1267 6 yes no no

(This paper) yes yes yes 7k 161 yes yes yes

and test data. This can be problematical because in-

voices are time dependent and thus can be seen as

time-series data as described in Bergmeir and Ben

ıtez

(Bergmeir and Ben

ıtez, 2012). For example, an event

such as a celebration may spawn particular types of

invoices that may have not been received in such

amounts in the training period when there was no

similar celebration before. Even when having invoice

data over longer periods, then still are business mod-

els and supplier relationships respect to change and

with this also received invoices. Also, the accounts to

predict are adapted over time.

Therefore, we argue that preserving the order of

received invoices is important when evaluating pre-

dictive models. This also means that reported num-

bers in related work on predictive performance might

be too optimistic. For example, on our data, we

reach an accuracy increase of 16.41 percent points

when not preserving the order but shufﬂing the data

in cross-validation. Without preserving the order in

evaluation, reported performance numbers in related

work range from 74% to 89% for invoice-based ap-

proaches. However, comparison of prediction perfor-

mances across approaches is difﬁcult because of dif-

ferent used datasets and different numbers of accounts

to predict. The unavailability of open-source code to

use an account prediction approach as a baseline also

hinders the comparison. Finally, related work focuses

on automatization so that the aspects of explainability

and traceability are not considered or discussed so far.

This paper aims to address these aspects by 1)

focusing on explainable approaches by design using

knowledge graphs, 2) preserving the order of invoices

in evaluations, and 3) providing resources for apply-

ing the presented approach.

3 PIPELINE FOR CREDIT

ACCOUNT PREDICTION

Figure 1 summarizes the overall process for account

prediction with enriched knowledge graphs. Section

3.1 covers the construction of an accounting knowl-

edge graph from invoice data (Step 1). Section 3.2

elaborates on how the knowledge graph is enriched

with internal graph data (Step 2), and Section 3.3

elaborates on account prediction approaches (Step 3).

3.1 Accounting Knowledge Graph

Construction from Invoice Data

3.1.1 Invoice Data and Postings

Data used in this study comes from a holding com-

pany located in Germany and covers basic informa-

tion from received invoices and their related pro-

cesses. To illustrate how we come from invoice data

to account predictions with the proposed pipeline in

Figure 1, we will use the following ﬁctive example

throughout the rest of the paper. The data process-

ing steps with this example including inputs and out-

puts can also be followed in the demo folder on the

paper page. Consider an accountant wants to pro-

cess an invoice to assign credit accounts. The start-

ing point is the invoice data already represented in an

ERP system with which the accountant works, so that

for the approach the data is available in the CSV for-

mat. This setting is comparable with experiment set-

tings reported in Bardelli et al. (Bardelli et al., 2020)

and Panichi and Lazzeri (Panichi and Lazzeri, 2023).

For every invoice, the ERP system creates one process

and attaches a process number to it. Therefore, when-

ever in the rest of the paper we speak of a process,

we mean this process where data from one invoice

is attached to. The booking area can be automati-

cally derived from the recipient listed on the invoice.

Let the example process number be 193224 and the

booking area 7467. Furthermore, our example invoice

comes from Chimney Sweep Mrs. Happy, which

is the invoice issuer. The service description on the in-

voice is Maintenance service baker street 4.

This is all the data required from the received invoice.

However, the approach requires also historical data

of booked invoices in form of postings, which is also

comparable to experiment settings in related work.

Knowledge Graph Enrichments for Credit Account Prediction

443

1. Construction of

Accounting KG

2. Enrichment of

Accounting KG

3. Credit Account

Prediction

Ontology for Account Prediction

(following the NeOn Methodology

(Su

arez-Figueroa et al., 2015))

Apache Jena

Semantic Layer:

Topic Recognition

(Algorithm 1)

KG Enrichment

(Figure 2)

Rule-based link prediction

Prediction with case-based-reasoning

Prediction with a combination of both

Figure 1: Overview of the pipeline for credit account prediction based on enriched accounting knowledge graphs.

There may be multiple similar invoices, however, for

illustration, for the rest of the paper consider the fol-

lowing ﬁctive historic invoice with an assigned ac-

count (coming from the posting CSV). Process Num-

ber: 45564, Booking Area: 9874, Invoice Issuer:

Clean Chimney, Service Description: Maintenance

Service Stand. TI, Account: 5912.

The total data used in the experiments amounts to

7k invoices with 12k postings which have been re-

ceived in a couple of months within a year. In the

scope of this study, the 7k invoices represent the usual

scenario at the holding company where each invoice

has one account as the solution. By following a long-

tail distribution, 161 accounts could possibly be pre-

dicted. The service description typically consists of

one to seven words in German, comes usually from

a person who entered the service description manu-

ally, and often consists of organization- and business

domain-speciﬁc vocabulary including abbreviations.

3.1.2 Knowledge Graph Construction

RDF (World Wide Web Consortium, 2014b) was

used as the underlying data model of the knowledge

graph, SPARQL (Valle and Ceri, 2011) was used as

the query language, and OWL (Antoniou and van

Harmelen, 2004) was used as the ontology language.

This knowledge graph setup is in line with the notion

of a knowledge graph provided by Ehrlinger and

oß (Ehrlinger and W

oß, 2016), which also includes

an ontology. For the ontology development, the

NeOn Methodology Framework (Su

arez-Figueroa

et al., 2015) was followed which encompasses

the phases of specifying requirements, identifying

resources for the ontology, as well as restructuring

and evaluating the ontology (Su

arez-Figueroa et al.,

2015). The main requirement for the ontology

was to enable account prediction by downstream

prediction approaches. Examples for ’competency

questions’ (Gr

uninger and Fox, 1995) serving this

goal which also cannot be answered without such

an ontology in place are: Show all processes

that have 5912 as an account and have at

least one service description topic in

common or Show all organizations that have

‘Chimney’ as an invoice issuer topic and

where the processes(invoices) where these

organizations are listed on as sellers

have been booked on the account 5912. For

the ontology development, non-ontological resources

were used. These were columns of the CSV ﬁle

where invoice data was contained (invoice issuer,

service description etc.), columns of the posting

CSV ﬁle (e.g., account number), and headings in

accounting handbooks (e.g., account number, short-

and long-description). In the restructuring phase,

existing vocabulary has been reused, for example the

P2P-O ontology (Schulze et al., 2021) for electronic

invoices and the Organization Ontology (World Wide

Web Consortium, 2014a). Finally, the ontology

was checked for consistency, and it was evaluated

regarding pitfall patterns from the Oops! (OntOlogy

Pitfall Scanner!) (Poveda-Villal

on et al., 2014)

(which resulted, for example, in more detailed license

information). Also, the pre-formulated ’competency

questions’ (Gr

uninger and Fox, 1995) have been

tested. The ontology and accompanying resources

are available on our paper page for reuse.

Technically, by using this ontology, the account-

ing knowledge graph was constructed with the Java-

based framework Apache Jena

. With our input data,

the accounting knowledge graph had 102k triples.

Demo CSVs of the input data are also available on

the paper page to facilitate the preprocessing of own

invoice data.

The black subgraphs a) and b) in Figure 2 il-

lustrate an excerpt of the constructed knowledge

graph with vocabulary from the developed ontol-

ogy. Subgraph a) shows parts of the example

process (invoice) at hand and subgraph b) rele-

vant parts of a similar historic process, which also

has a reference to the account :a5912. Note

that in this excerpt the black subgraphs a) and b)

are not connected. In fact, they are connected,

for example, with the expression that the instances

:oChimneySweepMrsHappy and :oCleanChimney

https://jena.apache.org

ICAART 2025 - 17th International Conference on Agents and Artiﬁcial Intelligence

444

are both of the type org:FormalOrganization.

However, it is not shown in Figure 2 because every

organization has this relation. Consequently, down-

stream prediction approaches cannot make use of this

non-differentiating information.

3.2 Accounting Knowledge Graph

Enrichment

The goal of the enrichment step is to create the kind

of additional information in the knowledge graph that

downstream prediction approaches can use to make

better predictions. In the enrichment approach, such

information will be topics about service descriptions

of invoices and topics about invoice issuers. Further-

more, the enrichment approach should meet two ma-

jor requirements: First, the requirements of explain-

ability and traceability should also apply here to meet

these requirements throughout the pipeline. Second,

the enrichment process should be applicable across

domains and across invoice topics. Consequently,

the approach should be either unsupervised or semi-

supervised in order to manage the wide-ranging topics

of received invoices. The latter requirement is con-

trary to the requirements reported in the study con-

ducted by Schulze et al. (Schulze et al., 2022). There,

the domain was restricted to a food- and beverage-

scenario so that domain-speciﬁc vocabulary could be

created upfront. This is not feasible (or too costly)

when not restricting to a speciﬁc domain within a par-

ticular company.

First experimentations have been done with apply-

ing classical string-based similarity measures on ser-

vice descriptions which are implemented in the case-

based reasoning framework ProCAKE (Bergmann

et al., 2019), such as the Levenshtein-Distance (Lev-

enshtein, 1966). Groups of similar processes were

created which would then have been enriched to re-

spective processes. However, analysis of such groups

revealed that they had the tendency of being too big

with containing many unsimilar processes. One rea-

son may be that for this scenario, semantic similar-

ity cannot be implied by the similarity of the raw

strings. Also named entity recognition systems such

as RoBERTa (Liu et al., 2019) have been explored;

however, at least for German, only broad named entity

labels are supported (person, organization, location

and misc) which turned out not to be well suited for

the very domain speciﬁc language in received busi-

ness invoices.

The presented algorithm in Listing 1 for topic

recognition and the enrichment step illustrated with

the subgraphs c1) and c2) in Figure 2 are based on

the idea of the isolated mapping similarity introduced

in ProCAKE (Bergmann et al., 2019)

. However,

instead of directly comparing tokens of, for exam-

ple, two service descriptions with each other and then

computing a similarity value, as in the isolated map-

ping similarity, tokens are ﬁrst gathered (Algorithm

1) and then matched and enriched on the accounting

knowledge graph (subgraphs c1) and c2) in Figure 2).

In this way, the similarity calculation is pushed down-

stream to the account prediction approaches.

Algorithm 1 shows how topics are gathered from

a set of given service description strings SD(AKG)

and invoice issuer names II(AKG) from an account-

ing knowledge graph AKG. Service descriptions

sd ∈ SD(AKG) and invoice issuer names ii ∈ II(AKG)

are ﬁrst tokenized (including bi- and trigrams). Next,

they are added and counted into a set of tuples M

SDT

(for service descriptions) and M

IIT

(for invoice issuer

names), where M

SDT

= {⟨t

i⟩ : t ∈ T

, i ∈ N with

i > 0} where T

represents a set of tokens, and

where M

IIT

= {⟨t

′

⟩ : t

′

∈ T

, i

′

∈ N with i

′

> 0}

where T

also represents a set of tokens. However,

a tuple ⟨t

∈ T

, 1⟩ or a tuple ⟨t

∈ T

, 1⟩ is only

added when t

meets given quality criteria Q

and t

quality criteria Q

, which are given as an

input. By exploring meaningful topics in our data,

topics meet quality criteria Q

and Q

when a)

they have more than one character because two

characters could already represent important business

speciﬁc abbreviations, b) they have more than three

digits when a topic consists only of numbers because

this could be an identiﬁer, and ﬁnally c), they are

not a general language speciﬁc stop word. Finally,

Algorithm 1 returns with M

SDT

a set of counted

topics for service descriptions and returns with M

IIT

a set of counted topics for invoice issuer names.

Considering the service descriptions of the two

example processes (see also Figure 2), then topics in

SDT

would be {maintenance, service, baker,

str, maintenance service, service baker,

baker str, maintenance service baker,

service baker str, stand., ti, service

stand., stand. ti, service stand. ti}

(an excerpt is shown in Figure 2), and topics in

IIT

would be {chimney, sweep, mrs.,happy,

chimney sweep, sweep mrs., mrs. happy,

chimney sweep mrs., sweep mrs. happy,

clean, clean chimney}. An advantage of having

such separate sets is that they can be maintained

separately. For example, they can be sorted and

further reﬁned regarding inappropriate topics. Also,

for experimentations with a ﬁxed set, all topics that

occur only once can be removed because they do not

see https://procake.pages.gitlab.rlp.net/procake-

wiki/sim/collections/

Knowledge Graph Enrichments for Credit Account Prediction

445

a) b)

c1)

c2)

:p193224

:Process

’193224’

:oChimneySweepMrsHappy

:bk7467

’Maintenance service baker str 4’

:p45564

:a5912

’Misc. Service’

’Maintenance Service Stand. TI’

rdf:type

p2p-o:hasSeller

rdfs:label

p2p-o:serviceDescription

:hasBookingArea

:hasAccount

p2p-o:serviceDescription

:longDescription

:hasAccount

:oCleanChimney

p2p-o:hasSeller

’Chimney Sweep Mrs. Happy’ ’Clean Chimney’

rdfs:label

’maintenance’

’maintenance service’

’baker’ ’ti’

’chimney’’chimney sweep’ ’clean chimney’

:serviceDescriptionTopic

:invoiceIssuerTopic

Figure 2: Excerpt of the accounting knowledge graph with ﬁctive example data. The black subgraph a) represents the knowl-

edge graph constructed from the example invoice data at hand, the black subgraph b) represents the knowledge graph con-

structed from a prior invoice which has a reference to an account, the blue subgraph c1) represents examples for service

description enrichments, and the purple subgraph c2) represents examples for invoice issuer enrichments.

connect subgraphs.

The enrichment with the gathered topics from Al-

gorithm 1 is illustrated with the subgraphs c1) and

c2) in Figure 2: First, every service description string

and invoice issuer name in the accounting knowledge

graph is tokenized the same way as described in Al-

gorithm 1. Next, if a token equals a topic in M

SDT

or M

IIT

, it is added to the knowledge graph with

the predicate :serviceDescriptionTopic for ser-

vice descriptions and :invoiceIssuerTopic for in-

voice issuer names. With our data, 6k of such triples

have been enriched resulting in an enriched knowl-

edge graph with 108k triples.

3.3 Account Prediction

As depicted in Figure 1, Section 3.3.1 describes ac-

count prediction using a rule-based link prediction

(rule-based-LP) approach, Section 3.3.2 presents a

case-based reasoning (CBR) approach, and Section

3.3.3 elaborates on how expert knowledge from a

CBR System can be combined with rule-based-LP.

3.3.1 Account Prediction with Link Prediction

To enable the application of link prediction algo-

rithms for account prediction, the problem of account

prediction is ﬁrst interpreted and formalized as a link

prediction problem: Given is an accounting knowl-

edge graph AKG

Train

, a test knowledge graph KG

Test

a set of nodes N, a set of edges E, and the spe-

ciﬁc edge {: hasAccount} that denotes which process

has which account as the solution. Then, AKG

Train

consists of a set of tuples ⟨n, e, n

′

⟩ where n, n

′

∈

N, e ∈ E and ∃⟨n, e, n

′

⟩|e = {: hasAccount}. Fur-

ther, KG

Test

= {⟨n

′′

, {: hasAccount}, n

′′′

⟩ : n

′′

, n

′′′

∈

N }. Then, AKG

′

Train

= AKG

Train

\ KG

Test

. With

AKG

′

Train

, the goal is to predict tuples of KG

Test

that AKG

Predicted

= AKG

′

Train

∪ KG

Test

The enrichment subgraphs c1) and c2) in Figure

2 enable new kinds of rules that rule learners can

calculate. With enrichments from the prior step,

the dashed and dotted edges can be leveraged by a

rule learner for generating new rules which connect

subgraphs a) and b). For example, in Figure 2, a

ICAART 2025 - 17th International Conference on Agents and Artiﬁcial Intelligence

446

Data: SD(AKG),II(AKG), Q

, Q

Result: M

SDT

, M

IIT

0; T

0; M

SDT

0; M

IIT

Service description strings:

foreach sd ∈ SD(AKG) do

= tokenize(sd)

foreach t

∈ T

if Q

) then

if ∃{⟨t, i⟩} ∈ M

SDT

with t = t

then

{⟨t, i⟩} = {⟨t, i + 1⟩}

else

SDT

= M

SDT

∪ {⟨t

, 1⟩}

end

Invoice issuer names:

foreach ii ∈ II(AKG) do

= tokenize(ii)

foreach t

∈ T

if Q

) then

if ∃{⟨t

′

, i

′

⟩} ∈ M

IIT

with t

′

= t

then

{⟨t

′

, i

′

⟩} = {⟨t

′

, i

′

+ 1⟩}

else

IIT

= M

IIT

∪ {⟨t

, 1⟩}

end

Algorithm 1: Topic recognition given a set of service de-

scriptions, invoice issuer names, and quality criteria.

simple rule can be: :hasAccount(X,:a5912) ⇐=

:serviceDescriptionTopic(X,maintenance).

So, when a process X has the service description

topic maintenance, it is implied that process X has

the account :a5912, which is illustrated with the

green edge in Figure 2. Imagine that this was the case

in 9 out of 10 cases in AKG

′

Train

, then this particular

rule would have a conﬁdence value of .90.

To implement this formalization, the bottom-up

rule learning algorithm of AnyBURL (Meilicke et al.,

2024) was used, which will be more detailed in the

Evaluation Setup Section 4.1.1. This state-of-the-

art algorithm was chosen because of its good perfor-

mance recently shown (Meilicke et al., 2024), and be-

cause it fulﬁlls the requirements of explainability by

generating traceable rules with a symbolic approach.

3.3.2 Account Prediction with CBR

Because historic invoices can be seen as solved cases,

and because accountants often use historic cases as

the basis for a solution for a current task, case-based

reasoning (CBR) approaches (Kolodner, 2014) can

provide account predictions based on historic cases

where similarity functions can be used for explaining

and tracing back such predictions. As a CBR frame-

work, we used ProCAKE (Bergmann et al., 2019)

because this framework offers with its focus on pro-

cesses a suitable integration into the process-oriented

accounting domain. A process (with invoice data)

is seen as a case which is deﬁned in an aggregate

class with the following set of attributes (which is

denoted as A): a process number p ∈ A, an invoice

issuer name iin ∈ A, a booking area ba ∈ A, an in-

voice service description sd ∈ A, a set of enriched

service description topics SD

enriched

∈ A, a set of en-

riched invoice issuer topics IIN

enriched

∈ A, and an

account solution acc ∈ A. Considering local cal-

culated similarities between attributes of two cases

local

(iin, iin

′

) ∈ R, s

local

(sd, sd

′

) ∈ R, . . . , the aggre-

gate similarity function s

aggregate

: A → R was deﬁned

as follows:

aggregate

local

(iin, iin

′

), s

local

(sd, sd

′

), s

local

(ba, ba

′

local

(SD

enriched

, SD

′

enriched

local

(IIN

enriched

, IIN

′

enriched

), w

1,2...5

∈ R : 0 ≤

1,2...5

≤ 3)

7→

local

(iin, iin

′

) × w

) + (s

local

(sd, sd

′

) × w

) +

local

(ba, ba

′

) × w

)

+(s

local

(IIN

enriched

, IIN

′

enriched

) × w

) +

local

(SD

enriched

, SD

′

enriched

) × w

)

Note that the attributes process number p and ac-

count solution acc are not part of S

aggregate

. This

is because in the former case, it has no inherent

information value, and in the latter case, this is the

attribute to be predicted. However, when retrieving

similar cases, these attributes are of high value

for the accountant. Local similarity calculations

of s

local

(iin, iin

′

), s

local

(sd, sd

′

) and s

local

(ba, ba

′

)

were conducted using the string-equals similarity

measure in ProCAKE (Bergmann et al., 2019),

and calculations of s

local

(SD

enriched

, SD

′

enriched

) and

local

(IIN

enriched

, IIN

′

enriched

) using ProCAKES’s

(Bergmann et al., 2019) isolated mapping similarity

measure. Concrete weights w

1,2...5

∈ R were found by

applying hyperparameter optimization with ’random

search’ (Rastrigin, 1963), which will be covered in

more detail in the Evaluation Setup Section 4.1.2.

With respect to the example processes, by following

this approach, the attributes (invoice issuer name,

service descriptions (as well their topics), etc.) will

be compared as reported, and then an aggregate

similarity value will be calculated by applying the

weights from S

aggregate

. This similarity value indi-

cates the similarity between the process at hand (here

Knowledge Graph Enrichments for Credit Account Prediction

447

:p193224) and the historic process :p45564. These

calculations are conducted for every historic process

in the knowledge graph, and the accounts of the most

similar processes (here e.g. :a5912) are predicted

according to their aggregate similarity values.

3.3.3 Account Prediction with the Combination

of Expert Knowledge from CBR and

Rule-Based Link Prediction

A strength of rule-learners is that they are unsuper-

vised without hyperparameter optimization required

in case of AnyBURL (Meilicke et al., 2024). A

strength of CBR is the consideration of human-in-the

loop in the reﬁnement phase bringing additional ex-

pert knowledge in (Kolodner, 2014). The basic idea

of combining these approaches is, on the one hand,

to make use of the predictive performance of rule-

learners, and on the other hand, to reﬁne such pre-

dictions with expert knowledge from a CBR system.

Subsequently, in a reﬁnement phase, the target ac-

counts where additional expert knowledge from CBR

should be tailored to, are the accounts where Any-

BURL (Meilicke et al., 2024) underperforms. For

each such account, the basis for analysis is, on the

one hand, the long description of the account in ac-

counting handbooks, and on the other hand, the reach-

able closure of all invoices in the training knowledge

graph AKG

′

Train

that should have yielded the correct

tuple in KG

Test

. On this basis, the expert deﬁnes ad-

ditional topics M

′

SDT

and M

′

IIT

as well as a respective

case which can be used for deriving the right account

solution. When now the service description or invoice

issuer name contains such topics in M

′

SDT

and M

′

IIT

the respective case and account solution is retrieved

and ranks higher than the predictions calculated with

aggregate

. Further, in the course of account predic-

tion with AnyBURL (Meilicke et al., 2024), when ex-

pert knowledge is available, then the account solution

coming from the CBR system is put at the ﬁrst posi-

tion of the suggested predictions, and all other predic-

tions of AnyBURL (Meilicke et al., 2024) are moved

down by one position.

Imagine that in the example service description

Maintenance Service Stand. TI, the TI stands

for an abbreviation well-known within the organiza-

tion meaning a particular type of maintenance (e.g.,

with dust measurement) which is only typical for the

account :a5912. When this information is added in

the reﬁnement phase of a CBR system, and when TI

occurs in a service description at hand, then :a5912

is suggested as the most likely account followed by

the predictions of AnyBURL.

1 2

5 6

11 12

5 6

11 1212

11 12

15 16

11 12

Figure 3: Illustration of how 17 folds have been used

for cross-fold validation inspired by Bergmeir and Ben

ıtez

(Bergmeir and Ben

ıtez, 2012) considering the time-series

nature of invoice data where rectangles represent folds for

training, light circles folds for validation, and darker ﬁlled

circles folds for testing, resulting in eight datasets.

4 EVALUATION

Section 4.1 demonstrates the applied evaluation setup

and developed resources which make the different ap-

proaches comparable. Section 4.2 reports computing

performances, and Section 4.3 presents the results.

4.1 Evaluation Setup

Although rule-based-LP and CBR do not require mass

data for learning, 7k invoices and the resulting 108k

triples in the knowledge graph are still a medium-

sized sample for evaluation purposes. Therefore, in

line with best-practices in related work (e.g.(Panichi

and Lazzeri, 2023)), cross-fold validation was ap-

plied. However, as introduced in Related Work, in-

voices can relate to each other and can be time de-

pendent. Therefore, we argue to preserve the received

order of invoices in evaluations, also within the folds.

Figure 3 demonstrates in more detail how 17 folds of

the dataset have been used for training, validation, and

testing. As also depicted, this procedure resulted in

eight training-, valid- and test-datasets.

The code to enable cross-fold validation on in-

voice data where the order is preserved and the output

is consumable by AnyBURL (Meilicke et al., 2024)

and ProCAKE (Bergmann et al., 2019) is published

on the paper page. To enable comparative evaluation,

it is also considered that training-, valid- and test-

datasets can be built with different degrees of seman-

tic enrichment: 1) without any enrichment, 2) with

invoice issuer name enrichment, 3) with service de-

scription enrichment, and 4) with both, invoice issuer

name and service description enrichment (denoted as

full enrichment for short).

ICAART 2025 - 17th International Conference on Agents and Artiﬁcial Intelligence

448

4.1.1 Evaluation Setup AnyBURL

Accordingly, with AnyBURL (Meilicke et al., 2024),

32 (4×8) models have been trained which then have

been evaluated on the 32 test datasets. Because Any-

BURL does not require any hyperparameter optimiza-

tion (Meilicke et al., 2024), and in line with the orig-

inal AnyBURL evaluation (Meilicke et al., 2019), the

valid datasets were also not used here. For experi-

ments, the latest version 2023 was applied

. Rules

have been learnt for 100 seconds because longer time

spans did not result in increased performance. All

default parameters have been applied except for the

maximum length of cyclic and acyclic rules. This

parameter was increased from 2 to 3 because enrich-

ments are this one hop deeper in the graph which oth-

erwise would not have been considered by the rule

learner. Furthermore, because we only want to pre-

dict :hasAccount relations, rules have been learnt

for this single relation. Finally, rules have been ap-

plied to predict hits @1 to hits @10 for each tuple

⟨n

′′

, {: hasAccount}, n

′′′

⟩ in KG

Test

4.1.2 Evaluation Setup ProCAKE

For ProCAKE (Bergmann et al., 2019) however, the

valid datasets have been used for optimizing hyper-

parameters. Here, hyperparameters are the weights

1,2...5

∈ R of the similarity function S

aggregate

as de-

scribed in Section 3.3.2. For optimization, ’random

search’ (Rastrigin, 1963) was applied where for each

training set models have been learnt and then have

been evaluated on the respective valid dataset. The

hyperparameter conﬁguration with the best perfor-

mance on the valid dataset was then used for evaluat-

ing the trained model on the test dataset. Depending

on the enrichment level, weights of other enrichments

have been set to zero. Account solutions of the top 1

to top 10 most similar retrieved cases have then been

used to calculate hits @1 to hits @10.

4.1.3 Evaluation Setup Combination of

ProCAKE and AnyBURL

The evaluation setup of the approach where expert

knowledge from the CBR system is combined with

AnyBURL (Meilicke et al., 2024) was conducted

as follows. First, with custom evaluation code, the

performance of the AnyBURL-model of every fold

was evaluated on the valid dataset so that the per-

formance for each account was evaluated separately.

Then, the top 5 most frequently underperforming ac-

counts have been considered for increasing the per-

formance by integrating expert knowledge into Pro-

https://web.informatik.uni-mannheim.de/AnyBURL/

CAKE (Bergmann et al., 2019). This was done with

the procedure described in Section 3.3.3. Next, CBR

with additional expert knowledge was iteratively eval-

uated on the valid datasets, and with the ﬁnal set of ex-

pert knowledge evaluated on the test datasets. Finally,

the predictions of AnyBURL on the test datasets have

been evaluated again but with consideration of expert

knowledge coming from the CBR system. Because

we were interested whether the combined approach is

on the one hand feasible and on the other hand is able

to increase the overall performance, and because of

higher manual effort in providing expert knowledge,

the presented procedure was conducted on the fully

enriched scenario but on all eight training,- valid- and

test-datasets. Also with this procedure, hits @1 to hits

@10 have been calculated.

4.2 Performance

All experiments were conducted with a laptop having

an Intel(R) Core(TM) i7-10750H CPU @2.60GHz

and 32GB RAM. As reported, rules have been learnt

by AnyBURL for 100 seconds. Applying the learnt

models which contained 34420 rules on average on

one test fold, which contained ca. 400 :hasAccount

relations, took on average 37211ms (full enrichment

scenario). For CBR, applying models with concrete

weights on one fold took on average 1257ms (also full

enrichment scenario). Enriching the whole account-

ing knowledge graph with service description topics

(as illustrated in Figure 2) took 3957ms and with in-

voice issuer topics 775ms. Applying Algorithm 1 for

topic extraction took 401ms for the whole knowledge

graph (351ms for service descriptions and 50ms for

invoice issuer names).

4.3 Evaluation Results

Table 2 summarizes the evaluation results. As shown

in Figure 3, eight models have been learnt and vali-

dated for each level of semantic enrichment (32 mod-

els AnyBURL (Meilicke et al., 2024) and 32 mod-

els ProCAKE (Bergmann et al., 2019)). Accordingly,

means for hits @1 to hits @5 and hits @10 are de-

picted. For the full enrichment scenario, results for in-

cluding expert knowledge from ProCAKE into Any-

BURL are depicted as well as the increase of percent

points when using full enrichments compared to no

enrichments.

Knowledge Graph Enrichments for Credit Account Prediction

449

Table 2: Evaluation results for credit account prediction with four different degrees of semantic enrichment.

hits@1 hits@2 hits@3 hits@4 hits@5 hits@10

Without Enrichment

ProCAKE (Bergmann et al., 2019) .4143 .4913 .5362 .5627 .5970 .6690

standard deviation .0686 .0561 .0464 .0470 .0428 .0380

AnyBURL (Meilicke et al., 2024) .4827 .5789 .6277 .6621 .6863 .7877

standard deviation .0284 .0291 .0379 .0361 .0376 .0403

Invoice Issuer Enrichment

ProCAKE (Bergmann et al., 2019) .4722 .5538 .5910 .6200 .6477 .7200

standard deviation .0660 .0544 .0498 .0508 .0451 .0329

AnyBURL (Meilicke et al., 2024) .5027 .6079 .6579 .6885 .7118 .8062

standard deviation .0414 .0306 .0329 .0435 .0487 .0435

Service Description Enrichment

ProCAKE (Bergmann et al., 2019) .5111 .5802 .6193 .6445 .6652 .7270

standard deviation .0516 .0455 .0465 .0444 .0446 .0408

AnyBURL (Meilicke et al., 2024) .5273 .6321 .6803 .7083 .7313 .8087

standard deviation .0368 .0438 .0448 .0478 .0486 .0419

Invoice Issuer Enrichment and

Service Description Enrichment

ProCAKE (Bergmann et al., 2019) .5387 .6145 .6464 .6786 .6977 .7528

standard deviation .0518 .0413 .0439 .0405 .0383 .0360

AnyBURL (Meilicke et al., 2024) .5333 .6462 .6956 .7243 .7446 .8231

standard deviation .0382 .0428 .0448 .0511 .0525 .0445

Expert Knowledge from ProCAKE

(Bergmann et al., 2019) combined with

AnyBURL (Meilicke et al., 2024) .5400 .6522 .7013 .7300 .7504 .8285

∆ percent points between no to

full enrichment for ProCAKE

(Bergmann et al., 2019) 12.45% 12.32% 11.02% 11.59% 10.07% 8.38%

∆ percent points between no to

full enrichment for AnyBURL

(Meilicke et al., 2024) 5.06% 6.73% 6.79% 6.21% 5.83% 3.54%

5 DISCUSSION

5.1 Comparison Between no and Full

Enrichment

According to Table 2, one key-ﬁnding is that for both,

ProCAKE (Bergmann et al., 2019) and AnyBURL

(Meilicke et al., 2024), applying both enrichments in-

creases the performance for hits @1 to hits @5 com-

pared to not having enrichments available. Overall,

the effect of using both enrichments is for ProCAKE

double as high as for AnyBURL. This may be due to

the comparable low performance of the CBR System

when having no enrichments available, and the bet-

ter capability of the rule learner to deal with less use-

ful information by generating still more ﬁne-grained

rules. By looking at concrete predictions generated

by ProCAKE and AnyBURL in the no-enrichment

scenario, it is observable that big groups of three to

six accounts with the same conﬁdence values are pre-

dicted. When enrichments come in, these big groups

disappear, and both approaches successfully handle to

make more ﬁne-grained distinctions. This may also

explain why the effect of enrichments is less for hits

@10 compared to hits @1 to hits @5. However, con-

sidering our rationale of assisting accountants by sug-

gesting a range of predictions, especially hits @3 to

hits @5 are relevant.

5.2 Comparison Between Invoice Issuer

and Service Description Enrichment

As depicted in Table 2, another ﬁnding is that for both,

CBR and AnyBURL (Meilicke et al., 2024), also sin-

gle enrichments of invoice issuer names and service

descriptions result in a performance increase. This

ICAART 2025 - 17th International Conference on Agents and Artiﬁcial Intelligence

450

indicates that these enrichments on their own are use-

ful. However, it is notable that service description en-

richments have a bigger impact on the performance

increase than invoice issuer enrichments. This is in-

teresting because during hyperparameter optimization

for the similarity function S

aggregate

in CBR (see Sec-

tion 4.1), we observed that when having no enrich-

ments available, then the invoice issuer name was

typically weighted higher than the service descrip-

tion, which implies that in this case the invoice issuer

name was more useful. However, in line with the ob-

served results, when having both enrichments avail-

able, then the service description enrichments typi-

cally have been weighted higher than invoice issuer

enrichments. The reason may lie on the one hand

in the higher variance of service description strings

compared to invoice issuer names, which implies that

service descriptions can be better used for making dis-

tinctions, and on the other hand, in the higher number

of tokens in service descriptions which results in more

enrichments. This indicates that when applying the

enrichments on invoice data, the usefulness may be

dependent on the variance of the set of strings, which

is typically higher for service descriptions and less

for invoice issuer names. In our data, there are twice

as many distinct service descriptions as invoice issuer

names. To conclude, considering the special charac-

teristics of such strings described in Section 3.1, re-

sults in Table 2 indicate that in such cases knowledge

graph enrichments may have an impact on predictive

performance.

5.3 Combining Expert Knowledge from

CBR with Rule-Based LP

By following the procedure described in Section

3.3.3, including expert knowledge in CBR and com-

bining this with AnyBURL (Meilicke et al., 2024)

yields overall the best results. With this, one ﬁnd-

ing is that it is feasible to enhance rule-based-LP by

including expert knowledge in a way so that intended

account solutions are returned, and at the same time

false positives can be mostly avoided.

The following extensions may be beneﬁcial to

increase the performance on KG

Test

further. First,

the integration of more ﬁne-grained expert knowl-

edge which is contained in accounting handbooks. In

this way, more expert knowledge would be available

which may trigger on unseen invoicing cases. Sec-

ond, the consideration of not only top 5 underper-

forming accounts but more. Analogously, more ac-

counts would be covered by expert knowledge which

increases the chance of coverage on unseen data. We

restricted to top 5 underperforming accounts because

for cross-fold evaluation, the procedure of bringing in

expert knowledge needed to be manually executed 40

times (5 × 8 because of top ﬁve underperforming ac-

counts and eight training-, valid-, and test- datasets).

Third, the consideration of more dimensions from the

invoice data, such as individual invoice position lines,

in order to use them as information for applying ex-

pert knowledge.

6 CONCLUSION AND OUTLOOK

The research questions of this paper were how to pre-

dict credit accounts using knowledge graphs with ap-

proaches that can provide explanations and traceable

predictions, and how to improve such predictions by

enriching accounting knowledge graphs with internal

data. With a three main-step pipeline, it was shown

how accounting knowledge graphs can be constructed

and leveraged for predicting accounts by using rule-

based link prediction, case-based reasoning, and a

combination of both. Such approaches have in com-

mon that they can explain and trace back predictions.

Furthermore, it was shown how accounting knowl-

edge graphs can be enriched from internal data al-

ready available in the knowledge graph so that pre-

diction performance increases. Regarding evaluation,

it was highlighted to preserve the order of received in-

voices: on the one hand because of the time-series na-

ture of invoices, and on the other hand, to gain more

realistic insights when aiming to deploy such a sys-

tem. Furthermore, to apply the pipeline and the eval-

uation on own invoice data, all resources are available

at our paper page. Because related work did not pub-

lish source code of their approaches so far, with this

contribution it is now possible to use the provided re-

sources as a ﬁrst baseline for future work. The usage

of basic invoice information and with that little pre-

processing necessary may facilitate this.

For future work, research on the perceived useful-

ness of different explanation styles provided by CBR

and rule-based-LP approaches can be fruitful. This is

particularly interesting considering the ﬁndings that

CBR and rule-based-LP produce similar prediction

results when having both enrichments available. Also,

while explanations in form of similar cases and corre-

sponding similarity values are in the context of CBR

conceptually rather straight forward, in the context of

rule learners it poses another challenge to aggregate

and represent a set of rules from a prediction in a

way so that they are meaningful and useful for daily

knowledge work.

Knowledge Graph Enrichments for Credit Account Prediction

451

ACKNOWLEDGEMENTS

We would like to thank our business partner for

the fruitful cooperation and all the support. We

thank Christian Zeyen for CBR-related discussions

and valuable feedback. We also thank Mahta

Bakhshizadeh, Desiree Heim, Christian Jilek, Heiko

Maus, and Markus Schr

oder for fruitful discussions

and inspiration. Finally, we would like to thank

anonymous reviewers for their valuable feedback.

REFERENCES

Antoniou, G. and van Harmelen, F. (2004). Web ontology

language: OWL. In Handbook on Ontologies, Interna-

tional Handbooks on Information Systems, pages 67–

92. Springer.

Bardelli, C., Rondinelli, A., Vecchio, R., and Figini, S.

(2020). Automatic electronic invoice classiﬁcation us-

ing machine learning models. Mach. Learn. Knowl.

Extr., 2(4):617–629.

Belskis, Z., Zirne, M., and Pinnis, M. (2020). Features

and methods for automatic posting account classiﬁ-

cation. In Databases and Information Systems - 14th

Intern. Baltic Conf., volume 1243 of Communications

in Computer and Information Science, pages 68–81.

Springer.

Belskis, Z., Zirne, M., Slaidins, V., and Pinnis, M. (2021).

Natural language based posting account classiﬁcation.

Balt. J. Mod. Comput., 9(2).

Bergmann, R., Grumbach, L., Malburg, L., and Zeyen, C.

(2019). Procake: A process-oriented case-based rea-

soning framework. In Workshops Proc. for (ICCBR

2019), Otzenhausen, Germany, September 8-12, 2019,

volume 2567 of CEUR Workshop Proceedings, pages

156–161. CEUR-WS.org.

Bergmeir, C. and Ben

ıtez, J. M. (2012). On the use of

cross-validation for time series predictor evaluation.

Inf. Sci., 191:192–213.

Ehrlinger, L. and W

oß, W. (2016). Towards a deﬁnition of

knowledge graphs. In Joint Proc. of the Posters and

Demos Track of SEMANTiCS2016, Leipzig, Germany,

September 12-15, 2016, volume 1695 of CEUR Work-

shop Proceedings. CEUR-WS.org.

uninger, M. and Fox, M. S. (1995). The role of compe-

tency questions in enterprise engineering. pages 22–

31. Springer US, Boston, MA.

Hart, P. E., Nilsson, N. J., and Raphael, B. (1968). A formal

basis for the heuristic determination of minimum cost

paths. IEEE Trans. Syst. Sci. Cybern., 4(2):100–107.

Jain, K. and Woodcock, E. (2017). A road map for digitiz-

ing source-to-pay. McKinsey.

Koch, B. (2019). The e-invoicing journey 2019-2025.

billentis GmbH.

Kolodner, J. L. (2014). Case-Based Reasoning. Morgan

Kaufmann.

Lazar, J., Feng, J., and Hochheiser, H. (2017). Research

Methods in Human-Computer Interaction, 2nd Edi-

tion. Morgan Kaufmann.

Levenshtein, V. I. (1966). Binary codes capable of cor-

recting deletions, insertions, and reversals. In Soviet

physics doklady, volume 10, pages 707–710.

Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D.,

Levy, O., Lewis, M., Zettlemoyer, L., and Stoyanov,

V. (2019). Roberta: A robustly optimized BERT pre-

training approach. CoRR, abs/1907.11692.

Meilicke, C., Chekol, M. W., Betz, P., Fink, M., and

Stuckenschmidt, H. (2024). Anytime bottom-up rule

learning for large-scale knowledge graph completion.

VLDB J., 33(1):131–161.

Meilicke, C., Chekol, M. W., Rufﬁnelli, D., and Stucken-

schmidt, H. (2019). Anytime bottom-up rule learn-

ing for knowledge graph completion. In Proceedings

of IJCAI 2019, Macao, China, August 10-16, 2019,

pages 3137–3143. ijcai.org.

Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and

Dean, J. (2013). Distributed representations of words

and phrases and their compositionality. In Proc. of

27th Annual Conference on Neural Information Pro-

cessing Systems 2013. Lake Tahoe, Nevada, United

States, pages 3111–3119.

Panichi, B. and Lazzeri, A. (2023). Semi-supervised classi-

ﬁcation with a*: A case study on electronic invoicing.

Big Data Cogn. Comput., 7(3):155.

Poveda-Villal

on, M., G

omez-P

erez, A., and Su

arez-

Figueroa, M. C. (2014). Oops! (ontology pitfall scan-

ner!): An on-line tool for ontology evaluation. Int. J.

Semantic Web Inf. Syst., 10(2):7–34.

Rastrigin, L. A. (1963). The convergence of the ran-

dom search method in the extremal control of a many

parameter system. Automaton & Remote Control,

24:1337–1342.

Schulze, M., Pelzer, M., Schr

oder, M., Jilek, C., Maus, H.,

and Dengel, A. (2022). Towards knowledge graph

based services in accounting use cases. In Proceed-

ings of Poster and Demo Track of SEMANTiCS 2022,

Vienna, Austria, September 13th to 15th, 2022, vol-

ume 3235 of CEUR. CEUR-WS.org.

Schulze, M., Schr

oder, M., Jilek, C., Albers, T., Maus,

H., and Dengel, A. (2021). P2P-O: A purchase-to-

pay ontology for enabling semantic invoices. In The

Semantic Web, ESWC 2021, Virt. Event, June 6-10,

2021, Proc., volume 12731 of LNCS, pages 647–663.

Springer.

arez-Figueroa, M. C., G

omez-P

erez, A., and Fern

andez-

opez, M. (2015). The neon methodology framework:

A scenario-based methodology for ontology develop-

ment. Appl. Ontology, 10(2):107–145.

Valle, E. D. and Ceri, S. (2011). Querying the semantic

web: SPARQL. In Handbook of Semantic Web Tech-

nologies, pages 299–363. Springer.

World Wide Web Consortium (2014a). The organization

ontology, http://www.w3.org/tr/vocab-org/.

World Wide Web Consortium (2014b). Rdf 1.1 primer,

http://www.w3.org/tr/2014/note-rdf11-primer-

20140624/.

ICAART 2025 - 17th International Conference on Agents and Artiﬁcial Intelligence

452