Semiotic Knowledge Models for Personal Knowledge Repositories

Stefano Casadei

Memorilog, Cambridge, MA, U.S.A.

Keywords: Knowledge Representation, Personal Knowledge Representation, Semiotic Model.

Abstract: Knowledge graphs have been used successfully to represent and acquire general knowledge and have also

been proposed for personal knowledge representations. While general knowledge data can be modelled

statistically as being a noisy projection of universal (and crisp) entities, categories, and relationships, personal

knowledge data requires a more refined model: each user’s peculiarities and fluctuations in associating words

with meanings and meanings with words should be tracked and analysed instead of being treated as noise and

averaged out. This position paper describes a semiotic knowledge model whose primitives are the

signification events which occur when symbols such as words and linguistic expressions are associated with

an instantaneous meaning. Semiotic structures constructed from these primitives with users’ active

participation, enable them to create, update, modify, organize, re-organize and curate detailed and

comprehensive representations of their own personal knowledge by means of their own personal

terminologies, taxonomies, and organizational schemes.

1 INTRODUCTION

Computer-implemented representations of

knowledge and information of different kinds are

widespread and underly many of our everyday

activities, from doing a search on the internet to

booking an airline ticket. While technologies for

creating repositories of general knowledge have

advanced significantly during the last few years, the

problem of representing and exploiting personal

information lacks an equally successful solution.

Knowledge graphs have been proposed as a

means for representing personal information (Balog,

Mirza, & Skjaeveland, 2022; Montoya, & al. 2018)

but, as it has been pointed out by (Balog & Kenter,

2019) the particular nature of personal information

presents unique challenges still in search of definitive

solutions: 1) Entities of personal interest are typically

mentioned only a few times and information about

them can be sparse. This makes it difficult to apply

statistical and deep-learning methods commonly used

for general knowledge. 2) Relations in a personal

knowledge scenario may be short-lived and common

relation extraction methods may not applicable. 3)

Users may be more inclined to use their own

organization schemes based on “freely defined

https://orcid.org/0009-0004-0094-8392

semantic categories” rather than external ontologies

To address these issues and to develop friendly

user interfaces for knowledge repositories, this

position paper describes a knowledge model whose

symbolic primitives are more refined than those used

in knowledge graphs, making it possible to build

more powerful and more flexible symbolic

constructs.

The constituents of typical knowledge graphs

(named entities, types, attributes, relationships, etc.)

and of many other types of knowledge

representations, are crisp and universal: their

meanings are assumed to be well defined, constant

over time and invariant across all users. Data is

modeled as a noisy projection of these underlying

universal crisp constituents and the goal of

knowledge acquisition (harvesting) is traditionally

viewed as one of removing this noise and of

“cleaning” the data to recover the underlying

constituents. Consequently, many knowledge

acquisition methods are of a statistical nature and

require large amount of data to counter the noise

(Ilyas & Chu, 2019; Weikum, Dong, Razniewski, &

Suchanek, 2021; Kejriwal, Knoblock, & Szekely,

2021).

240

Casadei, S.

Semiotic Knowledge Models for Personal Knowledge Repositories.

DOI: 10.5220/0012209100003598

In Proceedings of the 15th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2023) - Volume 2: KEOD, pages 240-247

ISBN: 978-989-758-671-2; ISSN: 2184-3228

It is well-known, however, that entities, attributes,

relationships, and especially types and categories are

not always crisp and universal. For example, users

think more in terms of “natural kinds” than rigid

categories (Rusell, Norvig, & Davis, 2010). This is

particularly important in a scenario where knowledge

is produced by users describing what is important to

them. Consider for example two users, Alia and

Barouk, who maintain a list of their friends. Alia may

want to include co-workers in the same category of

friends, while Barouk may prefer to maintain two

distinct categories, one for friends and one for co-

workers. Even worse, users change their mind on how

they use vague categories such as “friends”, resulting

in inconsistencies even within a knowledge

repository created by a single user; for example, Alia

may decide, down the road, that her list of friends

(which includes initially co-workers) has grown too

big and that she wants to redefine the meaning of

“friends” to exclude co-workers.

2 SEMIOTIC MODELS

Semiotic fluctuations due to the erratic semiotic

behavior of users, illustrated by the example above,

should be tracked instead of being smoothed-out. A

system that represents personal knowledge should

adapt to the idiosyncrasies of each user rather than

imposing an average, universal, same-for-everybody

interpretation of vague terms such as “friends”.

2.1 Signification Events

This motivates the adoption of a more refined

knowledge model, which we call a semiotic

knowledge model, whose primitives are the

signification events which occur when a symbol is

paired up with a particular interpretation or

instantaneous meaning. Knowledge is intimately

related to representations: known things, facts,

events, situations, rules and laws are those for which

an agent possesses an internal representation. Internal

representations rest still in some repository, providing

static knowledge, until they are recruited by a

signification event yielding a fragment of dynamic

knowledge, which is the manifestation of the

representational activity of the agent.

A signification event (SE) is somewhat related to

what semioticians call a sign, which comprises

something, called a signifier (or symbol) which stands

for something else (signified), the represented entity

(Chandler, 2007).

Speaking results in signification events. Consider

the following example. John is at his desk chatting

with his friend Mary over the internet. Suddenly, a

mouse jumps on John’s desk and John tells Mary:

“There is a mouse on my desk!”. Mary replies:

“What’s new, there is always a mouse on your desk!”.

The word “mouse”, a symbol, yields (at least) four

SEs in this exchange; a first one, which is an efferent

SE, occurs when John maps an internal mental

representation of the rodent he has just seen to the

word “mouse”; a second SE, which is an afferent SE,

occurs when Mary hears “mouse” and maps this word

to an internal mental representation of a computer

device; a third one (efferent) occurs when Mary utters

“mouse” and a fourth one (afferent) occurs when John

hears “mouse”. The instantaneous meaning in the first

SE is a rodent, whereas it is a computer device in the

last three (assuming John understood the intended

meaning of Mary’s sentence).

Note that the utterance of a sentence involves a

burst of signification events corresponding to the

grammatical components of the sentence: “desk”,

“my desk”, “on my desk”, “a mouse on my desk”, and

the whole sentence “There is a mouse on my desk” all

yield signification events.

To better visualize a semiotic model of

knowledge, it may be useful to assign space-time

coordinates to SEs which identify the location of the

agent at which the SE occurs (and perhaps even the

specific location within an agent where the

representation of a symbol is stored) and the time at

which the SE occurs. A SE becomes then a semiotic

point, where the term point indicates, in addition to

its space-time embedding, its primitive and atomic

nature as a constituent of signification and dynamic

knowledge: the instantaneous co-presence of a

signifier and a signified is the minimum requirement

to establish a representation and a fragment of

knowledge. The ensemble of semiotic points yields

the semiotic field.

The semiotic knowledge model described here

represents semiotic points by immutable symbols

called semiotic point representations (SPR) and uses

these to build dynamic and adaptive semiotic

structures, which can represent all types of

information elements (categories, named entities,

lists, properties, relationships, facts, facts about facts,

etc.) by adopting and tracking over time each user’s

terminology and organizational schemes.

In the context of personal knowledge

representations where several people (e.g., the

members of a family) share the same database and

contribute information to it, a basic SPR can be

constructed by concatenating (1) an identifier for the

Semiotic Knowledge Models for Personal Knowledge Repositories

241

symbol involved in the SE; (2) an identifier of the

information contributor; (3) a timestamp of when the

information was entered. This basic SPR can be

enriched by any kind of available contextual

information to yield a more informative SPR (e.g.: the

place where the contributor was when he/she entered

the information, his/her mood, etc.).

2.2 Modes of Operation

Users enter information into the system in one of

several ways: (1) Direct entry method: users speak or

type information such as primitive linguistic symbols

(words, names, etc), primal symbols (numbers, dates,

physical quantities) and simple natural language

expressions; (2) Compositional entry method: users

select existing symbol by browsing the repository and

compose them into new composite symbols, for

example, by manipulating widgets displayed on a

screen via a drag-and-drop interface. The

compositional method, which produces structured

linguistic expressions (SLEs), described later,

minimizes and simplifies the recognition step since

the user utilized symbols already known to the

system. (3) Analytical method: Unstructured

linguistic expressions, such as “It rains today”,

entered directly into the system, must be converted

into SLEs, either manually, or with the assistance of

a suitable natural language processing module.

By combining the above entry methods, users can

introduce their own terminology into the system and

reuse it. In addition to these entry methods, users can

operate at the level of semiotic structures to update,

revise, organize and re-organize their information.

Users can also import (portions of) external

dictionaries and ontologies and adapt them to their

personal ones.

2.3 Ingestion, Matching, Inference

(Balog & Kenter, 2019) point out that entity linking,

population of the repository and detection of new

entities (nil-detection) are intertwined in personal

knowledge representations: the semiotic model

proposed here comprises a symbol matching

component which carries out these three steps in a

unified fashion every time the repository system

ingests a packet of information delivered by the user.

The repository acts like an active memory and

attempts to recognize every symbol occurring in the

burst of signification events (SEs) produced by the

input data; every detected input symbol is matched

against the symbols stored in memory. Different

methods and data structures are used for symbol

matching: K-nearest neighbours algorithms to search

primal symbols embedded in metric spaces;

“vertical” compositional hypotheses lists, obtained by

tracking and recording the usage of symbols in

composite symbols, to detect potential matches;

“horizontal” relaxation, to deal with noise by

extending the search to neighbouring symbols; and

rule-based reasoning, which attempts to infer

searched symbols from existing symbols.

It is well known that there is a trade-off between

the expressive power of a representation language and

the mathematical computational properties of its

reasoning capabilities (Suchanek, 2020). The

representation language adopted here, defined by

parametric symbols and the structured linguistic

expressions (SLEs) described later, does not place

any restrictions on the kind of information that users

can try to communicate to the knowledge repository.

Reasoning is viewed here more as a sequence of

internal signification events than a crisp logical

computation; its grounding is more of a statistical

nature than an axiomatic one, so that issues of

plausibility and defeasibility of the derived results

take precedence over decidability. As in many

knowledge models based for example on modal logic

or fuzzy logic, truth is not an absolute value, and the

goal of reasoning is more one of growing large

regions of coherence (both logical and semiotic

coherence) than one of deducing true facts missing

from the repository. An inference graph method,

which complements the search method by using rules

to complete partial matches detected by the search

module, will be described elsewhere.

2.4 The Symbol Abstraction Hierarchy

The term “symbol” is used here in a quite broad sense

and includes both “concrete” and “abstract” symbols.

Three abstraction levels for symbols are considered:

materialized (or concrete) symbols, and two levels of

abstract symbols: symbol forms and multiform

symbols.

Materialized Symbols are physical embodiments of

symbols and are physically connected to the

signification events (SEs) which they trigger, or by

which they are created or “activated”. Materialized

symbols occupy a region in space-time. For example,

a road sign positioned near a road intersection triggers

signification events whenever someone sees it; a

neuron or a pool of neurons in the visual cortex of a

primate is a materialized symbol which is activated

when a particular visual stimulus is presented; a

chunk of memory cells in a computer holding a digital

KEOD 2023 - 15th International Conference on Knowledge Engineering and Ontology Development

242

representation of a word triggers signification events

whenever a computer program reads the memory

cells and acts on it based on a meaning assigned to the

word. Distributed representations also have

materialized embodiments; however, they are not

uniquely identified by space-time coordinates since

multiple distributed representations share the same

region in space-time. Mental representations (which

are arguably distributed representations) also have

materialized embodiments, even though it is not

always clear what they are.

Symbol Forms. It is convenient to (conceptually)

group a collection of materialized symbols which are

invariant under some transformation into a symbol

form. Symbol forms are abstract symbols, where

“abstract” means here non-physical, that is, not

localized in space-time. Examples of symbol forms:

(1) the equivalence class of all instances of the

character string “mouse”, across all printed

documents. (2) The equivalence class of all stored bit

sequences representing the string “mouse” in a

computer (3) The equivalence class of all visual

representations of the string “mouse” on a computer

screen.

Multiform Symbols. As illustrated by the previous

example, what is commonly called a symbol, such as

the word “mouse”, is typically associated with

multiple forms: printed form, digital form, displayed

form, etc. Hence it is called a multiform symbol.

Multiform symbols are a useful concept to build

efficient software implementations of knowledge

representations. Specifically, with an object-oriented

programming language (OOPL) such as Java, it is

possible to construct objects which provide detailed

representations of any kind of entity. These objects

can be quite complex, and their size can be very large;

think for example of the list of all filenames of

someone’s digital photos. It makes sense then to

introduce more compact representations of these

objects.

The most compact representation form of a

symbol is arguably an integer which identifies the

memory address of a record which defines the symbol

(this is roughly how Java represents the values held

by variables). Composite symbols built from

constituents can be represented by the array of integer

identifiers of its constituents, called the compositional

code of the multiform symbol. For example, a large

list of file names can be represented more compactly

by representing each file name with an integer rather

than a string containing the file name. The full-

fledged form of the object, given by an object wherein

all nested identifiers have been expanded, is called the

exploitable form of the symbol. Hybrid

compositional codes contain some constituents in

identifier form and some in exploitable form.

Optimized variable-complexity representations are

obtained by cleverly managing the computational

forms of a symbol so that only those components that

are needed in exploitable form are expanded.

The different forms of a multiform symbols all

have same putative (assigned) meaning although they

do not have the same intelligibility and exploitability:

form conversions, which are semantically invariant,

are needed to enable efficient computation and

communication.

2.5 Primal Symbols

Semioticians distinguish between three types of

symbols: indexical, iconic and conventional

(Chandler, 2007). Indexical ones are those which are

physically or causally linked to their signifiers,

whereas for conventional symbols the link is

established via conventions (iconic symbols are not

discussed here). An analogous distinction can be

made between primal and linguistic symbols.

Whereas linguistic symbols are plagued with all the

issues due to the conventional nature of their meaning

(ambiguities, redundancies, context-dependence of

the meaning, vagueness), primal symbols are those

which are assumed to be free from these problems.

Every symbol of a mathematical nature (numbers,

real vectors, etc.) or issued from a formal language is

a primal symbol. Physical quantities, such as 3kg, and

standard-defined entities, such as GPS coordinates,

are primal symbols. Character strings, stripped of any

linguistic meaning, can also be treated as primal

symbols.

Primal symbols usually belong to metric spaces

whose metric structure is important for search and

matching. For example, K-nearest-neighbours

algorithms can be used when searching for a match to

a primal symbol, and ad-hoc clustering algorithms

can be used for organizing and grouping occurrences

of primal symbols. It seems appropriate to store and

maintain primal symbols in ad-hoc memory slices or

databases where these operations can be carried out

more efficiently.

2.6 Structured Linguistic Expressions

Natural languages are arguably the most powerful

representation systems and can be used to represent

all types of entities. The term “entity” is used here in

Semiotic Knowledge Models for Personal Knowledge Repositories

243

the broadest sense possible: anything which can be

referred by a linguistic expression is an entity,

whether it “exists” or not. Things, stuff, living beings,

abstract concepts, relationships, statements, facts,

facts about facts, rules, etc., are entities which can be

represented (signified) by a symbol, notably by a

linguistic expression, which is encoded here as a

Structured Linguistic Expression (SLE).

One simple way to obtain an SLE is to begin with

an unstructured linguistic expression, that is, a

character string containing a natural language

expression, and to build a parametric linguistic

symbol representing an n-ary predicate or relation, by

replacing one or more fragments of this character

string with “fillable slots”. For example, from: “There

is a mouse on my desk”, one can build the parametric

symbol “There is #1 on #2”, obtained by replacing “a

mouse” and “my desk” with fillable slots denoted #1

and #2.

A parametric symbol can be viewed as a lambda

expression or a function which maps tuples of

symbols to a symbol. We use the notation <*| to

indicate such a function, and |*> to denote the

arguments passed to the function. We further

associate an integer identifier to these symbols so that

the construction of an SLE encoding for our example

can be represented by the following “script”:

$0< There is a mouse on my desk>

$1< There is #1 on #2|

$2|a mouse>

$3|my desk>

$4[1,2,3]< There is #1 on #2|a mouse, my desk>

The multiform symbol with identifier $4 has a

compositional code form [1,2,3] and an exploitable

form <There is #1 on #2|a mouse, my desk> as

indicated by the last line of the script.

The symbol $4 is an SLE encoding of the original

unstructured expression $0; note that even though

they can be declared to be semantically equivalent, $0

and $4 are two distinct multiform symbols.

Unary predicates, such as <#1 is a mouse|, can be

converted into category symbols denoted, for

example, {#1 is a mouse|. An equivalent symbol for

this category is {mice|. Conversely, a category can be

converted to a predicate, for example <mice| is a

predicate equivalent to <#1 is a mouse|.

Samples of categories are obtained by specifying

N arguments which are typically the names associated

with the items in the sample; for example, two mice

named “Jerry” and “Billy” can be represented as {#1

is a mouse| Jerry, Billy or {mice|Jerry, Billy}. A

category sample can also be built by specifying an

integer which indicates the number of (unnamed)

items in the sample; for example, “a mouse” is

encoded as {mice|1}. Category sample can also be

used to denote quantities of “stuff”, for example

{water|1 litre} denotes one litre of water.

Categories can be used to restrict the allowed slot

fillers of a parametric symbols, for example:

<The capital of <states|#1> is <cities|#2> |

is a restricted parametric symbol.

Note that SLEs can contain other SLEs as

constituents, which gives users practically unlimited

expressive power; facts about facts, beliefs, etc. can

be easy expressed as an SLE, for example, “João

believes that it will rain today” can be encoded as:

$0<it will rain today>

$1<#1 believes that #2 | João, $0>.

On the other hand, unless constraints are

introduced, it is possible to build nonsensical SLEs

and statements such as $0<$0 is false>, which cannot

be assigned a truth value.

2.7 Symbol Groupings

In addition to linguistic symbol composition

(obtained via parametric symbols) and category

samples (normally obtained via lists of items),

symbols are grouped according to the following

organizing principles. (1) Topological groupings are

obtained by grouping symbols related by proximity or

similarity. These groupings provide relaxation

regions used when searching for symbol matches; for

example, when searching for a word, a synonym

found in a relaxation region can be returned as a valid

potential match. Clusters in primal metric spaces are

also topological groupings. (2) Inferential groupings

are obtained by grouping symbols which participate

in an inferential derivation (e.g., a syllogism). These

groupings provide a justification for an inferred

symbol and can be useful to assign plausibility scores

and to compare alternative or incompatible states of

affairs. (3) Descriptive aggregates enhance the

description of some entity by combining, for

example, SLEs which refer to a common entity. For

example, $1<$0<Li> lives in Boston> and

$2<$0<Li> likes movies> can be grouped into a

descriptive aggregate {$1,$2}. As another example, a

category symbol can be aggregated with restricted

parametric symbols which use that category as a

restricting category, so that a user can be presented

with a list of available statement builders applicable

to members of that category. Note that from the point

of view of knowledge graphs, a descriptive aggregate

can be viewed as the reification of a star-shaped

subgraph centered at the “descripted” entity.

KEOD 2023 - 15th International Conference on Knowledge Engineering and Ontology Development

244

Taming Complexity. Symbol groupings play an

important role for (1) building a flexible and

incremental representation system; (2) representing

the “topology” of symbols (which, in turn, is essential

to make the system robust) and (3) making search and

matching with relaxed compositional hypotheses lists

more efficient by gathering an entity’s features under

a unifying symbol identifier. However, they also

introduce a great deal of complexity if they are

allowed to proliferate in an uncontrolled way. One

answer to this problem is to organize the multiple

manifestations of an entity (including symbol

groupings), which emerge from the interactions with

users and their data, into semiotic structures, as

described later in more detail.

A second difficulty arises from the potentially

very large size of the computational objects needed to

represent complex symbolic constructs, such as

symbol groupings. Think for example to a computer

program that maintains a table to represent all the files

or all the emails of a user: it is clearly impossible to

hold such objects permanently in live memory.

Multiform symbols and variable-complexity

representations play a role here by converting each

symbol to the most appropriate computationally

optimized symbol form, ranging from simple integer

identifiers to full-fledged exploitable forms (with

hybrid compositional code in between).

2.8 Contextualized Symbols

In the analysis of documents, a distinction is made

between word types and word tokens. We generalize

this distinction by considering word types a subclass

of de-contextualized or context-free symbols and

word tokens a subclass of contextualized symbols.

Words such as <mouse> and sentences such as

<There is a mouse on my desk> are context-free

symbols because no context is specified.

A contextualized symbol is one for which some

contextual information is specified.

Plucked symbols are one type of contextualized

symbols. They are obtained by “plucking” a

constituent symbol from a composite symbol. For

example, the occurrence of $0<a mouse> in

$1<#1 ran across my desk | $0<a mouse>>,

denoted by the symbol coordinate $1.1 ($n.k denotes

the k-th constituent of $n) is a contextualized symbol

because a context has been specified for the symbol

$0, namely a sentence in which it occurs. Note that $0

and $1.1 are quite different symbols: $0 can be either

rodent or a device, whereas $1.1 is (most likely) a

rodent.

Semiotic Point Representations (SPR). Semiotic

point representations, which represent signification

events, are contextualized symbols. Recall that a

signification event occurs when a materialized

symbol gets connected to a referent or signified. The

sequence of signification events triggered by a

materialized symbol corresponds to a sequence of

time samples of the spatial region occupied by it. For

example, a road sign planted at a crossroad is

“sampled” every time someone sees it and

understands its meaning (and also when the sign is

misinterpreted).

There are different types of SPR which differ in

the amount of information they convey about the

signification event. A bare semiotic point identifier

simply identifies a signification event by providing

the space-time coordinates that uniquely identify it

but does not convey any information about the

meaning conveyed. For example, specifying the GPS

coordinates of a road sign and the times at which it

has been seen identifies a sequence of signification

events, but does not provide any information about

what the sign meant to those who viewed it.

An informative SPR is one which does provide

useful information to recover the instantaneous

meaning of the represented signification event. A

typical informative SPR specifies the symbol

involved in the signification event plus some

contextual information that restricts its possible

interpretations. In the context of information

extraction, word mentions, which are often associated

with a few surrounding words, can be viewed as

informative SPRs. A word along with the sentence in

which it appears does not, however, specify a unique

semiotic point since the word and the enclosing

sentence are created once, yielding an initial semiotic

point; and then read multiple times by different

readers, yielding many additional semiotic points.

2.9 Semiotic Structures

All multiform symbols discussed in detail up to now,

that is, primal symbols, parameterized symbols, SLEs

(including categories and category sample), symbol

groupings of various kinds, SPRs, are immutable

symbols. Once created, they can be stored

permanently in a repository and assigned an identifier

which can be used confidently to refer to the symbol,

with a guarantee that the symbol does not change,

except for semantically invariant form conversions.

For this reason, they can be called stock symbols.

It should be noted that an immutable symbol is not

one whose meaning is necessarily immutable (unless

it is a primal symbol). For example, the symbols

Semiotic Knowledge Models for Personal Knowledge Repositories

245

<today> and <you> are immutable but their meaning

clearly depends on the context in which they are used.

Semiotic structures are mutable symbols. A

semiotic structure designates a representative symbol

which can be replaced by another representative

symbol when the semiotic structure needs to be

updated. An identifier of a semiotic structure

identifies a symbol which can change over time,

differently from identifiers of immutable symbols.

There is a crisp and fundamental distinction between

immutable and mutable symbols.

We now describe some of the ways a semiotic

structure evolves over time, usually with the active

intervention of a user.

State Updates. The list of my friends must be

updated every time I meet a new friend. To enact this

update, a new immutable symbol representing the

new list of friends is created and the semiotic

structure representing “my friends” is updated with

the new list, which becomes the new representative

symbol of the semiotic structure (the new list can of

course be represented more efficiently as an edit to

the old list if, for example, only one friend has been

added). This type of update is called a state update

because it reflects a state change in the underlying

entity represented by the semiotic structure. As

another example, a semiotic structure representing the

GPS coordinates of my car must undergo a state

update every time the car moves.

Informative Updates. State updates should be

contrasted with informative updates, which occur

when new information is provided about the

represented entity (and the underlying entity does not

change). For example, suppose a user Xiu, while

tagging pictures with people’s names, has quickly

introduced a new person into the database by a simple

keyword such as $0<Chen>. Later, Xiu realizes that

she knows two Chen’s, so that she needs to provide a

more informative representation of the first Chen, for

example, by means of the more informative symbol

$1<coworkers|Chen>. This informative update is also

a refinement update because not only does it provide

additional information about the entity (that is, that

Chen is a co-worker), but it also reduces the

ambiguity of the symbol, hence refining the set of

objects it may refer to.

A purely informative update which does not

reduce ambiguity is referred to as a descriptive

update. For example, suppose Maria knows only one

Pablo but she wishes to enhance her representation of

Pablo by including his phone number. In a knowledge

graph this would be done by adding one edge; in the

currently proposed model this could be done creating

an SLE such as $2<The phone number of #1 is #2 |

$0< Pablo >, (617)-123-4567> and then by plucking

“Pablo” from this SLE, to yield the plucked symbol

$2.1. An alternative way is to create a descriptive

aggregate which annotates $0 with $2.

Renaming updates are used for renaming an

entity; for example, if Maria decides to rename Pablo

to “Pablito”, the current representative of the semiotic

structure, say <Pablo>, is replaced with <Pablito>.

Consolidation Updates. Suppose that after using the

repository for a while, Maria has mentioned Pablo

multiple times so that the repository now contains

multiple SLEs having <Pablo> as a constituent.

While browsing through the repository and seeing

multiple mentions of Pablo, Maria decides to gather

all the information about Pablo in one place and

creates a descriptive aggregate about Pablo: this new

descriptive aggregate becomes the new representative

of the semiotic structure representing Pablo. Note that

a consolidation update is an informative update.

Consolidation updates play a crucial role in taming

the proliferation of symbols referring to the same

entity and ought to be triggered automatically by the

system when necessary.

Splitting. An shown earlier, fluctuations arising with

the use of vague symbols such as <friends> may lead

to inconsistencies. Nested semiotic structures can be

used to track these semiotic fluctuations and to

represent the structure of vague entities. Specifically,

an initial semiotic structure representing a vague

symbol such as <friends> can spawn two nested

semiotic structures by creating two refinements of

<friends>, one which includes co-workers and one

which exclude co-workers. These two refined

symbols become the initial representatives of the two

nested semiotic structures, each of which inherits one

portion of the semiotic history of the original vague

symbol.

Merging. Two semiotic structures whose meanings

are similar or overlapping can be combined into a

coarse semiotic structure. For example, suppose that

a group of family members has independently kept

lists of <people> but they have (unconsciously)

assigned slightly different meanings to the symbol

<people>. For example, Alia has excluded fictional

character, such as Harry Potter, from her list (she has

a separate category for them); Elif has also excluded

people he has never met (he has a special category

KEOD 2023 - 15th International Conference on Knowledge Engineering and Ontology Development

246

“celebrities” for famous people he has never met);

Canan has instead included all speaking entities into

his list including cartoon characters such as Tom the

cat and Jerry the mouse, just because he was too lazy

to create a special category for them. It is finally

decided to consolidate the databases of all family

members and for doing so, a coarse semiotic structure

named <people> is created and the 3 <people>

categories appear as nested categories within the

coarse <people> category. The flavors of <people>

“invented” by each family member is not lost and is

now organized under a vague category symbol that

acknowledges each one of three interpretations.

Extension-by-Reference and Rewinds. A semiotic

structure not only keeps track of the updates of its

representative symbol but also of the references to its

current representative symbol from other symbols. In

other words, a semiotic structure maintains an

historical record of all the signification events in

which it participates. A user may use this history to

fine-tune the meaning of a symbol. For example, the

meaning of a symbol may drift over time and a user

may realize that the a past usage of the symbol

represents better its current intended meaning than the

last occurrence of the symbol: a rewind operation is

then executed.

2.10 Reflection

Semiotic structures are meta-symbols: the history of

signification events they contain (in the form of

SPRs) represent symbols representing something.

The elementary operations just described, occurring

at the semiotic structure level: state updates,

information updates, refinements, splits, merges,

extension-by-reference, rewinds, consolidations, etc.,

accompany every population step and curation step.

To carry out any of these operations, an agent

(human or machine) must engage in reflection, which

involves recollecting or reconstructing the past

meanings of symbols. Reflection results in a semiotic

link being created between a current signification

event and a past signification event which have been

assessed by an agent to pertain to the same underlying

entity. The two linked signification events become

then part of the same semiotic structure.

A linear semiotic structure is obtained when an

agent always determines that the current meaning of

a symbol is the same as its previous occurrence.

Bifurcations occur when rewinds are necessary due,

for example, to semiotic drift.

3 CONCLUSIONS

A knowledge representation model has been

described which enables users to enter, modify

organize, re-organize, and curate their own personal

information by leveraging their own terminologies

and organizational schemes. We believe that ideas

discussed in this position paper can be used to

develop an interactive repository with a user-friendly

interface to store and recall personal information. One

should also explore the possibility of using semiotic

structures (and other symbolic representations

described here) to enhance knowledge graphs in

general, for example, to represent knowledge

extracted from large amounts of text.

REFERENCES

Balog, K., & Kenter, T. (2019). Personal Knowledge

Graphs: A Research Agenda. ICTIR 2019, (pp. 217-

200). Santa Clara, CA, USA.

Balog, K., & Kenter, T. (2019). Personal Knowledge

Graphs: A Research Agenda. ICTIR 2019, (pp. 217-

200). Santa Clara, CA, USA.

Balog, K., Mirza, P., & Skjaeveland, M. G. (2022, June).

Report on the Workshop on Personal Knowledge

Graphs (PKG 2021) at AKBC 2021. ACM SIGIR

Forum, Vol. 56, No. 1.

Chandler, D. (2007). Semiotics, Routledge. London and

New York, 2

edition.

Ilyas, I. F., & Chu, X. (2019). Data Cleaning. ACM.

Kejriwal, M., Knoblock, C. A., & Szekely, P. (2021).

Knowledge Graphs. Cambridge, Massachusetts.

London, England: The MIT Press.

Montoya, D., Pellissier Tanon, T., Abiteboul, S., Senellart,

P., & Suchanek, F. M. (April 2018). A knowledge Base

for Personal Information Management. LDOW 2018.

Lyon, France.

Rusell, S., Norvig, P., & Davis, E. (2010). Artificial

Intelligence: a Modern Approach. 3rd eds. Saddle

River, NJ: Prentice Hall.

Suchanek, F. (2020). The need to move beyond Tripls.

Proceedings of the Text2Story'20 Workshop. Lisbon,

Portugal: R. Campos, A. Jorge, S. Bathia (eds.).

Weikum, G., Dong, X., Razniewski, S., & Suchanek, F.

(2021, August 23). Machine Knowledge: Creation and

Curation of Comprehensive Knowledge Bases.

Foundations and Trends in Databases.

Semiotic Knowledge Models for Personal Knowledge Repositories

247