Personalized Semantic Resources
The SemComp Project Presentation and Preliminary Works
Alexandre Labadi
´
e
1
, St
´
ephane Ferrari
1
and Thibault Roy
2
1
GREYC, Caen, France
2
Noopsis, Caen, France
Keywords:
Personalized Semantic Resources, Componential Semantic, RDF.
Abstract:
This paper presents the computational aspects of the SemComp project, a multidisciplinary collaboration
aiming at observing how interacting with documents acts on knowledge acquisition. It is based on a model
for personalized semantic resources inspired from componential linguistics. The paper describes the advances
in both the computational model’s definition as well as its implementation in a Web oriented application.
Functionalities and technical choices are presented with regards to the expected experiments.
1 INTRODUCTION
In this paper, we present the SemComp project, an on-
going research project funded by the French Region
Basse Normandie. It aims to experiment a model for
semantic representation in the applicative context of
enhancing personal access to documents. The exper-
imentation will be realized through a Web-oriented
application, in teaching environments or for cultural
tourism purpose. SemComp is a multidisciplinary
collaborative project involving linguists, psycholo-
gists and computer scientists. We thus intend to reach
multiple goals: testing a new implementation of a lin-
guistic model where personal interpretation is central,
collecting data to observe how interacting with doc-
uments acts on lexical and semantic knowledge ac-
quisition, testing a method for personalized access to
information.
In section 2, we briefly present the model for se-
mantic representation. It is a simplification of a lin-
guistic approach to componential semantics for ap-
plicative purposes: casual Web users with or without
linguistic knowledge may use it. In section 3, we de-
tail some aspects of the implementation of Personal-
ized Semantic Resources (PSR in the following) using
Web semantic tools and standards. To conclude, we
present future works and evolution of the project.
2 SEMANTIC MODEL
SemComp stands for S
´
emantique Componentielle,
“Componential Semantics”.
2.1 Motivation
The model for semantic representation and, as a
consequence, for semantic analysis, is inspired by
the structural linguistic approach. It relies on the
main notions of the “Componential Semantics” and
the “Interpretative Semantics” as proposed in, e.g.,
(Greimas, 1966; Pottier, 1992; Rastier, 1987). These
notions are detailed in section 2.2.
In the last decade, NLP works based on a similar
approach have already been realized. In (Beust et al.,
2003), a componential model has been tested for se-
mantic analysis, and more specifically for metaphors
detection and interpretation in domain-specific cor-
pora. In (Roy and Ferrari, 2008), the same model as
well as graphical tools are used to provide a user with
a personal access to textual information. The appli-
cations were in both cases designed for experts rather
than casual Web users. Different other works, like
(Valette and Slodzian, 2008; Kanellos and Mauceri,
2008), also proposed the use of “Interpretative Se-
mantics” for information retrieval application.
In the SemComp project, we propose a modified,
simplified version of this model for its use by casual
Web users. Relaxing some constraints in the struc-
ture, we consider there are strong similarities with
some Web 2.0 tools such as folksonomies, tags and
social bookmarking.
164
Labadié A., Ferrari S. and Roy T..
Personalized Semantic Resources - The SemComp Project Presentation and Preliminary Works.
DOI: 10.5220/0004539501640169
In Proceedings of the International Conference on Knowledge Engineering and Ontology Development (KEOD-2013), pages 164-169
ISBN: 978-989-8565-81-5
Copyright
c
2013 SCITEPRESS (Science and Technology Publications, Lda.)
2.2 Linguistic Model
S
`
emes. The central notion of the “Componential Se-
mantics” approach is to describe lexical units with se-
mantic features. These features, called “s
`
emes”, are
theoretically supposed to describe the possible inter-
pretations of a lexical unit.
Some s
`
emes are generic ones, representing parts
of meaning shared by the lexical unit and the ones
with close meaning. For instance, chair and sofa will
share s
`
emes because they both can mean “a sort of
seat”. One may consider s
`
emes like /physical object/,
/crafted object/, /piece of furniture/, /seat/ to explain
these meanings.
Some s
`
emes are specific ones, used to distinguish
between close meanings. Chair could be differen-
tiated from sofa using a s
`
eme such as /seat without
arms/, while chair and sofa could themselves be dif-
ferentiated from stool using a s
`
eme such as /seat with-
out a back/.
Previous works such as in (Beust et al., 2003; Roy
and Ferrari, 2008) proposed to represent the specific
s
`
emes using attributes and values to code their dif-
ferential role: /seat’s back: yes/ for chair and sofa,
/seat’s back: no/ for stool; /seat’s arms: yes/ for
sofa, /seat’s arms: no/ for chair and stool. These
constraints are strong ones, requiring a high level of
expertise for describing a whole domain.
S
`
emes in SemComp: Free Semantic Features. In
the SemComp project, we propose to let the user de-
scribe the semantic features freely. We make the
hypothesis casual users will mostly describe generic
s
`
emes in order to retrieve documents related to their
hobbies and interests. The application in view (see
2.3) will allow a user to build queries using s
`
emes
rather than lexical (or graphical) units only. For casual
users, we make the assumption s
`
emes will be used as
tags for text classification: rather than tagging texts,
users will tag words themselves.
Though, it will still be possible for an expert user
to propose sets of differential s
`
emes if necessary. In
the application, this will appear only as an advanced
functionality. With the previous examples, a user will
be able to build a differential set including the s
`
emes
/seat without a back/ and /seat with a back/, in or-
der to enhance the results of a query using the s
`
emes
/seat/ and /seat without a back/: the application will
automatically consider the s
`
eme /seat with a back/ as
irrelevant, though describing units with close mean-
ings.
Isotopies. The second central notion of the “Inter-
pretative Semantics” is the one of “isotopy” for con-
textual interpretation. An isotopy is the redundancy
of a s
`
eme in a textual zone. It is closely related to the
notion of topic. When using numerous s
`
emes to de-
scribe a domain, as for the experiment on metaphors
in (Beust et al., 2003), it has been proved isotopies
help activating or deactivating s
`
emes: in the context
of economic news, meteorological terms such as ther-
mometer and barometer can be interpreted as measur-
ing or prevision tools for stock markets, deactivating
s
`
emes specific to the meteorology (the units they use,
the phenomenon they measure, etc.). French linguists
propose the terms of actualisation (the action of acti-
vating a s
`
eme in a specific context) and virtualisation
(the action of deactivating a s
`
eme in a specific con-
text) to describe these interpretation processes.
2.3 Application in View
Based on this simplified linguistic model, we plan
to develop an application allowing users to create,
manipulate and use their personal points of view on
different domains for consulting documents. Differ-
ent psychological and NLP experimentations are ex-
pected in the SemComp project. The use of Per-
sonalized Semantic Resources (PSR in the following)
will be tested in the following applicative contexts :
(1) students consulting teachers on-line courses ; (2)
students consulting the Web for a class project ; (3)
tourists looking for cultural activities in Normandy.
Experiments (1) and (2) are scheduled in a short
term (1yr), while (3), requiring inclusion of other
NLP tools, is scheduled in a longer term (2yrs). In (1)
and (2), we expect to observe how students acquiring
new knowledge on a domain modify their PSR. In (3),
we intend to experiment on casual users, and include
sharing of PSR to test if this model can lead to real
Web applications.
In (1), the collection of documents is closed, lim-
ited to the documents provided by a teacher in the
scope of a course. In (2), the collection must first be
retrieved from the Web, using a search engine, which
require to translate the user request from s
`
emes to
written forms. Next section presents the first devel-
opments centered on the PSR, as well as more details
on the application functionalities which will be used
in the first experiments (1) and (2).
3 PERSONALIZED SEMANTIC
RESOURCES (PSR)
Based on the model previously presented, we are cur-
rently developing a set of resources and an application
linked to it aiming to achieve three goals:
PersonalizedSemanticResources-TheSemCompProjectPresentationandPreliminaryWorks
165
1 to propose an exhaustive, yet flexible, representa-
tion for PSR;
2 to allow model implementation in an applicative
context ;
3 to track how users build their semantic resources,
for experimentation purposes.
3.1 UML Model
The simplified class diagram (figure 1) can easily be
divided into three parts corresponding to our three
goals : the semantic resources (PSR), their connex-
ions with documents (applicative context) and the in-
teractions (tracked for experimentations).
The center of the diagram represents the seman-
tic resources a user can build. The heart of the model
is the lexical entry which is linked to multiple writ-
ten forms (at least one) which compose it. One of
the written forms is the lexical entry representative.
A lexical entry can be linked to one or more fea-
tures groups, each one representing a different mean-
ing. These groups are composed of semantic features
(S
`
emes). The semantic features are not limited to tex-
tual representations and can be images, sounds, etc.
Features groups can be of different types, the most
basic one is a meaning of a lexical entry, but the
model allows to create other types of group to iden-
tify specific properties of some semantic features (for
instance a “differential set of s
`
emes as illustrated in
the previous section). Following the same idea fea-
tures groups can be linked by pairs to represent lexical
or semantic relations (hyperonymy, synonymy, etc.).
Lastly, lexical entries, semantic features and features
groups are linked to a viewpoint of a specific user on
a domain.
The top (and right-top) of the class diagram is de-
voted to link the PSR model to documents. A query
can be composed by selecting different semantic fea-
tures. Features that can be activated or deactivated by
the user when she meets an occurrence of a written
form in a document. The main purpose here is to pro-
pose a model that allows a user to create queries us-
ing her own personalized semantic representation of
a domain (using semantic features to create and ex-
pand queries). But our model also allows to reorder
the returned collection of documents. By activating
and deactivating semantic features linked to words oc-
currences in the documents, the user will refine her
search.
Lastly, the bottom of the class diagram allows us
to track each interaction the user has with her PSR.
This will be used in psycho-linguistic experiments to
uncover the building process of a semantic representa-
tion of a domain and its uses. Of course, the tracking
process will only take place during experiments and
users will be aware of it.
3.2 Application Expected
Functionalities
For Users. In addition to a user friendly manipula-
tion of our model, our application will allow the user
to use her own PSR to improve her Web or closed col-
lections of documents research. The user will be pre-
sented with the semantic features of her own PSR and
will compose queries with them. By expanding these
S
`
eme formulated queries”, the application should re-
trieve documents closer to the user point of view on
a domain than a more classic approach. We also in-
tend to allow users to share parts of their PSR and to
expand their owns with parts shared by other users.
For Experimenters. As described in section 3 our
model allows to track the user interactions while she
is building and using her PSR. The final application
should show us the whole process of building a PSR
by the user, in interaction with document browsing. It
should also allow us to navigate between every step of
this process: when tracking is active, the application
keeps everything in the RDF repository (see following
section for technical choices), archiving any modified
or deleted instance. We also intend to use graph sim-
ilarity measures to compare different users, domains,
etc.
3.3 Technological Choices,
Implementation and Preliminary
Results
We chose to use Web semantic tools and standards
to implement our PSR. We turned to RDF
1
, which
graph-like representation is closer to our model than
classical relational ones. Figure 2 is an example of
some semantic data organized as a multi-user PSR. It
show us two users sharing one domain, with common
and distinct semantic features, lexical entries and such
(these are simplified data for test and example pur-
poses). Purple triangles are classes from our model
and green labels are their implementation. In figure 2,
the aeronautics domain is linked to two points of view
from two different users (Alexandre and St
´
ephane).
Each point of view has lexical entries, semantic fea-
tures (aka. s
`
emes), written forms, etc.. For instance,
Boeing is the written form graphie 00002 represent-
ing the lexical entry lexie 00002. One of its mean-
ings (a plane) is described by the features group
1
W3C RDF reference site: http://www.w3.org/RDF/
KEOD2013-InternationalConferenceonKnowledgeEngineeringandOntologyDevelopment
166
Query
Archive : Boolean
Semantic Feature
Archive : Boolean
Group Type
Features Group
Archive : Boolean
User Domain
Archive : Boolean
Session
Trace : Boolean
Interaction
Creation date
Lexical Entry
Archive : Boolean
Written form
Session Boundary
Start : Boolean
CollectionDocument
Features Group Interaction
Archive : Boolean
Feature Interaction
Archive : Boolean
Query Interaction
Archive : Boolean
Lexical Entry Interaction
Archive : Boolean
Actualization Interaction
Archive : Boolean
Viewpoint
Shared : Boolean
Occurence
Position : Integer
<To contain
11..N
<To be composed of
1N
<To be part of
N
1..N
To characterize>
1
N
To use>
N
1..N
To create>
1
N
To belong to>
N
N
To belong to>
N
N
To belong to>
N
N
To interact with>
1..2
1..2
<To interact with
1..2
1..2
<To interact with
1..2
1..2
<To interact with
1..2
1..2
<To interact with
1..2
1..2
Domain Interaction
Archive : Boolean
<To interact with
1..2
1..2
To describe>
N
N
Relation Interaction
Archive : Boolean
Groups Relation
Archive : Boolean
Type : String
<To interact with
1..2
1..2
<To compose
N 1..N
<To represent
0..1 1
Actualization
Archive : Boolean
To belong to>
N
1
Figure 1: PSR UML class diagram.
groupe semes 00004 entitled /L’avion Boeing/ (the
Boeing plane, it is a common metonymy in French
to use the name of the company for one of its planes).
This features group contains the two s
`
emes /vole/ (can
fly) and /transporte/ (can carry). The meaning for
Boeing as a company is not described in this graph.
Some of these data are common to both points of
view, others are specific to one only. These test data
are the advanced form of the PSR, where users can
share parts of their PSR. The first set of experiments
2
will not allow to share their PSR, and will only al-
low them to access their personal resources and some
common resources extracted from a dictionary. Re-
source sharing will come in a second experimental
phase
3
.
The prototype is currently implemented both as a
JAVA Web service and a front end Web application.
It is based on the open source RDF base OpenRDF
Sesame
4
. In due time, it should be integrated in a big-
ger Web framework including other NLP applications
interacting with each other. First, we intend to pro-
pose a simple Web interface to experiment directly
on the model. In a second time, as other NLP ap-
2
Scheduled in September 2013 in Caen University,
France, for online courses access and in Louis Liard high
school, Falaise, France, for Web access.
3
Scheduled in March 2014 for cultural tourism in the
Basse Normandie Region
4
OpenRDF Web site: http://www.openrdf.org/
plications will be integrated into our framework, we
should develop a more complete Web gateway offer-
ing extended services.
At the time we are writing these lines, the
model has been implemented and deployed in an
OpenRDF Sesame repository, the JAVA Web ser-
vice and the Web interface are currently being devel-
oped. The application prototype can be tested here:
https://semcomp.info.unicaen.fr/.
4 FUTURE WORKS
In this paper, we presented some aspects of the ongo-
ing SemComp project. It aims at observing how users
build their lexical and semantic knowledge while in-
teracting with documents. For this purpose, we pro-
posed a simplified model for Personalized Semantic
Resources based on the linguistic approach to Com-
ponential Semantics. The main idea is to associate
lexical entries with features called s
`
emes to represent
parts of their meaning or interpretation. We presented
the current implementation of this Personalized Se-
mantic Resources in a Web-oriented application us-
ing one of the latest Web semantic tools at our dis-
posal (RDF triple stores). We are currently develop-
ing the Web client interface to provide users with two
main functionalities : defining their PSR, building re-
quests using s
`
emes to search information through doc-
PersonalizedSemanticResources-TheSemCompProjectPresentationandPreliminaryWorks
167
Figure 2: Part of the graph / PSR representation of the aeronautics domain for a pair of users.
uments.
In order to allow casual Web users to define their
PSR, the underlying linguistic notions are hidden in
this interface : the user is just asked to define her own
tags (the s
`
emes) and tag words she finds relevant for
her task the way she would tag relevant documents
in a Web2.0 application. The user can also build re-
quest using s
`
emes only to search a specific informa-
tion. A back-office module is dedicated to the trans-
lation of such requests into lists of written forms to
query a search engine. A second back-office module
validates and sorts the retrieved documents with re-
gards to the user’s PSR and the initial s
`
emes request.
We expect users will refine their PSR while discover-
ing information relevant for their task in the retrieved
documents, e.g. new lexical entries or new meaning
features. In the psycho-linguistic experiments, the in-
teractions with both the PSR and the documents will
be tracked in order to observe the knowledge acquisi-
tion process.
The first two experiments will involve students
searching the Web for a class project or accessing
their teacher’s online courses. In both cases, the users
are expected to acquire knowledge. A long-term ex-
periment is also expected. PSR would be integrated in
a larger “cultural tourism” NLP application for Web
users. It will help the user to find which cultural
events are happening during her stay in a specific lo-
cation. This latter experiment aims to test the model
in a real application context, and not only for psycho-
linguistic experiment. We intend to use the PSR to
help the user describing her interests and to enhance
the matching between the found events and her inter-
ests.
KEOD2013-InternationalConferenceonKnowledgeEngineeringandOntologyDevelopment
168
REFERENCES
Beust, P., Ferrari, S., and Perlerin, V. (2003). Nlp model
and tools for detecting and interpreting metaphors in
domain-specific corpora. In Archer, D., Rayson, P.,
Wilson, A., and McEnery, T., editors, Proceedings of
the Corpus Linguistics 2003 conference, volume 16 of
UCREL technical papers, pages 114–123, Lancaster,
U.K.
Greimas, A. J. (1966). S
´
emantique structurale : recherche
et m
´
ethode. Larousse.
Kanellos, I. and Mauceri, C. (2008). Une conscience in-
terpr
´
etative face
`
a un univers de textes. arguments
en faveur d’une analyse de donn
´
ees interpr
´
etative.
Syntaxe & s
´
emantique, (9). Textes, documents
num
´
eriques, corpus. Pour une science des textes in-
strument
´
ee. Etudes publi
´
ees sous la direction de Math-
ieu Valette.
Pottier, B. (1992). S
´
emantique g
´
en
´
erale. Presses Universi-
taires de France.
Rastier, F. (1987). S
´
emantique interpr
´
etative. Presses Uni-
versitaires de France.
Roy, T. and Ferrari, S. (2008). User preferences for ac-
cess to textual information: Model, tools and experi-
ments. In Wallace, M., Angelides, M., and Mylonas,
P., editors, Advances in Semantic Media Adaptation
and Personalization, pages 285–306. Springer.
Valette, M. and Slodzian, M. (2008). S
´
emantique des textes
et recherche d’information. Revue franc¸aise de lin-
guistique appliqu
´
ee, XIII(1):119–133.
PersonalizedSemanticResources-TheSemCompProjectPresentationandPreliminaryWorks
169