ONTOLOGY-BASED RECOMMENDER SYSTEM
OF ECONOMIC ARTICLES
David Werner, Christophe Cruz and Christophe Nicolle
LE2I Laboratory, UMR CNRS 5158, BP 47870, 21078 Dijon Cedex, France
Keywords: Recommender Systems, Multi-ontologies, Information Extraction, OBIE, Ontology-based, Knowledge-
based.
Abstract: Decision makers need economical information to drive their decisions. The Company Actualis SARL is
specialized in the production and distribution of a press review about French regional economic actors. This
economic review represents for a client a prospecting tool on partners and competitors. To reduce the
overload of useless information, the company is moving towards a customized review for each customer.
Three issues appear to achieve this goal. First, how to identify the elements in the text in order to extract
objects that match with the recommendation's criteria presented? Second, How to define the structure of
these objects, relationships and articles in order to provide a source of knowledge usable by the extraction
process to produce new knowledge from articles? The latter issue is the feedback on customer experience to
identify the quality of distributed information in real-time and to improve the relevance of the
recommendations. This paper presents a new type of recommendation based on the semantic description of
both articles and user profile.
1 INTRODUCTION
The decision-making process in the economic field
requires the centralization and the consummation of
a large amount of information. This aims at keeping
abreast with current market trends. The Company
Actualis SARL is specialized in the production and
distribution of press reviews about French regional
economic actors. This economic review represents
for a client a prospecting tool on partners and
competitors. The reviews sent are the same for each
customer, which does not necessarily correspond to
its needs. From the result an opinion surveys on
clients and the knowledge on company's business
from the company Actualis SARL, criteria for
relevant review customization were identified. These
criteria are economic events, economic sectors,
major transverse projects, temporal and localization
data about each element underlined. To reduce the
overload of useless information, the company is
moving towards a customized review for each
customer. To achieve these goals, a recommender
system is being developed (e.g. fig 1.). This system
is regularly supplied with articles by the company
librarians. It produces a magazine per customer
composed of a subset of daily produced articles
according to the client’s profile. This system is
composed by a couple of layers. The first, the
Intelligence Layer aim to manage information
extraction tasks, it contains several mechanisms. The
second, Semantic Layer is composed of ontologies.
This Layer allows to manage general or field
specialized knowledge, to model profiles and articles
representation. However, three issues appear to
achieve this goal. The first challenge is to identify
the elements in the text in order to extract objects
that match with the recommendation's criteria
presented previously. Additionally, the links
between these objects have to be also extracted from
the articles, because they represent valuable
information. The second issue lies in the definition
of the structure of these objects, relationships and
articles in order to provide a source of knowledge
usable by the extraction process to produce new
knowledge from articles. This recommended system
permits the economic watch on potential clients and
eventually to send appropriate alerts to customers
about important and new information or knowldege.
The latter issue is the feedback on customer
experience to identify the quality of distributed
information in real-time and to improve the
relevance of the recommendations. This is
materialized by the evolution of the customer profile
725
Werner D., Cruz C. and Nicolle C..
ONTOLOGY-BASED RECOMMENDER SYSTEM OF ECONOMIC ARTICLES.
DOI: 10.5220/0003933307250728
In Proceedings of the 8th International Conference on Web Information Systems and Technologies (WEBIST-2012), pages 725-728
ISBN: 978-989-8565-08-2
Copyright
c
2012 SCITEPRESS (Science and Technology Publications, Lda.)
in real time through the review reading, article by
article.
The ontology plays a center role because it is
used to model the knowledge of the domain, and to
drive the extraction process and the recommendation
process.
The following section presents an overview on
knowledge extraction, which is related to the first
issue. Section 3 presents the recommender systems
which allows us to identify the relevant architecture
of the recommended system for the articles and
knowledge recommendation. Section 4 focuses on
the architecture of the recommendation system
which is set up especially on the four ontologies.
These ontologies will be used by all business
processes, which are the knowledge extraction
process, the article indexing process, the article
annotation process, the recommendation process.
2 KNOWLEDGE EXTRACTION
In this section, we present an overview of
information extraction systems based on ontologies,
and we focus on their main functions and
architecture. The information Extraction (IE) is
intended to extract specific data elements as entities,
relationships or events from a set of textual records.
The approach generally used is based on the use of
rules, patterns applied to texts by transducers or
finite state automata. The presentation of
information retrieval architecture is available in
(Daya C. Wimalasuriya, 2009). KIM (B. Popov and
all, 2003) can be seen as an OBIE (Ontology-Based
Information Extraction) system, it used the KIMO
(www.ontotext.com/) ontology the predecessor of
PROTON (proton.semanticweb.org) to manage the
necessary knowledge for the annotation task. During
text analyses, patterns and gazetteers based approach
is used to extract information, like organizations
names, persons or dates. New information extracted,
detected by patterns are used to populate the
ontology. Information extracted during previous text
analyses are used to perform future text analyses.
Texts and ontologies can be both seen as sources of
knowledge, and they are complementary, because
each one providing information to the other.
On the one hand, the text is a source of
information to ontologies. The first case consists in
reaching the automation of the ontology creation
regarding a domain defined by a set of documents.
Therefore, treatment aims to detect in the text, key
concepts and their properties and relationships. This
task is called ontology learning. The second case
Figure 1: The architecture of our recommender system.
focuses on the automation of the individual
detection, properties and relationships in the
documents in order to supply the knowledge base.
This task is called ontology population.
On the other hand, where ontologies are used to
provide information to the texts, the idea is to
describe the knowledge contained in texts with
known information contained the knowledge bases.
Groups of words in texts can be labeled according to
the ontology to highlight instances of concepts or
relationships. Therefore, ontologies seem to be the
fundamental tool to meet our needs of indexing and
annotating of the economic articles. In addition,
OBIE is a tool able to index, which is a second
important tool for our purpose. Moreover, by its
nature, the ontology contains the relevant criteria of
recommendation. Consequently, after the indexing
and annotating tasks executed over articles, the
ontology provides a good support for our
recommender system. The following section
presents a brief state of the art of recommender
systems focuses on the needs of our recommender
system.
3 RECOMMENDER SYSTEMS
Recommender systems (RS) are tools and pieces of
software, which aims are to provide suggested items
to users (Burk, R., 2007). The simplest task of RS is
the ranking of items (e.g. Books, CDs, Travels, and
so on); it tries to predict the importance each item
for the user to ordering each other. The computing
task can be based on explicit user's preferences, (e.g.
the user give a score to product), or implicit,
preferences are inferred by the system, depending on
user comportment (e.g. choices of browsing on an
online shopping site).
The increasing need of right information in a
WEBIST2012-8thInternationalConferenceonWebInformationSystemsandTechnologies
726
right time requires the development of recommender
systems to drive users. An overview and taxonomy
of the different kinds of systems can be found in
(Burk, R., 2007), both main approaches are
distinguished, content-based and collaborating
filtering based.
Content-based approach: This kind of RS tries
to recommend similar items of already known
relevant items, or compare items and users profiles.
All items and profiles are characterized by different
kind of attributes. I.e. if a user evaluates positively a
thriller book, then the system trends to recommend
other thriller books.
Collaborating filtering based approach: This
Kind of RS tries to predict automatically the interest
of a user with the help of taste information from
many other users. An implantation of this approach
(Schafer, Frankowski, Herlocker, Sen 2007) consist
in recommend items appreciated by users with close
profiles. This similarity of profiles is computed in
relation to previous item's grade by each profile. I.e.
if close users (users with profiles close to user
profile) mostly enjoy « le discours de la methode »,
then the system trends to recommend this book to
users.
The approach developed here for the
recommendation is based on the information, and
the knowledge extracted from each economical
article. Each article contains a set of information
pieces, and each of these pieces is used like an
attribute characterizing the articles. The set of
properties and the set of known possible values of
each allow the definition of a structured
representation of each article. In most content-based
filtering systems, items are described by textual
properties, by words. Some properties of natural
language create trouble in the use of the word in
matching task, like polysemy (matching profile with
no really relevant items) or synonymy (no matching
profile with yet relevant items). In order to remedy,
semantic-based techniques were developed.
Keyword vectors: Most of RS use simple
methods like key words or vector space models
(VSM), with or without moderation process like TF-
IDF (Salton, 1989). VSM is spatial representations
of documents. Each text is represented by an n-
dimensioned vector; n is the number of words
selected by generally statistical processes to
representing the document. The similarity between
tow vectors (item’s vector/item’s vector or item’s
vector/profile’s vector) can be measured with the
computation of cosine similarity.
Semantics: There are lots of strategies to
introduce semantics in the task of recommendation.
These strategies are generally based on ontologies.
Different kinds of ontologies are used, like Wordnet
(A lots of different words can be used to referring
the same concept in natural language) or like domain
specifics ontologies (to create a structured and
controlled representation of domain, which can be
used to describe items).
Vector realizes the qualification of articles and
profiles. A set of criteria corresponding to
information extracted from articles forms the vector
(i.e. locations, economic events, transversal projects
and economic sectors).
Figure 2: The architecture overview.
Moreover, the extraction of relevant knowledge
extends the use of the knowledge base where the
knowledge drives the recommendation. Thus, not
only articles are recommended but also the
knowledge contains in a set of articles. From this
knowledge, articles can be combined to present a
global view of a specific domain.
4 THE ECONOMIC
RECOMMENDER SYSTEM
The section deals with the component of the
architecture and the aims of these components.
4.1 The Architecture
Figure 2 depicts the architecture of processes of our
recommender system. Annotation and indexing
processes bind articles and knowledge using the
OBIE system. These processes make possible the
creation and the management of the semantic
descriptions of articles. Next, profiling process
makes the semantic representation of user’s profiles
with vectors of concepts. These vectors link users
and knowledge base.
The user profile can be determined in various
ways, including active and passive feedback,
ONTOLOGY-BASEDRECOMMENDERSYSTEMOFECONOMICARTICLES
727
allowing us to know what information is relevant to
each user.
The last process, generate a customized review
for each user made of articles according to his
profile.
Profiles and knowledge extracted from articles
are described using an ontology. Using this ontology
articles can be recommended.
4.2 The Knowledge Representation
Description logics (DLs) are a family of logics that
are decidable fragments of first-order logic with
attractive and well-understood computational
properties. DLs have been in use for over two
decades to formalize knowledge and notably quality
ontologies. Ontology languages like OWL DL and
OWL Lite semantics are based on DLs (Horrocks,
2009). For example, OWL DL corresponds to the
SHOIN (D) description logic, while OWL 2
corresponds to the SROIQ(D) logic (Hitzler and al,
2009). Our work deals with OWL DL ontologies so
we chose the SHOIN(D) expressivity level to
formalize ontology inconsistency. In DL, a
distinction is drawn between the so-called TBox
(terminological box) and the ABox (assertional box)
(Gruber, 1993). In general, the TBox contains
sentences describing concept hierarchies (i.e.,
relations between concepts) while the ABox
contains ground sentences stating where in the
hierarchy individuals belong (i.e., relations between
individuals and concepts). In OWL DL ontologies,
TBox corresponds to the intension and ABox to the
extension. Ontologies are knowledge representation,
a description understandable bye the machine. The
indexing task based on an ontology allow the
definition of the knowledge structure which limits
the ambiguities inherent in the use of simple words.
Ontology is a representation of a context, which
permits a formal interpretation of the information
contained herein. Our knowledge base consists of
four ontologies: The upper-level ontology, the
domain ontology, the lexical resource ontology and
the corpus ontology. The first two ontologies intend
to distinguish the knowledge specific to an
application domain (domain ontology) from those
which transcend all areas (upper-level ontology).
The Lexical resource ontology is inspired by
PROTONS. It is used in the management of objects
required by NLP tools. These tools are used to
perform the information extraction task. The corpus
ontology manages the items to be indexed. In our
case these are articles.
This model aims to make the system less
dependent on a given area. It allows us to change the
domain ontology in order to move from one area to
another. The ability to switch the domain ontology
with another one makes our system flexible.
5 CONCLUSIONS
This work presents a new approach for
recommender systems based on a set of four
ontologies. This generic proposal has been applied
in the field of economic reviews. The system built
aims at providing to company's customers a set of
economic articles, which contain information
relevant to their business needs.
In the work presented, the bias is to propose the
recommendations based on the knowledge included
in the articles. Information extraction systems were
presented including those based on ontologies. They
allow both to evolve the index (populating the
knowledge base) and to index articles.
ACKNOWLEDGEMENTS
This project if founded by the company Actualis
SARL and the financing CIFRE research grant from
the French agency ANRT.
REFERENCES
Burke, R., 2007, Hybrid web recommender systems, The
AdaptiveWeb, 377-408.
Grüber, T. R., 1993. A translation approach to portable
ontology specification. Knowledge acquisition
5(2):199-220.
Hitzler, P., Krötzsch, M., Rudolph, S., 2009. Foundations
of Semantic Web Technologies. CRCPress. ISBN
142009050X.
Horrocks, I., Patel-Schneider, P., F., 2009. Reducing OWL
Entailment to Description Logic Satisfiability
Popov, B., Kiryakov, A., Kirilov, A., Manov, D.,
Ognyanoff, D. Goranov, M., 2003, KIM – semantic
annotation platform. In Proceedings of the 2nd
International Semantic Web Conference, (Springer-
Verlag, Berlin, 2003).
Salton, G., 1989, Automatic Text Processing, Addison-
Wesley.
Schafer, J., Frankowski, D., Herlocker, J., & Sen, S. 2007,
Collaborative filtering recommender systems. The
AdaptiveWeb, 291-324.
Wimalasuriya, D. C., Dou. D., 2010, Ontology-based
information extraction: An introduction and a surevey
of current approaches. In Journal of Information
Science, vol 36, no.3 pp. 306-323.
WEBIST2012-8thInternationalConferenceonWebInformationSystemsandTechnologies
728