A Method to Manage the Difference of Precision between Profiles and

Items for Recommender System

Applied Upon a News Recommender System using SVM Approach

David Werner and Christophe Cruz

LE2I Laboratory, UMR CNRS 6306, Université de Bourgogne, BP 47870, 21078 Dijon Cedex, France

Keywords: Recommender System, News, Domain Ontology, Ontologies, Knowledge Base, Indexing,

Recommendation, Vector Space Model.

Abstract: Contractors, commercial and business decision-makers need economical information to drive their

decisions. The production and distribution of a press review about French regional economic actors

represents a prospecting tool on partners and competitors for the businessman. Our goal is to propose a

customized review for each user, thus reducing the overload of useless information. Some systems for

recommending news items already exist. The usefulness of external knowledge to improve the process has

already been explained in information retrieval. The system’s knowledge base includes the domain

knowledge used during the recommendation process. Our recommender system architecture is standard, but

during the indexing task, the representations of content of each article and interests of users’ profiles created

are based on this domain knowledge. Articles and Profiles are semantically defined in the Knowledge base

via concepts, instances and relations. This paper deals with the similarity measure, a critical subtask in

recommendation systems. The Vector Space Model is a well-known model used for relevance ranking. The

problematic exposed here is the utilization of the standard VSM method with our indexing method.

1 INTRODUCTION

The decision-making process in the economic field

requires the centralization and intake of a large

amount of information. The aim is to keep abreast

with current market trends. Thus, contractors,

business men and sales persons need to continuously

be aware of the market conditions. This means to be

up-to-date regarding ongoing information and

projects undergoing development. With the help of

economic monitoring, prospects can be easily

identified, so as to establish new contracts. Our tool

is specialized in the production and distribution of

press reviews about French regional economic

actors.

The reviews sent are the same for each user, but

personalized according to a geographic area. All

articles sent do not necessarily correspond to a

person’s needs, and can be a waste of time. To

reduce the overload of useless information, we are

moving towards a customized review for each user.

Therefore, an opinion survey on magazine readers

that covers a broad array of subjects, including news

services, was undertaken. Criteria for a relevant

customization of the review were identified as a

result of this survey as well as expert domain

knowledge. These criteria are economic themes (i.e.

main economic events), economic sectors, major

transverse projects, temporal and localization data

about each element underlined. Therefore, the

complete production process of the review was

redesigned to produce and to automatically

distribute a customized review for each user.

Another drawback in the existing process is the

produced information storage. News articles are

stored as PDF file reviews (i.e. the same format that

is sent to customers, natural language), but this

unstructured format is hard to handle and reuse.

The aim of the architecture is to manage all news

produced, and provide the most relevant article for

each customer, using our domain knowledge. The

overload of news information is a particular case of

information overload, which is a well-known

problem, studied by Information Retrieval and

Recommender Systems research fields. News

recommender systems already exist (Middleton et al.

2004), (Getahun et al. 2009), (Liu et al. 2010),

465

Werner D. and Cruz C..

A Method to Manage the Difference of Precision between Proﬁles and Items for Recommender System - Applied Upon a News Recommender System

using SVM Approach.

DOI: 10.5220/0004373404650470

In Proceedings of the 9th International Conference on Web Information Systems and Technologies (WEBIST-2013), pages 465-470

ISBN: 978-989-8565-54-9

 2013 SCITEPRESS (Science and Technology Publications, Lda.)

(Tanev et al. 2008) SCENE (Li et al. 2011)

NewsJunkie (Gabrilovich et al. 2004) Athena

(Ijntema et al. 2010) GroupLens (Resnick et al.

1994) News Dude (Billsus, Pazzani. 1999) et

YourNews (Brusilosky et al. 2007). Some of these

systems use domain knowledge to improve the

recommendation task (Ijntema et al. 2010),

(Middleton et al. 2004).

To achieve this goal, a content-based

recommender system is being developed. A

recommender system is necessary for the item

ranking. And content-base is required to analyze the

content of each article to structure and preserve

information content. The results of the analysis

enable to link the domain knowledge to the articles.

Because the domain knowledge can be reused to

improve the recommendation task (Ijntema et al.

2010), (Middleton et al. 2004).

This paper is organized as follows: Section 2

presents a brief review of related work. In section 3,

we outline the proposed solution of a recommender

system. Section 4 presents the similarity measure

task, and our implementation of VMS. Section 5 is

the conclusion and future work.

2 RELATED WORK

Our work is related to several works in News

recommender systems, SCENE (Li et al. 2011)

NewsJunkie (Gabrilovich et al. 2004) Athena

(Ijntema et al. 2010) News Dude (Billsus, Pazzani.

1999) et YourNews (Brusilosky et al. 2007). The

survey of K. Nagewara Rao (Nageswara, Talwar,

2008) proposed a general comparison of the main

advantages and drawbacks of each kind of

Recommender System (e.g. content based or

collaborative filtering). The advantages of content-

based recommender systems for news

recommendation are also explained in (Liu et al.

2010) to improve the Google news platform. The

main drawback of collaborative filtering systems is

the recommendation of new items. Novelty is very

important in the particular case of news; each new

article must be quickly recommended. Waiting for

enough users to have read it before recommending is

a big waste of time. Furthermore, we need to be able

to recommend for very particular user profiles, as

some customer needs are unique.

There are many systems that work without

Knowledge (Liu et al. 2010), (Billsus, Pazzani.

1999), (Resnick et al. 1994). The advantages of

using exterior knowledge for enhancing the

recommendation were exposed by Ijntema (Ijntema

et al. 2010). He uses the name semantic-based

recommender system to distinguish standard

content-based systems from the systems using

external knowledge (e.g. domain ontologies or

lexical knowledge as WordNet (Fellbaum 1998)).

Lexical knowledge is used by (Getahun et al. 2009)

and domain knowledge by (Middleton et al. 2004).

Athena uses both (Ijntema et al. 2010). Ontologies

used by these systems already exist or are created by

hand, and maintained. Unlike previous systems, the

Knowledge base containing domain Knowledge is

used as an index for articles and profiles, as it is

explained in section 3. To compare profiles and

articles, classic VSM is not directly usable, so we

have adapted it, as presented in section 4.

3 IMPLEMENTATION

Our system is an ontology drive content based

recommender system. An ontology schema is

created and populated with the help of company

experts, in order to model the domain knowledge in

a knowledge base. In a classic content-based

recommender system, we distinguish two main

tasks. The first is indexing. The task is to create a

representation of the users’ needs, and item content.

The Knowledge base will be populated during this

task. The quality of content analysis is important for

the knowledge base population and for indexing, so

our system is semi-supervised by an expert. The

second task is comparison. This task is the

comparison with item representation so as to

measure the degree of relevance for each profile.

Items are ranked with the help of the similarity

measure, after being provided to the user. These

subjects are developed as follows.

3.1 The Knowledge Base

The knowledge base  used for this system is

composed of several ontologies (Werner et al. 2012).

An Upper level ontology is used to manage

information shared by all application areas (in the

case of an extension of the system to new fields of

application). High level concepts like location,

geospatial information, temporality, events, agents,

etc. Domain ontology is used to manage domain-

specific knowledge. Concepts of this ontology are

mainly specialization of concepts from the upper

level. Other ontologies are used to manage articles,

profiles, and lexical resources used by a gazetteer.

The lexical resource ontology is based on the

ontology PROTON used on the KIM platform

WEBIST2013-9thInternationalConferenceonWebInformationSystemsandTechnologies

466

(Popov et al. 2003). In this paper the knowledge

base model defined by Ehrig et al. (Ehrig et al.

2004) and based on the Karlsruhe Ontology

(SOURCE) Model is used.

Definition 1. Ontology:

Ο,,



,



, , 



,



,



,





Wherein,,,are disjoint sets of concepts,

data types, relations and

attributes,



,



,



,



are hierarchy of classes,

data types, relations and attributes, and 



,



are

function that provide the signature for each 



: →

   attribute and 



: → relation.

Definition 2. Knowledge Base:





,



,



,



,,,



,



,



,





Wherein 



,



,



,



,, are disjoint sets

of concepts, data types, relation attributes, instances

and data values.



is the classes instantiation

function 



:



→2



. 



is the data type

instantiation function 



:



→2



. 



is the relation

instantiation function 



:



→2



. 



is the

attribute instantiation function 



:



→2



3.2 Indexing

To archive the recommendation of articles to

customers, the system needs are a representation of

the content of each article, and representation of the

needs of each customer. The index used in our

system is the same for articles and profiles. The

knowledge base used has an index. Articles and

profiles are represented by instances in our

knowledge base. Some relations in the system

ontologies are used to model of article content, and

users’ interests.

3.2.1 Article Indexing

The ambition is to create a machine understandable

representation of the content of each article, so as to

compare with profiles. The unstructured information

contained in articles is analyzed. Two kinds of

information can be distinguished: explicit pieces of

information (e.g. places, persons, organizations and

so on) and implicit pieces of information (e.g. the

theme of each article). In the system, the theme is

one criterion, corresponding to the main event

related by the article. Company experts have

predefined a hierarchical list of themes wherein each

article must be classified. The tasks of information

extraction, annotation, indexing are done with the

help of the GATE platform (Cunningham 2002). A

web interface was developed. It enables the writers

to create articles. Results of the analysis are

presented in it. They can be validated/corrected by

the writer.

Analysis

Some post processes are applied into articles, such

as tokenizers, sentence splitters, POS taggers,

gazetteers (which use the knowledge base lexical

resources as a dictionary) before JAPE patterns

matching engine. In the first prototype, we used

handmade lexico-semantic patterns. The aim is to

extract important entities (explicit pieces of

information like Persons, Organizations, places and

dates). Results of analysis are hand-checked,

corrected and validated. Implicit pieces of

information are specifically handmade.

Population

For each article analyzed, an instance of the concept

article is created in the knowledge base, so as to

represent the article. Automatic analysis, correction

done by hand and specifications of not automatically

funded criteria are used to characterize the content

of each article. Relations are created in the

knowledge base between the article’s instance, and

criterion’ instances (result of the analysis). Instances

of criteria and relations with the instances of articles

permit to index the article and create a semantic

representation understandable by the machine.

3.2.2 Profiles Indexing

In the company, sellers are in charge of

understanding the needs of each customer. Several

phone calls are necessary so as to acquire future

customers. During the phone call the seller proposes

a free trial period. This helps to create a first

handmade profile for each customer by an expert,

and avoids the problem of a cold start, common to

all content-based recommender systems.

The profile indexing process is the same as the

articles. A profile instance is created in the

knowledge base. Relations are created between the

profile instance and criteria instances. A web

interface was developed to enable sellers to define /

change the user profile. The choices are reflected in

the Knowledge base that permits to index the profile,

and create a semantic representation understandable

by the machine.

3.3 Recommendation

The recommendation task is mainly based on the

AMethodtoManagetheDifferenceofPrecisionbetweenProfilesandItemsforRecommenderSystem-AppliedUpona

NewsRecommenderSystemusingSVMApproach

467

comparison between the profile and available items.

The knowledge base is used by the system like an

index, and profiles and articles are presented by a set

of instances and relations. For each article validated

by writers, the full content is inserted in the database

and a representation of the article is created in the

knowledge base. An instance of the concept article

and an instance of each relation “isAbout” is created

to link the article with its criteria. For each profile

made by the sellers, the representation is created in

the knowledge base. An instance of the concept

profile is created, and each “isInterestedIn” relation

between the profile instance and its criteria are

instanced. We present the comparison method used

in the following section.

4 COMPARISON

The comparison task enables to evaluate the

pertinence of an article for a profile, via a similarity

measure between them.

Definition 3 similarity: SIM



x, y



:II→0,1 is a

function to measure the degree of similarity between

x and . The similarity function can satisfy some

properties:

∀x, y ∈ ISIM



x, y



0Positiveness

∀x, y ∈ ISIM



x, x



0 Reflexivity

The last one is the symmetry,∀x, y ∈

ISIM



x, y



SIM



y, x



but in our context we want

an asymmetric function, because we consider that

comparing profiles and articles is not the same as

comparing two articles.

4.1 VSM

An approach based on the vector space model

(Salton 1970) was used in the prototype. Articles

and profiles are represented by vectors on a space

wherein each dimension is a potential instance of

criteria. Several methods can be used to compare

vectors; the most common is the cosine similarity.

SIM







,





  cos  





.













(1)

An article can be defined like a vector of

instances of Entities and criteria. For the

recommendation task in the prototype only instances

of criteria are used. 



,



,…





Wherein, an article is represented by a vector 

composed by a set of instances



. 



∈



Instances 



are instances of concepts belonging

to the set of concepts C



defined by indexing criteria.

The set 



contains all instances of all concepts in the

setC



. 



⊆

The set 



is a subset of the set of all instances 

of the Knowledge Base.

A profile can be defined as a vector of instances

of criteria. 



,



,…





Where, a profile is represented by a vector

composed of a set of instances



In our implementation, one vector for each

criterion is used. This enables to weigh or use

different similarity measures (e.g. cosine, jacquard,

Euclidian) for each criterion. For example, location,

theme and sectors are much more important than

project and organization and so highly weighed.

SIMF







,







∑





SIM













,









∑





(2)

SIM













,









Is the similarity between profile







and article 







and 



the coefficient for a specific

criterion. ∀

,

∈















,

,

,

,…

,

 And











,

,

,

,…

,

. One or more concepts are

defined for each criterion.

Methods from the information retrieval fields can

be used to enhance the recommendation. One of the

first systems using external knowledge to improve

the understanding of user needs is (Voorhees 1994).

The Voorhees approach used WordNet (Fellbaum

1998) to provide a query expansion. We can

translate this kind of method to recommender

systems, instead of query expansion; we can name

this method ‘profile expansion’. Middleton

(Middleton et al. 2004) uses this method without

naming it. Ijntema (Ijntema et al. 2010) also uses it,

but unlike Middleton, he uses more powerful

ontology relations (not just is_a) to expand the user

profile. In our system, the profile expansion takes

the form of adding instances, e.g. if the user’ U

profile shows an interest for company Co and in the

knowledge base a symmetric relation like

is_aSubsidiaryOf is instanced with another company

Co’, it is possible to add the other company to his

profile. The Expanded VSM is developed in the

following section.

4.2 Expanded Vectors

The 4.1 section presents an implementation of the

similarity measure between two instances, by the

creation of vectors of related instances. It was

explained by Voorhees (Voorhees 1994) in the VSM

that all dimensions are orthogonal and so, all

elements of each vector are considered as

independent. That is not really the case for the

lexical used by Voorhees, and instances used in our

WEBIST2013-9thInternationalConferenceonWebInformationSystemsandTechnologies

468

system. In Voorhees’s method, the metronomic and

synonymic relations defined in WorldNet are used to

add related lexical to the vector. In our case,

relations between individuals exist, because we used

a Knowledge base to manage the domain

knowledge, and our criteria are hierarchically

defined in it.

Meronomic relations exist between instances of

Location. With Voorhees’s method, it seems logical

to expand the profile (the query for Voorhees) with

all sub instances enclosed by the most general

instance. For example, if the profile is interested in

Bourgogne (Region, biggest Administrative

division), it seems logical to add Cote d’Or and

Yonne (Departement, sub administration division of

Region) which are two departments within the

Bourgogne region, and Dijon, Beaune, Chenove,

Auxerre, which are cities within these departments.

But, in the real case with this method, it is necessary

to add four departments and its 2047 cities to the

vector. The similarity between a profile interested in

Bourgogne and an article about Dijon will be very

low with this method.

So our method is to expand profile and articles,

not only the profile and limiting the size of vector,

instances added are including instances, not included

(by meronomic relations). So our method is

analogue to graph-based methods which search the

common ancestor. For example, if the profile is

interested by Dijon, Cote d’Or and Bourgogne, these

are added to the vector. But if the profile is

interested just by Bourgogne, nothing is added.

With this method the first drawback is solved,

synonymic and meronomic relations between

instances are managed, but the second still is not.

Our similarity function is symmetric, but we want to

compare a profile and an article to different things.

The precision of an article must not have the same

consequences as the precision of a profile. This

problem is the subject of the following section.

4.3 Expanded Vectors

The previous section looked at how to take into

account the relations between instances in the VSM.

This section is about the managing of the difference

of precision between profiles and articles. To solve

this problem we used an intermediate vector for each

criterion, a sub vector 







composed of common

instances of the article 







and profile 







vectors for

the criteria.

Definition 4. Precision: In the hierarchy of concepts,

more generals concepts enclosed more specifics one.

In the hierarchy of instances it is the same. Instances

Figure 1: Examples of profiles and articles for one

criterion.

from the top of the hierarchy are less specific than

instances from the bottom.

If the Article is about an instance form the

bottom and the Profile is interested in top instance

(of the same branch), the similarity must be higher

than if the Article is about a top instance and the

Profile is inserted in an instance from the bottom of

the hierarchy because there is a loss of precision if

the profile is more specific than the article, so the

article is less relevant. 





,



∩

,







is the sub set of common elements of the set

of instances related to the profile 

,



and the article



,



. ∀

,

∈S













,

,

,

,…

,



The vector 







is composed by elements of the set





.













,











,















,











,















,











,



,

(3)

It is possible to weigh differently the precision of

the profile and the precision of the article with this

method. In our implementation we used ′

,

1

and ′

,

4, because we consider that the

difference of precision of the profile mustn’t

influence the final note over 20%. However, it is

possible to change the value, and it is also possible

to use different values according to the criterion.

SIMF







,







∑



















,









∑





(4)

The final similaritySIMF



,



value is the sum

of the similarity measure for each criterion. This

measure is used for the ranking of articles proposed

to the user according to his profile.

With the method the similarity value for the case

A.2 (figure 1.) is higher than the case A.1, because

in the case A.2, a and p have more common

ancestors than in the case A.2. Moreover, the cases

B.1 and B.2 figure the problem of precision. With

our asymmetric method the value of similarity

AMethodtoManagetheDifferenceofPrecisionbetweenProfilesandItemsforRecommenderSystem-AppliedUpona

NewsRecommenderSystemusingSVMApproach

469

between a and p in the case B.1 is higher than in the

case B.2 because the user needs are specific and the

article information (about this criterion) very general

relative to the user needs. So the article is less

relevant for the user. The following section presents

the conclusion and future work.

5 CONCLUSIONS

In this paper we have presented the adaptation of a

standard VSM recommender system to our specific

method of indexing (e.g. articles and profiles are

semantically defined in the knowledge base via

relations with the domain knowledge already

defined in it). We first presented the context, our

goal, and the existing approaches, then we explained

the architecture in detail. Finally we explained the

specific task of comparison that we adapted to our

case.

ACKNOWLEDGEMENTS

This project is founded by the company Actualis

SARL and the financing CIFRE research grant from

the French agency ANRT.

REFERENCES

Billsus, D., Pazzani, M.J., 1999. A Personal News Agent

that Talks, Learns and Explains. In: The Third Annual

Conference on Autonomous Agents, ACM, pp. 268–

275.

Ahn, J., Brusilovsky, P., Grady, J., He, D., Syn, S.Y.,

2007. Open User Profiles for Adaptive News Systems:

Help or Harm? In: 16th international conference on

World Wide Web, ACM, pp. 11–20

Cunningham, H., 2002 GATE, A General Architecture for

Text Engineering. Computers and the Humanities 36

pp. 223–254

Ehrig, M., Haase, P., Stojanovic, N., Hefke, M., 2005

Similarity for Ontologies A Comprehensive

Framework. ECIS. Regensburg, Germany.

Fellbaum, C., ed., 1998. WordNet: An Electronic Lexical

Database. MIT Press, Cambridge, MA

Gabrilovich, E., Dumais, S., Horvitz, E., 2004.

Newsjunkie: providing personalized newsfeeds via

analysis of information novelty. In WWW, pp. 482–

490.

Getahun, F., Tekli, J., Richard, C., Viviani, M.,

Yetongnon, K., 2009. Relating RSS News/Items. In:

9th International Conference on Web Engineering,

Springer, pp. 442–452

IJntema, W., Goossen, F., Frasincar, F., Hogenboom, F.,

2010 Ontology-Based News Recommendation. In:

International Workshop on Business intelligence and

the Web, BEWEB, EDBT/ICDT Workshops. pp. 16:1

16:6. ACM, New York, USA

Li, L., Wang, D., Li, T., Knox, D., Padmanabhan, B.,

2011. SCENE: A scalable two-stage personalized

news recommendation system. In Proc. the 34th

Annual International ACM SIGIR Conference on

Research and Development in Information Retrieval,

Beijing, China, Jul. 25-29, 2011, pp.124-134.

Liu, J., Dolan, P., Pedersen, E.R., 2010. Personalized news

recommendation based on click behavior. In

Proceedings of the 15th International Conference on

Intelligent User Interfaces, pp. 31–40. ACM.

Middleton, S.E., Shadbolt, N.R., Roure, D.C.D., 2004.

Ontological User Profiling in Recommender Systems.

ACM Transactions on Information Systems 22, pp.

54–88

Nageswara Rao, K., and Talwar, V.G., 2008. Application

Domain and Functional Classification of

Recommender Systems A Survey, Journal of Library

& Information Technology, Vol. 28, No. 3: 17-35.

Piskorski, J., Tanev, H., Wennerberg, P.O., 2007.

Extracting Violent Events From OnLine News for

Ontology Population. In: 10th International

Conference on Business Information Systems, BIS.

Lecture Notes in Computer Science, vol. 4439, pp.

287-300. Springer-Verlag Berlin Heidelberg.

Popov, B., Kiryakov, A., Kirilov, A., Manov, D.,

Ognyanoff, D., Goranov, M., 2003. KIM Semantic

Annotation Platform.

Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P.,

Riedl, J., 1994. GroupLens: An open architecture for

collaborative filtering of netnews. In Proceedings of

the 1994 ACM Conference on Computer Supported

Cooperative Work, Chapel Hill, North Carolina,

United States. ACM Press, New York, pp. 175-186.

Salton, G., 1970. The SMART retrieval system :

Experiments in automatic document processing.

Prentice Hall

Stumme, G., Ehrig, M., Handschuh, S., Hotho, A.,

Maedche, A., Motik, B., Oberle, D., Schmitz, C.,

Staab, S., Stojanovic, L., Stojanovic, N., Studer, R.,

Sure, Y., Volz, R., Zacharias, V., 2003. The Karlsruhe

view on ontologies. Technical report, University of

Karlsruhe, Institute AIFB.

Tanev, H., Piskorski, J., Atkinson, M., 2008. Real-Time

News Event Extraction for Global Crisis Monitoring.

In: 13th International Conference on Natural

Language and Information Systems: Applications of

Natural Language to Information Systems, NLDB.

Lecture Notes in Computer Science, vol. 5039, pp.

207-218. Springer-Verlag Berlin Heidelberg

Voorhees, E., 1994. Query Expansion using Lexical-

Semantic Relations.

Werner, D., Cruz, C., Nicolle, C., 2012. Ontology-based

Recommender system of economic articles; In

Proceedings of the 2012 Webist Conference, Porto,

Portugal.

WEBIST2013-9thInternationalConferenceonWebInformationSystemsandTechnologies

470