EVOLUTIVE CONTENT-BASED SEARCH SYSTEM

Semantic Search System based on Case-based-Reasoning

and Ontology Enrichment

Manel Elloumi-Chaabene, Nesrine Ben Mustapha, Hajer Baazaoui-Zghal

Riadi Lab., ENSI Campus Universitaire de la Manouba, 2010 Tunis, Tunisie

Antonio Moreno, David Sánchez

ITAKA Research Group, Departament d’Enginyeria Informàtica i Matemàtiques

Universitat Rovira i Virgili, Av. Països Catalans, 26. 43007, Tarragona, Spain

Keywords: Information Retrieval, Semantic search, Ontology, Case-based-Reasoning.

Abstract: The exponential growth of the data available on the Web requires more efficient search tools to find relevant

information. In the context of the Semantic Web, ontologies improve the exploitation of Web resources by

adding a consensual knowledge. However, the automation of ontology building is a hard problem. The

exploitation of the users’ search feedback can aid to address that problem. In order to apply this solution, we

present a semantic search system based on case-based-reasoning (CBR) and ontologies that automatically

enriches the ontologies by using previous queries. In this paper we expose the contribution of CBR and

ontologies in Semantic Web search. Some experiments on the system are presented, and the obtained results

show an improvement on the precision of the Web search and ontology enrichment.

1 INTRODUCTION

Because of the enormous size of the textual

information available in the Web, the use of

traditional search engines does not always provide

the most appropriate and relevant results. In fact, the

problem of Information Retrieval (IR) consists on

finding the right information corresponding to the

user query. Indeed, the information available on a

Web page is not always well-structured, which

makes it difficult to extract knowledge. Thus, we are

confronted with problems of interpretation of human

language and treatment of ambiguities (polysemy,

anaphora, metaphor), which are difficult to be

automated. The inclusion of the semantics provided

by ontologies in the IR process can help to improve

the quality of the search results.

In order to avoid the necessity of manually

constructing ontologies, in this work we address the

automatic construction of multi-domain ontologies

for Semantic Web search. The basic idea is to use

the Web search feedback provided by users to

construct ontologies from the consulted Web

documents (Ben Mustapha et al., 2009a). The

challenge behind this work is to find the relevant

documents from which the ontology should be

constructed. Case-based reasoning (CBR) is used to

find appropriate documents related to a new query

according to a case base. In this way, ontologies can

be automatically enriched as new search feedback is

obtained, whereas the Web search can be improved

by the semantics provided by the ontologies.

This work presents an implementation of a

generic approach (Ben Mustapha, 2010) which

allows any search engine to develop its semantic

layer by building ontologies and indexing the

document base. Our contribution consists in

integrating CBR with ontologies to ameliorate the

semantic search system and automatic ontology

learning.

In this paper, we present an overview of related

works in the area of semantic search and ontology

learning. The third section describes the architecture

of the proposed content-based search engine. The

next section illustrates the proposed approach by a

use case and describes some evaluation tests.

Finally, we conclude and discuss directions for

future research.

Elloumi-Chaabene M., Ben Mustapha N., Baazaoui-Zghal H., Moreno A. and Sánchez D..

EVOLUTIVE CONTENT-BASED SEARCH SYSTEM - Semantic Search System based on Case-based-Reasoning and Ontology Enrichment .

DOI: 10.5220/0003067600240034

In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval (KDIR-2010), pages 24-34

ISBN: 978-989-8425-28-7

 2010 SCITEPRESS (Science and Technology Publications, Lda.)

2 SEMANTIC WEB SEARCH

Semantic search has been one of the motivations of

the Semantic Web (Berners-Lee, 2001) since it was

envisioned as an extension of the current one, in

which information is given well defined meaning,

better enabling computers and people to work in

cooperation. It is emerging as the next generation

Web, which aims to give a semantic representation

which is accessible and understandable by machines.

It is based on the deployment of ontologies, the

semantic annotations and the formal representation

of the knowledge underlying these ontologies and

annotations. The role of semantic search is to use

underlying ontologies and available metadata in

semantic Web portals provide better performance

than keyword-based search.

2.1 Ontologies

Ontologies (Guarino, 1998) are used to represent a

shared conceptualization of a knowledge domain,

adding a semantic layer to computer systems. In

other words, an ontology is a formal representation

of the main concepts in a domain and their

interrelationships. An ontology can take the simple

form of a taxonomy (i.e., knowledge encoded in a

minimal hierarchical structure) or a vocabulary with

standardized machine interpretable terminology

supplemented with natural language definitions.

Ontologies are often specified in a declarative form

by using semantic markup languages such as XML

(Bradley, 2001), RDF (Lassila, 1999), OIL (Decker,

2000), XOL (Karp 1999) and OWL (McGuinness,

2004).

Several types of ontologies exist:

• Generic or Common Ontologies: they model

the common sense knowledge reusable

between domains. These ontologies include

vocabulary related to things, events, time,

space, causality, behaviour, function, etc.

• Application Ontologies: they model the

knowledge required for specific applications.

Ontology applications often specialize the

vocabulary of domain ontologies and task

ontologies.

• Knowledge Representation Ontologies: they

model the primitive representations used to

formalize knowledge.

• Domain Ontologies: ontologies focused in a

specific area.

• Modular Ontologies: an ontology module can

be considered as a reusable component of a

large or more complex ontology, which is

self-contained but bears definite relationships

to other ontology modules (Doran, 2006).

Ontology building tools provide automatic or

semi-automatic support for the construction of

ontologies. Ontology Learning methods extract

ontological elements (conceptual knowledge) from a

corpus of documents and build ontologies from

them. Ontology learning integrates many

complementary techniques, including machine

learning, natural language processing, and data

mining. The methods that apply ontology learning

methods to texts extracting ontologies by applying

natural language processing techniques. The most

well-known approaches are pattern-based extraction,

association rules, conceptual clustering, ontology

pruning and concept learning.

2.2 Classification of Semantic Search

Systems

The use of ontologies and metadata is the most

important aspect of the Semantic Web. Ontologies

have contributed to the emergence of semantic

search engines. Among these, we include the

contextual search engines based on domain

ontologies, which restrict the search to a well-

delimited area. We can distinguish two main

categories of search engines: ontology search

engines and semantic search engines (Mangold,

2007). We are especially interested in semantic

search engines, which can be divided in three

groups:

• Context-based Search Engines: the final

purpose of these engines is to enhance the

performance of traditional search engines

(especially precision and recall), by

understanding the context of documents and

queries. One of their most important parts is

the annotator, which is responsible for

generating metadata from the crawled pages.

These systems are the most practical ones; in

fact, they are the next generation of current

search engines.

• Evolutionary Search Engines: These search

engines are an answer to the famous and well

known problem of how to automatically

gather information about a specific topic. The

main distinguished behaviour of these engines

consists in using external metadata. They

normally use an ordinary search engine and

display augmented information near the

original results.

EVOLUTIVE CONTENT-BASED SEARCH SYSTEM - Semantic Search System based on Case-based-Reasoning and

Ontology Enrichment

• Semantic Association Discovery Engines: The

goal of these systems is to find various

semantic relations between input terms and

then rank the results based on semantic

distance metrics. An upper ontology like

WordNet or OpenCyc can be used to evaluate

this kind of search engines.

The main problems of ontology-based IR are the

reformulation of the users’ queries and the

difficulties to build and manage ontologies. Many

ontologies are available on the Web, but it is still

difficult to find an ontology for every domain

associated to a user’s query. However, it is obvious

that building ontologies for all domains, so that they

can be exploited by semantic search engines, is

difficult. So, using previous experiences of users can

help to accomplish this mission.

2.3 Semantic Search based on CBR

Case-based Reasoning (CBR) is a methodology for

problem solving by reusing previous experience

(Watson, 1999) and collecting new experiences,

since every new problem case, once solved, becomes

a new case that may be stored and reused. The tf-idf

(term frequency–inverse document frequency)

weighting scheme is used in the vector space model

together with cosine similarity to determine the

similarity between two textual cases (Rosina, 2005).

Information retrieval based on CBR allows the

management of user experiences and also some

scalability to adapt and customize the system to

search. It provides a strong collaboration that could

simulate the learning during the search. Moreover,

the incremental acquisition of knowledge on search

instances can make the system scalable, thanks to

the long-term learning. CBR presents the advantage

of its easy exploitation, in comparison with other

learning methods. However, it presents the

difficulties of case modelling and representation.

Many information retrieval approaches attempt to

improve the recovery process (indexing texts,

formulation) by using ontologies.

The combination of ontologies (as semantic

background) and CBR mechanism (to enrich the

ontology from search feedback) can improve the

performance of Semantic Web search. However, it

faces the difficulties of knowledge modelling from

textual data and the representation of search cases.

3 CONTENT-BASED SEARCH

ENGINE BASED ON CBR

AND ONTOLOGIES

To overcome the problem of information retrieval on

the Web, we have developed a system based on

ontologies and case-based reasoning. In this section,

we present an implementation of a generic approach

(Ben Mustapha, 2010) which allows any search

engine to develop its semantic layer by building

ontologies and indexing a document base.

Our contribution integrates case based reasoning

with ontologies to ameliorate the semantic search

system and add an automatic ontology learning

component.

3.1 Repository of Modular Ontologies

We have proposed in previous works the extensive

use of domain ontologies (Ben Mustapha et al.,

2009a). Our choice is motivated by the fact that

domain ontologies are restricted to the concepts

related to a specific knowledge field. This has the

advantage of limiting the ambiguity of terms defined

in the ontology, facilitating their detection in

documents. We can represent a relationship between

domain ontologies and modular ontologies as shown

in Figure 1. For example, we can define two

domains such as “Tourism” and “Computer

Science”, which can contain shared ontology

modules like “Conference”.

Figure 1: Relationship between domain ontologies and

modular ontologies.

KDIR 2010 - International Conference on Knowledge Discovery and Information Retrieval

3.2 Architecture

In this section, we describe the architecture of the

proposed system, which offers some functionalities

to search Web documents for the users. It has four

modules (see Figure 2). The first module is

responsible for the treatment and reformulation of

the users’ queries, and the second one manages the

case base, using ontologies. After these two

processes, the system filters and classifies the

documents selected by the users. Finally, the system

performs an ontology enrichment process based on

the analysis of the text of the selected Web

documents.

The first step consists on processing the user’s

search query in order to match it with a topic

covered in the ontology (module). The choice of the

module from the ontology can help to retrieve the

exact meaning of the search using WordNet.

Then, the user query is processed by the query

treatment module. In the case that the search topic is

new, the system provides an initial list of Web

documents. If a similar case (search) is found in the

case base, the system displays a list of already

retrieved documents containing relevant results and

adds the list from the search engine. Then, these

documents are classified by means of the filtering

process. Finally, the documents that are more similar

to the query are displayed to the user. After this, the

reformulation query module uses the case base

(more precisely, the semantic signature) to provide a

new query. The reformulated query allows launching

a new search instance to refine the results. The final

task of the system is the enrichment of the ontology

from the Web documents assessed as relevant, using

text-based Ontology Learning techniques.

Figure 2: Global architecture of the system.

3.3 User Query Reformulation

The Query Reformulation step has the aim of

extending the original query with additional

information to obtain more accurate results. This

process can help to solve the difficulties faced by the

users to express their queries. Our approach is based

on the enrichment of the query using the information

available from past cases. The submitted query is

also extended on the basis of the concepts and

relations of the domain ontology. More precisely,

two tasks are performed to reformulate the query:

• Morphological Analysis: It allows the

recognition of different forms of words from a

lexicon (dictionary or thesaurus). Stemming

allows the transformation of conjugate or

flexed forms to their canonical form or

lemma. In our approach the stemming of the

query is performed by treetagger.

• Adding Terms from Semantic Signature: Each

query is referenced by the semantic signature

of a case, whose terms can be used to enrich

the query. They represent the concepts used

more often for the search topic and the given

query. Each term is described by a weight

calculated from the relevant documents of the

case.

3.4 Ontology-based CBR

The case based reasoning (CBR) is the act of solving

a problem by using past experiences, stored in the

case base.

Case Modelling: Typically, a case contains at least

two parts: a description of a situation representing a

"problem", and a "solution" used to remedy this

situation. We have modelled a case with the

structure shown in Figure 3.

Problem Modelling: A problem is composed of five

parts (Ben Mustapha et al., 2009b). The first one

represents the search goal. We can distinguish two

types of search goals (Rose and Levinson, 2004).

The navigational goal is a desire by the user to be

taken to the home page of an institution or

organization. To be considered navigational, the

query must have a single authoritative Web site that

the user already has in mind. For this reason, most

queries consisting of names of companies,

universities, or well-known organizations are

considered navigational. Also for this reason, most

queries for people – including celebrities – are not.

The informational goal includes the desire to locate

something in the real world, or simply to get a list of

EVOLUTIVE CONTENT-BASED SEARCH SYSTEM - Semantic Search System based on Case-based-Reasoning and

Ontology Enrichment

suggestions for further research. Informational

queries are focused on the user goal of obtaining

information about the query topic. This category

includes goals for answering questions that the user

has in mind.

The second part includes the domain search. It is

referred by a topic concept from the domain

ontology which aims to define the topic of the

search. We have also modelled a pivotal concept for

the chosen ontological module and a set of queries

classified by a record. The last part is the semantic

signature of the pivotal concept, which is a list of

terms that appear frequently with the chosen

concept.

Solution Modelling: The case solution contains a

module vector, used for filtering documents of the

current search and the similar cases. This vector is

represented as a n-tuple (W

, W

..,

), where each tuple (W

) represents a

concept of the ontology module and its associated

weight. The case solution also holds a domain

vector, similar to the module vector, which contains

the domain concepts and their weights. Finally, we

have modelled a results part to collect the search

results which are selected by the users. These results

could be Web documents, ontological classes or text

fragments, depending on the search goal.

Figure 3: Case structure.

Procedures Applied to a Case. We present in the

following paragraphs the different procedures that

allow the enrichment of the cases (see Figure 4).

Search for Similar Cases: The submission of a

query by a user is treated as a case search. The

system starts by searching in the case base. Two

requests are called similar if they belong to the same

search area for the same search goal, they are both

connected to the same ontological concept, and they

include a similar set of terms (i.e., the terms are

identical to the keywords of the original query, or

synonyms of them). If a similar case exists in the

database, the system executes two processes. The

first is to collect all relevant documents in the case

base related to similar queries. The second is to

collect all documents from the Web related to the

user query after treating and enriching the query

with the semantic signature. After this document

collecting phase, a classification process is triggered

to classify all the selected documents and display

them to the user. Then, these documents are

evaluated by the user and an ontology learning tool

analyzes them in order to improve the ontological

module with new concepts and new relationships

with other modules. Finally, the system updates the

case to enrich it.

Figure 4: Procedures applied to a case.

Cases Update: Once an existing case in the base

case is selected, it is updated in two steps. The first

one is the addition of new relevant documents to the

case. After processing the user’s query, a new class

of terms is added to the semantic signature. The next

step is to update the module vector and the domain

vector, by recalculating the weights of the terms of

the semantic signature related to the case. Finally,

the system saves the new adapted case into the case

base.

Insertion of New Cases: This step involves the

insertion of the case problem, which includes the

addition of the search goal, the user-selected search

field, the pivotal concept and the semantic signature

(i.e., a list of terms obtained through the extraction

of keywords from the user query).

After inserting the case problem, the system

returns a set of URLs from the search engine. The

user mission is to mark those that he/she considers

KDIR 2010 - International Conference on Knowledge Discovery and Information Retrieval

relevant. The addition of the module vector and the

domain vector is made as soon as the user selection

is finished.

Similarity Measures: The next step is crucial for

the integration and update of the module and domain

vectors.

Weighting Concepts by CF-ICF Method: The CF-

ICF measure is a good approximation to determine

the importance of a concept in a document,

particularly in a uniform size corpus. Here are some

formulas to calculate CF-ICF.

CF: (Concept Frequency) is a value proportional

to the occurrence and the frequency of a concept in

the ontological module and so in the semantic

signature.

ICF: (Inverse Concept Frequency)

ICF Log

⎛⎞

⎜⎟

⎝⎠

(1)

N: total number of concepts in the ontology

: number of documents that contain the concept c

The formula below (to obtain W

) is the combination

of CF-ICF:

∑

With

()

iii

ICFLogCFW ∗+= 1

(2)

Where f

is a normalized function: to reduce the

differences between the values associated with the

concepts of the document. It is defined by the

following formula:

0,5 0,5

max ( )

cD ij

∈

=+×

(3)

With cf

is the number of occurrences of the

concept c

in document D

Calculation of Module Vector: This involves the

calculation of the weight of each term using the

vector model, from the base of relevant documents

related to a specific domain stored in the case base.

Calculation of Domain Vector: This metric

calculates the weights of all concepts of the domain

ontology using the vector model, also from the basic

documents stored in the case base.

It can be noticed that CBR is used for many

tasks: reformulation of new queries on the basis of

previous ones, proposal of recommendations that are

represented by similar queries and their results,

document classification and enrichment of

ontological modules.

3.5 Filtering and Classification of Web

Documents

The vector model of Salton (Salton, G. and al.,

1983) is used to retrieve the most relevant

documents. We replace the terms by concepts to

make it more suitable for our application. More

precisely, we filter the domain vector to eliminate

documents outside the area, which do not belong to

the same module; therefore, this vector can reduce

the search noise.

In this vector model each document is

represented by a vector which has the following

form: D

= (d

,...,d

). We can define d

as the

weight of the concept c

in the document D

, being N

the number of concepts that are in the semantic

signature. The query is represented by the vector Q=

,...,q

), where q

is the weight of the concept c

in the semantic signature. The measure of similarity

between a document and a query is calculated with

the cosine formula:

(,)

ij i

ji i

Sim D Q

•

∑

(4)

3.6 Ontology Enrichment from Web

Documents

The final task of this system is the enrichment of the

ontology fragments (associated to an old or new

case) from the Web documents making up the

solution of this case. Each document will be the

input of the ontology learning phase of the process

proposed in a previous work (Baazaoui Zghal and

al., 2007) (Baazaoui-Zghal and al., 2008). Text

mining techniques (syntactic patterns (Hearst, 1992)

and verb based patterns) are used to discover new

concepts and new relations between the concepts of

the ontology fragment and the topic signature

(Aguirre and al., 2004).

It is possible to extract new terms which have not

a stable relation with the ontology fragment. In this

case, these terms are added to the topic signature of

the case (instead of being added to the ontology) in

order to be used in a next iteration.

EVOLUTIVE CONTENT-BASED SEARCH SYSTEM - Semantic Search System based on Case-based-Reasoning and

Ontology Enrichment

4 ILLUSTRATING SCENARIO

AND USE CASE

4.1 Illustration

This section illustrates a simple scenario using the

search system. We present an example in which the

user wants to search for “a notebook HP”. As it is

shown in the last section, our system is composed of

four modules. So, the first step consists in the

treatment of the query. In figure 5 we present the

first iteration, including the primitive query built by

the system.

The system begins by reformulating the user’s

query to refine the search. On the one hand, the

system searches for similar cases to send their

documents (if they exist) to the user, and inserts the

module and domain vectors in the case base. Then,

the search engine obtains a list of Web documents.

This search tries to construct a primitive ontology.

On the other hand, after selecting the relevant

documents, the system uses text mining techniques

to enrich the primitive ontology, by extracting

concepts that are relevant to the ontological module

from the selected Web documents; in this case the

ontological module is the “notebook”.

Figure 5: An example: HP Notebook search.

After the completion of this scenario, new

concepts and instances are added to the module and

the query of the user is reformulated and enriched

with related terms and concepts as represented in

Figure 6.

4.2 Use Cases Demonstration

In this section, we show two use cases representing

the semantic relation between two queries Q1 and

Q2 that appear as two independent requests

submitted by two users but, according to the domain

knowledge, the result of one is the answer to the

other. The query Q1 is “What is an EMS in

Computer Science?” and Q2 is “How to make an

online meeting or workshop? “.

Figure 6: Module enrichment.

We suppose that the first user submits the query

Q1. This first iteration can be done with any search

engine, given that there is no learning at this stage.

Concerning this query, we cannot find any related

ontology module nor similar cases as the query

keywords are not clear. A graph corresponding to

the query is extracted using lexico-syntactic patterns.

Then, the corresponding keywords are sent to a

search engine. The user selects the relevant

documents to be saved and a new case C1 is inserted

in the case base (Table 1). Both the insertion of a

new case and the calculation of the two vectors are

done automatically. After the case insertion, the

construction of the ontology module related to the

concept “electronic meeting system” is done by

fulfilling the following tasks:

• Extraction of sentences containing the term

“EMS” from selected Web documents

associated to case C1 (figure 7).

• Application of lexico-syntactic patterns and

syntactic frames to discover new related

concepts and relations.

• Validation of the ontology elements

discovered for the set of Web documents.

The resulting ontology module is illustrated by

Figure 8.

A second user submits the next query Q2. This

second iteration is fulfilled by learning from similar

queries. The difference between the two queries is

that the first user knows exactly the concept he is

searching for (“EMS”) while the second user only

knows the role of this concept (without knowing the

fact that an EMS is a tool to make online meetings

or workshops). The use of a traditional search engine

to answer the query Q2 will lead to the difficulty of

finding that EMSs are used to make online

meetings or workshops, as “EMS” does not

KDIR 2010 - International Conference on Knowledge Discovery and Information Retrieval

Table1: Case representation of C1.

Case

Index

http://www.SemSearch.com/Computer_Scie

nce /ontology#EMS.owl

Problem GS= navigational

D=http://www.SemSearch.com/Computer_

Science

PC: EMS

TopicSign= meeting, conference, online,

collaborative….

Set_Q= (EMS, (EMS +electronic meeting

system))

Solution Module Vector

Domain Vector

Set_response=

http://en.wikipedia.org/wiki/Electronic_mee

ting_system

appear as a term in the second query. But, according

to the proposed approach, the ontological index of

cases can provide the second user with a similar case

existing in the case base.

Figure 7: Extracted sentences containing the term “EMS”

from selected Web documents associated to case C1.

By applying this approach, the system has

noticed that there is a common result between the

two queries. However, by using the search engine,

we wouldn’t be able to find the same result. In fact,

the retrieval of cases similar to query Q2 provides

the most ranked recommended cases (C1 and many

other cases, as C2 shown in Table 2).

For this second user, we can provide a summary

of what is an online meeting, using the ontology

module (concepts in the neighborhood of online

meeting), and give a set of recommended

documents. In this context, we supposed that a

previous user submitted queries related to meetings.

The explication of this common result is that the

ontology module indexing C1 is strongly related to

the ontology module indexing C2 (figure 8).

The relation between the two ontology modules

was discovered in the step of ontology enrichment of

the case C1. The two ontology modules were

automatically created in the first iteration by

applying text mining techniques on the documents of

the C1 case.

Table 2: Case representation of C2.

Case

Index

http://www.SemSearch.com/Computer_Sci

ence /ontology#online_meeting.owl

Problem GS= navigational

D=http://www.SemSearch.com/Computer_

Science

PC: EMS

TopicSign= meeting, online, tools,...

Set_Q= (make online meeting, workshop

making tools, interactive meeting tools,....)

Solution Module Vector

Domain Vector

Set_response= http://www.online-tech-

tips.com /cool-websites/free-online-

meeting-software/

So, EMS is one of the available online meeting

tools. There are other answers such as “Web

Conferencing Roundup” (C2). The selection of one

of the proposed cases narrows the process of search.

This implies the use of the vector module. In this

example, we suppose that the second user selects the

case C1 among the recommended cases. Then, a set

of queries are submitted to import other Web

documents whose similarity with the Module Vector

is important. Then, we apply the document filtering

step by using the domain vector. We remark that the

majority of removed Web documents include the

term EMS, but they have not the same meaning of

“electronic meeting system“. In fact, there are many

other senses of the expression “EMS”, such as

“Express Mail Service” and ”European

Mathematical Society”.

5 SYSTEM EVALUATION

We have just shown how CBR and ontologies are

used in the search. To evaluate our system we

computed the precision in the first part and the

EVOLUTIVE CONTENT-BASED SEARCH SYSTEM - Semantic Search System based on Case-based-Reasoning and

Ontology Enrichment

influence of CBR on ontology enrichment in the

second part.

Figure 8: Relation between two ontology modules

indexing two similar cases.

5.1 Search Results Evaluation

We measured the precision of our tool for finding

information online. Several scenarios have been

tested. The first scenario represents the initial

search, which is a keyword search on Google. The

second scenario represents the situation where there

are similar cases in the database. The search is based

on the vector model to filter the results. The third

scenario represents the search for information based

on the learning of cases, the query reformulation and

the use of the vector model for classification.

The precision measures the rate of relevant

documents that are manually assessed by a human

domain expert. Its values are calculated for several

queries. In Figure 9 we observe that the results

show a significant improvement of the relevance of

the returned information gradually as the concepts

are used in experiments. This demonstrates that the

implemented ideas contribute significantly to

improve the relevance of search results.

5.2 Ontology Enrichment Evaluation

In Figure 10 we can observe that the relevance of the

obtained results shows a marked improvement in the

considered scenarios. For the first scenario, the

Figure 9: An average precision for a system search.

added concepts are from WordNet so this is the

contribution of the disambiguation. The added

concepts in the second scenario are based on similar

cases in the case base, and so that is the contribution

of the CBR system. Finally, for the last scenario, the

added concepts are from the text, and that is the

contribution of enrichment from text.

Figure 10: Average number of added concepts for each

scenario.

In the first scenario, we use only WordNet. There

are many relevant concepts in relation to the total of

concepts, but the number of concepts is very low to

describe the user’s search (see Figure 11).

In the second scenario we can extract concepts

from similar cases. We can observe the increase of

the number of relevant concepts in comparison with

the first scenario, despite the decrease of the

pertinent concepts average in relation to the total of

concepts. This can highlight the contribution of

case-based -reasoning in our work (see Figure 12).

0,2

0,4

0,6

0,8

1 2 4 5 102030406080100

Precisionrate

NumberofWebdocuments

Scenario1 Scenario2 Scenario3

100

150

200

250

300

350

1 5 10 15 20 25 30

Numberofconcepts

Numberofcases

Scenario1 Scenario2 Scenario3

KDIR 2010 - International Conference on Knowledge Discovery and Information Retrieval

Figure 11: Concepts relevance for the first scenario.

Figure 12: Concepts relevance for the second scenario.

The third scenario represents the concepts

extracted from text (from relevant Web documents).

We can notice the great number of concepts

retrieved by the proposed system. This number of

concepts, especially the relevant ones, demonstrates

the contribution of ontology enrichment in our work

(see Figure 13).

Our system highlights the contribution of case-

based-reasoning and the ontology enrichment.

However, it presents some limitations. In fact,

returned documents for the user’s query and

ontology enrichment are not synchronised, these two

processes can not unfold at the same time. The

ontology enrichment process takes time because of

the volume of Web documents. That is why we

thought to use other ontology techniques based on

Web metrics and data returned by search engine to

paralellize the search and the enrichment processes

in our future work.

Figure 13: Concepts relevance for the third scenario.

6 CONCLUSIONS AND FUTURE

WORK

The challenge addressed by this paper is to find a

solution to improve the contextualization of requests

based on past users’ queries and the enrichment of

ontologies from the query context (documents

selected by the user). In this paper, we have

presented related work that uses CBR technology in

IR systems and Ontology Learning for IR. In fact,

since intelligent retrieval is one of the main

applications of Case-based reasoning paradigm

(CBR), the semantic formalization in CBR systems

is becoming an increasing research area. In CBR

systems, semantics are the main source of reasoning,

similarity calculation and case adaptation. Our

underlying hypothesis is that case-based reasoning

supported by ontologies is a promising approach for

achieving semantics aware search and ontology

enrichment. In this work, we discussed an evolutive

semantic search based on case-based reasoning, by

which the traditional information retrieval, ontology

and CBR can be integrated. Our recommender

approach uses Case-Based Reasoning (CBR), with

semantic Web language markup (ontology) for case

indexing.

We present in this paper a system that supports

the proposed approach (Ben Mustapha et al., 2010)

of semantic search based on ontologies and CBR.

The main contribution of this work is the ontology

fragment building and enrichment within a semantic

search engine and linking user queries with ontology

fragments using the case base reasoning.

The ontology enrichment is based on our

previous work that proposes to analyse the whole

Web page to learn new concepts and relations. So, it

1 5 10 15 20 25 30

Numberofconcepts

Numberofcases

Totalconcepts

Relevantconcepts

Nonrelevantconcepts

1 5 10 15 20 25 30

Numberofconcepts

Numberofcases

Totalconcepts

Relevantconcepts

Nonrelevantconcepts

100

200

300

400

1 5 10 15 20 25 30

Numberofconcepts

Numberofcases

Totalconcepts

Relevantconcepts

nonrelevantconcepts

EVOLUTIVE CONTENT-BASED SEARCH SYSTEM - Semantic Search System based on Case-based-Reasoning and

Ontology Enrichment

is obvious that this approach is time consuming to

set it as an online task. So, we intend in the future

work to study especially ontology learning

techniques based on Web metrics (Sánchez and

Moreno, 2008a, 2008b).

Besides, since modularization is a very important

aspect in order to facilitate the enrichment,

maintenance and the reuse of the ontologies, the

main motivation of our future work is to adopt

ontology learning techniques based on the Web to

build modular ontologies by the composition of

extracted ontology modules.

ACKNOWLEDGEMENTS

This work has been supported by the Spanish-

Tunisian AECID project A/023175/09, “A

framework for the integration of Ontology Learning

and Semantic Search”.

REFERENCES

Agirre, E., and Lacalle, O., 2004. Publicly available topic

signatures for all wordnet nominal senses. In the 4rd

International Conference on Language Resources and

Evaluation (LREC) (Lisbon, Portugal, 2004).

Baazaoui Zghal, H., Aufaure, MA., Ben Mustapha, N.,

2007 Extraction of Ontologies from Web Pages:

conceptual modeling and tourism Journal of internet

Technologies.

Baazaoui-Zghal, H. Aufaure, MA., Ben Mustapha, N. A,

2008. Model-Driven approach of ontological

components for on-line semantic web information

retrieval”, Journal on Web Engineering, Special Issue

on Engineering the Semantic Web, Rinton Press, vol.

6, n°4, pp 309-336.

Ben Mustapha N., Baazaoui Zghal H., Aufaure MA. and

Ben Ghezala H., 2010. Enhancing Semantic Search

using Case-Based Modular Ontology, In 25th

Symposium On Applied Computing (SAC) , track on

“The Semantic Web and Applications” (SWA), 22 –

26, Sierre, Switzerland.

Ben Mustapha N., Baazaoui Zghal, H., Aufaure, MA. and

Ben Ghezala, H., 2009a. Combining Semantic Search

and Ontology Learning for Incremental Web Ontology

Engineering, International Workshop on Web

Information Systems Modeling (WISM 2009),

Amsterdam.

Ben Mustapha, N., Baazaoui Zghal, H., Aufaure, MA. and

Ben Ghezala, H., 2009b. Ontology learning from

Web : survey and framework based on semantic

search, Second International conference on Web and

Information Technologies, ICWIT2009, Kerkena,

Tunisia.

Berners-Lee, T., Hendler, J. and Lassila, O., 2001. The

Semantic Web. Scientific American, May, pp. 29-37.

Bradley N., 2001. The XML Companion, Addison-Wesley

Professional Publisher.

Charlet, J., Szulman, S., Pierra, G., Nadah, N., Teguiak, H.

V., Aussenac-Gilles, N., et al. (2008). DAFOE: a

multimodel and multimethod platform for building

domain ontologies. in D. Benslimane, (Ed.), 2èmes

Journées Francophones sur les Ontologies, JFO’08,

ACM.

Decker S., M. Erdmann, D. Fensel, I. Horrocks, M.

Klein,F.van Harmelen, 2000. Oil in a nutshell, In

Proceedings of the 12th European Knowledge

Acquisition Workshop (EKAW'00).

Doran P.. Ontology Reuse via Ontology Modularisation.

In Proceedings of KnowledgeWeb PhD Symposium

2006 (KWEPSY2006), 17th June 2006. Budva,

Montenegro.

Guarino, N., 1998. Formal Ontology in Information

Systems. In 1

International Conference on Formal

Ontology in Information Systems (FOIS’98). Trento,

Italy, pp. 3-15.

Karp R., V. Chaudhri, J. Thomer, 2002. An xml-based

ontology exchange language.

Hearst, M. A., 1992. Automatic acquisition of hyponyms

from large text corpora. In 14th International

Conference on Computational Linguistics (COLING-

92). Nantes, France, pp. 539–545.

Lassila O. R. R. Swick, 1999. Resource description

framework (rdf) model and syntax specification w3c

recommendation 22.

Mangold, C., 2007. A survey and classification of

semantic search approaches, In Journal of Metadata,

Semantics and Ontology, Vol. 2, No. 1, pp.23–34.

McGuinness D. L., F. van Harmelen, 2004. OWL Web

Ontology Language Overview.

Rose, D. E. and Levinson, D., 2004. Understanding User

Goals in Web Search. In International World Wide

Web Conference, New York, USA, pp. 13-19.

Rosina O.Weber, Kevin D. Ashley and Stefanie

Brüninghaus, 2005. Textual case-based reasoning.

The Knowledge Engineering Review, Vol. 20:3, 255–

260.

Salton G, Fox E.A, Rocchio J, Extended Boolean

information retrieval system.CACM 26(11), pp. 1022

1036, 1983.

Sánchez, D. and Moreno, A., 2008a. Learning non-

taxonomic relationships from web documents for

domain ontology construction, Data & knowledge

Engineering, Vol. 63, No. 3, pp. 600-623.

Sánchez, D. and Moreno, A., 2008b. Pattern-based

automatic taxonomy learning from the Web, AI

Communications, Vol. 21, No. 1, pp. 27-48.

Watson, I., 1999. Case-based reasoning is a methodology

not a technology, Knowledge-Based Systems, Vol. 12,

No. 5, pp. 303-308.

KDIR 2010 - International Conference on Knowledge Discovery and Information Retrieval