These schemes are based only on the keywords of the
query sent by the user and returning documents con-
taining the query terms. However, this is not always
the best way to perform a search. The downside is
that if a user does not select the appropriate keywords
of the query, the server would not return the most per-
tinent documents. Indeed, the server ignores every
document not containing at least one query term, even
if it has a meaning close to that of the query. Conse-
quently, the search is not optimal. To overcome this
problem, it is necessary to introduce a semantic search
over encrypted cloud data.
There are few works in the literature that have
tried to address this problem by proposing semantic
search approaches. (Sun et al., 2013), (Yang, 2015)
have proposed approaches that exploit the technique
of expansion of the query (a single term query) by
inserting synonyms of the query term. These ap-
proaches have not solved the problems previously
posed. Their limit is that they do not use external
resources such as ontologies and thesauri. In addi-
tion, except synonymy, they do not exploit relation-
ships between terms (associative relation, homonym,
instance-of relation, related term, etc.).
In this paper, we present our proposed scheme.
The goal is to solve the problems mentioned above by
performing a semantic search over encrypted cloud
data in which an external resource (Wikipedia on-
tology) is exploited. In addition, we will introduce
an improved version of our approach by proposing a
new weighting formula. Furthermore, an experimen-
tal study validates our proposed approach.
2 PROBLEM FORMULATION
2.1 Toward a Semantic Search
The majority of encryption searchable schemes over
cloud data proposed in the literature performs a
keyword-based search. Indeed, during the search pro-
cess, when the server receives a query, it tries to find
documents containing the query terms. Documents
not containing any query term will not be returned
despite they can be relevant.
Therefore, to get the more relevant documents, the
user is obliged to choose the right keywords when for-
mulating his query. However, this is not always easy,
especially for an inexperienced user. Consequently,
the search may become a tedious task for the system
users. In addition, many relevant documents not con-
taining any query term will not be returned to the user.
To illustrate the problem, let us take the following
example: Assuming we have two short documents
1
,
the first document deals with the London Stock Ex-
change
2
; whereas, the second one is about the Eng-
land football team
3
.
Document 1. The London Stock Exchange is a stock
exchange located in the City of London in the United
Kingdom. As of December 2014, the Exchange had
a market capitalization of US$6.06 trillion, making it
the third-largest stock exchange in the world by this
measurement.
Document 2. The England national football team
represents England and the Crown Dependencies of
Jersey, Guernsey and the Isle of Man for football
matches as part of FIFA-authorised events, and is
controlled by The Football Association, the governing
body for football in England.
If a user sends the query Economy of England, the
server will search for documents containing the terms
Economy and / or England in the documents collec-
tion. The server will surely find that the first docu-
ment does not contain any of these terms, so it ignores
this document. Contrariwise, it will find that the sec-
ond document contains the term England, so it will
return it. However, if we analyze the content of the
two documents, we will notice that the first document
is relevant, since its meaning is close to that of the
query, given that it talks about the London stock ex-
change which is strongly related to the economy of
England. Contrary to the second document that talks
about football in England and has no relationship with
economy. Therefore, this document is not supposed
to be relevant even if it has terms in common with the
query.
In order to solve the problem that we have faced
in the syntactic search. IR community has turned to
the use of techniques exploited in natural language
processing. Indeed, they have exploited external re-
sources such as thesauri and ontologies in order to un-
derstand the meaning of the queries sent by the users.
The goal is to improve the precision and recall of the
search by returning documents that have a meaning
close to that of the query rather than relying on the
syntax. This area of research is called semantic infor-
mation retrieval.
To the best of our knowledge, very few studies
(Sun et al., 2013), (Yang, 2015) have exploited the
semantic information retrieval over encrypted cloud
data. These works are based on the query expansion
technique by adding the synonyms of the query term.
The drawback of these schemes is that except the syn-
1
Extracted from Wikipedia
2
https://en.wikipedia.org/wiki/London Stock Exchange
3
https://en.wikipedia.org/wiki/
England national football team
WEBIST 2016 - 12th International Conference on Web Information Systems and Technologies
236