has introduced constructs to evaluate fuzzy sets over
JSON documents, promises to provide this capability,
but we will see that queries become harder to write.
So, a higher-level approach is necessary, based on a
simpler soft-querying model that is specific for GeoJ-
SON documents, but indeed based on fuzzy J-CO-QL
queries, which becomes the underlying engine.
In this paper, we present a high-level soft querying
model for selecting features in GeoJSON documents,
in such a way that, given an input GeoJSON docu-
ment, the query generates a new GeoJSON document
containing only the selected features (i.e., data items
contained in the document). The adoption of linguis-
tic predicates in the soft query provides the capability
to rank features, on the basis of the membership de-
gree to fuzzy sets. The paper will show how such
“simple” soft queries can be automatically translated
into complex fuzzy J-CO-QL queries, which actually
disassemble the GeoJSON documents, apply fuzzy
querying to single features and re-assemble them into
a unique GeoJSON document, such that features are
ordered in reverse order of importance with respect to
the linguistic condition expressed in the soft query.
The remainder of the paper is organized as fol-
lows. Section 2 briefly discusses relevant related
work. Section 3 presents the background of our work,
i.e., basic notions on fuzzy sets and the GeoJSON for-
mat. Section 4 introduces the main research idea of
the paper, i.e., applying soft querying to GeoJSON
documents. Section 5 shows how the J-CO-QL lan-
guage can manipulate GeoJSON documents to select
features. Then, Section 6 shows how fuzzy concepts
previously added to J-CO-QL in (Psaila and Marrara,
2019) can perform soft querying on GeoJSON docu-
ments (Sections 6.1 and 6.2), while a rewriting tech-
nique to derive fuzzy J-CO-QL queries on GeoJSON
features is presented in Section 6.3. Finally, Section 7
concludes the paper.
2 RELATED WORK
Most relational database management systems adopt
Boolean logic to formalize queries. This means that a
query condition can be either satisfied or not satisfied.
By using Boolean logic, it is not possible to have a
flexible semantics of relational operations, to express
preferences and to rank query results. In many real-
world situations, queries are expressed by humans by
means of imprecise words. User’s intention is not
merely to find the items that satisfy a given query;
in contrast, the user may wish to estimate how much
each item satisfies the conditions in the query (its sat-
isfaction degree), in order to rank items (if possible).
There are several approaches to represent impre-
cise and vague concepts in Information Retrieval (IR).
A first approach defines similarity or proximity rela-
tions between pairs of imprecise and vague items. In
the Vector Space Model, for instance, documents and
queries are represented as points in a space of terms
and the distances between the points representing the
query and the documents are used to quantify their
similarity (Salton et al., 1994). Another category of
approaches adopts the notion of Fuzzy Set. Fuzzy Set
Theory is an extension to classical set theory (Zadeh,
1975). The notion of fuzzy set has been used to repre-
sent vague concepts expressed in a flexible query for
specifying soft selection conditions (Blair, 1979). The
objective here is to quantify the closeness of the in-
formation carried by the proposition with the consid-
ered reality. Possibility Theory (Fuhr, 1989; Zadeh,
1965) together with the concept of linguistic variable
defined within fuzzy set theory (Zadeh, 1975), pro-
vides a complete formal framework to manage impre-
cise, vague and uncertain information (Buell, 1985).
There are two alternative ways to model the re-
trieval activity. (i) One possibility is to model the
query evaluation mechanism as an uncertain decision
process. The concept of relevance is defined as a
binary (crisp) condition, since the query evaluation
mechanism computes relevance probability of a doc-
ument d with respect to a query q. Such an approach,
which does model the uncertainty of the retrieval pro-
cess, has been introduced and developed by using
probabilistic IR models (Ramer, 1989; Herrera and
Herrera-Viedma, 1997; Waller and Kraft, 1979). (ii)
Another way is to interpret the query as the speci-
fication of soft constraints that the representation of
a document can satisfy to a certain degree, and to
consider term relevance as a gradual (vague) concept.
This is the approach adopted in fuzzy IR models (Bor-
dogna and Pasi, 1995). In this latter case, the decision
process performed by the query evaluation mecha-
nism computes the degree with which the query is sat-
isfied by the representation of each document. A very
good survey regarding the adoption of Fuzzy Sets in
IR can be found in (Kraft et al., 2015).
A well defined context, in which dealing with un-
certainty and vagueness is quite common, is XML
Retrieval. When defining query languages for XML
documents, the problem of querying data collections
without a well-defined structure, or with a heteroge-
neous structure, was soon evident.
To tackle this problem, Fuzzy Set Theory was
a fairly immediate choice. In (Damiani and Tanca,
2000) the authors presented an approach in which
XML documents are modeled as labeled graphs; their
structure is selectively extended by computing the im-
WEBIST 2020 - 16th International Conference on Web Information Systems and Technologies
254