FUZZY XML MODEL FOR REPRESENTING FUZZY
RELATIONAL DATABASES IN FUZZY XML FORMAT
Alnaar Jiwani
, Yasin Alimohamed
, Krista Spence
, Tansel Özyer
, Reda Alhajj
‡,
Department of Computer Science, University of Calgary, Calgary, Alberta, Canada
Department of Computer Science, Global University, Beirut, Lebanon
Keywords: XML schema, fuzzy Data, fuzzy relational database, database reengineering.
Abstract: This paper describes a fuzzy XML schema model for representing a fuzzy relational database in XML
format. It outlines a simple translation algorithm to include fuzzy relations and similarity matrices with their
associated conventional relation. We also describe an example implementation of a fuzzy relational
database and the XML document resulting from the translation according to our schema.
1 INTRODUCTION
In the past two decades, there has been extensive
research examining how imprecise and uncertain
data can be represented in databases given that it is
pervasive in most real-world applications. Examples
of imprecise data include subjective opinions and
judgments in areas such as personnel evaluation,
policy preferences and economic forecasting. A
particular vein of research that is immediately
applicable to many applications is how the
conventional relational model can be extended to
incorporate this fuzzy data.
Another highly researched area focuses on how
relational data can be represented in the Extensible
Markup Language (XML). Lee et al (2002) and
Turowski and Weng (2002) describe examples of
XML representation for fuzzy data modeling.
However, they do not describe how to incorporate
fuzzy XML with data from conventional relations.
The approach described by Lee et al applies to the
object-oriented paradigm, and not simple relational
data. Turowski’s approach is more general;
however, it does not utilize the currently accepted
technique of XML schemas in defining the XML
document class (it instead uses DTD format).
This paper presents a novel approach to
incorporate fuzziness in the XML model and
outlines one approach for transforming fuzzy
relational data into fuzzy XML.
The rest of this paper is organized as follows.
Section 2 reviews the existing literature in the area
of fuzzy sets and fuzzy relational databases. Section
3 presents an overview of the representation of fuzzy
data in XML. In section 4, we describe a basic
algorithm for translation from fuzzy relational
database to fuzzy XML. Section 5 is conclusions.
2 FUZZY RELATIONAL
DATABASES
Zadeh (1965) formally introduced the fuzzy set
concept to represent imprecise data. He identified
that objects encountered in physical world do not
have precisely defined criteria of membership; this
imprecision is natural to human thinking. For
example, a teacher may want to find a list of all
‘good’ students. The teacher may define the term
‘good’ based upon attendance, grades, behavior or
many other criteria. The teacher may also want to
include students who do not exactly fit any crisp
definition of being ‘good’. Conventional relational
databases rely on crisp representations of data and
do not have immediately obvious ways to
incorporate such imprecision.
2.1 Overview of Fuzzy Set Theory
A classical set is a set with a crisp boundary, such
that any object in the domain either belongs to the
set, or does not belong to the set. A classical set C
may be: C = {x | x 25, for all real x}. In contrast, a
fuzzy set has a continuum of grades of membership.
There is a gradual transition from “belonging to a
163
Jiwani A., Alimohamed Y., Spence K., Özyer T. and Alhajj R. (2006).
FUZZY XML MODEL FOR REPRESENTING FUZZY RELATIONAL DATABASES IN FUZZY XML FORMAT.
In Proceedings of the Eighth International Conference on Enterprise Information Systems - DISI, pages 163-168
DOI: 10.5220/0002465101630168
Copyright
c
SciTePress
set” to “not belonging to a set”. For example, a
fuzzy set could be used to model linguistic
expressions such as “the person is beautiful” or “the
weather is bad”. Fuzziness does not come from the
randomness of members of the set, but from the
uncertain and imprecise nature of abstract concepts.
The construction of a fuzzy set F relies first on
identifying a domain of values that could belong to
the set, generally referred to as the Universe of
Discourse (Jang and Sun, 1995). The set is
characterized by a membership function, denoted
μ
F
(x) that associates each value in the Universe of
Discourse to a real number in interval [0, 1]. The
specification of the membership function is
completely subjective to the person who defines it.
A fuzzy set F with universe of discourse X is
defined as ordered pairs: F = {(x, μ
F
(x)) | x X}
Fuzzy sets may be discrete or continuous. Let X be
the set of possible grades a student may receive on a
paper: X = {A, B, C, D, F}.
The discrete fuzzy set “High grades” (H) could be
represented as: H = {A/1.0, B/0.7, C/ 0.2, D/ 0, F/ 0}
Table 1: An instance of a Student relation.
FName LName Avg_Marks Attitude
Jeremy Scott A Unhappy
Jenny Wong A Negative
George Yuzwak C Positive
Jose Sanchez B Cheerful
Table 2: Similarity Relation for the Attitude attribute of
the Student relation (Table 1).
Unhappy Negative Positive Cheerful
Unhappy 1 0.8 0.2 0
Ne
g
ative 0.8 1 0 0
Positive 0.2 0 1 0.95
Cheerful 0 0 0.95 1
Alternately, let X be the set of possible ages for a
human being. Then the continuous fuzzy set “about
50 years old” (G) could be represented as: G = {(x,
μ
G
(x)) | x X}, where, μ
G
(x) = 1/(1+((x-50)/5)
4
)
2.2 Fuzzy Data in Relational Databases
Existing literature discusses many different
techniques for representing fuzziness within
relational databases. In general, it seems that the
following ideas are agreed upon: a fuzzy relational
database (FRDB) either allows for queries that let
preferences be expressed instead of exact Boolean
conditions, or allows for the storage and querying of
a new type of data that directly stores fuzzy sets. In
other terms, a FRDB can accommodate two types of
imprecision – impreciseness in the association
among data values or impreciseness in the data
values themselves (Medina et al, 1994). The two
most common techniques used for handling
imprecision are similarity relations and possibility
distributions; or a combination of both.
2.2.1 Similarity-based Techniques
Buckles and Petry (1995) were the first to introduce
the similarity-based relational model. The basis of
this model is the replacement of equality with a
similarity relation. A similarity relation s(x,y) is a
mapping of every pair of elements within the
domain of an attribute to the interval [0,1]. This is
best visualized in the form of a matrix. An example
of this, based on the Attitude attribute of the Student
relation described in Table 1, is given in Table 2.
The matrix illustrates that the similarity relation is
reflexive and symmetric. In this model of FRDB, a
similarity relation is defined over the elements in
each attribute, in each relation. Where a crisp
definition of equality is still desired, the matrix
representation of the similarity relation is reduced to
the identity matrix.
Another feature of the similarity-based FRDB, is
that it allows for non–atomic domain values. In their
model, Buckles and Petry (1995) define that any
member of the power set of the domain may be a
domain value except the null set. This feature allows
uncertainty of data values to be expressed, but is not
in first normal form and suffers the associated
implementation problems. Similarity relations are
best for finite and discrete domains of linguistic sets.
2.2.2 Possibility-based Techniques
Instead of understanding a membership function
μ
F
(x) as the grade of membership of x in F,
possibility-based FRDBs interpret it as a measure of
the possibility that a variable Y has a value x. Such
fuzzy sets are referred to as possibility distributions
and are represented by the symbol . In a
possibility-based FRDB, these possibility
distributions can be used to indicate the possibility
that a tuple has a particular value for an attribute.
For example, if a tuple in a Person table has the
value ‘Young’ for the attribute Age, a possibility
distribution describes the likelihood that such person
has a particular value for the age (Buckles and Petry,
1995):
young
= {1.0/22, 1.0/23, 0.8/24, 0.6/25, ...}
So the likelihood that the Young person is 24
years old is 0.8. This allows the linguistic identifier
to be used as a value in the domain while the actual
possibility distribution is given elsewhere in the
database in the form of a relation having the name of
the linguistic identifier.
ICEIS 2006 - DATABASES AND INFORMATION SYSTEMS INTEGRATION
164
Raju et al (1988), describe two different ways of
implementing a possibility-based FRDB. Each
represents a fuzzy relation r by a table with
additional column for μ
r
(t), showing the membership
of tuple t in r. The first (Type-1) stipulates that the
domain of each attribute is a fuzzy set (recall that a
classical set is a special case of a fuzzy set). Given
crisp values in a relation, there exist membership
functions that map the values to linguistic terms with
associated possibilities. The second implementation
(Type-2) permits more uncertainty in the data
values. It allows for ranges or possibility
distributions to be the actual values of attributes.
This cannot be implemented given current
commercial frameworks for relational databases
since it allows for different data types in the same
column and/or multiple values.
2.2.3 Hybrid Techniques
Other techniques for representing fuzziness in
relational databases have been proposed that include
characteristics of both the similarity-based and
possibility-based models. This allows them to work
with more than one area of imprecision.
An example is GEFRED – a Generalized Model
of Fuzzy Relational Databases. The model allows
for linguistic terms in a column to be related via a
‘proximity relation’, which is identical to the
similarity relation described by Buckles and Petry
(1995). If no proximity relation exists for the
attribute, it is assumed that the classical definition of
equality applies for values in this domain.
3 XML SCHEMAS AND FUZZY
DATA
XML is an excellent method of transmitting data
between software applications. In order for an
application to interpret the XML, certain constraints
must be placed on an XML document. This can be
accomplished by describing classes of XML
documents through XML schemas or Document
Type Declarations (DTDs). The application can then
use the specified XML schema or DTD to parse the
information contained in a specific XML document.
3.1 XML Schemas
An XML schema defines the structure of an XML
document instance. Unlike DTDs, XML schemas
allow for strong data typing, modularization, and
reuse. The XML schema specification allows a
developer to define new data types (using the
<complexType> tag), and also use built-in data
types provided by the specification. The developer
can also define the structure of an XML document
instance and constrain its contents. As well, the
XML schema language supports inheritance, so that
developers do not have to start from scratch when
defining a new schema.
3.2 Fuzzy Relational Data in XML
There has already been some research completed on
representing fuzzy data in XML.
The fuzzy object-oriented modeling technique
(FOOM) schema proposed by Lee et al (2002) is one
such approach. This method builds upon object-
oriented modeling (OOM) to also capture
requirements that are imprecise in nature and
therefore ‘fuzzy’. The FOOM schema defines a class
of XML document that can describe fuzzy sets,
fuzzy attributes, fuzzy rules, and fuzzy associations.
This method would be useful in representing data
contained in object-oriented databases. However, it
is too specific in terms of its object-oriented nature
to be applied directly to relational databases.
Another more general approach is proposed by
Turowski and Weng (2002). The method described
is aimed at creating a common interchange format
for fuzzy information using XML to reduce
integration problems with collaborating fuzzy
applications. XML tags with a standardized meaning
are used to encapsulate fuzzy information. A formal
syntax for important fuzzy data types is also
introduced.
This technique of using XML to represent fuzzy
information is general enough to be built upon to
apply to relational databases. However, it uses
DTDs, rather than the currently accepted method of
XML schemas to define and constrain the
information held in an XML document. It would be
beneficial to extend this approach to define the XML
document class for holding data from fuzzy
relational databases with an XML schema, rather
than a DTD.
4 FROM FUZZY RDB TO FUZZY
XML
In this section, we describe in detail our
implementation of a fuzzy relational database, XML
schema structure and the algorithm to convert
database content to XML document conforming to
the schema.
FUZZY XML MODEL FOR REPRESENTING FUZZY RELATIONAL DATABASES IN FUZZY XML FORMAT
165
Table 3: Example ‘Student’ relation.
ID FNAME LNAME ATTEND AVG ATTITUDE ADVISOR
1 Jeremy Scott 0.56 3.60 Unhappy 1
2 Jenny Wong 0.98 3.87 Motivated 5
3 George Yuzwak 0.80 2.74 Lazy 3
4 Jose Sanchez 0.9 3.20 Cheerful 1
5 Eliza Reichs 0.35 1.87 Lazy 1
4.1 Database Structure
We chose to create a hybrid-type fuzzy relational
database that incorporates both similarity relations,
to represent fuzzy equality, and possibility relations,
which could be used to translate crisp data based on
a number of linguistic terms or to represent a
possibility distribution. Any attribute in a relation
may have an associated fuzzy relation and/or an
associated similarity relation. Each of these can be
joined into a query to retrieve information based on
imprecise conditions. The results of these queries
themselves can then be considered a sort of fuzzy
relation that has all the attributes requested by the
query as well as an attribute that describes tuple’s
membership in the relation.
An example relation, ‘Student’, is illustrated in
Table 3 (an extension of the relation in Table 1). We
suppose that the information stored in ID, FNAME,
LNAME and ADVISOR columns is crisp. The data
in ATTEND and AVG is also crisp, but fuzzy
relations based on linguistic terms are defined on
each. The values within the domain of ATTITUDE
have an associated similarity relation defined to
provide fuzzy equivalence.
4.1.1 Similarity Relations
In our fuzzy relational database model, we allow any
column to have an associated similarity relation,
which assigns all elements in the domain a degree of
similarity to all other elements in the domain.
Normally, this is visually represented in a matrix,
but to construct this matrix in a relation by naming
attributes after each domain element is very
inflexible and difficult to modify if one wanted to
add another element to the domain. Instead, we
flatten the matrix.
Every similarity matrix is named under the
convention: ‘SM_TABLENAME_COLNAME’,
where TABLENAME and COLNAME are the
relation and attribute the similarity matrix applies to,
respectively. Within the similarity relation there are
three attributes: VALUE1, VALUE2, and MATCH.
VALUE1 and VALUE2 hold the combination of
domain values and will be assigned a type according
to the type of the attribute being compared. MATCH
contains the result of the similarity relation s(x,y) for
the pair described in VALUE1 and VALUE2, and so
will contain a value in the interval [0,1].
4.1.2 Fuzzy Relations
Our model also allows any attribute to have an
associated fuzzy relation, which can contain the data
for a number of fuzzy sets defined over the domain
of the attribute. Each fuzzy set is identified by a
linguistic term. If the set is discrete, the fuzzy
relation will contain each ordered pair within the set.
However, if the set is defined by a continuous
function, the fuzzy relation will contain points along
the graph of the function that can be interpolated to
find the exact value of the membership function.
Fuzzy relations are named under the convention:
‘FR_TABLENAME_COLNAME’, where
TABLENAME and COLNAME are the relation and
attribute the fuzzy relation applies to, respectively.
Within the fuzzy relation there are three attributes:
LINGUISTIC_TERM, COLUMN_VALUE, and
MEMBERSHIP. LINGUISTIC_TERM is the word
that describes the meaning of the fuzzy set.
COLUMN_VALUE and MEMBERSHIP can be
interpreted as the (x,y) values of a point on the graph
of the fuzzy set. COLUMN_VALUE is the same
type as the attribute this relation applies to, and
MEMBERSHIP is the result of the membership
function that maps the COLUMN_VALUE to the
unit interval.
Figure 1: XML Schema – Top View.
4.2 XML Schema Structure
The schema structure we developed for representing
our fuzzy relational database in XML provides a
direct relationship between the database and the
resulting XML document. This implementation
allows for an XML representation of the data that is
simple to interpret and query.
The schema defines the outermost element of the
XML document as the Database element. A
Database element contains a name attribute used to
ICEIS 2006 - DATABASES AND INFORMATION SYSTEMS INTEGRATION
166
indicate the name of the fuzzy relational database
and a sequence of Table elements representing each
relation in the database. A Table element also has a
name attribute that will be set to the name of the
table. This outer structure of the XML schema is
represented in Figure 1.
Figure 2: Definition of the Row Element for Storing Table
Records.
Figure 3: Definition of the SimilarityMatrix Element.
The schema further defines a Table element from
the database content related to each relation. This
includes the relation’s row and column data
(database records) and any fuzzy relations or
similarity relations associated with its attributes.
Figure 2 illustrates how record information is
stored in the XML document. The schema defines a
Table element as a complex type composed of Row
elements. A Row element is composed of Column
elements. Each Row element in the XML document
holds a record, whose column values are stored as
the value for each Column element. Column
elements are also described by the name, type, and
nullable attributes.
The database structure stores similarity and fuzzy
relations as separate tables. To keep the XML
document simple, our XML schema stores an
attribute’s fuzzy data along with the table that
contains the attribute. The schema defines a Table
element as a complex type with Row elements for
each record (Figure 2), SimilarityMatrix elements
for similarity relations (Figure 3), and FuzzyRelation
elements for fuzzy relations (Figure 4).
Figure 4: Definition of the FuzzyRelation Element.
The flat conversion algorithm we chose to
convert the fuzzy relational database to an XML
document following our schema structure can be
outlined as follows:
Add Database start tag to the XML document with
the name attribute set to the name of the database.
Set the xsi:schemaLocation to point to the location
of the XML Schema this document is to adhere to.
Retrieve all table names from the database
For each table:
Add Table start tag to the XML document with the
name attribute set to the table name
Query the database to get the column data (name,
type, nullable value) for the current table and then
query for the row data using the column names
For each row:
Add Row start tag to the XML document
For each column:
Add Column start tag to the XML document and set
the name, type, and nullable attributes to their
corresponding values
Set the value of the Column element to the value
retrieved for the current row/column
Add Column end tag to the XML document
Add Row end tag to the XML document
Query the database to get all similarity matrix data
for the current table. A similarity matrix belonging
FUZZY XML MODEL FOR REPRESENTING FUZZY RELATIONAL DATABASES IN FUZZY XML FORMAT
167
to a table is identified by appending ‘SM’ to the
table name, followed by the name of the matrix.
For each similarity matrix:
Add SimilarityMatrix start tag to the XML document
and set the assocColumn and type attributes to their
corresponding values
For each cross reference:
Add crossRef start tag to the XML document
Add Value1, Value2, and Match elements and set
their corresponding values
Add crossRef end tag to the XML document
Add SimilarityMatrix end tag to the XML document
Query the database to get all fuzzy relations for the
current table. A fuzzy relation belonging to a table is
identified by appending ‘FR’ to the name of the
table, followed by the name of the relation
For each fuzzy relation:
Add FuzzyRelation start tag to the XML document
For each linguistic term:
Add LinguisticTerm start tag to the XML document
and set the term attribute to its corresponding value
Add FuzzySet start tag to the XML document
For each point in the fuzzy set:
Add the Point start tag to the XML document
Add x_value and membership elements and set their
corresponding values
Add Point end tag to the XML document
Add FuzzySet end tag to the XML document
Add LinguisticTerm end tag to the XML document
Add FuzzyRelation end tag to the XML document
Add the Table end tag to the XML document
Add the Database end tag to the XML document
5 CONCLUSIONS
We have described a fuzzy XML schema to
represent an implementation of a fuzzy relational
database that allows for similarity relations and
fuzzy sets. We have also provided a flat translation
algorithm to translate from the fuzzy database
implementation to a fuzzy XML document that
conforms to the suggested fuzzy XML schema.
REFERENCES
Anvari, M., Rose G. F., 1987. “Fuzzy Relational
Databases,” Analysis of Fuzzy Information, Bezdek ed.,
Vol II, CRC Press.
Bosc P., Galibourg M. and Hamon G., 1988. “Fuzzy
querying with SQL: extensions and implementation
aspects,” Fuzzy Sets and Systems, Vol. 28, pp.333-349.
Buckles B. P. and Petry F. E., 1995. “Fuzzy Databases in
the New Era,” Proc. of ACM SAC, pp.497-502.
Dey D. and Sumit S., 1996. “A Probabilistic Relational
Model and Algebra,” ACM TODS, Vol.21, pp.339-369.
Duta A., Barker K. and Alhajj R., 2004. “Converting
Relationships to XML Nested Structures,” Journal of
Information and Organizational Sciences, Vol.28,
No.1-2.
Wang C., Lo A. Alhajj R. and Barker K., 2005. “Novel
Approach for Reengineering Relational Databases into
XML,” Proc. of XSDM, (in conjunction with ICDE),
Tokyo.
Fernandez M., Tan W.-C., and Suciu D., 2000.
“SilkRoute: Trading between Relations and XML”.
Proc. of WWW, Amsterdam.
Fong J., Pang F. and Bloor C., 2001. “Converting
Relational Database into XML Document,” Proc. of
the International Workshop on Electronic Business
Hubs, pp.61-65.
Jang J. and Sun C., 1995. “Neuro-Fuzzy Modeling and
Control,” Proc. of the IEEE, 83, pp.378-406.
Yang K.Y., Lo A., Özyer T. and Alhajj R., 2005.
“DWG2XML: Generating XML Nested Tree Structure
from Directed Weighted Graph,” Proc. of ICEIS.
Lee, D., et al, 2002. “NeT & CoT: translating relational
schemas to XML schemas using semantic constraints,”
Proc. of ACM CIKM.
Lee, J., et al, 2002. “Modeling Imprecise Requirements
with XML,” Fuzzy Systems, 2, pp.861-866.
Medina J. M., Pons O. and Vila M. A., 1994. “GEFRED:
A Generalized Model of Fuzzy Relational Databases
Version 1.1,” Information Sciences.
Raju. K. V. and Majumdar, A. K., 1988. “Fuzzy
Functional Dependencies and Lossless Join
Decomposition of Fuzzy Relational Database
Systems,” ACM TODS, Vol.13, pp.129-166.
Thompson H. S., et al, 2004. “XML Schema Part 1:
Structures,” W3C Recommendation.
Turowski K. and Weng U., 2002. “Representing and
processing fuzzy information – an XML-based
approach,” Knowledge-Based Systems, Vol.15, pp.67-.
Wang S., et al, 2001. “Incremental Discovery of
Functional Dependencies from Similarity-bases Fuzzy
Relational Databases Using Partitions,” Proc. of the
National Conference on Fuzzy Theory and Its
Applications, pp.629-636.
Zadeh L., 1965. “Fuzzy Sets,” Information and Control, 8,
pp.338-353.
Zvieli A. and Chen P. P., 1986“Entity-relationship
modeling and fuzzy databases,” Proc. of IEEE ICDE,
Los Angeles, pp.320-327.
Zemankova M. and Kandel A.,1984. “Fuzzy Relational
Data Bases – A key to Expert Systems,” Verlag TUV
Rheinland.
ICEIS 2006 - DATABASES AND INFORMATION SYSTEMS INTEGRATION
168