data:image/s3,"s3://crabby-images/759c8/759c879169ff4724ac9564ec1cb59e19b177eb30" alt=""
on different sets of tags and a schema for represent-
ing different kinds of data. In this way XML can be
adopted as a framework for exchanging knowledge
information among data mining systems.
Although some work have already been proposed
in this context such as PMML (Data Mining Group,
2003), our approach is a contribution since we con-
centrate on the representation of logic formulas. In
doing this, we allow the user working on data mining
to express data (i.e., facts over a database schema),
database constraints (such as functional dependency)
and extended association rules (i.e., patterns more so-
phisticate than those allowed by the association rule
module of PMML). Moreover, as XRM is built over
XML Schema it can ensure a certain level of correct-
ness of the data and patterns by defining integrity con-
straints. The following example shows that XRM can
simplify the task of describing sophisticate associa-
tion rules.
Example 1.1 Consider the association rule R1: 80% of
the customers who buy bread and butter, also buy milk
(i.e., confidence=0.80). Moreover, this holds in 20% of
the transactions (i.e., support=0.20). Rule R1 can be
easily represented in PMML. Now consider the associ-
ation rule R2, with confidence=0.50 and support=0.25:
Customers buying dairy products in June are professors
doing local shopping. The representation of R2 in PMML
is extremely complicate, artificial and time consuming,
since we need firstly to combine several tables into a
universal table, then to transfer it to a binary table. The
reason for this is that the DTD proposed by PMML
cannot express predicates, variables or quantifiers. To
express R2 we need a tool capable of expressing the
logic formula (∀x, ∃p, ∃v
1
, ∃y
1
, ∃s
1
, ∃v
2
, ∃y
2
, ∃s
2
)
(Cust(x, p, v
1
) ∧ Sale(x, y
1
, s
1
, ”June”) ∧
P rod(y
1
, ”dairy”) ⇒ (Cust(x, ”professor”, v
2
) ∧
Sale(x, y
2
, s
2
, ”June”) ∧ Store(s
2
, v
2
)). ¤
The main contributions of this paper are:
• The introduction of an XML-based language to rep-
resent logic formulas. This language allows the repre-
sentation of data, constraints and patterns (as logical
formulas). Thus, XRM offers a flexibility for systems
that extend the association rule mining by considering
not only simple rules over a unique database relation
but also queries involving base relations and views. It
is a general framework to express any system on logic
formulas (not only association rules).
• The possibility given to the mining systems to
present their results in an exchangeable format, mak-
ing it easy for other tools to work on them. In fact,
different tasks can be accomplished over the results
of a mining process: their integration to the results
coming from other mining systems, their use in other
mining process, their graphical representation, their
manipulation by XML query languages, etc.
• The use of XML Schema (W3C, 2001) in the defini-
tion of XRM that ensures a certain level of correctness
of data to be mined and allows the automatic verifica-
tion of patterns produced by mining systems.
This paper is organized as follows. Section 2 dis-
cuss some related work. Section 3 recalls the data
mining concepts to be represented in our language. In
Section 4, XRM is presented by a general schema and
we discuss some of its details. Section 5 concludes
with some further work.
2 RELATED WORK
The Predictive Model Markup Language (PMML)
proposed by (Data Mining Group, 2003) is a format
to exchange patterns among systems. It is an XML-
based language, built over a DTD, for describing data
mining models. Part of this DTD concerns rule mod-
els and focus on association rules. This specification
is however restricted to relatively simple rule mod-
els, since only transactions with a single attribute can
be represented. Variables, negation, quantifiers, con-
nectives or multi-dimensional association rules can-
not be expressed by PMML. The model Rule Model
proposed by (Wettschereck and M
¨
uller, 2001) modi-
fies the DTD of PMML for association rules. In this
way, it can describe multi-relational association rules.
In that approach, an association rule consists of a set
of literals but quantifiers and connectives cannot be
represented.
Contrary to the above approaches, XRM is built
over XML Schema. Thus, XRM can impose con-
straints that are not possible to express when dealing
with a DTD. Moreover, as XRM allows the represen-
tation of logic formulas, sophisticate multi-relational
association rules can be treated.
XDM (Meo and Psaila, 2003) uses XML as a unify-
ing framework for inductive databases (Imielinski and
Mannila., 1996) and, more generally, for knowledge
discovery systems. It is devised to capture the KDD
process and allows the storage of the derivation pro-
cess (described by statements). In fact, XRM can be
seen as a complement for XDM: as XDM is indepen-
dent of a specific format for data and pattern, one can
consider that data and patterns represented by XRM
documents might be stored in an inductive database
based on XDM.
3 MINING DATABASES
We assume that the reader is familiar with the bases
of relational databases and first order logic. We just
recall some definitions and notations used in this pa-
per. A relation schema is a relation name R and a
database schema is a nonempty finite set R of re-
lational schemas. In the named perspective, names
of attributes are considered and a tuple u (with sort
ICEIS 2004 - DATABASES AND INFORMATION SYSTEMS INTEGRATION
442