Unlike XML Schema and Schematron, XincaML
is designed as a constraint specification language
rather than a schema language. Its constraint
expressions are more descriptive and declarative
than those of Schematron, so business rules that
applications need to check can be mapped to XML
data constraints more easily. XincaML concentrates
on descriptively expressing inter-node constraints
that XML Schema can not express. Hence, it is
considered as a helpful supplement of XML
Schema.
As a constraint specification language, XincaML
focuses more on descriptiveness. It not only makes
XincaML more like a natural language but also
enables XML developers to write more optimized
code for efficient constraint handling and to play
with the constraint definition structure itself when
needed. In addition, XincaML also gives users the
flexibility of applying XPath to XincaML to the
extent they like so that they can balance between a
concise expression and a descriptive one.
A XincaML Processor reference implementation
is already available for downloading from IBM
Alphaworks [Ying Nan Zuo, 2002]. It is a Java
package and provides APIs for constraints parsing
and checking. Applications are able to concentrate
on data processing by delegating the data validation
work to the processor. The violation handling
mechanism of the processor, which enables
callbacks of the application specific code for
violation handling, helps application developers
create cleaner program logic.
In the rest of this paper, we’ll first introduce
the basic concepts of XML data constraint, and then
discuss how XincaML expresses the inter-node
constraints and its advantages. The reference
implementation of XincaML Processor and several
usage scenarios are also introduced so as to give a
basic idea of how XML developers integrate
XincaML into their applications. Some future works
are presented in the end of the paper.
2 XML DATA CONSTRAINTS
Handling data constraints has been around for quite
sometime. In a database, data constraints are mostly
part of the database schema. The schema serves for
two purposes. First, it describes the structure or type
of the data; second, it describes certain constraints
including assertion of the keys and inclusion
dependencies. In general, all constraints on data can
be divided into two groups-integrity constraints and
data validity constraints. Integrity constraints (type
constraints, path constraints etc.) describe semantic
integrity of data. Data validity constraints describe
conditions of validity of data. [Ekaterina Pavlova,
2000]
Semi-structured data is a generation of structured
data in a sense, so it has integrity constraints and
data validity constraints similar to those in structured
data. XML data is usually treated as semi-structured
data, thus the constraints in semi-structured data can
mostly be applied to XML data. In practice, most of
real-world logical constraints to data are very
complex and not just pure integrity constraints or
data validity constraints. It is impossible to make a
complete taxonomy of all these constraints. But
some kinds of constraints are most commonly used
by lots of XML applications. It is more valuable to
investigate these kinds of constraints.
In general, the commonly used XML data
constraint can be classified as the following four
categories:
i. Containment structural constraint
(structures): This kind of constraint describes the
basic structure of XML documents such as element
hierarchies, attributes of a element, inheritance for
elements and attributes, cardinality of elements and
so on.
ii. Lexical structural constraint (data types):
This kind of constraint describes data types and data
formats in order to check the domain range of values
of elements or attributes as well as ensure they
follow certain formats.
iii. Integrity constraint (identity constraint):
This kind of constraint describes the reference
relationship between elements or attributes like the
key/foreign key mechanism in the relational
database.
iv. Inter-node constraint (co-constraint): This
kind of constraint describes the presence/value
dependencies between elements or attributes
belonging to the same or different sub-branches of
an XML document tree. It is usually the most
fundamental part of data semantics.
XML Schema as of today has already covered
the first three kinds of constraint, but it lacks of the
capability of expressing the inter-node constraints in
an XML document. XincaML is proposed to
complement it. Before we go into detail about
XincaML, let’s take a closer look at the inter-node
constraints.
First, a small piece of XML data is presented
below serving as an example of XML data that have
inter-node constraints.
<Contacts>
<Person title=”Mr”>
<Name> John Smith </Name>
<Gender>Male </Gender>
</Person>
<Person title=”Ms”>
<Name> Joan Smith </Name>
ICEIS 2004 - INFORMATION SYSTEMS ANALYSIS AND SPECIFICATION
480