XML DATA CONSTRAINT AND XINCAML

Jing Min Xu, Ying Nan Zuo, Shun Xiang Yang, Zhong Tian

IBM China Research Laboratory, 4/F, No.7, 5th Street, Shangdi, Beijing 100085, PRC

Keywords: XML Data Constraint, Constraint Specification Language, XincaML

Abstract: XML is becoming the de facto standard for data exchange. Because it brings structures and semantics to the

conten

ts, it is very important for applications to verify the validity of XML data before further processing.

W3C XML Schema language can specify many of the constraints in XML data, but it lacks of the capability

of expressing application specific inter-node constraints. Therefore XincaML (eXtensible inter-node

constraint Markup Language) is invented as a complement to XML Schema language to specify this kind of

application constraints. XincaML is a descriptive inter-node constraint specification language. XincaML

Processor is a reference Java implementation of the XincaML language parser and constraints checker.

Developers can easily integrate the processor into their applications to handle inter-node constraints besides

validating XML data against XML Schema. XincaML and the processor provide a common mechanism for

applications to describe and process inter-node constraints, thus significantly eliminate the need and labor to

hard-code the constraint handling in applications and speeds up the application development.

1 INTRODUCTION

More and more applications adopt XML [T. Bray,

2000] documents as method of data exchange. The

exchanged XML documents convey not only the

structure of the data but also application specific

semantics of them. It is very important for

applications, especially ones performing critical

commerce transactions, to ensure the validity of the

documents. Invalid data might lead to unexpected

application behavior or even denial of service. Since

XML data structures can be validated already, it is

only natural to be able to validate more XML data

constraints, e.g. relationships between elements or

attributes belonging to different contexts, invariants

over data models, limits over attribute values and so

on.

W3C XML Schema [D. C. Fallside, 2001] [H. S.

ompson, 2001], as a typical grammar-based

schema language [Dongwon Lee, 2000], has taken a

big step in that direction. It allows users to specify

various XML data constraint such as structures of

XML documents and data types of elements or

attributes. However, it is short of the ability to

specify data constraints among elements or attributes

located on different sub-branches of an XML

document tree. We call this kind of constraint as

inter-node constraint. Although such data

relationships in an XML document are almost

everywhere as part of application semantics, they

can’t be specified by current XML Schema language

easily, if at all.

W3C Schema Working Group also noticed the

ter-node constraint problem and considered adding

co-constraints into W3C XML Schema as one of

desiderata in the latest XML Schema requirements

working draft [C. Campbell, 2003]. But we still

think it is too rigid in that sense and diminishes the

flexibility of XML. In addition, it may make the

schema language too difficult to analyze.

In order to specify inter-node constraint, several

con

straint specification languages have been

proposed. Schematron [Rick Jelliffe, 2002] is a

typical one. It is a pattern-based language and more

constraint-oriented than XML Schema [Dongwon

Lee, 2000]. It is able to specify constraint patterns in

XML documents based on the presence, names and

value of elements and attributes along paths.

Schematron completely relies upon XPath [J. Clark,

1999] and XSLT [James Clark, 1999] for defining

context dependent rules. On one hand, it makes

Schematron quite concise. On the other hand, it

makes it more difficult for applications to analyze

the structure and definition of constraints, let alone

optimized code for efficient constraints handling.

This is the main motivation for inventing XincaML

(eXtensible inter-node constraint Markup

Language).

479

Min Xu J., Nan Zuo Y., Xiang Yang S. and Tian Z. (2004).

XML DATA CONSTRAINT AND XINCAML.

In Proceedings of the Sixth International Conference on Enterprise Information Systems, pages 479-485

DOI: 10.5220/0002652604790485

 SciTePress

Unlike XML Schema and Schematron, XincaML

is designed as a constraint specification language

rather than a schema language. Its constraint

expressions are more descriptive and declarative

than those of Schematron, so business rules that

applications need to check can be mapped to XML

data constraints more easily. XincaML concentrates

on descriptively expressing inter-node constraints

that XML Schema can not express. Hence, it is

considered as a helpful supplement of XML

Schema.

As a constraint specification language, XincaML

focuses more on descriptiveness. It not only makes

XincaML more like a natural language but also

enables XML developers to write more optimized

code for efficient constraint handling and to play

with the constraint definition structure itself when

needed. In addition, XincaML also gives users the

flexibility of applying XPath to XincaML to the

extent they like so that they can balance between a

concise expression and a descriptive one.

A XincaML Processor reference implementation

is already available for downloading from IBM

Alphaworks [Ying Nan Zuo, 2002]. It is a Java

package and provides APIs for constraints parsing

and checking. Applications are able to concentrate

on data processing by delegating the data validation

work to the processor. The violation handling

mechanism of the processor, which enables

callbacks of the application specific code for

violation handling, helps application developers

create cleaner program logic.

In the rest of this paper, we’ll first introduce

the basic concepts of XML data constraint, and then

discuss how XincaML expresses the inter-node

constraints and its advantages. The reference

implementation of XincaML Processor and several

usage scenarios are also introduced so as to give a

basic idea of how XML developers integrate

XincaML into their applications. Some future works

are presented in the end of the paper.

2 XML DATA CONSTRAINTS

Handling data constraints has been around for quite

sometime. In a database, data constraints are mostly

part of the database schema. The schema serves for

two purposes. First, it describes the structure or type

of the data; second, it describes certain constraints

including assertion of the keys and inclusion

dependencies. In general, all constraints on data can

be divided into two groups-integrity constraints and

data validity constraints. Integrity constraints (type

constraints, path constraints etc.) describe semantic

integrity of data. Data validity constraints describe

conditions of validity of data. [Ekaterina Pavlova,

2000]

Semi-structured data is a generation of structured

data in a sense, so it has integrity constraints and

data validity constraints similar to those in structured

data. XML data is usually treated as semi-structured

data, thus the constraints in semi-structured data can

mostly be applied to XML data. In practice, most of

real-world logical constraints to data are very

complex and not just pure integrity constraints or

data validity constraints. It is impossible to make a

complete taxonomy of all these constraints. But

some kinds of constraints are most commonly used

by lots of XML applications. It is more valuable to

investigate these kinds of constraints.

In general, the commonly used XML data

constraint can be classified as the following four

categories:

i. Containment structural constraint

(structures): This kind of constraint describes the

basic structure of XML documents such as element

hierarchies, attributes of a element, inheritance for

elements and attributes, cardinality of elements and

so on.

ii. Lexical structural constraint (data types):

This kind of constraint describes data types and data

formats in order to check the domain range of values

of elements or attributes as well as ensure they

follow certain formats.

iii. Integrity constraint (identity constraint):

This kind of constraint describes the reference

relationship between elements or attributes like the

key/foreign key mechanism in the relational

database.

iv. Inter-node constraint (co-constraint): This

kind of constraint describes the presence/value

dependencies between elements or attributes

belonging to the same or different sub-branches of

an XML document tree. It is usually the most

fundamental part of data semantics.

XML Schema as of today has already covered

the first three kinds of constraint, but it lacks of the

capability of expressing the inter-node constraints in

an XML document. XincaML is proposed to

complement it. Before we go into detail about

XincaML, let’s take a closer look at the inter-node

constraints.

First, a small piece of XML data is presented

below serving as an example of XML data that have

inter-node constraints.

<Name> John Smith </Name>

</Person>

<Name> Joan Smith </Name>

ICEIS 2004 - INFORMATION SYSTEMS ANALYSIS AND SPECIFICATION

480

<Gender>Female </Gender>

</Person>

</Contacts>

At least one can find one inter-node constraint

here that if the title of a person is “Mr”, then the

gender of him must be “Male”. It’s hard to express

this data constraint using XML Schema language.

In order to depict the details of inter-node

constraint more clearly, let us simplify the XML

data model first. In most cases, what applications

care are the values and relationships of elements or

attributes in an XML document. So, in XincaML, we

only examine element nodes (and their values) and

attribute nodes (and their values).

Figure 1: A Visual Representation of Four Types of

Inter-node Constraints

The figure 1 provides a visual representation of

four types of inter-node constraints in an XML

document tree. The element node, attribute node,

element value and attribute value are represented as

EN, AN, EV and AV respectively.

The inter-node constraints are divided into the

following four types:

i. Presence-Presence Constraint (PPC): The

presence of a set of nodes depends on the presence

of another set of nodes.

ii. Presence-Value Constraint (PVC): The

presence of a set of nodes depends on the values of

another set of nodes.

iii. Value-Presence Constraint (VPC): The

values of a set of nodes depend on the presence of

another set of nodes.

iv. Value-Value Constraint (VVC): The values

of a set of nodes depend on the values of another set

of nodes.

Since the nodes involved in an inter-node

constraint do not necessarily have any direct

structural relationships among them, it is difficult for

a grammar-based schema language such as W3C

XML Schema to express these constraints.

It is a general practice for application developers

to validate data against structure definitions. They’d

like to have as much application semantics checked

as possible before the data is further processed. A

common expression of the inter-node constraints

enables application developers to relegate the

constraint checking work to any third party tools so

that they can focus on the application logic.

XincaML is such a common mechanism for

expressing the inter-node constraints.

3 XINCAML LANGUAGE AND

PROCESSOR

First of all, XincaML is in XML syntax, which

brings several benefits:

1) Users do not have to learn another proprietary

syntax.

2) The expressed constraints can be readily

applied to existing XML applications.

3) The expressed constraints can be stored in an

XML storage system along with XML documents

they are applied to.

4) The expressed constraints are extensible.

A XincaML document may contain one or more

constraint declarations. The <constraint> element is

used to declare a constraint. Multiple constraints can

be grouped together with the element <constraints>

as a constraint group. Two attributes are defined for

the <constraint> element. One is the “name”

attribute, which is used to uniquely identify a

constraint within a constraint group. The other is the

“context” attribute, which specifies the scope of a

constraint. The context attribute is required to be

expressed as an XPath path expression. For example,

the following constraints are defined:

<xinca:constraints>

<xinca:constraint name=”titleConstraint”

context=”/Contacts/Person”>

… …

</xinca:constraint>

<xinca:constraint name=”otherConstraint”

context=”/Contacts/Other”>

… …

XML DATA CONSTRAINT AND XINCAML

481

<xinca:constraint>

</xinca:constraints>

The constraint “titleConstraint” and

“otherConstraint” are used to constrain the nodes

located in the different sub-trees of an XML

document: the “/Contacts/Person” sub-tree for the

former and the “/Contacts/Other” sub-tree for the

latter.

Each constraint is expressed as a rule, which

usually contains three segments: <if>, <then> and

<action>. The <if> and <then> segments are both

required to state an assertion. If the assertion in the

<if> segment holds true, the assertion in the <then>

statement is also required to be true. If the assertion

in the <then> statement is not true, the constraint is

violated. However, if the assertion in the <if>

segment holds false, the constraint will be ignored

by the constraint checker. The <action> segment

acts as a placeholder for applications to insert their

particular constraint handling mechanism when the

constraint is violated. XincaML defines the

<message> element as a default violation handling

method: reporting a message.

Each assertion states which nodes are present

(presence assertion) or their values satisfy an

expression (value assertion). The expression should

be any expression which evaluation is a Boolean

value such as a logical expression, a comparison

expression or a logical expression. [Ying Nan Zuo,

2002] For example, the constraint “titleConstraint”

can be expressed as follows:

<constraint name="titleGenderConstraint"

context="/Contacts/Person">

<if>

<eq>

</eq>

</satisfy>

</assert>

</if>

<then>

<eq>

</eq>

</satisfy>

</assert>

</then>

If the title is "Mr." then the gender of the person

must be "Male".

/message>

</action>

</constraint>

This constraint specifies that if the title of a

person is “Mr”, then the gender of the person must

be “Male”. Thus, XincaML is quite straightforward

due to its descriptiveness. However, the side effect

is that it makes XincaML sort of verbose. In order to

resolve this problem and fully leverage the

functionality of XPath, XincaML allows expressing

assertions or rules as XPath expressions. For

examples, the “titleConstraint” can be expressed as

follows:

<constraint name="titleGenderConstraint"

context="/Contacts/Person">

<xpath exp="(@Title = 'Mr' and Gender = 'Male') or

@Title != 'Mr' ” />

the title is "Mr." then the gender of the person

must be "Male".

</message>

</action>

</constraint>

This feature gives users the flexibility of

applying XPath expressions to XincaML to the

extent they feel comfortable. But expressing

assertions or rules as XPath expressions brings the

following two disadvantages:

1) More devious for users to map business rules

to XincaML constraints if they are not comfortable

with XPath.

2) More difficult for applications to analyze the

structure of XincaML constraints themselves.

So users have to balance the advantages of

expressing assertions or rules as XPath expressions

against the disadvantages based on application

scenarios as well as XincaML processor they intend

to use.

XincaML enhanced XML data validation is a

multiple-stage process that begins with W3C XML

Schema validation, followed by XincaML checking

to handle the inter-node constraints. Like XML

parsers, which relieve applications of validating

XML documents based on predefined schemas,

XincaML processors relieve applications of parsing

and checking inter-node constraints expressed by

XincaML.

A reference XincaML processor implementation

is available from IBM Alphaworks. It contains two

Java packages:

1) Constraint Parser, which parses XincaML

constraints into a XincaML object tree so that

ICEIS 2004 - INFORMATION SYSTEMS ANALYSIS AND SPECIFICATION

482

applications are able to navigate the tree to know the

structures of each constraint.

2) Constraint Checker, which checks XincaML

constraints against an XML document based on the

constructed XincaML object tree.

The architecture of the XincaML Processor is

depicted in the figure 2:

Figure 2: XincaML Processor Architecture

The XincaML Processor has a violation handling

mechanism that allows application developers

registering a violation handler for each constraint by

specifying a <action> element to add custom

application specific processing. For example:

<constraint name="titleGenderConstraint"

context="/Contacts/Person">

… …

<java:class>xincaparser.SampleHandler

</java:class>

<java:method>showMessage</java:method>

</handler>

</action>

</constraint>

For the reference implementation, when a

constraint violation occurs, the Constraint Checker

will call the corresponding violation handler by use

of Java reflection mechanism. The reference

implementation also contains a default violation

handler. An error message will be generated as

return result if the checker does not find a registered

violation handler.

In order to boost performance, the XincaML

processor constructs an in-memory dependency

graph that records which nodes are relevant to which

constraints when parsing XincaML constraints.

Therefore, it allows applications only checking

constraints that are relevant to a specific node when

its value changes. However, the dependency graph

does not include the constraints that express

assertions or rules as XPath expressions because

XincaML does not reveal their structures.

4 USAGE SCENARIOS OF

XINCAML

A typical use of XincaML processor is to use it as a

guardian for XML applications. Any XML data must

be validated first before they are fed into an

application. If the processor tells the application that

everything is ok, then the application goes on.

Otherwise, the application should stop and deal with

the data constraint violations through predefined

violation handlers. The XincaML processor allows

XincaML constraints to be embedded in

xs:annotation / xs:appinfo W3C XML Schema

elements, seamlessly combining W3C XML schema

and XincaML. Thus it makes the XincaML

enhanced XML Schema language compatible with

conventional XML Schema validation, and allowing

applications to put in additional validation and

processing when appropriate.

In some cases, applications may not be willing to

be stopped by the processors. They just want the

processors to go through the input data, collect all

the information about violated elements and

attributes and then report to them. The applications

may handle the constraint violations themselves later.

XincaML Processor provides several violations

reporting APIs [Ying Nan Zuo, 2002] to offer

applications such kind of choices. With these APIs,

an application can exactly locate the elements and

attributes that violate the constraints.

In many visual modeling tools, a lot of

application specific constraints (including syntax

constraints and semantic constraints) exist among

drawing elements. Usually, people use this kind of

tools as thinking tools. They don’t like to be

interfered by such constraint details. It’s better to

deal with these constraints when the main body of

the model is finished. XincaML can help a lot in this

tools. For example, a state machine modeling tool

which represents a state machine as an XML

document, may use XincaML language to specify

various constrains between states in a state machine

and use XincaML processor to assist the well-form

checking of the machine. Application specific

constraints such as “There should be one and only

one starting state in a state machine”, “The transition

target of a state must be an existing state in the state

machine” and so on, are all mapped into XincaML

constraint specification. After drawing a state

machine using the modeling tool, a user can invoke

the well-form checking function. The tool will

XML DATA CONSTRAINT AND XINCAML

483

delegate the checking work to XincaML processor.

Then, all constraints violations are reported through

violations reporting APIs. The modeling tool can

make use of these APIs to locate the violated states

and highlight them to the user. Figure 3 illustrates

the use of XincaML processor in this modeling tool.

Figure 3: The Usage of XincaML Processor in State

Machine Modeling Tool

In other cases, applications may want to know

the detail syntax of each inter-node constraint so that

they can generate their own codes to do the

constraint checking. In these cases, only the

XincaML language parser is needed. XincaML

Processor Java package provides users a lot of

classes, such as XincaParser class, XincaConstraint

class, XincaAssert class, XincaLogicalExpression

class and so on, to manipulate each component of a

constraint definition. [Ying Nan Zuo, 2002]

To demonstrate this kind of usage scenario, we

take D3From [Shun Xiang Yang, 2003] as an

example. D3From is a dynamic Web form generator,

which is used in an adaptive profiling framework for

service provisioning [Shun Xiang Yang, 2003]. The

framework provides flexible profiling mechanism to

dynamically collect relevant information about

service consumers according to diverse profiling

requirements of different services in an e-Commerce

platform. D3Form is used to generate different Web

forms for different services according to their

profiling requirements. The service profiling

requirements are defined by use of XML Schema

plus XincaML language. D3From takes the schema

part as data and HTML form definitions, and takes

XincaML part as constraints among the data

elements that will be collected by the form. Figure 4

shows some details.

The HTML Form Generator is used to map XML

schema into HTML form controls and apply XSL

document on the generated form to control its

appearance. The JavaScript Generator makes use of

XincaML APIs to parse the XincaML constraint

specifications and convert them to JavaScript codes.

The generated scripts serve as inter-node constraint

checker in client side when someone fills in the

forms with data. Further more, if you specify the

dependencies between form controls with PPC

(Presence-Presence Constraint) or VPC (Value-

Presence Constraint) style of constraints, the

generated scripts are able to change the layout of the

form based on users’ input.

Figure 4: The using of XincaML in D3Form Processor

ICEIS 2004 - INFORMATION SYSTEMS ANALYSIS AND SPECIFICATION

484

5 CONCLUSION AND FUTURE

WORK

XincaML can describe some inter-node constraints

that can’t be expressed by the current XML schema

language and therefore supplement it to capture

application specific data constraints. The purpose of

XincaML is to specify constraints that most often

used by XML applications. For applications that

require other arbitrary or complicated data

constraints, they must perform their own additional

validations.

XincaML Processor can validate XML

documents against constraints specified in XincaML

instances. The advantage of using XincaML in

applications is that many of the constraints that

previously had to be checked in applications can

now be moved out of them and delegated to

XincaML Processor. It can make users concentrate

on the application processing logic itself and thereby

create clearer application logic models.

Currently, we are thinking of adding some new

features to XincaML. For example, supporting more

logic expressions and constraints, checking of

implication and conflicts among several constraints

specified in one XincaML instance and applying

constraints to multiple XML documents.

ACKNOWLEDGEMENT

We would like to thank Bob Schloss

[rschloss@us.ibm.com], of IBM Watson Research

Center, for his great contribution to the concept of

XincaML.

REFERENCES

T. Bray, J. Paoli, C. M. Sperberg-McQueen and E. Maler,

2000. Extensible Markup Language (XML) 1.0

(Second Edition). W3C Recommendation,

http://www.w3.org/TR/REC-xml, World Wide Web

Consortium, Oct. 2000.

D. C. Fallside, 2001. XML Schema Part 0: Primer. W3C

Recommendation, http://www.w3.org/TR/xmlschema-

0, World Wide Web Consortium, May. 2001.

H. S. Thompson, D. Beech, M. Maloney and N.

Mendelsohn, 2001. XML Schema Part 1: Structures.

W3C Recommendation,

http://www.w3.org/TR/xmlschema-1, World Wide

Web Consortium, May. 2001.

Dongwon Lee, Wesley W. Chu, 2000. Comparative

Analysis of Six XML Schema Languages. ACM

SIGMOD Record, Vol. 29, No. 3, Sep. 2000.

C. Campbell, Ashok Malhotra and Priscilla Walmsley,

2003. Requirements for XML Schema 1.1. W3C

Working Draft,

http://www.w3.org/XML/Group/2002/07/xmlschema-

1.1-current-reqs.html#N4000FC, Jan. 2003.

Rick Jelliffe, 2002. The Schematron Assertion Language

1.5.

http://www.ascc.net/xml/resource/schematron/Schema

tron2000.html, Oct. 2002

J. Clark and S. DeRose, 1999. XML Path Language

(XPath) Version 1.0. W3C Recommendation,

http://www.w3.org/TR/xpath, World Wide Web

Consortium, Nov. 1999.

James Clark, 1999. XSL Transformations (XSLT) Version

1.0. W3C Recommendation

http://www.w3.org/TR/xslt, World Wide Web

Consortium, Nov. 1999.

Zuo Ying Nan, Jing Min Xu and Shun Xiang Yang, 2002.

XincaML Technology. IBM Alphaworks,

http://www.alphaworks.ibm.com/tech/xincaml. Dec,

2002.

Ekaterina Pavlova , Igor Nekrestyanov and Boris

Novikov, 2000. Constraints for Semistructured Data.

Proc. of the Russian DL'2000, 214-219, Protvino,

Russia, September 2000.

Shun Xiang Yang, Ying Nan Zuo , Jing Min Xu, Zhong

Tian, 2003. Adaptive Profiling Framework and System

for Service Provisioning. IEEE Conference on

Electronic Commerce, USA 2003

C. Nentwich, W. Emmerich and A. Finkelstein, 2001.

Static Consistency Checking for Distributed

Specifications. Proceedings of the 16th International

Conference on Automated Software Engineering

(ASE), Coronado Island, CA, IEEE Computer Science

Press, pages 115-124. November 2001.

Eric van der Vlist, 2001. Comparing XML Schema

Languages.

http://www.xml.com/pub/a/2001/12/12/schemacompar

e.html, Dec. 2001

Will Provost, 2002. Beyond W3C XML Schema.

http://www.xml.com/pub/a/2002/04/10/beyondwxs.ht

ml, Apr. 2002.

M. H. Jacinto, G. R. Librelotto, J. C. Leite Ramalho and P.

R. Henriques, 2002. Constraint Specification

Languages: comparing XCSL, Schematron and XML-

Schemas. XML EUROPE 2002, May 2002.

Ramalho, José C. and Henriques, Pedro R, 2001.

Constraining Content: Specification and Processing.

XML Europe 2001, International Congress Centrum

(ICC), Berlin, Germany. 2001

XML DATA CONSTRAINT AND XINCAML

485