
implementors alike; this can then be used for im-
provements in planning and implementation. Ide-
ally, the process will result in a feedback loop.
3. A point that is particularly important is that proto-
typing eases the integration of subsystems. Since
software systems, and especially Web-based ones,
consist of various logical and physical layers, it is
highly desirable to develop and improve the differ-
ent subsystem independently of each other as far as
possible.
In this paper we outline a framework for XML
database application development which addresses
these issues. In a stepwise refinement prototyping
process, we move from general-purpose XML tools
deployed in the first step to tools that allow domain
modeling in the refinement steps. The first step in the
development chain consists of using XQuery (Cham-
berlin et al., 2001) on a file-based storage back-
end. Subsequent steps include switching to a DBMS
with automated XML-to-database mapping, annota-
tions with domain knowledge, and, eventually, using
modeling techniques to both ensure efficient data ac-
cess as well as data integrity through the specification
of constraints. Thus, prototyping goes through several
refinement steps and employs more and more specific
tools, which is made possible by increasingly draw-
ing benefit from domain-knowledge. The final step is
then a database which does not consist of automated
mappings with surrogate identifiers anymore but of a
E/R-type data model (Thalheim, 2000). To achieve
this, we have defined a mapping language that en-
sures smooth interaction of XML tools and relational
databases. We have employed this development pro-
cess during development of an example application,
and we give performance numbers and improvements
for the prototype through the steps in the develop-
ment process. The implementation was carried out
in a number of student projects.
The organization of the rest of this paper is as fol-
lows. In Section 2 we give an overview of related
work. Section 3 describes the general layout of our
framework. In Section 4 we describe in detail the dif-
ferent steps of the development process. In Section 5
we present a number of measurements that reflect the
performance characteristics of the different steps and
hints at some trade-offs. to be considered. In Sec-
tion 6 we discuss the use of the framework in the con-
text of document-centric XML documents. Finally, in
Section 7, we conclude the paper and outline topics
for future research.
2 RELATED WORK
The general validity of rapid prototyping for XML ap-
plications has been demonstrated in various industry
Frontend
Backend
WWW
Figure 1: General setting of our research.
projects (see, e.g., (e-XMLmedia, )), although usually
only one prototype is developed in order to demon-
strate both the feasibility of the undertaking and the
user interface. In our framework, this is equivalent to
the first step as laid out below.
In (Orsini and Celentano, 2002), Orsini and Celen-
tano propose a development environment that can aid
data engineers in mapping between database schemas
and XML DTDs. This process is essentially bidirec-
tional: it enables data transfer between the two sides,
and the generation of programs and DTDs for execut-
ing, validating and safe-guarding the data exchange
process. Furthermore in (Florescu and A. Gr
¨
unhagen,
2003), the authors present a language for implement-
ing middleware functionality like Web services that
could also play a role in the API specification that is
part of our framework.
With respect to databases, there have been sev-
eral studies on mapping from XML to relational ta-
bles, and how to query and store in a RDBMS based
on these mappings. For example, in (Florescu and
Kossmann, 1999), Florescu and Kossmann present
mappings from XML to general relational tables;
in (Schmidt et al., 2000), Schmidt et al. present a
data and an execution model that allow for efficient
storage and retrieval of XML documents in a rela-
tional database based on binary associations. The
main problem of mapping from XML to relational ta-
bles, is in order to achieve good performance differ-
ent mappings are needed for different data and work-
loads. In order to solve this problem, Bohannon et
al. (Bohannon et al., 2002) developed a cost-based
XML storage mapping engine that is based on mod-
els of XML schema, data statistics and workload tries
to find the best mapping for a given application ac-
cording to a cost model. In (Freire and Sim
`
eon, 2002)
the authors propose an implementation framework for
the implementation of these considerations. In (Shan-
mugasundaram et al., 1999), a mapping that ‘imitates’
E/R modeling on top of XML documents is presented;
it is a variation of one of the mapping we also use in
our implementation and performance study.
The reverse process, generating and publishing
XML data from relational sources in addressed, for
example, in the Agora system (Manolescu et al.,
2000); there, XML is employed as the user interface
format, while relational tuples are used to represent
RAPID XML DATABASE APPLICATION DEVELOPMENT
371