DATA CONCERN AWARE QUERYING FOR THE
INTEGRATION OF DATA SERVICES
Muhammad Intizar Ali, Reinhard Pichler
Database and Artificial Intelligence Group, Vienna University of Technology, Vienna, Austria
Hong-Linh Truong, Schahram Dustdar
Distributed Systems Group, Vienna University of Technology, Vienna, Austria
Keywords:
Data Services, Database as a Service, Data Services Integration, Data Concerns, XQuery.
Abstract:
There is an increasing trend for organizations to publish data over the web using data services. The published
data is often associated with data concerns like privacy, licensing, pricing, quality of data, etc. This raises
several new challenges. For instance, it must be ensured that data consumers utilize the data in the right
way and are bound to the rules and regulations defined by the data owner and data service provider. Current
Data Integration systems using data services lack the ability to preserve data concerns while querying multiple
services in an integrated environment. In this paper, we design a new querying system which takes data
concerns into account. It supports automatic service selection based on data concerns and perfectly fits into
dynamic data integration applications.
1 INTRODUCTION
More and more data providers are reaping the bene-
fits of Web 2.0 technology and provide their data on
the web either through web services, APIs (REST/-
SOAP), or data services also referred to as Data
as a Service (DaaS) (Dan et al., 2007; Hacig
¨
um
¨
us
et al., 2002). Data services combine the strength
of database systems and query languages on the one
hand with the benefits of service-oriented architecture
on the other hand. Many big companies have started
to publish their data via query interfaces (rather than
simple forms in html) so that the data can be eas-
ily reused, composed and integrated with other data
sources. Amazon Public Data Sets on AWS
2
, Google
Squared
3
and UN data API’s are some prominent ex-
amples of publicly available data services.
Data services are increasingly used for data inte-
gration. Many tools and techniques are available to
dynamically compose, integrate and execute differ-
ent data services or sources
4
(Mykletun and Tsudik,
This work was supported by the Vienna Science and
Technology Fund (WWTF), project ICT08-032.
2
http://aws.amazon.com/publicdatasets/
3
http://www.google.com/squared/
4
http://virtuoso.openlinksw.com
2006). These tools help to create situational applica-
tions by composing existing data services. The data
thus published and processed is often associated with
data concerns like privacy, licensing, pricing, quality
of data, etc. Hence, data integration tools not only
have to mitigate the heterogeneity in data formats and
query languages. In addition, also the various data
concerns should be preserved when data is published
and utilized. Moreover, data service selection and
data selection should be based on these data concerns.
Consider for example, a meta-search query (a query
that is posed against many data sources and selects
the best possible integrated results among them). A
user query will be executed on multiple data services
registered at the integrated application. After integrat-
ing the results of all the data services, usually top-k
results (where k is a constant value defined by the ap-
plication) are returned. Now consider that different
users have different priorities for the data selection,
e.g., one user may be more interested in quality of
data while another user is more concerned about the
pricing. There is a clear need for an explicit system
that (semi)automatically selects the most appropriate
data service as well as data items for each user ac-
cording to various data concerns.
Some data concerns like data quality, privacy, and
quality of service (QoS) have long been studied in
111
Ali M., Pichler R., Truong H. and Dustdar S..
DATA CONCERN AWARE QUERYING FOR THE INTEGRATION OF DATA SERVICES.
DOI: 10.5220/0003548401110119
In Proceedings of the 13th International Conference on Enterprise Information Systems (ICEIS-2011), pages 111-119
ISBN: 978-989-8425-53-9
Copyright
c
2011 SCITEPRESS (Science and Technology Publications, Lda.)
their respective domains of databases, data mining
and web services. However, data services are dif-
ferent. Recently, the importance of distinguishing
data services from web services has been recognized
(Truong and Dustdar, 2009). For instance, while li-
censing and quality of data are usually static for web
services, they are dynamic for data services. Indeed,
as the data gets “older”, the licensing and the data
quality (of which the up-to-dateness may be an im-
portant aspect) will most probably change. Moreover
data concerns can be dynamically updated. Hence
static information about usage permission or privacy
cannot deal with the requirements of the dynamic in-
tegration application created by composing data ser-
vices on the fly. Hence, new techniques are required
to integrate data concerns into data services.
Among the various data concerns, privacy has re-
ceived most attention. It has been studied in many
different areas like data integration (Bhowmick et al.,
2006; Clifton et al., 2004), data mining (Zhang and
Zhao, 2007), and web services (Kobsa, 2001; Creese
et al., 2009). In (Mrissa et al., 2010) and similarly
in (McSherry, 2009), privacy has been studied in the
context of data services. However a systematic inte-
gration of data concern awareness into data services is
still missing to date. Moreover, previous approaches
of dealing with data concerns like privacy do not di-
rectly integrate the data concern awareness into the
query language, even though this would be very im-
portant for enabling the querying system to select the
best suited data source for various parts of a given
query. The goal of this work is to design a querying
system (i) which can take arbitrary data concerns into
account, (ii) which integrates the data concern aware-
ness into the query language, and (iii) which automat-
ically selects the appropriate data sources depending
on current context, user requirements and data con-
cerns.
Structure and Summary of Results. The main re-
sults of this paper are as follows.
Roles and Concerns. In Section 2, we lay the
foundations of our study by identifying the actors in-
volved in data services and their specific concerns.
In contrast to common web services, there are nor-
mally three actors involved, namely consumer, ser-
vice provider, and also a data provider.
Models. In Section 3, we present four possible
models of data concern aware querying. We discuss
the characteristics and virtues of each model and se-
lect the best suited one for our system.
System. In Section 4, we come up with our data
concern aware querying system. We describe how
meta-data is organized and stored by this system and
how data concern awareness is integrated directly into
the XQuery language.
Implementation and Evaluation. We report on an
implementation of our data concern aware querying
system and an evaluation on benchmark tests in Sec-
tion 5.
Conclusion. Finally, in Section 6, we give a con-
clusion and point out directions for future work.
Related Work. In (Truong and Dustdar, 2009), the
authors give an overview of data concerns and dis-
cuss the different parties and their roles in data ser-
vice creation and utilization. As mentioned above,
there have been attempts to incorporate privacy con-
cerns into data services (Mrissa et al., 2010; Mc-
Sherry, 2009). Data source selection has been long
discussed in the database and information retrieval
community and different algorithms have been de-
veloped for optimal selection of the database (French
et al., 1999). Different frameworks and techniques
are available for the best service selection in a web
service environment (Maximilien and Singh, 2004).
In (Boulakia et al., 2004), the authors consider user
preferences for data source selection while (Liu et al.,
2004) used QoS attribute for the service selection. Se-
lection of data services based on data concerns has
not been considered so far. A query language exten-
sion method has been used to provide additional func-
tionality for the integrated applications. A framework
for data quality aware queries is presented by extend-
ing SQL query language (Yeganeh et al., 2009) while
privacy aware querying language has been designed
for preserving privacy in distributed query evaluation
(Farnan et al., 2010). To the best of our knowledge,
there is no querying system available for data concern
aware querying to integrate data services.
2 DATA SERVICES: ROLES AND
DATA CONCERNS
The concept of data services is based upon the
service-oriented architecture (SOA), which includes
standardized processes for accessing data “where
it resides” irrespective of the platform. Data ser-
vices take advantage of service-oriented architecture
to offer users a mediator for integrating information
from database systems and other structured or non-
structured data sources. Data services thus realize
a layer of software between the physical, distributed
data sources and the applications or services which
want to access the data. The data is exposed to the
customer via a virtual data model. It is the responsi-
bility of the data service to connect to the back-end
ICEIS 2011 - 13th International Conference on Enterprise Information Systems
112
Consumer
Service Provider
Or Developer
Results
SemanticsDescription
Description Semantics
Parties “become known” to each other
Semantics
Description
Data Provider
Policy
Query
AgreementPolicy
Parties “agree” to provide data service
Data
User Query
Results
Agreeme
nt
Figure 1: Conceptual architecture of the data services.
data sources via the available interfaces and to map
the physical data schema to the virtual data model.
The applications and services using the data service
leverage this virtual data model to access the required
information and the data service software handles the
collection and distribution of the data as needed from
the physical instances of the data. Figure 1 gives an
overview of the conceptual architecture of the data
services.
The traditional approach to the design of data ser-
vices is that the service provider and consumer agree
on how the service shall be used. The terms of this
agreement are usually static and are not altered over
time. Typically, data services in the form of inte-
gration server products have connectors or adaptors
built to connect applications together. A number of
them have added or will add new connectors for data
service applications such as integration adaptors for
salesforce
5
. However, the static nature of agreements
on data concerns is a severe drawback of the existing
technology – in particular in dynamic data integration
scenarios using mashups or cloud strategy. Moreover,
several important aspects of data integration applica-
tions are not addressed in these solutions, such as (1)
automatic updates, (2) changes of regulations, and (3)
changes of the location of the data.
2.1 Roles of the Data Service
As can be seen in Figure 1, there are three actors
involved in a data service architecture, namely Data
Provider (DP), Service Provider (SP), and Consumer.
Below we discuss each of them briefly.
Data Provider. The Data Provider is responsible to
provide the actual data for the service. The DP is not
necessarily the owner of the data, since data can also
5
http://salesforce.com/
be outsourced. But it is the responsibility of the DP to
ensure the availability of the data for the usage by the
data service. All the concerns related to the data must
also be communicated to the service provider. Some
data concerns for the data provider like the privacy
and permission concerns can vary from data item level
to the whole data source level.
Service Provider. The Service Provider entity is
the person or organization that provides an appropri-
ate agent to implement a particular service. Service
Provider is owner of the data service. After making
the arrangement with the DP, the SP defines the func-
tionality of the service. It is responsibility of the SP to
make sure that all the concerns defined by the various
data providers are preserved while the service is used.
Consumer. The consumer is a person or organization
that wishes to make use of the data service. It will
use a requester agent to exchange messages with the
provider agent. In most cases, the requester agent in-
teracts with the provider agent to exchange messages.
The requester entity and provider entity agree on the
service description (e.g. a WSDL document) and the
semantics that will govern the interaction between the
requester agent and the provider agent.
2.2 Data Concerns for Data Services
Data services have different requirements and con-
cerns compared with traditional web services. Data
provided through data services is usually associated
with many data concerns(Truong and Dustdar, 2009;
Truong and Dustdar, 2010). These concerns must be
precisely described, organized and stored in such a
way that the SP is capable of preserving these con-
cerns while querying any data source.
For the sake of simplicity we have chosen a few
of the most important concerns as shown in Table 1.
DATA CONCERN AWARE QUERYING FOR THE INTEGRATION OF DATA SERVICES
113
Table 1: Some data concerns associated with data services.
Category Scope Data Concern Description
Data Data Timeliness Defines the life time and freshness of the data
Quality Level Accuracy Data correctness and consistency
Completeness Missing information in terms of null values etc.
Availability Defines possible access limitations
Quality Service Performance Defines performance of the data service,
of Level e.g. execution time, response time
Service Reliability/availability What is the failure probability and what is the
recovery time in case of failure
Dependability/Trust Reputation, how trustful is the data service
Service Location Location of the service execution
License Data Usage Permission/Rights How a data service can be used
Level Data Location Defines where the data resides
Usage Fee Defines the fee associated with the usage of
the data or data service
Law Enforcement Defines laws which are used to deal with the
use of the service
Each concern can be categorized into a particular cat-
egory depending on the type of the concern. Every
category of the data concerns has its scope which will
be either service level or data level (for further details,
see Section 4.1).
2.3 Data Service Systems with Data
Concerns
Figure 2 describes the activities in a data service sys-
tem with data concerns. As an initial step, both DP
and SP mutually agree to provide the facility of the
data service. The data provider is required to provide
accessibility to its data for the service operations. The
DP should provide a mechanism to access its data and
additionally provide data concerns associated with the
data. The user can also contribute to the data concerns
meta-data by providing her preferences. Such infor-
mation can be easily managed using profiling tech-
niques. The service provider describes the service op-
erations to access the data sources. This access can be
realized via a query language, REST/SOAP or a pa-
rameterized query for structured data. The SP has to
make sure that all the concerns defined by the DP are
preserved.
3 MODELS OF DATA CONCERN
AWARE QUERYING
Figure 3 depicts four possible models for querying
data services with concerns. As already discussed in
Section 2 there are three actors/roles involved. For the
sake of simplicity we assume that Data Provider and
Service Provider have already agreed on the storage
format of the data concerns: we assume that this in-
formation is stored in the form of an XML document
whose schema will be described in Section 4.1.
The main difference between the models pre-
sented in Figure 3 is due to the way how the data con-
cerns are associated with the queries that are posed
against the data sources: The models in Figure 3(a)
and (b) use the “Querying With Concerns” paradigm.
In this case, the system sends a pair hQ, Ci to the data
source, where Q is the query for the data service and
C is the collection of data concerns associated with
the query. Each data service must be capable of pre-
serving these concerns by using its own individual
querying system. The models in Figure 3(c) and (d)
are based on “Concern Aware Querying”. The ser-
vice provider has the concerns either stored locally or
fetched dynamically and is capable of writing queries
in such a way that they preserves all the data concerns
associated with the particular query and its relevant
data source.
Below we briefly discuss each of the models
shown in Figure 3
Model (a): Querying with Data Concerns. Figure
3(a) shows the basic model of querying with data con-
cerns. DP and SP have mutually agreed and have ac-
cumulated their data concerns. These data concerns
are stored by the SP. A user query Q is subdivided into
multiple sub-queries for multiple data sources. The
SP must be capable of retrieving the data concerns
meta-data of a particular data source and attaching it
to the query. The data provider must be capable of
taking care of the supplied concerns while executing
ICEIS 2011 - 13th International Conference on Enterprise Information Systems
114
Consumer
Service
Provider
Data Service
Data Source
Data Provider
Data Concerns
Data Service
Operations
owns
owns
Data Concerns
describes
Concern
Aware Query
designs
constrains
describes
ensures
queries
uses
implements
provides
executes
describes
Figure 2: Activities in data concern aware data service systems.
the query and returning the results.
Model (b): Query with Data Concerns with Cen-
tralized Repository. This model deals with the
generic data concerns stored in a centralized repos-
itory. Such a model is best suited for an inter-
organization data integration system, where data con-
cerns are homogeneous for all of the data sources of
the organization. As shown in Figure 3(b) this model
is very similar to the existing data integration systems.
The SP sends user queries to multiple data sources
and assumes that the data sources are capable of pre-
serving the data concerns by accessing a centralized
repository of data concerns.
Model (c): Concern Aware Querying. The concern
aware querying model of Figure 3(c) is a data service
capable of re-writing user queries in such a way that
all the concerns associated with a particular data ser-
vice are incorporated into the query itself. All the data
sources and their concerns are already registered at
the SP and the required meta-data information is lo-
cally stored by the SP. The SP divides a user query
into multiple sub-queries in such a way that all the ap-
plicable data concerns (which are stored within meta-
data file locally stored at the SP) are incorporated into
these sub-queries.
Model (d): Concern Aware Querying Model with
Dynamic Discovery. Contrary to the legacy data in-
tegration systems for databases, data services usually
have no prior knowledge of the schema or the data
source. Mostly data sources are discovered dynami-
cally and the most suitable data source has to be se-
lected. Of course, storing meta-data information of
all data sources at internet scale is out of the question.
Figure 3(d) shows the model for concern aware query-
ing with dynamic discovery of data sources and their
associated data concerns. The SP stores data concerns
meta-data as in the concern aware querying model,
but at the same time it is capable of discovering data
services dynamically and creating meta-data of data
concerns by fetching them dynamically. These con-
cerns are then applied to queries for a particular data
source. This model also allows the consumer to pro-
vide her concerns and preferences within the query.
4 CONCERN AWARE QUERYING
For our system, we have chosen the concern aware
querying model as shown in Figure 3(d) since it is
the most flexible and most powerful of the models
discussed in the previous section. Our system stores
the meta-data information of the concerns of all three
stake holders of the data service and is capable of
querying data concerns dynamically. All the three
actors of a data service have the possibility to con-
tribute their concerns to the meta-data. Each data
provider provides its data concerns associated with
its data along with its data. Similarly each consumer
has the option to add her preferences/concerns with
her query. The service provider can add its concerns
to form an aggregated data concerns meta-data which
DATA CONCERN AWARE QUERYING FOR THE INTEGRATION OF DATA SERVICES
115
DS
2
Q
1
,
C
1
Q
2
,
C
2
Q
n
,
C
n
DS
n
DS
1
DS
2
Q
1
c
1
Q
2c
2
Q
ncn
DS
n
DS
1
(a) Querying with Data Concerns (c) Concern Aware Querying
Data
Concerns
Data
Service
Q
Consumer
Data
Concerns
Data
Service
Q
Consumer
Q
DS
2
Q
1
c
1
Q
2c
2
Q
ncn
Consumer
DS
n
DS
1
(d) Concern Aware Querying dynamic source/concern
discovery
Data
Service
Data
Concerns
DC
DC
C
n
C
2
C
1
Q
1
Q
2
Q
n
DS
2
DS
n
DS
1
(b) Query with Data Concerns
(centralized concerns repository)
Data
Service
Q
Consumer
Data
Conc-
erns
Figure 3: Data concerns aware querying models.
will be utilized by the query language to write concern
aware queries. A data concern tree (dct) is generated
from the available meta-data and is associated with
each of the data sources.
We have extended the XQuery language to make
it concern aware, with the introduction of special key-
words for mentioning data concerns within the query.
The query parser looks into the data concern trees at-
tached to the available data sources and executes the
query by selecting the most suitable data sources and
data items for the particular user query. Below we
discuss the concern aware querying system in detail.
4.1 Data Concerns Collection
For storing the meta-data information of data con-
cerns of a particular data source, we have decided to
use XML because of its standardization and platform
independence. Figure 4 shows the schema definition
for the meta-data for the concern aware querying. A
data concern tree will be generated by using the meta-
data information available in the attached XML file.
The schema shown in Figure 4 describes the data con-
cerns as shown in Table 1. The data concern collec-
tion file can be expanded to any number of concerns
depending on the type and requirement of the SP or
DP. The data concern tree can be subdivided into two
categories based on the scope of the data concerns.
Service Level Concerns. Data concerns whose scope
is the entire data service are called service level con-
cerns. The data concerns tree of a data service con-
tains exactly one value for any of the service level
concerns. For instance, consider the performance
concern of QoS category of a service. Clearly, a data
service can have only one calculated value for its per-
formance data concern.
Data Level Concerns. Data concerns whose scope
is a single data item or a collection of data items are
called data level concerns. Data level concerns can
have multiple values in the data concern tree attached
to some data source. If a data source returns a node
or a list of nodes of an XML document as a result of
a user query (written in XQuery), then a data level
concern can be described for each of the nodes in this
result and for each element in the subtrees rooted at
these nodes.
The data concern tree is populated by the values
of data concerns depending on the type of each
data concern. There are two possible types of data
concerns based on their value, namely (i) boolean
data concerns, which can have either the value
true or false, e.g. usage permission for commercial
purposes can either be allowed or restricted by the
data provider, and (ii) value based data concerns,
which can have any value within a specific range e.g.
the performance concern of QoS of a data service or
the completeness concern of a data item can have a
specific value inside a pre-described range. Different
algorithms are available for the calculation of QoS
values of a data service or data quality measurements
for a data item, which is outside the scope of this
paper. We assume that after applying any of the
available algorithms and techniques a fixed value
(positive integer) is provided for the value based data
ICEIS 2011 - 13th International Conference on Enterprise Information Systems
116
D:\PhD Dbai\Restful\DataConcerns1.xsd 2/8/2011 9:57:23 PM
©1998-2010 Altova GmbH http://www.altova.com Page 1Registered to Ali (TU WIEN)
DataConcern
Data Concerns with Service
Level Scope
ServiceLevelConcerns
QoS
Trust
Fee
Availability
Performance
Licensing
LawEnforcement
UsagePermission
NoLinkage
NoIntegrity
NoDistribution
NonCommercial
Location
Data Concerns with Data
Level Scope
DataLevelConcerns
Multiple DataQuality
elements will show either
resource level or dataitem
level concerns
DataQual...
1
..
DataLocati...
Completeness
Timeliness
Accuracy
Consistancy
Level
DataItemLevel
ResourceLevel
Figure 4: Data Concerns Tree.
concerns within a range between 1 and 10, where 10
is the highest level.
Algorithm 1: Data Source Selection based on SLC.
input : Concern Aware Query Q,
DS = {ds
1
, ds
2
,...,ds
n
}
output: DS0 DS
DS0 = {φ};
for each ds
i
DS do
flag = true;
for each Q.SLC
b
do
if Q.SLC
b
6= ds.SLC
b
then flag = false
end
end
if flag == true then
for each Q.SLC
b
do
if Q.SLC
v
ds.SLC
v
then flag = false
end
end
end
if flag == true then DS0 ds
i
end
end
Return DS0;
4.2 Query Processing
If concern aware querying is used and a query con-
tains data concerns, then the SP iterates through the
data concern trees of all available data sources. Using
data concern information provided in the query and
the meta-data tree of data concerns, the data concern
aware querying system is able to select the most suit-
able data service for a particular query. Algorithm 1
shows the data source selection by assuring both types
(i.e., boolean and value based) of service level con-
cerns. A concern aware query and a set of available
data sources is provided as input. After evaluating
this algorithm, the most suitable data source or set of
data sources (in case multiple data source assure the
desired concerns) is returned for data querying. Sim-
ilarly, DLCs are also evaluated using the same tech-
nique mentioned in Algorithm 1 for data selection to
assure data level concerns.
4.3 Concern Aware XQuery
We have chosen XQuery because it is the de facto
standard language for XML data and because of its
capability to execute distributed queries over het-
erogenous data sources (Ali et al., 2009a). Now con-
sider a dynamic data sources integration system as
described in Figure 3(d), where a user query can be
executed on multiple data sources which perform the
same task and one data concern tree is attached to
each data source. Current query languages for semi-
structured data like XQuery or SPARQL need to pro-
vide the URI or location of the data source manu-
ally. In order to select the most appropriate data
source from the available data sources which perform
the same task, we have extended the XQuery syntax
by providing additional keywords to make it concern
DATA CONCERN AWARE QUERYING FOR THE INTEGRATION OF DATA SERVICES
117
Listing 1: Sample Concern Aware XQuery.
l e t $ a u c t i o n : =
doc ( a u c t i o n . xml ) SLC [ p e rf o r m anc e > 7 ,
C om me r ci a lU sag eP e rm i ss io n = t r u e ]
r e t u r n l e t [ $ca . t i m e l i n e s s ] : =
$ a u c t i o n / s i t e / c l o s e d a u c t i o n s / c l o s e d a u c t i o n r e t u r n
l e t
[ $ e i . c o m p l e t e n e s s > 4] : = $ a u c t i o n / s i t e / r e g i o n s / e u r op e / it e m
f o r $p i n $ a u c t i o n / s i t e / p e o ple / p e r s o n
l e t $a :=
f o r $ t i n $ca
where $p / @id = $ t / b u ye r / @person
r e t u r n
l e t $n : = f o r $ t 2 i n $ e i
where $ t / i t e m r e f / @item = $ t 2 / @id r e t u r n $ t 2
r e t u r n <ite m >{$n / name / t e x t () } < / item >
r e t u r n <p e r s o n name ={ $p / name / t e x t ()} > { $a }</ p er so n >
aware. All the service level concerns can be described
where a data source location or URI is mentioned.
Listing 1 shows a sample query over the
XMARK benchmark data of a web application for
auctions, which return a list of person names and
items bought by them within Europe. A comma sepa-
rated list of all the service level concerns is described
within square brackets using additional keywords de-
fined for the service level concerns immediately af-
ter the URL/location of the data source mentioned in
XQuery. For the purpose of simple illustration we al-
low using data level concerns related to particular data
elements within an XQuery [QName.DLC] as part of
concern aware querying. Boolean data concerns can
have either a true or false value. Basic comparison
operators can be implied for the calculation of value
based data concerns.
5 IMPLEMENTATION AND
EVALUATION
Implementation. We have implemented a data con-
cern aware XQuery tool to support the idea of con-
cern aware querying. To store XML data sources,
we use the open source native XML database system
eXist-db
6
. Our concern aware querying tool uses java
on top of the XQuery processing facility provided by
eXist-db. When a a concern aware query is submitted
to the tool it implements the methods described above
and selects the most suitable data sources for a partic-
ular query. Once the most suitable data sources have
been selected, our tool excludes concern aware query-
ing clauses and sends standard XQuery to the selected
6
http://exist.sourceforge.net/
data sources. Our data concern aware XQuery tool is
built upon the DeXIN system (Ali et al., 2009a; Ali
et al., 2009b), which is a web based system to inte-
grate data over heterogeneous data sources. DeXIN
extends the XQuery language to support SPARQL
queries inside XQuery, thus facilitating the integra-
tion of data modeled in XML, RDF, and OWL. To
build our data concern aware XQuery system, we
have incorporated the data concern awareness into
DeXIN. Now DeXIN can integrate heterogenous dis-
tributed data source while preserving their individual
data concerns.
Evaluation. As a proof of concept we have evalu-
ated our system on XML benchmark data. We used
the XMARK
7
benchmark data set for experimental
analysis. XMARK is a popular XML benchmark and
models an internet auction application. We made 100
copies of the subset of the XMARK auction data and
defined each as a data service. The resulting data
services were constructed with the same functional-
ity but with different concerns. Due to the unavail-
ability of data services which support data concerns,
we randomly generated data concern tree meta-data
for each data service and assigned different values to
both service and data level concerns. We used 20 dif-
ferent sample queries provided with the benchmark
and executed each of them with different data con-
cern values. There was no reported failure in the con-
cern aware query execution and all the provided data
concerns were assured, which proves the suitability
of our tool and the potential for its incorporation into
any data service integration application.
7
http://www.xml-benchmark.org/
ICEIS 2011 - 13th International Conference on Enterprise Information Systems
118
6 CONCLUSIONS AND FUTURE
WORK
In this work, we have designed a querying system
which is capable of taking several kinds of data con-
cerns into account. We have provided a basic model
in which we concentrate on three concerns, namely
data quality, quality of service, and licensing. How-
ever, our approach is generic in the sense that one can
incorporate arbitrary data concerns. Indeed, one item
on our agenda for future work will be to integrate fur-
ther data concerns like pricing, data security, auditing
model, etc. Another important goal for future work is
the integration of our querying system into a powerful
mash-up tool. So far, our querying system is designed
to access data sources via XQuery. In the future, we
want our system to access also data sources which ex-
pose their data via web services.
REFERENCES
Ali, M. I., Pichler, R., Truong, H. L., and Dustdar, S.
(2009a). Dexin: An extensible framework for dis-
tributed xquery over heterogeneous data sources. In
Proc. ICEIS 2009, volume 24 of LNBIP, pages 172–
183. Springer.
Ali, M. I., Pichler, R., Truong, H. L., and Dustdar, S.
(2009b). On using distributed extended xquery for
web data sources as services. In Proc. ICWE 2009,
volume 5648 of LNCS, pages 497–500. Springer.
Bhowmick, S. S., Gruenwald, L., Iwaihara, M., and
Chatvichienchai, S. (2006). Private-iye: A framework
for privacy preserving data integration. In Proc. ICDE
Workshops 2006, page 91. IEEE Computer Society.
Boulakia, S. C., Lair, S., Stransky, N., Graziani, S., Rad-
vanyi, F., Barillot, E., and Froidevaux, C. (2004).
Selecting biomedical data sources according to user
preferences. In ISMB/ECCB 2004, pages 86–93.
Clifton, C., Kantarcioglu, M., Doan, A., Schadow, G.,
Vaidya, J., Elmagarmid, A. K., and Suciu, D. (2004).
Privacy-preserving data integration and sharing. In
Proc. DMKD 2004, pages 19–26. ACM.
Creese, S., Hopkins, P., Pearson, S., and Shen, Y. (2009).
Data protection-aware design for cloud services. In
Proc. CloudCom 2009, volume 5931 of LNCS, pages
119–130. Springer.
Dan, A., Johnson, R., and Arsanjani, A. (2007). Informa-
tion as a service: Modeling and realization. In Proc.
SDSOA 2007. IEEE Computer Society.
Farnan, N. L., Lee, A. J., and Yu, T. (2010). Investigating
privacy-aware distributed query evaluation. In Proc.
WPES 2010, pages 43–52. ACM.
French, J. C., Powell, A. L., Callan, J. P., Viles, C. L., Em-
mitt, T., Prey, K. J., and Mou, Y. (1999). Comparing
the performance of database selection algorithms. In
Proc. SIGIR 1999, pages 238–245. ACM.
Hacig
¨
um
¨
us, H., Mehrotra, S., and Iyer, B. R. (2002). Pro-
viding database as a service. In Proc. ICDE 2002.
IEEE Computer Society.
Kobsa, A. (2001). Tailoring privacy to users’ needs. In Proc.
User Modeling 2001, volume 2109 of LNCS, pages
303–313. Springer.
Liu, Y., Ngu, A. H., and Zeng, L. Z. (2004). Qos compu-
tation and policing in dynamic web service selection.
In Proc. WWW Alt. 2004, pages 66–73. ACM.
Maximilien, E. M. and Singh, M. P. (2004). A frame-
work and ontology for dynamic web services selec-
tion. IEEE Internet Computing, 8(5):84–93.
McSherry, F. (2009). Privacy integrated queries: an exten-
sible platform for privacy-preserving data analysis. In
Proc. SIGMOD 2009, pages 19–30. ACM.
Mrissa, M., Tbahriti, S.-E., and Truong, H.-L. (2010). Pri-
vacy model and annotation for daas. In Proc. ECOWS
2010, pages 3–10. IEEE Computer Society.
Mykletun, E. and Tsudik, G. (2006). Aggregation queries
in the database-as-a-service model. In Proc. DBSec
2006, volume 4127 of LNCS, pages 89–103. Springer.
Truong, H. L. and Dustdar, S. (2009). On analyzing and
specifying concerns for data as a service. In Proc.
APSCC 2009, pages 87–94. IEEE.
Truong, H. L. and Dustdar, S. (2010). On evaluating and
publishing data concerns for data as a service. In Proc.
APSCC 2010, pages 363–370. IEEE Computer Soci-
ety.
Yeganeh, N. K., Sadiq, S. W., Deng, K., and Zhou, X.
(2009). Data quality aware queries in collaborative
information systems. In Proc. APWEB/WAIM 2009,
volume 5446 of LNCS, pages 39–50. Springer.
Zhang, N. and Zhao, W. (2007). Privacy-preserving data
mining systems. IEEE Computer, 40(4):52–58.
DATA CONCERN AWARE QUERYING FOR THE INTEGRATION OF DATA SERVICES
119