D
ATA MANAGEMENT AND INTEGRATION WITHIN
COLLABORATIVE WORKING ENVIRONMENTS
Assel Matthias and Kipp Alexander
High Performance Computing Center - HLRS, University of Stuttgart
Nobelstr. 19, Stuttgart, Germany
Keywords:
Distributed Data Management, Data Integration, Data Sharing, Collaborative Working Environments, Data
Exchange Security, Virtual Laboratory, ViroLab, CoSpaces.
Abstract:
With increasingly distributed and inhomogeneous resources, sharing knowledge, information, or data becomes
more and more difficult and manageable for both, end-users and providers. To reduce administrative overheads
and ease complicated and time-consuming integration tasks of widely dispersed (data) resources, quite a few
solutions for collaborative data sharing and access have been designed and introduced in several European
research projects for example in CoSpaces and ViroLab. These two projects basically concentrate on the
development of collaborative working environments for different user communities such as engineering teams
as well as health professionals with a particular focus on the integration of heterogeneous and large data
resources into the system’s infrastructure.
In this paper, we present the two approaches realised within CoSpaces and ViroLab to overcome the difficulties
of integrating multiple data resources and making them accessible in a user-friendly but also secure way. We
start with an analysis on systems’ specifications describing user and provider requirements for appropriate
solutions. Finally, we conclude with an outlook and give some recommendations how those systems can be
further enhanced in order to guarantee a certain level of dynamicity, scalability, reliability, and last but not
least security and trustworthiness.
1 INTRODUCTION
Today’s B2B
1
relationships are not limited on local
or regional collaborations any longer but the interna-
tional orientation of enterprises in fact combines var-
ious organisations scattered all over the world to co-
operate with each other. Due to the necessity of ex-
changing information and/or confidential data among
the business partners involved, easy, dynamic and
especially secure ways of sharing certain data sets
and information need to be considered and applied
before cross-organisational collaborations can take
place, and, in order to prevent any abuse by third
parties while communicating over untrusted networks
like the Internet.
From an eBusiness perspective, the concept of
Virtual Organisations (VO) is widely and often used
to approach similar issues, namely to make (data) re-
sources and products available dynamically and on-
demand. The main purpose pursuit in such a concept
1
Business-T
o-Business
consists in making ad-hoc collaborations (Schubert
et al., 2005) possible that meet specific business goals,
respectively address a (temporary) market niche. To
achieve this, VO frameworks (Wilson et al., 2005)
allow for identification of resource providers accord-
ing to current business needs/goals, and integration of
these so as to enable collaborative workflow execu-
tion.
The CoSpaces
2
project elaborates a framework
to support world-wide distributed engineering teams
by developing an environment that supports dynamic,
ad-hoc collaborative working sessions. In particular,
this framework shall support on-demand selections of
participants, documents, and data for a collaboration
session as well as the easy integration of partners re-
garding both, the support to ease the access to and
from partners and their applications.
However, within ViroLab
3
, a virtual laboratory
2
http://www
.cospaces.org/
3
http://www.virolab.org
258
Matthias A. and Alexander K. (2008).
DATA MANAGEMENT AND INTEGRATION WITHIN COLLABORATIVE WORKING ENVIRONMENTS.
In Proceedings of the Tenth International Conference on Enterprise Information Systems - DISI, pages 258-263
DOI: 10.5220/0001704302580263
Copyright
c
SciTePress
for HIV
4
research and medication support (Assel and
Krammer, 2007) is being developed that allows sev-
eral experts in this field to share their knowledge and
results interactively while working together on the
same data and information sets, which are currently
widely dispersed over Europe and without cross-
national or even cross-institutional collaboration.
To face real end-users’ needs and requirements,
concrete scenarios in cooperation with industrial part-
ners in CoSpaces and respectively hospitals in Vi-
roLab have been developed and will be evaluated
against the defined concepts. Sharing data or docu-
ments between partners stresses security issues to be
of the utmost priority and importance (Assel et al.,
2007) while developing such collaborative working
environments for business partners and/or academic
institutions. Those issues include several levels of se-
curity implying trustworthiness among participants to
meet appropriate collaboration goals without harming
legal issues as well as keeping the user’s privacy.
In the following, we come up with general sys-
tems’ specifications including requirements on both,
user and provider sites that need to be considered
while developing environments for dynamic (busi-
ness) collaborations, as well as two concrete exam-
ples demonstrating appropriate dynamic and secure
infrastructures realised within the CoSpaces and Vi-
roLab project.
2 SYSTEMS’ SPECIFICATIONS
Talking about distributed collaboration environments
between organisations across different countries, the
specific and even dynamic requirements for integrat-
ing and accessing services, applications, and data re-
sources in particularly regarding security issues differ
from ”normal” local collaboration federations.
The immense complexity of the different tech-
nologies involved as well as the heterogeneity of
present infrastructures and resources together with
their spatial distribution, requires lots of effort for the
design and development teams. Lots of progress has
been made in the recent past and quite a few solu-
tions have been developed in other project like Ako-
grimo
5
and TrustCoM
6
. The currently running In-
tegrated Project BREIN
7
extends the eBusiness ap-
proach by merging semantics, agents, and Grid tech-
nologies to provide an intelligent, self-manageable in-
frastructure. But all these approaches do not consider
4
Human Immunodeficiency Virus
5
http://www.mobilegrids.org
6
http://www.eu-trustcom.com.
7
http://www.gridsforbusiness.eu
the very specific needs in case of dynamic collabora-
tion sessions, respectively distributed data handling.
Basically, to ease and reduce management and
configuration overheads during the setup phase but
also during the runtime of such sessions, several is-
sues including amongst others usability, performance,
scalability, reliability, flexibility and especially secu-
rity need to be taken into account, and should be care-
fully evaluated before deciding on specific technolo-
gies, respectively designing and developing appropri-
ate systems and/or infrastructures.
The following list briefly highlights the most im-
portant requirements for collaborative working envi-
ronments in general by taking the recent project re-
sults and extending them accordingly.
The overall infrastructure shall be highly flexible -
not limited to one specific operating system, mid-
dleware, and technology (Elmroth et al., 2007) but
rather interoperate with different widely-used so-
lutions including standard technologies and spec-
ifications, as well as following the latest ap-
proaches of the so-called Software as a Service
(SaaS) paradigm;
End-user friendliness including easy to deploy
and run components/services as well as applica-
tion and resource transparency;
Virtualisation of services and corresponding com-
ponents to allow on-the-fly modifications or ex-
tensions without affecting the current session(s);
The easy usage of the collaboration platform
should be supported through a decentralized
authentication and authorisation model (Assel
and Kipp, 2007) based on a Single-Sign On
(SSO) procedure across and within organisational
boundaries;
The dynamic setup of collaboration partners in-
cluding the on-demand modifications of firewalls
in order to guarantee that only trusted people are
allowed to execute corresponding operations;
Dynamic management and control of attribute-
based access policies required to authorise users
before accessing services, applications, and re-
sources;
Keeping the user’s privacy and protecting his/her
confidentiality by impersonalising data or exclud-
ing irrelevant information;
Satisfying requirements under the Data Protection
Act as well as explicit consent from all parties
concerned;
Secure data transmission based on data encryption
on several levels (e.g. encrypted messages versus
DATA MANAGEMENT AND INTEGRATION WITHIN COLLABORATIVE WORKING ENVIRONMENTS
259
secure protocols) ensuring trustworthiness and in-
tegrity of exchanged information;
Additional security mechanisms and policies for
storage of confidential data sets;
Recording of relevant user interactions for audit-
ing, accounting, and pricing;
Monitoring of critical infrastructure components
as well as services/applications to react on sud-
denly intermittent failures;
Flexible system(s) for distributing interesting
events or pre-defined topics of interest to foreseen
users/components (notification support);
Methods for defining and negotiating Ser-
vice Level Agreements (digital contracts) (Has-
selmeyer et al., 2006) based on QoS
8
parameters
in order to guarantee a certain level of reliability,
performance, and scalability to customers/users
and to facilitate the individual pricing of single
service capabilities.
3 COSPACES SHARED DATA
SPACE
Beside the dynamic setup of a collaboration session
the management of sensible data has to be considered
as a critical aspect. Within industrial collaborations
critical data has to be shared between all collaboration
partners. So the CoSpaces framework has to provide
an infrastructure to support the secure distribution of
this data to partners being foreseen for a specific col-
laboration. As one of the most important aspects for
industrial partners regarding the sharing of data, it has
been identified that the control of the corresponding
data, i.e. who is allowed to access or modify which
specific data sets, must remain by the corresponding
data owner. The following current practise has been
identified within current industrial collaborations:
If companies want to share data, the data being
foreseen for a specific collaboration session is
stored in a dedicated shared data space within the
Demilitarised Zone (DMZ) of the corresponding
company;
Access rights are just granted on data within that
shared data space;
The DMZ is protected by another firewall and
secured with an authentication and authorisation
system;
8
Quality of Service
Access to the corresponding data is realised en-
suring encryption of the entire data traffic;
The access rights are fully controlled by the cor-
responding data owner.
Since within such a collaboration different tools are
going to be used by the collaboration partners, the
framework shall also provide a data transformation
and integration support in order to allow the partici-
pating users to exchange and integrate their local data
sets with the ones being published by other collabo-
ration partners. As a result, the users involved benefit
by additional knowledge resulting in the combination
and integration of the available data sources. Figure 1
reflects this entire CoSpaces Data Space approach.
Figure 1: CoSpaces Data Space.
To face real user requirements as mentioned before,
the CoSpaces approach realises the data management
processing as follows:
Every collaboration partner willing to share data
with other partners has to provide a Virtual Shared
Data Space within their DMZ. Within this Virtual
Shared Data Space every partner is able to upload
data from the Secure Organisational Space and as-
signs access control policies for the corresponding
data sets and partners. Within CoSpaces, this Vir-
tual Data Space is going to be realised by providing
dedicated databases as well as a modified version of
the BSCW shared workspace system (Horstmann and
Bentley, 1997).
This modified version allows the usage of the
Shibboleth approach for authentication and authori-
sation tasks, since for the realisation of the entire
authentication and authorisation processing, Shibbo-
leth’s federated Single-SignOn and attribute exchange
framework has been selected (Assel and Kipp, 2007).
To allow for the need to combine and integrate
data artefacts to get additional knowledge, a Shared
Business Entity Server is mentioned in the CoSpaces
approach as well. This Shared Business Entity Server
ICEIS 2008 - International Conference on Enterprise Information Systems
260
provides functionality to convert data according to
specific formats and integrate these data sets to a new,
global one. It also provides the same security, authen-
tication and authorisation infrastructure as the Vir-
tual Shared Data Spaces and can so consequently be
hosted by one of the data providers or by an external
trusted third party. The access rights of the combined,
collaborative models are defined by all data providers
themselves.
4 VIROLAB VIRTUAL
LABORATORY
The mission of the ViroLab project is to provide re-
searchers and medical doctors in Europe with a virtual
laboratory for infectious diseases for HIV drug resis-
tance.
The virtual laboratory integrates the biomedical
information from viruses (proteins and mutations),
patients (e.g. viral load) and literature (drug re-
sistance experiments) resulting in a rule-based de-
cision support system (Sloot et al., 2006) for drug
ranking. In addition, it includes advanced tools for
(bio)statistical analysis, visualisation, modelling and
simulation, enabling the prediction of temporal viro-
logical and immunological response of viruses with
complex mutation patterns for drug therapy.
The virtual laboratory is basically used by medi-
cal doctors to review previous results and rankings
of recent HIV drug resistance interpretations or by
scientists to conduct new experiments and simula-
tions starting from pre-defined process flow tem-
plates, which allow an interactive selection of avail-
able bioinformatics applications to be combined into
one explicit workflow for analysing individual HIV
drug resistance. Furthermore, the virtual environment
offers different capabilities such as on-demand re-
quests to people more involved or real-time data shar-
ing, in order to allow easy collaborations with other
medical professionals for studying and discussing
previous results and experiments.
To achieve a smooth integration of the distributed
and heterogeneous resources into the overall labo-
ratory infrastructure, a set of virtualisation services
that guarantees access to resources in a consistent,
resource-independent, and efficient way is being de-
veloped to facilitate a direct and on-demand interac-
tion with all available biomedical databases, thus en-
abling collaborative research and workflow execution.
In order to meet the specific requirements for ex-
changing the confidential biomedical data sets within
such a virtual environment, the solution introduced in
ViroLab is built on existing Grid technologies provid-
ing the core for the so-called Data Access Services
(DAS) (Assel et al., 2008).
These services implement standard user interfaces
realised as basic Web Service capabilities to guaran-
tee an easy interoperability with different end-user
systems, and to support various user groups such as
researchers, medical doctors, etc. for accessing the
distributed data in a user-friendly way. Furthermore,
the DAS also allow the integration of several data re-
source types to be exposed within the virtual infras-
tructure. With only one central entry point acting as
the only ”visible” and accessible system, users are un-
aware that they are dealing with a federation of differ-
ent data resources rather than a single one.
Thus, when answering requests for data, the ser-
vices need to transform and translate heterogeneous
data according to application-dependent formats, ac-
cess heterogeneous technologies, consolidate data
gained from several resources simultaneously, and as-
sure the availability of new/current data while observ-
ing data confidentiality and ownership. The latter
ones are very crucial in an eHealth scenario and can
be seen as one of the most essential and extremely
important parts. Therefor, the DAS are equipped
with sophisticated security mechanisms based on es-
tablished technologies like Shibboleth and the Grid
Security Infrastructure (GSI) provided by the Globus
Toolkit (Barton et al., 2006) to protect the sensible
sources of data and to keep the privacy of single data
sets (patients). The following figure depicts the over-
all data access and integration architecture of Viro-
Lab’s virtual laboratory.
Figure 2: ViroLab’s Data Access and Integration Architec-
ture.
At every data provider site, within the hospitals’ se-
curity regions (behind their firewalls), data from the
original database(s) are transferred into a private Re-
gaDB installation. RegaDB is a specific HIV data and
analysis management environment (Libin et al., 2007)
developed by the Rega Institute of the Katholieke
DATA MANAGEMENT AND INTEGRATION WITHIN COLLABORATIVE WORKING ENVIRONMENTS
261
Universiteit Leuven that enables an easy storage and
management of biomedical data sets of HIV-treated
patients. The transfer is done by exporting data from
the original database and converting that data ex-
tract through a custom script into the latest RegaDB
schema. The transfer can be conducted repeatedly
over time at the discretion of the database adminis-
trator(s).
Data anonymisation can occur either while trans-
ferring data from an original database into a private
RegaDB or alternatively when transferring data from
a private RegaDB onto a collaborative one.
To contribute data to the ViroLab virtual labo-
ratory, there are two alternative scenarios, both in-
cluding the upload of data from a private RegaDB
into a collaborative RegaDB. The main difference
between both solutions is the physical location of
the collaborative RegaDBs. Data providers can ei-
ther host their own collaborative RegaDB installation
within a trusted region outside their institutional fire-
wall (within their Demilitarised Zone or DMZ), or
they utilise one of the ”centrally managed” collabo-
rative RegaDBs hosted by some trusted third parties
via a secure connection.
Currently, both of the described scenarios are in
the scope of ViroLab. Since some hospital policies
basically prohibit an installation of additional server
machines and/or software components within their
administered networks, the only way to contribute
data to the project’s workspace is limited to the sec-
ond possibility as described above.
5 PRELIMINARY RESULTS
At present, both projects are in their implementation
phases developing and realising the mentioned ap-
proaches. While CoSpaces just finalised the concep-
tual design phase of the entire data management in-
frastructure and recently started working on a first
prototype to be available until the end of this year,
ViroLab already released a first version of the Vir-
tual Laboratory that supports data access and inte-
gration of the heterogeneous resources in a limited
way. One can deal with these resources as a federated
data space, which can be queried by submitting mul-
tiple and concurrent requests for gathering any kind
of biomedical data sets that still reside in an inho-
mogeneous state. Basic security features including
the encryption of transferred data messages as well
as the support for user authorisation based on Shibbo-
leth’s authentication and authorisation infrastructure
(AAI) are also in place. These data requesting ac-
tivities are applied within several pre-defined exper-
iments that can be used by virologists but also clin-
icians to estimate the possible drug resistance for a
particular virus mutation. A more detailed descrip-
tion of corresponding experiments can be found in
(Gubala et al., 2008). Future releases of ViroLab’s
Data Access Services will enhance the services’ ca-
pabilities with respect to performance and scalability,
and facilitate the interaction with the services through
a specific query language based on natural language
terms instead of providing common SQL statements.
6 CONCLUSIONS
Dynamic (business) collaboration, as an exciting and
promising field of an interdisciplinary cooperation,
will provide new working environments that ease
cross-organisational data exchange and communica-
tion. It has attracted worldwide attention and several
international research projects have already designed
and implemented first prototypes for appropriate in-
frastructures.
Actually, today’s systems often stick to static en-
vironments instead of developing flexible and dy-
namic solutions enabling ad-hoc collaborations with
on-demand application and data sharing.
Future developments need to take service and re-
source virtualisation more into account to hide the
complexity of the underlying technologies from the
users/customers but also to allow on-the-fly modifi-
cations of internal interfaces and/or enhancement of
existing functionalities while simultaneously keeping
and guaranteeing dynamicity, scalability, and perfor-
mance of the available services/resources. To develop
applicable environments for daily business processes,
current systems need to be enhanced with reliable
models and tools to monitor user activities including
data requests or service invocations, and in fact, with
sophisticated mechanisms to price and account corre-
sponding user interactions according to current mar-
ket variances or surprisedly changing user require-
ments.
In this paper, we have presented the approaches
of two running European research projects CoSpaces
and ViroLab, which are trying to overcome the com-
plex problem of building dynamic infrastructures for
secure data management and data integration. We
have identified some key features and requirements,
which should be considered during the design and
development phase of such environments and which
may lead to better and much easier to handle solutions
not limited to research but also for the next generation
of intercontinental business collaborations.
ICEIS 2008 - International Conference on Enterprise Information Systems
262
ACKNOWLEDGEMENTS
The results presented in this paper are partially funded
by the European Commission under contract IST-5-
034245 through the project CoSpaces and through the
support of the ViroLab Project Grant 027446. The au-
thors want to thank all who contributed to this paper,
especially the members of both consortiums.
REFERENCES
Assel, M. and Kipp, A. (2007). A secure infrastructure
for dynamic collaborative working environments. In
Proceedings of the 2007 International Conference on
Grid Computing & Applications, GCA 2007, June 25-
28 2007, Las Vegas, Nevada, USA. CSREA Press.
Assel, M. and Krammer, B. (2007). Towards innovative
healthcare grid solutions: Virolab - a virtual labora-
tory for infectious diseases. In Proceedings of the
German e-Science Conference 2007, May 02-04 2007,
Baden-Baden, Germany. Max Planck Digital Library.
Assel, M., Krammer, B., and Loehden, A. (2007). Data ac-
cess and virtualization within virolab. In Proceedings
of the 6th Cracow Grid Workshop 2006 (CGW06),
Oct. 15-18 2006, Krakow, Poland. ACC-Cyfronet
AGH.
Assel, M., Krammer, B., and Loehden, A. (2008). Man-
agement and access of biomedical data in a grid en-
vironment. In Proceedings of the 7th Cracow Grid
Workshop 2007 (CGW07), Oct. 16-18 2007, Krakow,
Poland. ACC-Cyfronet AGH.
Barton, T., Basney, J., Freeman, T., Scavo, T., Sieben-
list, F., Welch, V., Ananthakrishnan, R., Baker, B.,
Goode, M., and Keahey, K. (2006). Identity federation
and attribute-based authorization through the globus
toolkit, shibboleth, grid- shib, and myproxy. In Pro-
ceedings of 5th Annual PKI R&D Workshop, April 04-
06 2006, Gaithersburg, USA. NIST Interagency Re-
ports.
Elmroth, E., Gardfjll, P., Norberg, A., Tordsson, J., and
stberg, P.-O. (2007). Designing general, composable,
and middleware independent grid infrastructure tools
for multi-tiered job management. In Proceedings of
the CoreGrid Symposium, Aug. 28-31 2007, Rennes,
France. Springer-Verlag.
Gubala, T., Balis, B., Malawski, M., Kasztelnik, M.,
Nowakowski, P., Assel, M., Harezlak, D., Bartynski,
T., Kocot, J., Ciepiela, E., Krol, D., Wach, J., Pelczar,
M., Funika, W., and Bubak, M. (2008). Virolab vir-
tual laboratory. In Proceedings of the 7th Cracow Grid
Workshop 2007 (CGW07), Oct. 16-18 2007, Krakow,
Poland. ACC-Cyfronet AGH.
Hasselmeyer, P., Qu, C., Schubert, L., Koller, B., and
Wieder, P. (2006). Towards autonomous brokered sla
negotiation. In Proceedings of the eChallenges 2006
(e-2006) Conference, Oct. 25-27 2006, Barcelona,
Spain. IOS Press.
Horstmann, T. and Bentley, R. (1997). Distributed author-
ing on the web with the bscw shared workspace sys-
tem. StandardView, 5(1):9–16.
Libin, P., Deforche, K., Laethem, K. V., Camacho, R., and
Vandamme, A.-M. (2007). Regadb: An open source,
community-driven hiv data and analysis management
environment. In Proceedings of the Fifth European
HIV Drug Resistance Workshop, March 2007, Cas-
cais, Portugal. Reviews in Antiretroviral Therapy.
Schubert, L., Wesner, S., and Dimitrakos, T. (2005). Secure
and dynamic virtual organizations for business. In
Proceedings of the eChallenges 2005 (e-2005) Con-
ference, Oct. 19-21 2005, Ljubljana, Slovenia. IOS
Press.
Sloot, P., Boucher, C., Bubak, M., Hoekstra, A., Plaszczak,
P., Posthumus, A., van de Vijver, D., Wesner, S., and
Tirado-Ramos, A. (2006). Virolab - a virtual labora-
tory for decision support in viral diseases treatment. In
Proceedings of the 5th Cracow Grid Workshop 2005
(CGW05), Nov. 20-23 2005, Krakow, Poland. ACC-
Cyfronet AGH.
Wilson, M., Chadwick, D., Dimitrakos, T., Doser, J., Gi-
ambiagi, A. A. P., Golby, D., Geuer-Pollmann, C.,
Haller, J., Ketil, S., Mahler, T., Martino, L., Par-
ent, X., Ristol, S., Sairamesh, J., and Schubert, L.
(2005). The trustcom framework v0.5. In Proceed-
ings of the 6th IFIP Working Conference on Virtual
Enterprises (PRO-VE ’05), Sep. 26-28 2005, Valen-
cia, Spain. Springer-Verlag.
DATA MANAGEMENT AND INTEGRATION WITHIN COLLABORATIVE WORKING ENVIRONMENTS
263