Enhancing Collaboration in Big Biomedical Data Settings

Knowledge Visualization, Data Mining and Decision Making Issues

Nikos Karacapilidis

, Georgia Tsiliki

and Manolis Tzagarakis

Computer Technology Institute and Press "Diophantus" and University of Patras, 26504 Rio Patras, Greece

Biomedical Research Foundation, Academy of Athens, 11527 Athens, Greece

Keywords: Big Data, Data Mining, Decision Support Systems, Collaboration, Knowledge Visualization, Data and

Information Quality, Situational Awareness, Biomedical Data.

Abstract: Biomedical researchers need to efficiently and effectively collaborate and make decisions by meaningfully

assembling, mining and analyzing available large-scale volumes of complex multi-faceted data residing in

different sources. Arguing that dealing with data-intensive and cognitively complex settings is not a

technical problem alone, this paper reports on the development and practical use of an innovative web-based

collaboration support service in a biomedical research context. The proposed service builds on the synergy

between machine and human intelligence to facilitate and augment the underlying knowledge management,

data mining and decision making processes. Evaluation results indicate that the service enables stakeholders

to make more informed decisions, by displaying the aggregated information according to their needs.

1 INTRODUCTION

Collaboration support technologies and platforms

are crucial in today’s biomedical research settings,

where multidisciplinary communities ranging from

biologists to bioinformaticians need to assimilate

clinico-genomic research information and scientific

findings and explore diverse associated issues (Ekins

et al., 2011). At the same time, biomedical research

is associated with large-scale amounts of multiple

types of data, obtained from diverse and distributed

sources. An ever-increasing volume of biomedical

resources, including multiple types of data sets and

analysis tools, are available on the web. For

instance, recent technology advances in Next

Generation Sequencing (NGS) platforms entail an

exponential increase in the size and number of

experimental data sets available (Quail et al., 2012).

However, in most cases, the raw information is so

overwhelming that researchers are often at a loss to

even know where to begin to make sense of it.

This paper reports on a web-based collaboration

support service that aims to fully cover the diversity

of requirements in contemporary biomedical

research settings by providing a series of innovative

features. Firstly, the service provides advanced

collaboration support functionalities through

innovative virtual workspaces based on alternative

data visualizations schemas. Secondly, it is able to

meaningfully accommodate the outcomes of latent

knowledge mining services in a collaboration

session, thus offering added value concerning

recognition of biomedical data patterns. Thirdly, by

supporting emergent semantics and the incremental

formalization of argumentative collaboration, it

augments individual and collective decision making.

By providing ease-of-use and expressiveness for

users and advanced reasoning by the machine, the

service also provides appropriate recommendation

mechanisms that enable stakeholders to project their

future actions in their dynamic working settings.

The proposed platform has been developed in the

context of an FP7 EU research project, namely

Dicode (http://dicode-project.eu/), which exploits

and builds on prominent high-performance

computing paradigms and large data processing

technologies to facilitate and augment collaboration

and decision making in data-intensive and

cognitively-complex settings.

The remainder of this paper is structured as

follows: Section 2 reports on related work and

highlights existing problems and requirements;

Section 3 sketches the overall approach followed in

the Dicode project, putting emphasis on knowledge

visualization, data mining and decision making

issues; Section 4 provides an illustrative example to

demonstrate the use of the proposed service in a

Karacapilidis N., Tsiliki G. and Tzagarakis M..

Enhancing Collaboration in Big Biomedical Data Settings - Knowledge Visualization, Data Mining and Decision Making Issues.

DOI: 10.5220/0004492100230031

In Proceedings of the 2nd International Conference on Data Technologies and Applications (DATA-2013), pages 23-31

ISBN: 978-989-8565-67-9

 2013 SCITEPRESS (Science and Technology Publications, Lda.)

biomedical setting; finally, Section 5 concludes by

discussing related remarks and outlining evaluation

results.

2 RELATED WORK

Easy visualization and analysis of big biomedical

data is a highly important requirement in the settings

under consideration. Towards fulfilling it, a series of

applications and web services that link together

bioinformatic tools and databases have recently

emerged. For instance, BioGRID (Stark et al., 2006),

BNDB (Birkl and Yona, 2006) and BioMart

(Guberman et al., 2011) are repositories which store

readily combined data sets and provide platforms to

easily visualize such data. Similarly, the

GenePattern platform provides access to more than

180 tools for genomic analysis to enable

reproducible in silico research (http://

www.broadinstitute.org/cancer/software/genepattern

/). In addition, many collaborative resource sharing

networks have been established, e.g. the eagle-i

consortium (https://www.eagle-i.net/), to address the

researchers’ data sharing needs and accelerate the

discovery of new knowledge. Integration of these

separate systems and resources into a single flexible

infrastructure that streamlines heterogeneous

workloads is a challenging task.

At the same time, a number of projects and

initiatives aim at addressing diverse collaboration

requirements in a variety of biomedical contexts. For

instance, GRANATUM (http://granatum.org) tries to

bridge the information, knowledge and collaboration

gap by providing integrated access to the globally

available data resources needed to perform complex

cancer chemoprevention experiments and conduct

studies on large-scale datasets; Health-e-Child

(http://www.health-e-child.org) gives clinicians a

comprehensive view of a child’s health by

integrating biomedical data, information and

knowledge that spans the entire spectrum from

imaging to genetic to clinical and epidemiological

data; Virolab (http://www.virolab.org) offers a user

friendly environment to facilitate tasks such as data

archiving, data integration, data mining and

simulation; finally, SIMBioMS (http://

simbioms.org) is a multi-module solution for

biomedical data management that is able to

accommodate experiments requiring non-

conventional data storage solutions.

While certainly helpful in addressing specific

biomedical subjects, the above projects and

initiatives do not deal with big data issues;

moreover, they do not exploit the synergy between

human and machine intelligence in order to

meaningfully accommodate and interpret the results

of the associated data mining services through an

environment that facilitates and enhances

collaboration among stakeholders.

As the number of related Web services is

constantly increasing, their proper integration

becomes a critical issue. A few approaches have

been already launched to facilitate the collaboration,

data sharing and decision making among scientists

by providing them with a platform to share

resources. A well known example of this category of

related work is myExperiment (Goble et al., 2010),

an online research environment that supports the

social sharing of bioinformatics workflows, i.e.

procedures consisting of a series of computational

tasks, which can then be reused according to their

specific requirements. Another representative

example is BioCatalogue (http://

www.biocatalogue.org/), which is a registry of web

services that allows users to annotate and comment

on the available services in order to assist them in

identifying the more suitable ones (services are

presented in terms of their functions, data types and

resources). A third example is MethodBox

(https://www.methodbox.org/), which enables

researchers to browse and download data sets, share

methods and scripts, find fellow researchers with

similar interests and share knowledge. Instead of

workflows, MethodBox users share statistical

methods for epidemiology and public health

research. Finally, the Galaxy Project

(http://galaxy.psu.edu/) offers a web-based platform

allowing researchers to perform and share their

analyses. In any case, approaches of this category

demonstrate a set of limitations, mainly concerning

incorporation of collective intelligence and

flexibility in the integration of services offered.

Moreover, they lack mechanisms for a meaningful

integration of data mining services to appropriately

support tasks such as the discovery of patterns and

dependencies within big data sets, which are very

common in the biomedical research domain.

As results from the above, collaboration in the

biomedical domain involves assembling and

analyzing big volumes of complex multi-faceted

data. In this context, a holistic approach integrating

collaboration, new knowledge co-production,

decision making and data mining services is

required. Biomedical researchers need a highly

flexible service that enables them to easily and

meaningfully embed data mining in their

collaborative data analysis and decision making

DATA2013-2ndInternationalConferenceonDataManagementTechnologiesandApplications

process. This service should first of all be focused

on improving efficiency; it has to improve current

work practices that are often characterized by

fragmentation of information and multiple disparate

tools in use; it also has to improve the quality and

speed of the current processes, paying much

attention to data and decision provenance issues.

Secondly, such a service should be focused on

improving effectiveness, in that it enables

stakeholders figuring out how to carry out their daily

tasks better; for instance, how to improve their work

methodologies when elaborating and interpreting big

biomedical data residing in diverse sources. Finally,

such a service should enable stakeholders transform

their work, thus leading to new processes, innovative

work methodologies, and new insights. The solution

described in the next section is geared towards this

direction.

3 COLLABORATION IN DICODE

Dicode provides a novel Web-based collaboration

support service with advanced knowledge

management, data mining and decision making

functionalities. The service enables the seamless

integration of these functionalities and allows their

interoperation from both a technical and conceptual

point of view. In this regard, semantics techniques

have been exploited to define an ontological

framework for capturing and representing the

diverse stakeholder and associated data perspectives.

3.1 Knowledge Visualization Issues

Collaboration in Dicode brings together two

paradigms: the Web 2.0 paradigm, which builds on

flexible rules favouring ease-of-use and human

interpretable semantics, and the traditional decision

support paradigm, which requires rigid rules that

reduce ease-of-use but render machine interpretable

semantics. To achieve this, our approach builds on a

conceptual framework, where formality and the level

of knowledge structuring during collaboration is not

considered as a predefined and rigid property, but

rather as an adaptable aspect that can be modified to

meet the needs of the tasks at hand. By the term

formality, we refer to the rules enforced by the

system, with which all user actions must comply.

Allowing formality to vary within the collaboration

space, incremental formalization, i.e. a stepwise and

controlled evolution from a mere collection of

individual ideas and resources to the production of

highly contextualized and interrelated knowledge

artifacts and finally decisions, can be achieved

(Shipman and McCall, 1994).

Dicode offers alternative visualizations of the

collaboration space (called ‘Dicode views’), which

comply with the incremental formalization concept.

Each Dicode view provides the necessary

mechanisms to support a particular level of

formality. The more informal a view is, the greater

easiness-of-use is implied. At the same time, the

actions that users may perform are intuitive and not

time consuming; however, the overall context is

human (and not system) interpretable. On the other

hand, the more formal a view is, the smaller

easiness-of-use is rendered; the actions permitted are

less and less intuitive and more time consuming. The

overall context in this case is both human and

system interpretable (Karacapilidis and Tzagarakis,

2012). The views that are particularly interesting in

the context of this paper are:

 Mind-map View: a collaboration space is

displayed as a mind map (Figure 1), where

users can interact with the items uploaded so

far. The map deploys a spatial metaphor

permitting the easy movement and arrangement

of items on the collaboration space. The aim of

this view is to support information triage

(Marshall and Shipman, 1997), i.e. the process

of sorting and organizing through numerous

relevant materials and organizing them to meet

the task at hand.

 Formal View: this view enables the posting of

predefined knowledge items, which adhere to a

specific argumentation model (i.e., IBIS (Kunz

and Rittel, 1970)). It invokes a set of dedicated

scoring and reasoning mechanisms aiming to

aid users conceive the outcome of a particular

collaborative session and receive support

towards reaching a decision (Figure 2).

In the ‘mind-map view’ of the collaboration

space, stakeholders may organize their collaboration

through dedicated item types such as ‘ideas’, ‘notes’,

‘comments’ and ‘services’. Ideas stand for items that

deserve further exploitation; they may correspond to

an alternative solution to the issue under

consideration and they usually trigger the evolution

of the collaboration. Notes are generally considered

as items expressing one’s knowledge about the

overall issue, an already asserted idea or note.

Comments are items that usually express less strong

statements and are uploaded to express some

explanatory text or point to some potentially useful

information. Multimedia resources can also be

uploaded into the mind-map view (the content of

which can be displayed upon request or can be

EnhancingCollaborationinBigBiomedicalDataSettings-KnowledgeVisualization,DataMiningandDecisionMaking

Issues

directly embedded in the workspace).

A detailed description of the knowledge

visualization related technologies can be found in

(Karacapilidis et al., 2011).

3.2 Data Mining Issues

In the ‘mind-map view’, service items enable users

to configure, launch and monitor the execution of

external data mining services from within the

collaboration workspace, and allow the automatic

upload of their results into the workspace when the

execution of these services terminates.

As with any other item type, users may specify a

title and a content, which gives the ability to attach a

longer description to the item, when such an item

type is uploaded into the collaboration workspace.

Once uploaded, users may configure the item and

specify which data mining service it corresponds to.

The set of available data mining services with which

the service item can be associated is stored centrally

at a dedicated registry (the exploitation of a specific

data mining service, namely Subgroup Discovery, is

discussed in Section 4). Once service items on the

collaboration workspace have been configured, they

can be executed by supplying the required

parameters to the data mining service. Visual cues

indicate the status of service items: whether they

have yet to be executed, are currently executing or

have finished their execution.

Service items appearing on collaboration

workspaces can be explicitly related with other items

in the workspace via relationships or can be spatially

arranged and grouped. When the execution of a

service terminates, the results – consisting of one or

more files – are automatically uploaded into the

collaboration workspace and explicitly connected to

the service item whose execution produced them.

Once the results of service items are available

(i.e., uploaded into the collaboration workspace),

users can treat them as regular items. Furthermore,

users may rate service items in order to indicate

which service they consider as the most useful one

in the context of the discourse.

More details on the data mining technologies

exploited in our approach can be found in (Tsiliki et

al., 2012).

3.3 Decision Making Issues

In the ‘formal view’ of the collaboration space, the

available knowledge item types include ‘issues’,

‘alternatives’, ‘positions’, and ‘preferences’. Issues

correspond to problems to be solved, decisions to be

made, or goals to be achieved. For each issue, users

may propose alternatives (i.e. solutions to the

problem under consideration) that correspond to

potential choices. Positions are asserted in order to

support the selection of a specific course of action

(alternative), or avert the users’ interest from it by

expressing some objection. A position may also

refer to another (previously asserted) position, thus

arguing in favour or against it.

Finally, preferences provide individuals with a

qualitative way to weigh reasons for and against the

selection of a certain course of action. A preference

is a tuple of the form [position, relation, position],

where the relation can be “more important than” or

“of equal importance to” or “less important than”.

The use of preferences results in the assignment of

various levels of importance to the alternatives in

hand. Like the other discourse elements, they are

subject to further argumentative discourse.

The above four semantic types of items enable

users to contribute their knowledge on the particular

problem or need (by entering issues, alternatives and

positions), as well as to express their relevant values,

interests and expectations (by entering positions and

preferences). Moreover, this view continuously

processes the elements entered by the users (by

triggering its reasoning mechanisms each time a new

element is entered), thus facilitating users to become

aware of the elements for which there is (or there is

not) sufficient (positive or negative) evidence, and

accordingly conduct the discussion in order to reach

consensus.

Alternatives, positions and preferences have an

activation label indicating their current status (they

can be active or inactive). This label is calculated

according to the argumentation underneath and the

type of evidence specified for them. Active elements

are taken into account in a scoring mechanism that

calculates the weight of each alternative expressed

and indicates the one that prevails each time (for

more details on the decision making algorithms

used, see (Karacapilidis and Papadias, 2001)).

Dicode collaboration spaces can be transformed

at any time from one view into another. During such

transformations, the semantically enriched item

types available in one view are transformed into the

respective item types of the desired destination view.

The transformation is rule-based; such rules can be

defined by users participating in a collaboration

session and reflect the evolution of a community’s

collaboration needs. After a transformation into the

desired view occurs, the collaboration may continue

in this view, with the users being able to exploit the

item types available in order to keep conducting the

DATA2013-2ndInternationalConferenceonDataManagementTechnologiesandApplications

discourse in the desired formality level and take

advantage of the provided functionality.

4 AN EXAMPLE OF USE

To better illustrate the use of the proposed Web-

based collaboration support service and in particular

how the available functionalities can be used in the

biomedical context, we present a scenario which is

indicative of the way researchers collaborate in this

field, where research is carried out by

multidisciplinary teams consisting of biologists,

medical doctors, clinical researchers and

statisticians, each of which contributes from his/her

perspective to the problem being discussed.

Figure 1 shows the collaboration workspace

operated in the ‘mind-map view’, where a team of

three researchers is discussing an issue related to

breast cancer research. In particular, they are

collaborating in order to determine how to augment

existing datasets in order to study how Tamoxifen

(Tam) resistant cells modulate global gene

expression.

Tam is a widely used antagonist of the estrogen

receptor, whereas its resistance is a well-known

obstacle to successful breast cancer treatment

(Huber-Keener et al., 2012). While adjuvant therapy

with Tam has been shown to significantly decrease

the rate of disease recurrence and mortality,

recurrent disease occurs in one third of patients

treated with Tam within 5 years of therapy. The

team initially selected and analyzed gene-expression

data from 300 patient samples. These data are

derived from whole human genome expression

arrays (Affy U133A Plus 2.0, see

http://www.affymetrix.com). Although the sample is

relatively large, they believe that augmenting the

data with publicly available data will be a good idea

for obtaining statistically significant results.

All participating researchers may upload into the

collaboration workspace items to express their

opinion on the issue being discussed. In the instance

shown in Figure 1, they have uploaded items of type

‘idea’ to propose additional data sets (“Consider also

Next Generation Sequencing (NGS) data” (Figure 1

- (a)) and “Work with gene-expression data (Figure

1 - (b))). Participants have responded to the

proposed alternatives (ideas) by uploading items and

connecting them via arrows to other items to which

they refer. Participants may also change an arrow’s

colour to indicate the semantics of the relationship:

green-coloured arrows express arguments in favour,

red-coloured arrows express arguments against,

Figure 1: Workspace (‘mind-map view’) showing collaboration between biomedical researchers. Service items (d) and (e)

have finished their execution and are associated with other collaboration items.

EnhancingCollaborationinBigBiomedicalDataSettings-KnowledgeVisualization,DataMiningandDecisionMaking

Issues

while grey-coloured arrows indicate neutrality.

Furthermore, they can aggregate items on the

workspace by drawing coloured rectangles around

them and give a title to the groupings. For example,

the orange-coloured rectangle with title

“Supplementary information” (Figure 1 - (c)) groups

together bibliographic resources that the team has

obtained from external repositories and are relevant

to their research.

As the discussion evolves, the team thinks about

exploiting the Subgroup Discovery (SD) data mining

algorithm (Atzmueller et al., 2005) using both data

sets as input. SD is the task of finding patterns that

describe subsets of a data set that are highly

correlated relative to a target attribute. This is a

popular approach for identifying interesting patterns

in the data, since it combines a sound statistical

methodology with an understandable representation

of patterns. For example, in a group of patients that

did or did not respond to specific treatment, an

interesting subgroup may be that patients who are

older than 60 years and do not suffer from high

blood pressure respond much better to the treatment

than the average.

To invoke the SD algorithm on the NGS data,

they upload the associated service item into the

workspace (Figure 1 - (e)) and start configuring the

service. Configuring the service includes the

specification of the URI for the REST-based SD

service and specification of parameters such as input

file, number of rules to be used, service ontology,

and minimum number of subgroups to be retrieved

(more details can be found in (Tsiliki et al., 2012)).

After configuring the service, they trigger its

execution. As long as the SD service is executing,

the icon representing the service appears with a

green colour. Upon successful termination of the SD

service, the icon changes its colour to orange and the

results are automatically uploaded into the

collaboration workspace (Figure 1 - (f)). To clearly

indicate the execution of the SD service on the gene

expression data and the results it returned, the team

groups together the relevant items and supplies a

descriptive title (“Apply data mining to NGS data”).

The team can now assess the output of the SD

execution by commenting on the results and

connecting them to other items in the collaboration

workspace (Figure 1 - (g)).

The team can follow the same procedure

(invoking the SD service and collectively assessing

its output) for the gene expression data. The three

researchers are able to carefully examine the

commonalities between the two SD runs (on gene

expression and NGS data) and share their insights.

As the collaboration continues and more items

are added to the collaboration space, the team

decides to switch to a different view, in order to

reach a decision. For this, they decide to transform

the ‘mind-map view’ into the ‘formal view’, which

provides elaborated scoring and reasoning

mechanisms that further facilitate the decision

making process. By transforming the ‘mind-map

Figure 2: The ‘formal view’ of the collaboration illustrated in Figure 1.

DATA2013-2ndInternationalConferenceonDataManagementTechnologiesandApplications

view’ of the collaboration workspace, all available

semantic types – including the service items - are

transformed into the appropriate types of the ‘formal

view’, based on well-specified rules.

Figure 2 shows the collaboration space in the

‘formal view’. As noted above, the team can

continue the collaboration in this view by adding

more items (each time a new item is added, the

reasoning mechanism is triggered). Furthermore, it

allows the team to see which is the best argumented

alternative solution (or ‘winning’ solution) by

highlighting it using visual cues. Based on the

current state of the collaboration, the currently

‘winning’ solution is the alternative “Consider also

NGS data” (item in blue underlined font colour in

Figure 2).

5 DISCUSSION & CONCLUSIONS

5.1 Evaluation Issues

Dicode has been already introduced in three real-life

settings (i.e. the biomedical research assimilator,

decision making on clinical treatment effects, and

opinion mining from unstructured Web 2.0 data) for

a series of pilot experimentations. For the setting

considered in this paper, 61 users from 4 European

countries participated in a detailed evaluation of the

proposed service. The above users had a varying

level of hands-on experience in related technologies

(ranging from ‘early adopters’ to semi-experienced

and novice users); their background was on

disciplines such as Bioinformatics, Biology and

Computer Science. Feedback requested was of both

quantitative and qualitative type. Answers to the

quantitative questions of the questionnaires were

given for ordinal data in a 1-5 scale (questions

concerning the quality, acceptability and

accessibility of the service), where 1 stands for ‘I

strongly disagree’ and 5 for ‘I strongly agree’, and

for continuous numerical data (scale data) in a 0-10

scale (questions concerning the services’ usability),

where 0 stands for ‘none’ and 10 for ‘excellent’.

As far as the overall quality of the proposed

collaboration support service is concerned (Table 1),

the evaluators agreed that: the objectives of the

service are met (median=4, mode=3), the service is

novel to their knowledge (median=4, mode=4), they

are satisfied with the performance of the service

(median=4, mode=4), and they are overall satisfied

with this service (median=4, mode=4). The

evaluators seemed to be to some extent sceptical as

Table 1: Overall Quality Descriptive Statistics for the

Dicode Collaboration Support Service.

Question Median

Median

interpret

ation

Mode

interpret

ation

Q1: The service is able

to address data intensive

decision making issues

3 neutral 3 neutral

Evaluator confidence on

3 high 2 medium

Q2: The objectives of

the service are met

4 agree 3 neutral

Evaluator confidence on

3 high 3 high

Q3: The service is novel

to my knowledge

4 agree 4 agree

Evaluator confidence on

3 high 3 high

Q4: I am satisfied with

the performance of the

service

4 agree 4 agree

Evaluator confidence on

3 high 3 high

Q5: Overall, I am

satisfied with this

service

4 agree 4 agree

Evaluator confidence on

3 high 3 high

to whether the service is able to address the data

intensive decision making issues (median=3,

mode=3).

With respect to the acceptability of the service,

the evaluators overall agreed that the service has all

the functionality they expected (median=4,

mode=3), the interface of the service is pleasant

(median=4, mode=4) and that they will recommend

this service to their peers/community (median=4,

mode=3).

The analysis of qualitative evaluation results

showed that, overall, reviewers found the service

“promising”, “easy and intuitive”, as well as “very

useful for a complex use case”. However, a few

technical and documentation issues were raised,

such as: “A bit slow loading time both for the

workspace list and the mind-map view”; “The

arrows’ graphics were not very pleasant for me: they

start from the middle of the icon and not from the

beginning of the square ... the overall idea however,

is quite good”; “I got a bit confused until I fully

understand what I had to do”; “I often missed some

system information”.

Such findings reveal the need for more detailed

documentation of the service, as well as for

provision of help files and system messages.

EnhancingCollaborationinBigBiomedicalDataSettings-KnowledgeVisualization,DataMiningandDecisionMaking

Issues

5.2 Final Remarks

The service described in this paper offers an

innovative environment that allows users “immerse”

in Web 2.0 interaction paradigms and exploit its

enormous potential to collaborate through

reviewing, commenting on and extending the shared

content. The Dicode environment enables

stakeholders maintain chains of views and opinions,

accompanied by the supporting data, which may

reflect, at any time, the current collective knowledge

on the issue under consideration, and justify a

particular decision made or action taken.

The proposed service may fully cover the needs

of the three stages of situational awareness needed in

the above settings (Haendel et al., 2012; Kahn,

2011), namely perception (i.e. perceive the status,

attributes, and dynamics of relevant elements in the

setting under consideration), comprehension (i.e.

perform a synthesis of disjointed elements of the

previous stage through the processes of pattern

recognition, interpretation, and evaluation), and

projection (i.e. extrapolate information from

previous stages to find out how it will affect future

instances of the operational setting) (Endsley, 1995).

Moreover, the development of the proposed service

has adopted an agile, analytic and adaptive approach

that enables stakeholders to fully leverage and reap

the benefits of the associated biomedical “big data”.

Such an approach can improve the quality and

effectiveness of decisions in the context under

consideration.

The service described in this paper has been

integrated in the Dicode workbench environment (de

la Calle et al., 2012), which is a web-based

application that integrates - at the level of the user

interface - various data mining and collaboration

support services. The objective is to provide users

with a uniform and easy access to the available

Dicode services. The type and number of services

appearing on the Dicode workbench can be easily

configured by end users according to the needs of

the particular context and problem under

consideration. In such a way, current work practices

have been admittedly improved in terms of

efficiency and effectiveness. The issue of

information fragmentation as well as that of data and

decision provenance are properly addressed.

Moreover, by providing users with useful hints, our

approach enables stakeholders figuring out how to

carry out their daily tasks in a more effective way.

Finally, the proposed service enables stakeholders to

follow and adopt innovative work methodologies,

which build on the synergy of human and machine

reasoning.

Future work directions include investigation of

additional services for data-intensive computing

(e.g. services already developed in projects such as

ADMIRE - http://www.admire-project.eu),

considering whether they can be integrated in the

Dicode environment. Also, a thorough investigation

of the Dataspace concept and the related data

management abstraction (Halevy et al., 2006),

considering its suitability to the purposes of our

approach.

ACKNOWLEDGEMENTS

This publication has been produced in the context of

the EU Collaborative Project “DICODE - Mastering

Data-Intensive Collaboration and Decision”, which

is co-funded by the European Commission under the

contract FP7-ICT-257184. This publication reflects

only the authors’ views and the Community is not

liable for any use that may be made of the

information contained therein.

REFERENCES

Atzmueller, M., Puppe, F. and Buscher, H.P. 2005.

Exploiting background knowledge for knowledge-

intensive subgroup discovery. In Proceedings of

IJCAI’05, 647-652.

Birkl, A., and Yona, G. 2006. Biozon: a hub of

heterogeneous biological data. Nucleic Acids

Research, 34:2006.

de la Calle, G., Alonso-Martinez, E., Tzagarakis, M. and

Karacapilidis, N. 2012. The Dicode Workbench: A

Flexible Framework for the Integration of Information

and Web Services. In Proceedings of the 14th

International Conference on Information Integration

and Web-based Applications & Services (iiWAS2012),

Bali, Indonesia, December 3-5, 2012, pp. 16-25.

Ekins, S., Hupcey, M., Williams, A. (Eds), 2011.

Collaborative Computational Technologies for

Biomedical Research, John Wiley & Sons, Inc.

Endsley, M. R., 1995. Toward a theory of situation

awareness in dynamic systems. Human Factors 37(1),

32–64.

Guberman, J. M., Ai, J., Arnaiz, O., Baran, J., Blake, A.,

Baldock, R., Chelala, C., Croft, D., Cros, A., Cutts, R.

J. et al. 2011. Biomart central portal: an open database

network for the biological community. Database,

2011:bar041.

Goble, C. A, Bhagat, J., Don Cruickshank, S. A.,

Michaelides, D., Newman, D., Borkum, M.,

Bechhofer, S., Roos, M., Li, P. and De Roure, D. D.

2010. myExperiment: a repository and social network

DATA2013-2ndInternationalConferenceonDataManagementTechnologiesandApplications

for the sharing of bioinformatics workflows. Nucleic

Acids Research, 38:W677–W682.

Haendel, M. A., Vasilevsky, N. A., Wirz, J. A. 2012.

Dealing with Data: A Case Study on Information and

Data Management Literacy. PLoS Biol. 10, 5,

e1001339.

Halevy, A., Franklin, M. and Maier, D. 2006. Principles of

Dataspace Systems. In Proceedings of the 25th ACM

SIGMOD-SIGACT-SIGART Symposium on Principles

of Database Systems (PODS 2006), Chicago, IL, USA,

pp. 1-9.

Huber-Keener, K. J., Liu, X., Wang, Z. et al. 2012.

Differential Gene Expression in Tamoxifen-Resistant

Breast Cancer Cells Revealed by a New Analytical

Model of RNA-Seq Data. PLoS ONE. 7, 7, e41333.

DOI= doi:10.1371/journal.pone.0041333

Kahn, S. D. 2011. On the future of genomic data. Science.

331, 728, DOI: 10.1126/science.1197891

Karacapilidis, N. and Papadias, D. 2001. Computer

Supported Argumentation and Collaborative Decision

Making: The HERMES system. Information Systems,

26(4), 259-277.

Karacapilidis, N., Karousos, N., Tzagarakis, M. and

Christodoulou, S. 2011. Mitigating the cognitive

overload of contemporary argumentation-based

collaboration settings. In Proceedings of the 7th

International Conference on Collaborative

Computing: Networking, Applications and

Worksharing (CollaborateCom 2011), Orlando, FL,

USA, October 15-18, 2011, pp. 516-519.

Karacapilidis, N. and Tzagarakis, M. 2012. Towards a

Seamless Integration of Human and Machine

Reasoning in Data-Intensive Collaborative Decision

Making Settings: The Dicode Approach. In Proc. of

the 16th IFIP WG8.3 International Conference on

Decision Support Systems (DSS 2012), IOS Press,

Amsterdam, 223-228.

Kunz, W. and Rittel, H. W. J. 1970. Issues as elements of

information systems. Working Paper 131, Institute of

Urban and Regional Development, University of

California.

Marshall, C. and Shipman, F. 1997. Spatial hypertext and

the practice of information triage. In Proc. of the 8th

ACM Conference on Hypertext, Southampton, UK,

124–133.

Quail, A. Q., Smith, M., Coupland, P., Otto T. D., Harris,

S. R., Connor, T.R., Bertoni, A., Swerdlow, H. P., Gu,

Y. 2012. A tale of three generation sequencing

platforms: comparison of Ion Torrent, Pacific

Biosciences and Illumina MiSeq sequencers. BMC

Genom. 13, 431, DOI: 10.1186/1471-2164-13-341.

Shipman, F.M. and McCall, R. 1994. Supporting

knowledge-base evolution with incremental

formalization. In Proc. CHI 94 Conference, 285–291.

Stark, C., Breitkreutz, B. J., Reguly, T., Boucher, L.,

Breitkreutz, A. and Tyers, M. 2006. BioGRID: a

general repository for interaction datasets. Nucleic

Acids Research, 34:D535–D539.

Tsiliki, G., Kossida, S., Friesen, N., Rüping, S.,

Tzagarakis, M. and Karacapilidis, N. 2012. Data

mining based collaborative analysis of microarray

data. In Proceedings of the 24th IEEE International

Conference on Tools with Artificial Intelligence

(ICTAI 2012), Athens, Greece, November 7-9, 2012,

pp. 682-689.

EnhancingCollaborationinBigBiomedicalDataSettings-KnowledgeVisualization,DataMiningandDecisionMaking

Issues