Exploring Data Fusion under the Image Retrieval Domain

adia P. Kozievitch

, Carmem Satie Hara

, Jaqueline Nande

and Ricardo da S. Torres

Dept. of Informatics, Federal University of Technology, Curitiba, Brazil

Dept. of Informatics, Federal University of Paran

a, Curitiba, Brazil

Institute of Computing, University of Campinas, Campinas, Brazil

Keywords:

Content-based Image Retrieval, Data Fusion, XML, Metadata.

Abstract:

Advanced services in data compression, data storage, and data transmission have been developed and are

widely used to address the required capabilities of an assortment of systems across diverse application do-

mains. In order to reuse, integrate, unify, manage, and support heterogeneous resources, a number of works

and concepts have emerged with the aim of facilitating aggregation of content and helping system developers.

In particular, images, along with existing Content-Based Image Retrieval services, have the potential to play

a key role in information systems, due to the large availability of images and the need to integrate them with

existing collections, metadata, and available image manipulation softwares and applications. In this work,

we explore a data fusion approach for solving data value conﬂicts in the context of image retrieval domain.

In particular, we target the process of solving value conﬂicts resulted from different features integrating the

data resulted from the Content-Based Image Retrieval process, along with the image metadata, provided from

a number of sources and applications. Our approach reduces the need of human intervention for keeping a

clean and integrated view of an image repository when new data sources are added to an image management

system.

1 INTRODUCTION

The process of organizing new information, and inte-

grate them with data acquired from external sources

are usually very time consuming tasks. Users in-

volved in managing these data are often looking for

ways to improve their productivity, and it is thus im-

portant to provide them with effective tools to reuse

and aggregate content.

Motivated by the need for integration and interop-

erability, a number of works have proposed the ag-

gregation of different information combined together

to compose a single logical object. The resulting

object has been denoted as Aggregation (Williams

and Suleman, 2003), Component-Based Object (San-

tanch

e and Medeiros, 2007; Santanch

e et al., 2007),

Complex Object (Nelson et al., 2001), and Compound

Object (Awre, 2009). In particular, images are a rep-

resentative example of a data source which is gener-

ally integrated and combined with different compo-

nents, such as metadata, links, videos, and image ma-

nipulation softwares and applications.

One common strategy used to support image

searches in large datasets is called Content-Based Im-

age Retrieval (CBIR) (Torres et al., 2006). Roughly,

the process can be divided in three steps: (1) feature

vectors that represent image visual properties (such as

color, texture, and shape) are extracted; (2) the simi-

larity between images are computed based on the dis-

tance between their feature vectors; and (3) the most

similar collection images are returned as the search

result.

In fact, there are a number of works (Akbar et al.,

2008; Nanni et al., 2011) that propose the integra-

tion or parallel use of several feature vectors, but few

that propose the integration of a general purpose data

fusion system in the context of CBIR, considering

both image and metadata. A fusion process involves

both entity resolution and cleaning. Entity resolution

refers to the problem of identifying overlapping data

in different sources. This problem has been the sub-

ject of extensive research on relational (Lim et al.,

1996), entity-relationship (Menestrina et al., 2006),

and XML (Poggi and Abiteboul, 2005) data mod-

els. Cleaning refers to the process of solving attribute

value conﬂicts. In particular, several issues can be

cited regarding integration of different sources in the

CBIR domain: (i) existence of duplicate images, (ii)

the use of different descriptors, (iii) the existence of

images with different transformations (crop, resolu-

171

P. Kozievitch N., Satie Hara C., Nande J. and da S. Torres R..

Exploring Data Fusion under the Image Retrieval Domain.

DOI: 10.5220/0004869901710178

In Proceedings of the 16th International Conference on Enterprise Information Systems (ICEIS-2014), pages 171-178

ISBN: 978-989-758-027-7

 2014 SCITEPRESS (Science and Technology Publications, Lda.)

tion, etc.), (iv) deﬁnition of different ranked lists (e.g.,

deﬁned in terms of different descriptors), (v) different

sizes of ranked lists, among others.

Most of existing systems for data fusion consider

data structured on relational format. Nevertheless,

given that XML has become the de facto standard for

data exchange on the Web, it is natural to also con-

sider this format in the fusion process.

In this paper we report on a system that combines

a general purpose data fusion system in the context

of CBIR. In particular, we target the process of solv-

ing value conﬂicts resulted from different features in-

tegrating the data resulted from the CBIR process,

along with the image metadata, provided from a num-

ber of sources and applications. The main novelty re-

sides in automatically solving conﬂicts in CBIR with-

out user intervention, using rules to integrate images,

metadata, and sources along the fusion process. Nev-

ertheless, these formalized rules can also be extended

to compare, and highlight the differences among dif-

ferent multimedia resources.

1.1 A Motivating Example

Consider an infrastructure (shown in Figure 1) which

has as input images and metadata and as output a

XML ﬁle, aggregating the input data with resulting

resources from the CBIR process. For instance, con-

sider Source 1, which stores the CBIR information

for Figure 2-a (a parasite image), within an XML ﬁle.

The XML ﬁle is composed of a feature vector FV1

(processed by a color descriptor – BIC), the similarity

scores SM1, an image IM1, and the respective meta-

data M1 (with attributes such as name, path, size, and

resolution).

Figure 1: Fusion process integrating CBIR sources.

Now consider Source 2, where the same im-

age (shown in Figure 2-b) was processed by another

descriptor (SASI texture descriptor). The resulting

XML ﬁle would include the feature vector FV2, with

the respective similarity scores SM2, and metadata

M2. Assume that both sources need to be integrated

and stored in a repository. That demands that existing

conﬂicts among entities need to be solved. The result-

ing XML ﬁle should automatically integrate the CBIR

information, and metadata on Source 1 and Source 2

within one XML tree structure.

(a) (b)

Figure 2: CBIR for image 01 ancylostoma.jpg, using the (a)

BIC (Stehling et al., 2002) descriptor and (b) SASI (Carka-

cioglu and Yarman-vural, 2001) descriptor.

In XFusion (Cecchin et al., 2010), the user deﬁnes

a set of rules for solving these conﬂicts after merging

the information imported from several data sources.

In this paper, using basic services from XML, we ex-

plore XFusion in order to aggregate different CBIR-

related data. The novelty resides on the use of fusion

rules to automate the decision process regarding con-

ﬂicted values, considering aggregation of metadata,

images, and CBIR-related data.

1.2 Organization

The remainder of this paper is organized as follows.

Section 2 contains a description of related work. An

overview of our solution is described in Section 3.

A case study is presented in Section 4. Finally, we

present our conclusions and draw future work in Sec-

tion 5.

2 RELATED WORK

2.1 Integration of Resources

Multiple deﬁnitions have been used (Kozievitch et al.,

2011a; Nelson et al., 2001) to name the integration

of resources into a single digital object as Aggrega-

tion (Williams and Suleman, 2003), a Component-

Based Object (Santanch

e and Medeiros, 2007; San-

tanch

e et al., 2007), a Complex Object (Nelson et al.,

2001; Lagoze et al., 2006), or a Compound Ob-

ject (Awre, 2009).

Several integration formats arise from different

communities (Nelson and de Sompel, 2006; Nelson

et al., 2001; Fox and France, 1997; Karpovich et al.,

1994; Burnett et al., 2006). In particular, Santanch

used the idea of integration within the ﬁeld of soft-

ware reuse and exchange (Santanch

e and Medeiros,

2007; Santanch

e et al., 2007), in a component-

based technology named Digital Content Component

(DCC). A DCC is composed of four distinct subdivi-

sions: (a) content, (b) structure,(c) interfaces, and

(d) metadata, used to manage different layers of the

object aggregations.

ICEIS2014-16thInternationalConferenceonEnterpriseInformationSystems

172

2.2 Images and Applications

If we consider image data, new challenges have

emerged to handle the Content-Based Image Retrieval

services. A typical solution for this service requires

the construction of image descriptors, which are

characterized by: (i) an extraction algorithm to en-

code image features into feature vectors; and (ii) a

similarity measure to compare two images based on

the distance between their corresponding feature vec-

tors. The similarity measure is a matching function,

which gives the degree of similarity for a given pair

of images represented by their feature vectors, often

deﬁned as a function of the distance (e.g., Euclidean),

that is, the larger the distance value, the less similar

the images.

There are several applications which support ser-

vices based on image content, allowing integration in

distinct domains (Murthy et al., 2010; Achananuparp

et al., 2007). And along with these applications, im-

ages can also be explored to integrate annotation sup-

port for image digital libraries such as (Jochum et al.,

2007). In particular, consider for the following exam-

ples a recently developed component-based CBIR in-

frastructure aiming to process and encapsulate images

and related data (Kozievitch et al., 2012; Kozievitch

et al., 2011b), presented in Figure 3. The bottom layer

contains the data sources: the descriptor library, the

image collection, and the database. The second layer

contains DCCs, which encapsulate, provide access,

and manage several parts of the CBIR process: the

descriptors, the descriptor library, the images, the re-

trieval from the database, and ﬁnally a manager, for

setting up the entire process. The third layer com-

prises the applications which manipulate the image

aggregations, by accessing the CBIR process or the

data sources.

Feature

Vectors

Images

Videos

DBMS (marks,

metadata)

Descriptor

Library

XML

Files

ImageCODCC

DescriptorDCC

ManagerDCC

DescriptorLibraryDCC

ImageDCC

CBIRStoryDCC

ExtractStoryDCC

StoryDCC

VideoDCC

RetrievalDCC

AnnotationDCC

PublishingDCC

Data Insertion

Content-Based

Image Search

Text-Based

Combined Search

Video

Browsing

Video

CBIRProcessDCC

Video

Indexing

Video

Image

Indexing

Image

Browsing

Layer 1: Data

Layer 2: DCCs

Layer 3: Interface

Annotation

Publishing

Image

Figure 3: Management Layers (Kozievitch et al., 2012).

2.3 Data Integration and Cleaning

Data fusion and cleaning have been studied exten-

sively by the database community (Bhattacharya and

Getoor, 2006; Bleiholder and Naumann, 2008). Most

of previous works consider data on relational format,

but recently it has been stressed the need for inves-

tigating the problem of solving conﬂicts on semi-

structured data. XClean (Weis and Manolescu, 2007)

is a system that allows declarative and modular spec-

iﬁcation of a cleaning process. It consists of a declar-

ative language with operators that cover not only the

fusion process, but also entity identiﬁcation and com-

bination of values that refer to the same object. The

main goal is to provide a modular system that can be

easily extended with new operators. Potter’s Wheel

(Raman and Hellerstein, 2001) follows a cleaning

strategy based on a set of operations to transform data,

such as format, drop, copy, merge, split, divide, fold

and select. However, instead of storing the result of a

data transformation, the sources are stored along with

the deﬁnition of the transformation. The transforma-

tion is applied on-the-ﬂy whenever a consistent and

clean information is required.

Several systems have been proposed in the litera-

ture on (i) the fusion process (such as Hummer (Bilke

et al., 2005) and Fusionplex (Motro and Anokhin,

2006)), (ii) sharing structured data (Orchestra (Ives

et al., 2008)), and (iii) update data-oriented workﬂow

(Panda (Ikeda and Widom, 2010)).

There are a number of strategies for data fusion

proposed in the literature (Yin et al., 2008; Dong

et al., 2010; Cao et al., 2013; Fan et al., 2013), and

a survey can be found in (Bleiholder and Naumann,

2008). The approach presented within this paper al-

lows strategies to be reapplied in subsequent integra-

tion processes, and also keeps provenance informa-

tion for tracing back the origin of the data stored in

the repository.

3 OVERVIEW OF OUR

SOLUTION

From a formal perspective, an aggregation of

Content-Based Image Retrieval components com-

prises an structure that aggregates the image, fea-

ture vector, and similarity scores (Kozievitch et al.,

2011a). In addition, each digital object might have

metadata (such as ﬁle name, ﬁle size, among others).

In this section, we outline the proposed two-step

solution, namely the information extraction from im-

ages, described in Section 3.1 and the fusion process,

detailed in Section 3.2.

ExploringDataFusionundertheImageRetrievalDomain

173

3.1 Gathering the Information from the

CBIR Process

Suppose that within the parasite domain, a researcher

has metadata, several images, and their respective

CBIR information for several species (Kozievitch

et al., 2010), from different sources. Recall that the

CBIR process is responsible for creating the respec-

tive image feature vector and similarity scores among

the images.

Consider now the CBIR infrastructure pre-

sented in Figure 3. As input, consider im-

age 35 hnanagravpp5x.jpg and respective metadata.

Within Source 1, the image is processed by the BIC

descriptor (Stehling et al., 2002), showed in Figure 4.

The CBIR infrastructure aggregates all the related in-

formation within an XML ﬁle. Within Source 2, the

same image processed by the same descriptor. In this

case, however, Source 2 considers a different image

collection, i.e., images found within Source 2 are not

necessarily the same managed within Source 1. Note

that, comparing the two XML ﬁles, besides meta-

data conﬂicts that might appear (such as different ﬁle

paths), both images also present a different ranked

lists (the 5th image is different in both ranked lists,

as shown in Figure 8).

(a)

(b)

Figure 4: CBIR for image 35 hnanagravpp5x.jpg, within

(a) Source 1 and (b) Source 2.

3.2 Fusion of Image-related Data

The data fusion problem refers to the merging of data

provided from two or more sources. In particular,

the CBIR information presented in Figure 4-a and

Figure 4-b could be summarized within a XML tree

representation, as shown in Figure 7. Basically each

image (identiﬁed by a name) is processed by an im-

<db>

<image>

<image_name>39_Hnanagravpp5x.jpg</image_name>

<image_path>/tmp/39_Hnanagravpp5x.jpg</image_path>

<image_feature_vector_name>/tmp/fv/39_Hnanagravpp5x.jpg

</image_feature_vector_name>

<image_descriptor><descriptor>Bic</descriptor>

<imageId>39_Hnanagravpp5x.jpg</imageId>

<image_dist_value>0</image_dist_value></distance>

<imageId>29_dipilmad.jpg</imageId>

<image_dist_value>38</image_dist_value></distance>

<imageId>38_dipilgrav.jpg</imageId>

<image_dist_value>47</image_dist_value></distance>

<imageId>42_tsoliumgrav.jpg</imageId>

<image_dist_value>49</image_dist_value></distance>

<imageId>Taenia_solium_scolex1.jpg</imageId>

<image_dist_value>49</image_dist_value></distance>

</image_descriptor>

</image>

</db>

Figure 5: XML ﬁle for Source 1, representing Figure 4-a.

<image_name>39_Hnanagravpp5x.jpg</image_name>

<image_path>/home/nadiapk/data/uploads/39_Hnanagravpp5x.jpg</image_path>

<image_feature_vector_name>/data/fv/39_Hnanagravpp5x.jpg

</image_feature_vector_name>

<image_descriptor>

<imageId>39_Hnanagravpp5x.jpg</imageId>

<image_dist_value>0</image_dist_value></distance>

<imageId>29_dipilmad.jpg</imageId>

<image_dist_value>38</image_dist_value> </distance>

<imageId>38_dipilgrav.jpg</imageId>

<image_dist_value>47</image_dist_value> </distance>

<imageId>42_tsoliumgrav.jpg</imageId>

<image_dist_value>49</image_dist_value></distance>

<imageId>03_ancylostoma.jpg</imageId>

<image_dist_value>53</image_dist_value></distance>

</image_descriptor>

</image>

</db>

Figure 6: XML ﬁle for Source 2, representing Figure 4-b.

age descriptor (identiﬁed by a name), which thereby

provides a feature vector and a ranked list of similar

images.

Consider now the XML documents presented in

Figures 5 and 6. Each element provided by

Source 1 has a corresponding one in Source 2.

However, some of the values associated with ele-

ments disagree, such as the elements image path,

image feature vector name, and the ﬁfth element

of the ranked list (imageId and image dist value),

as illustrated in Figure 8. Note that there are two types

of conﬂicts: metadata conﬂicts (such as the ﬁrst two

listed above) and CBIR conﬂicts (such as elements

within the ranked list).

A cleaning strategy for solving metadata conﬂicts

may determine that whenever a data item provided

from Source 1 disagrees with any other source, we

should choose Source 1’s value over the others, for

example. As a result of applying this strategy, the

data repository keeps a single consistent value for all

subelements.

ICEIS2014-16thInternationalConferenceonEnterpriseInformationSystems

174

Figure 7: XML tree representation for CBIR-related data.

Figure 8: XML tree representation with conﬂicting ele-

ments (in yellow) from Source 1 and Source 2.

A cleaning strategy for CBIR conﬂicts may also

determine that all conﬂicting values should be kept

within the repository. As a result of applying this

strategy, the union of the ranked lists are kept within

the XML representation after cleaning, as shown in

Figure 9.

XFusion is a system which provides the function-

ality described within this paper. It relies on the con-

cepts of XML keys (Buneman et al., 2002) for deter-

mining which elements should be merged when inte-

grating one or more data sources, and a set of cleaning

rules for generating a consistent data repository.

Figure 9: XML tree representation after cleaning.

3.2.1 XML Keys

XML keys are used within our context to express the

result of the entity matching process. Thus, whenever

objects from distinct data sources agree on their keys,

they are merged into a single object in the repository.

When the values of non-key objects differ we say that

a conﬂict has been detected.

An XML key is deﬁned as (context-path, (target-

path, { key-paths})), where the values of the key-paths

uniquely identify nodes reached following a target-

path in the context of each subtree deﬁned by the

context-path. In order to generate the merge docu-

ment in Figure 9, keys were deﬁned within the CBIR

context, such as:

• (ε, (image, {image name}): in the context of the

entire document, (ε denotes the root), an image is

identiﬁed by its name;

• (image, (image descriptor, {descriptor})): in the

context of the each subtree rooted at image, the

image descriptor is identiﬁed by descriptor;

• (image/image descriptor, (distance, {rank})): in

the context of the each subtree rooted at im-

age/image descriptor, their elements distance are

identiﬁed by their rank;

• (image, (image path, {})): in the context of any

subtree rooted at image node, there is at most one

image path.

Although here we have presented keys following

the syntax proposed in (Buneman et al., 2002), in

(Cecchin et al., 2010) the key deﬁnitions are stored

within an XML format. Note that within CBIR do-

main, additional keys can be used to deﬁne if how

many descriptors can be used do characterize the vi-

sual properties of an image, how many images will be

available at the ranked list fusion, etc.

3.2.2 Rules

Rules deﬁne high-level strategies for deciding how

value conﬂicts should be solved. The context of a rule

is deﬁned by a path expression, and a list of strate-

gies for solving a conﬂict. There are a number of

strategies proposed in the literature for solving value

conﬂicts. The following subset of those proposed by

(Bleiholder and Naumann, 2008) is adopted by XFu-

sion:

• Trust Your Friends. This strategy is based on

a reliability criterion: the value provided by the

source with the highest conﬁdence rate assigned

by the user is chosen to be stored in the reposi-

tory;

• Meet In The Middle. This strategy mediate the

conﬂict by generating a new value which repre-

sents an average among all conﬂicting values;

• Cry With The Wolves. This strategy is deﬁned

by choosing the value reported by the majority of

data sources;

• Roll The Dice. A random value is choose among

the conﬂicting ones; and

ExploringDataFusionundertheImageRetrievalDomain

175

• Pass It On. All the conﬂicting values are kept in

the repository.

As an example, consider the following fusion rule:

(/image[image name=“39 Hnanagravpp5x.jpg”]

/image path, [Trust your Friends, Pass it On])

The rule determines that whenever image

‘‘39 Hnanagravpp5x.jpg’’ conﬂicts on the

image path element, the conﬂict is solved by ﬁrst

applying strategy Trust your Friends, followed by

strategy Pass it On. Considering that the user has

assigned a higher conﬁdence rate to Source 1 over

Source 2, the result of the rule’s application is

illustrated in Figure 9.

Note that rules can be deﬁned on larger contexts

than on single elements. Suppose, for example, that

the same strategy described above should be applied

to any conﬂict on image path elements. This rule can

be expressed as:

(/image/image path, [Trust your Friends, Pass it On])

With such a rule, future conﬂicts on this element

do not require any user intervention.

3.2.3 XFusion Usage

XFusion has a graphical interface that allows the user

to load data sources into the repository and perform

cleaning operations through the deﬁnition of fusion

rules. The tool’s main window is shown in Figure 10,

with the main screen after both Sources 1 and 2, pre-

sented in Figures 5 and 6, have been uploaded. Note

that the right window shows all uploaded sources,

associating a different color with each of them, and

showing their conﬁdence rate inside the parenthesis.

Figure 10: XFusion main screen.

Figure 11: XFusion conﬂict resolution screen.

The merged document is shown in the main win-

dow, with value conﬂicts identiﬁed with the sources

that provide the conﬂicting values, represented by the

colored squares that precede each value.

When the user selects an existing conﬂict and

clicks on the resolve button, the screen depicted in

Figure 11 is shown. Note that the user has three main

options for solving a conﬂict: choose one among the

conﬂicting values, manually insert a new value, or ap-

ply some of the available strategies (described in Sec-

tion 3.2.2). Strategies are chosen by clicking on the

direction buttons in the middle of the screen, deter-

mining the order in which they should be considered.

Below the strategy boxes, the context of the rule is

presented. This path is originally set to uniquely iden-

tify the conﬂicting element or attribute, according to

the XML keys deﬁned on the repository. Neverthe-

less, the user can edit the path for applying the list of

strategies on larger contexts. Finally, when the user

clicks on the Clean button, the new rule is inserted

into the policy base and its execution propagates the

chosen value to the repository. XFusion allows the

user to deﬁne rules incrementally. The user can then

check whether the strategy has been effective for solv-

ing all conﬂicts within the rules context. If not, she

may decide to extend the rule by deﬁning additional

strategies to be applied.

In particular, within the context of the presented

parasite domain (Kozievitch et al., 2010), the fusion

of different sources centralizes the data, providing

more information about CBIR services and metadata.

This strategy can provide parasite experts with an in-

tegrated view on available datasets, which can foster

knowledge sharing. The same approach could be ex-

plored within other CBIR-related applications.

ICEIS2014-16thInternationalConferenceonEnterpriseInformationSystems

176

4 CONCLUSIONS

Many digital library implementations and applica-

tions demand additional and advanced services to ef-

fectively specify, reuse, describe and aggregate differ-

ent resources. Examples of commonly required ser-

vices include those related to the support of images

and related CBIR tasks.

In this paper, we address the integration of im-

age retrieval with XFusion, a rule-based cleaning tool

that stores curated data in an integrated repository. A

metamodel is proposed in order to specify the com-

ponents of the CBIR related tasks, validated through

a case study within the parasite domain. The main

novelty resides in automatically solving conﬂicts in

CBIR without user intervention, using rules to inte-

grate images and associated metadata.

A straightforward future work consists in the use

of rules to guide the design and implementation of im-

age digital libraries that integrate different (and possi-

bly distributed) image collections. One starting point

relies on the use of applications, like those proposed

in (Gonc¸alves and Fox, 2002; Zhu et al., 2003).

ACKNOWLEDGEMENTS

We would like to thank CAPES, CNPq, FAPESP,

AMD, Microsoft Research, and Fundac¸

ao Arauc

aria.

REFERENCES

Achananuparp, P., McCain, K. W., and Allen, R. B. (2007).

Supporting student collaboration for image index-

ing. In ICADL’07, pages 24–34, Berlin, Heidelberg.

Springer-Verlag.

Akbar, S., Kung, J., and Wagner, R. (2008). Multishape-

features and text-feature integration on 3d model simi-

larity retrieval. Int. J. Innov. Comput. Appl., 1(3):171–

184.

Awre, C. (2009). Managing compound objects

within Fedora, Enhanced E-theses Project

Deliverable 9, available at http://igitur-

archive.library.uu.nl/DARLIN/2010-0526-

200241/UUindex.html. Knowledge Exchange

Group.

Bhattacharya, I. and Getoor, L. (2006). Collective entity

resolution in relational data. IEEE Data Engineering

Bulletin, 29(2):4–12.

Bilke, A., Bleiholder, J., Naumann, F., B

ohm, C., and

Weis, M. (2005). Automatic data fusion with hum-

mer. In Proc. of the 31st VLDB Conference, pages

1251–1254.

Bleiholder, J. and Naumann, F. (2008). Data fusion. ACM

Comput. Surv., 41(1):1:1–1:41.

Buneman, P., Davidson, S., Fan, W., Hara, C., and Tan,

W.-C. (2002). Keys for XML. Computer Networks,

39(5):473–487.

Burnett, I. S., Pereira, F., de Walle, R. V., and Koenen, R.

(2006). The MPEG-21 Book. John Wiley & Sons.

Cao, Y., Fan, W., and Yu, W. (2013). Determining the rel-

ative accuracy of attributes. In SIGMOD’13: Proc. of

the ACM SIGMOD International Conference on Man-

agement of Data, pages 565–576.

Carkacioglu, A. and Yarman-vural, F. (2001). Sasi: A new

texture descriptor for content based image retrieval.

IEEE International Conference on Image Processing,

2:137–140.

Cecchin, F., Ciferri, C. D. A., and Hara, C. (2010).

XML Data Fusion. In International Conference

on Data Warehousing and Knowledge Discovery

(DaWaK‘2010).

Dong, X., Berti-Equille, L., Hu, Y., and Srivastava, D.

(2010). SOLOMON: Seeking the truth via copying

detection. PVLDB, 3(2):1617–1620.

Fan, W., Geerts, F., Tang, N., and Yu, W. (2013). Inferring

data currency and consistency for conﬂict resolution.

In ICDE’13: Proceedings of the IEEE International

Conference on Data Engineering, pages 470–481.

Fox, E. A. and France, R. K. (1997). Architecture of an

expert system for composite document analysis, rep-

resentation, and retrieval. In Readings in Information

Retrieval, pages 400–412. Morgan Kaufmann Pub-

lishers Inc., San Francisco, CA, USA.

Gonc¸alves, M. A. and Fox, E. A. (2002). 5SL: A Language

for Declarative Speciﬁcation and Generation of Digi-

tal Libraries. In JCDL ’02, pages 263–272, New York,

NY, USA. ACM.

Ikeda, R. and Widom, J. (2010). Panda: A system for

provenance and data. IEEE Data Engineering Bul-

letin, 33(3):42–49.

Ives, Z. G., Green, T. J., Karvounarakis, G., Taylor, N. E.,

Tannen, V., Talukdar, P. P., Jacob, M., and Pereira,

F. (2008). The Orchestra collaborative data sharing

system. SIGMOD Record, 37(3):26–32.

Jochum, W., Kaiser, M., Schellner, K., and Wirl, F. (2007).

Living memory annotation tool — image annotations

for digital libraries. In Proc. of the 11th European

conference on Research and Advanced Technology for

Digital Libraries, ECDL ’07, pages 549–550, Berlin,

Heidelberg. Springer-Verlag.

Karpovich, J. F., Grimshaw, A. S., and French, J. C. (1994).

Extensible ﬁle system (elfs): an object-oriented ap-

proach to high performance ﬁle i/o. ACM SIGPLAN

Notices, 29(10):191–204.

Kozievitch, N. P., Almeida, J., da S. Torres, R., Santanch

A., Leite, N. J., Murthy, U., and Fox, E. A. (2012).

Reusing a compound-based infrastructure for search-

ing and annotating video stories. International Jour-

nal of Multimedia Technology, 2:89–97.

Kozievitch, N. P., Almeida, J., Torres, R. S., Leite, N. A.,

Gonc¸alves, M. A., Murthy, U., and Fox, E. A. (2011a).

Towards a Formal Theory for Complex Objects and

Content-Based Image Retrieval. JIDM, 2(3):321–336.

Kozievitch, N. P., da S. Torres, R., Santanch

e, A., Pe-

dronette, D. C. G., Calumby, R. T., and Fox, E. A.

ExploringDataFusionundertheImageRetrievalDomain

177

(2011b). An infrastructure for searching and harvest-

ing complex image objects. The Information - Inter-

action - Intelligence (I3) Journal, 11(2):39–68.

Kozievitch, N. P., Torres, R. d. S., Andrade, F., Murthy, U.,

Fox, E., and Hallerman, E. (2010). A teaching tool for

parasitology: enhancing learning with annotation and

image retrieval. In ECDL’10, pages 466–469, Berlin,

Heidelberg. Springer-Verlag.

Lagoze, C., Payette, S., Shin, E., and Wilper, C. (2006).

Fedora: an architecture for complex objects and their

relationships. Int. J. Digit. Libr., 6:124–138.

Lim, E., Srivastava, J., Prabhakar, S., and Richardson, J.

(1996). Entity identiﬁcation in database integration.

Information Sciences, 89(1).

Menestrina, D., Benjelloun, O., and Garcia-Molina, H.

(2006). Generic entity resolution with data conﬁ-

dences. In Proc. of VLDB Work. on Clean Databases.

Motro, A. and Anokhin, P. (2006). Fusionplex: resolu-

tion of data inconsistencies in the integration of het-

erogeneous information sources. Information Fusion,

7(2):176–196.

Murthy, U., Kozievitch, N. P., Leidig, J., da S. Torres, R.,

Yang, S., Goncalves, M., Delcambre, L., Archer, D.,

and Fox, E. A. (2010). Extending the 5S Frame-

work of Digital Libraries to support Complex Objects,

Superimposed Information, and Content-Based Image

Retrieval Services. Technical Report TR-10-05, Vir-

ginia Tech, Department of Computer Science.

Nanni, L., Brahnam, S., and Lumini, A. (2011). Combining

different local binary pattern variants to boost perfor-

mance. Expert Syst. Appl., 38(5):6209–6216.

Nelson, L. and de Sompel, H. V. (2006). IJDL special is-

sue on complex digital objects: Guest editors’ intro-

duction. International Journal of Digital Libraries,

6(2):113–114.

Nelson, M. L., Argue, B., Efron, M., Denn, S., and Pattuelli,

M. C. (2001). A survey of complex object technolo-

gies for digital libraries. Technical report, NASA/TM-

2001-211426.

Poggi, A. and Abiteboul, S. (2005). XML data integration

with identiﬁcation. In Proc. of DBPL, pages 106–121.

Raman, V. and Hellerstein, J. M. (2001). Potter’s wheel:

An interactive data cleaning system. In VLDB ’01:

Proceedings of the 27th International Conference on

Very Large Data Bases, pages 381–390.

Santanch

e, A. and Medeiros, C. B. (2007). A Compo-

nent Model and Infrastructure for a Fluid Web. IEEE

Transactions on Knowledge and Data Engineering,

19(2):324–341.

Santanch

e, A., Medeiros, C. B., and Pastorello Jr, G. Z.

(2007). User-author centered multimedia building

blocks. Multimedia Systems, 12(4):403–421.

Stehling, R. O., Nascimento, M. A., and Falc

ao, A. X.

(2002). A compact and efﬁcient image retrieval ap-

proach based on border/interior pixel classiﬁcation.

In CIKM ’02, pages 102–109, New York, NY, USA.

ACM.

Torres, R. d. S., Medeiros, C. B., Gonc¸alves, M., and Fox,

E. A. (2006). A Digital Library Framework for Bio-

diversity Information Systems. International Journal

on Digital Libraries, 6(1):3–17.

Weis, M. and Manolescu, I. (2007). Declarative XML

data cleaning with XClean. In International Conf. on

Advanced Information Systems Engineering (CaiSE),

pages 96–110.

Williams, K. and Suleman, H. (2003). A sur-

vey of digital library aggregation services. In

Scholarship at Penn Libraries, available at

http://works.bepress.com/martha

brogan/10.

Yin, X., Han, J., and Yu, P. S. (2008). Truth discovery with

multiple conﬂicting information providers on the web.

IEEE Transactions on Knowledge and Data Engineer-

ing, 20(6):796–808.

Zhu, Q., Gonc¸alves, M. A., and Fox, E. A. (2003). 5SGraph

demo: a graphical modeling tool for digital libraries.

JCDL ’03, pages 385–385, Washington, DC, USA.

IEEE Computer Society.

ICEIS2014-16thInternationalConferenceonEnterpriseInformationSystems

178