CLOUD COMPUTING KEEPS FINANCIAL METRIC

COMPUTATION SIMPLE

Joan Navarro

, Ainhoa Azqueta-Alzuaz

, Pablo Murta Bai˜ao Albino

and Jos´e Enrique Armend´ariz-I˜nigo

Distributed Systems Research Group, La Salle, Universitat Ramon Llull, 08022 Barcelona, Spain

Dpto. Ing. Matem´atica e Inform´atica, Universidad P´ublica de Navarra, 31006 Pamplona, Spain

Dpto. Gesti´on de Empresas, Universidad P´ublica de Navarra, 31006 Pamplona, Spain

Keywords:

Cloud computing, Hadoop, MapReduce, Data consistency, SABI database.

Abstract:

Cloud computing—implemented by tool suites like Amazon S3, Dynamo, or Hadoop—has been designed to

overcome classical constraints of distributed systems (i.e. poor scale out, low elasticity, and static behaviour)

and to provide high scalability when dealing with large amounts of data. This paper proposes the usage

of Hadoop functionalities to efﬁciently (1) process ﬁnancial data and (2) detect and correct errors from data

repositories; in particular, the work is focused on the database SABI.There is a set of operations that performed

with the distributed computation paradigm may increase the calculation performance.

1 INTRODUCTION

Rapid advances in technology and storage capacity

have lead to grab huge volumes of data from internal

and external processes. In such scenario, data man-

agement has become a crucial component in any data-

driven application. Actually, the concept of data man-

agement has evolved and, currently, not only refers to

data storage but also to computation and data aggrega-

tion, which pushes traditional relational databases to

the background. Thus, cloud storage services take the

baton offering their high scalability and availability at

low cost (Kraska et al., 2009). Existing commercial

services use computer farms of commodity hardware

to provide remote storage facilities. Some of them

restrict strong consistency to small data sets (e.g. Mi-

crosoft SQL Data Services) and others only provide

eventual consistency to larger data sets (e.g. Amazon

S3). However, there are several applications that re-

quire both transactional guarantees and high scalabil-

ity. Data management then imposes new demands to

deal with (1) a large amount of (2) non-homogeneous

data.

This work focuses on the ﬁnancial database Sis-

tema de An

alisis de Balances Ibricos (SABI) (Bureau

van Dijk, 2010), which is considered a research tool

by many Spanish universities (Albino, 2008) and is

largely used by private companies to perform market

analysis. Although this repository constitutes an im-

portant ﬁnancial information source in Spain, many

companies do not properly ﬁll all the ﬁelds, leading

to an incomplete data panel. In some works, this is-

sue is roughly solved by excluding those rows that be-

long to companies with missing values (Hern´andez-

C´anovas and Mart´ınez-Solano, 2010). Reducing the

size of the sample set or even replacing missing val-

ues with means may bias the results in terms of ac-

curacy. Nonetheless, this data repository is targeted

to engage researchers in analysing companies’ efﬁ-

ciency (Kapelko and Rialp-Criado, 2009; Retolaza

and San-Jose, 2008; Guzm´an et al., 2009), by com-

puting ratios like indebtedness, availability of idle

resources, or capital costs (Mart´ınez-Campillo and

Gago, 2009). Thus, to perform such calculations re-

searchers and users have to follow a two-step proce-

dure: (1) search and ﬁlter the data and (2) analyse

them with the aid statistical tools.

The purpose of this paper is to propose the use

of an open-source cloud computing tool to efﬁciently

store and process large amounts of data. To this end,

we rely on Hadoop (White, Tom, 2009) and its im-

plementation of MapReduce (Dean and Ghemawat,

2010). To manage the storage resources, Hadoop

uses a distributed ﬁle system referred to as HDFS

that is written in Java and designed to offer portabil-

ity across heterogeneous hardware and software plat-

143

Navarro J., Azqueta-Alzuaz A., Murta Baião Albino P. and Enrique Armendáriz-Iñigo J..

CLOUD COMPUTING KEEPS FINANCIAL METRIC COMPUTATION SIMPLE.

DOI: 10.5220/0003506901430148

In Proceedings of the 6th International Conference on Software and Database Technologies (ICSOFT-2011), pages 143-148

ISBN: 978-989-8425-76-8

 2011 SCITEPRESS (Science and Technology Publications, Lda.)

forms. We tested the performance of the proposed

approach through a case study based on error detec-

tion and ﬁnancial metric computation on the afore-

mentioned SABI repository.

The remainder of this paper is organised as fol-

lows. Section 2 reviews the tools powered by Apache

Hadoop project, and Section 3 describes the SABI

repository. Next, Section 4 details the implementa-

tion and presents some experimental results. Finally,

Section 5 summarises the work and outlines some fu-

ture research lines.

2 MASSIVE STORAGE &

COMPUTING

Despite the effort in designing and developing ef-

ﬁcient algorithms, scalability is still a challenge in

data management (Brewer, 2000; DeCandia et al.,

2007). This has been carried out by a database man-

agement system (DBMS) which is usually enhanced

with cluster-based solutions to increase system per-

formance and tolerate site failures. Nevertheless, the

more the number of replicas increases, the more the

traditional DBMS struggles (Paz et al., 2010).

To overcome this drawback, some ap-

proaches (Chang et al., 2006; DeCandia, Giuseppe

et al., 2007; Lakshman, Avinash and Malik, Prashant,

2010) attempt to store data into non-relational

databases; plain databases with no special features

such as fast interfaces or advanced concurrency

control algorithms (Brantner et al., 2008), where data

are just stored in a non-normalised scheme meeting

speciﬁc constraints.

This section reviews the storage technologies and

describes the data management model used in our ex-

periments, which offers an efﬁcient cloud-based way

to address the data set described in Section 3.

2.1 Cloud Storage Trends

According to the CAP theorem (Brewer, 2000), in or-

der to deal with a deluge of data spread over thou-

sands of servers, data consistency, availability, or net-

work partitioning properties have to be relaxed, ﬁnd-

ing a trade off among them. Amazon S3 (Palankar

et al., 2008), Dynamo (DeCandia et al., 2007), Ya-

hoo! (Cooper et al., 2009) or Hadoop are some of

the market technologies that provide high-scalability

based on the cloud paradigm. However, depending

on the aforementioned requirements—consistency,

availability or performance—one technology will be

more suitable than the others. For instance, while

Amazon S3 implements eventual consistency on its

nodes, Hadoop ensures strict consistency allowing

data to be Written Once and Read Many (WORM)

times (White, Tom, 2009).

Our speciﬁcation is subjected to two require-

ments, data must be (1) strictly consistent and (2)

written once and read each time we perform calcu-

lations. To this end, Hadoop can be a suitable tech-

nology since it satisﬁes our need by providing a good

balance between consistency and availability. In addi-

tion, it supplies a set of open-source tools which offer

reliable and scalable distributed computing:

Hadoop Distributed File System (HDFS). Raw stor-

age container which ensures consistency, scal-

ability, fault tolerance, and replication under a

WORM environment. This implementation has

followed the ideas presented in (Ghemawat et al.,

2003).

HBase. Interface that permits accessing the non-

normalised data stored on the HDFS as if it was

a structured distributed database.

MapReduce. Software framework able to perform

distributed computing operations in the data set

stored in the HDFS.

In the following, each tool is described:

HDFS. Storage devices tend to be the bottleneck in

many scenarios such as web services or intensive

computing applications; scenarios where user queries

and network communications are faster than writing

and reading from disks (Paz et al., 2010). However,

HDFS, due to its architecture, behaves as a distributed

ﬁle system mounted at the user space which spreads

and replicates data across all the storage servers in a

scalable way.

In our case, HDFS automatically splits the ﬁle

and stores each partition on different sites enabling

parallel distributed computations. The size of this

partition is set by default to 64 MB though it can

be adjusted to obtain different performances (Shafer

et al., 2010).

HBase. Once data are stored on the distributed ﬁle

system, they are ready to be retrieved and processed.

Data can be accessed (1) from the command line in-

terface which gives direct access to the distributed

ﬁle system via

put

and

get

HDFS directives suitable

to perform small tests and check whether data have

been stored correctly or (2) from an upper-layer mid-

dleware such as HBase, extremely useful when large

amounts of data have to be read, processed, and writ-

ten back to the ﬁle system.

HBase allows us to access to the non-normalized

data stored on the distributed ﬁle system as if they

ICSOFT 2011 - 6th International Conference on Software and Data Technologies

144

were on a relational database SQL. Both the standard

query language and the HDFS built-in facilities make

easier to formulate queries and retrieve ﬁltered data.

MapReduce. Data stored in distributed ﬁle systems

can be processed either (1) by centralized computing,

i.e. by aggregating all remote data and then process-

ing them from a central node or (2) by distributed

computing, i.e. by ﬁrst processing data chunks lo-

cally stored on each node and then aggregating the

partial results. The latter may perform better than the

ﬁrst one when calculations can be solved in parallel

since it takes the most of the computational resources

of each distributed site and minimizes the network

trafﬁc. MapReduce is an Apache Hadoop compati-

ble distributed computing paradigm which hides the

internal distributed ﬁle system architecture allowing

to process distributed data without knowing its exact

location.

HBase and MapReduce both provide an efﬁcient

way to access the distributed data stored in the HDFS

without compromising reliability nor worrying about

data partition. While HBase is best suited for real-

time read/write random access to very large data sets,

MapReduce is suitable for performing complex op-

erations with stored data without having any notions

about the typical issues of the distributed systems

such as concurrency control, replication schemes,

fault tolerance and recoverability.

Taking into account that our data have to be not

only stored and retrieved from the ﬁle system but

also processed, MapReduce seems to be an appealing

framework to efﬁciently perform our experiments.

3 SABI DATABASE

This section brieﬂy describes SABI, the database used

in the experimentation, stresses its relevance, and

points out its main drawbacks which can be solved

with the proposed MapReduce approach.

Distributed online in Spain by INFORMA (In-

forma, 2010), SABI consists of (1) a private repos-

itory that gathers data from 1998 until 2009 of more

than 1.2 million Spanish and Portuguese ﬁrms and (2)

a ﬁnancial analysis system.

As any other conventional database, the data

stored can be accessed through different search cri-

teria such as company name, tax identiﬁcation num-

ber, location, business activity, employees, etc. How-

ever, SABI provides additional functionalities that al-

low the user to (1) perform statistical and compara-

tive analyses of companies taking into account dif-

ferent variables and different time basis, (2) to ob-

tain reports in either standard or personalized format,

and (3) graphically visualise results from balance ac-

counts, income statements, and other comparisons.

Therefore, SABI’s strength lies on its analytical

tools applied to ﬁnance, marketing, and economics.

Finance/Credit. Users can follow ﬁnancial

progress, carry out credit analysis, conduct company

comparisons, identify competitors, study companies’

position in the market, detect potential partners,

consider mergers and acquisitions, etc.

Marketing/Commercial. Users can perform strate-

gic corporate planning, examine market situation, de-

tect potential clients, elaborate market strategies, etc.

Economics Research. Users can beneﬁt from a re-

search tool and teaching resource.

SABI has been used in many research works such

as (Retolaza and San-Jose, 2008; Hern´andez-C´anovas

and Mart´ınez-Solano, 2010). Nevertheless, some of

these studies report the inconsistency of the database

and the presence of missing values which force to re-

move many items from the database and, as a conse-

quence, shrink the set of samples.

In the following, we present a case study that

showshow to detect errors from the database and keep

the information used consistent.

4 CASE STUDY: SABI

This section (1) describes the problem to solve, (2)

explains how data are organised in the SABI repos-

itory ﬁles and, (3) how these data are tuned to ap-

ply the MapReduce technique and derive our calcu-

lations. As already mentioned, we obtained the data

from a SABI DVD October 2009. These data corre-

spond to collected information from year 2001 until

2008 and most of them refer to the old Spanish chart

of accounts—which changed in 2007.

The main problem that everybody faces when try-

ing to extract any statistics from the SABI reposi-

tory is the mismatch between the different values con-

tained in the companies’ accounts. This is due to

the fact that most of these Spanish and Portuguese

companies manually introduce the calculated values.

Once mismatches are identiﬁed, these entries have to

be removed from the data panel. Therefore, the goal

of our proposal is to efﬁciently automatise the task of

identifying and removing these mismatches.

For example, the total assets value shown in Ta-

ble 1 can be computed from the following items:

(1) shareholder contribution receivable, (2) long-term

CLOUD COMPUTING KEEPS FINANCIAL METRIC COMPUTATION SIMPLE

145

Table 1: Veriﬁed metrics from the SABI repository.

Metric Operations required

Total assets Total liabilities

Total assets Shareholder contribution receivable + long-term investments + deferred

charges + current assets

Non-current assets Up cost + intangible assets + tangible assets + ﬁnancial assets + long-term

treasury stock + due on long-term trafﬁc

Current assets Expenditure required by shareholders + stocks + debtors + short term invest-

ments + short-term treasury shares + treasury + accrual

Total liabilities Equity + revenue deferred + provisions for liabilities and charges + long-term

creditors + short-term creditors

Equity Subscribed capital + premium + reservations and results for previous exercises

+ income + interin dividend paid during the year + share for capital reduction

investments (which is computed from the following

items: start-up costs, intangible assets, tangible as-

sets, ﬁnancial investments, own stock and long-term

investments debtors), (3) deferred charges (which is

computed from the following items: shareholder con-

tribution non-receivable, debtors, temporary ﬁnancial

investment, short-term own stock, liquid assets and

accrual adjustments), and (4) current assets.

Hence, the total assets ﬁeld can be checked from

such other ﬁelds of the same entry. If there is a ﬁeld

with incorrect data (either the ﬁnal total assets value

or any of the others) the full entry will have to be re-

moved. We can follow a similar process for the rest of

metrics (economic performance, proﬁtability, ﬁnan-

cial structure and short/long term solvency) shown in

Table 1.

Next, we describe how data is organisedwithin the

SABI data repository and how MapReduce tasks are

launched to deal with each entry.

4.1 Data File Format

The aforementioned SABI repository is extracted to

a single text ﬁle of 10.4 GB which is not manage-

able for some ﬁle systems. Hence, we split it up into

years (from 2001 to 2008) obtaining eight text ﬁles

of 1.3GB each. The ﬁrst row of each ﬁle contains

the header indicating the content of each ﬁeld, e.g.

the name, address, number of employees, etc. and

the rest contains information regarding each company

(one per row). Each ﬁle is written in a ﬁxed-size vir-

tual rectangle which forces long ﬁelds (e.g. name) to

be written in multiple lines as follows:

This is This is 43 Another And another

a field another field one

field

In order to ease the map tasks, these ﬁles need

to be preprocessed to (1) demarcate each ﬁeld (up to

now there is not a unique ﬁeld separator, i.e. white

space or tabulator), (2) transform multiple line ﬁelds

into single lines, and (3) ﬁll up the empty ﬁelds by

inserting ‘⋆’.

After preprocessing the eight ﬁles, we obtain eight

new ﬁles that are loaded into HDFS. The following

section describes how the MapReduce tasks check

these ﬁles.

4.2 The MapReduce Process with the

SABI Data Panel

Once the ﬁles are loaded into HDFS, the compu-

tation takes a set of input hkey

/value

i pairs, and

produces a set of output hkey

/value

′

i pairs. The

MapReduce process is based on two functions: map

and reduce. The map function, written by the user,

selects the needed ﬁelds to compute the metrics

shown in Table 1 from a given company and passes

them to the reduce function with an intermediate

key

. The reduce function, also written by the user,

accepts this key

and a set of values for that key

({value

,value

,...,value

}) and com-

putes the desired metric. Its output is one if the com-

pany passes the check or 0 otherwise. The follow-

ing summarised code snippet shows the implemented

map and reduce functions written in J2SE:

static class myMapper extends Mapper

<LongWritable, Text, Text, Text > {

public void map (LongWritable key,

Text value, Context context){

String line = value.toString();

Pattern p = Pattern.compile("\t");

String[] items = p.split(line);

String[] fields = getFields(items);

context.write(fields);

}

static class myReducer extends Reducer <Text,

Text, Text, Text>{

public void reduce(Text key, Iterable <Text>

values, Context context){

context.write(key, new Text(

checkFields(values)));

}

ICSOFT 2011 - 6th International Conference on Software and Data Technologies

146

Table 2: Veriﬁcation results (α for tested, β for passed).

Agriculture Industry Energies Construction Services

& Dwellings

Year α β α β α β α β α β

2001 1% 39% 25% 42% 1% 56% 17% 41% 55% 42%

2002 1% 38% 24% 45% 1% 57% 18% 41% 55% 44%

2003 1% 42% 24% 44% 1% 56% 18% 43% 55% 43%

2004 1% 41% 23% 43% 1% 55% 19% 41% 55% 43%

2005 1% 43% 23% 42% 1% 56% 19% 40% 55% 43%

2006 1% 40% 23% 43% 1% 55% 19% 40% 55% 43%

2007 1% 43% 23% 44% 1% 56% 18% 40% 56% 42%

2008 1% 36% 26% 40% 2% 43% 16% 32% 54% 43%

MEAN 1% 40% 24% 43% 1% 54% 18% 40% 55% 43%

After preprocessing the ﬁles, a line is arranged

as follows: “

Company[ ]year [

t] ...[

shareholder contribution receivable [

fixed assets [

t] multi-year expenses [

current assets [

t] ... [

t] total assets

[

t]...

”.

Assume that a given line contains: “

Firm1 2006

[

t] ... [

t] 0[

t] 2.242.904 [

t] ... [

48.258 [

t] 3.452.272 [

t] ... [

t] 5.743.434

[

t] ...

”.

The Hadoop JobTracker assigns to existing Task-

Trackers the different blocks in which the ﬁles are

split to do their map tasks. Our deﬁned map task re-

turns for each company their four accounts that we

have speciﬁed per year; all of these accounts are sepa-

rated using “\t” too. So the result of the map task will

be in this case: “h

Firm1 2006, 5.743.434 [

0 [

t] 2.242.904 [

t] 48.258 [

t] 3.452.272

i”,

recall that the map task (1) checks if there are empty

ﬁelds in a given line (if so, it will discard it and will

not send it to the reduce task) and (2) removes the un-

necessary ﬁelds (marked with “

...

” in the example

above).

Then, the reduce tasks will be issued obtaining

that:

total assets = 5.743.434, shareholder

contribution receivable = 0, fixed assets

= 2.242.904, multi-year expenses = 48.258,

current assets = 3.452.272

. As in this case it is

satisﬁed, it will return the tuple h

Firm1 2006, 1

Thus, at the end of the reduce task we will have a ﬁle

composed of the following tuples (formatted as text

lines with “\t” as the ﬁeld separator for each tuple):

Firm1, 1

i, ... , h

Firm2, 1

i, ..., h

Firm3, 0

i, ... ,

Firm4, 1

i}.

As we wanted to extract some knowledge from

ﬁrms that have mismatching data, we classiﬁed

the total amount of entries in the SABI repository

from 2001 until 2008 (2.131.336 ﬁrms distributed in

266.417 entries per year) according to their working

sector: agriculture, industry, energies, construction

and dwellings, and services.

From the output generated by the MapReduce

task, we built Table 2. Each working sector has two

columns: (1) α shows the ratio of ﬁrms that had non-

empty values at all the required ﬁelds to compute the

metrics shown in Table 1 from the total amount of en-

tries, (2) β shows the ratio of these ﬁrms that passed

all the veriﬁcations (e.g. at 2001, the 25% of indus-

try ﬁrms, i.e. 266.417 ∗ 0,25 = 66.604 ﬁrms, had

no missing ﬁelds and, from these, only the 42%, i.e.

66.604∗ 0,42 = 27.974 ﬁrms, passed the six veriﬁca-

tions described in Table 1).

This section described the implementation of the

map and reduce tasks to efﬁciently go through the

whole data repository and remove the mismatching

entries. Nevertheless, we also need to compare the

performance of our method with respect to other tools

designed to mine data such as project R, a free soft-

ware for statistical computing, or Matlab, a high level

computing language.

5 SUMMARY AND FUTURE

WORK

Data driven applications are becoming more popu-

lar nowadays and the requirements needed to manage

them are very stringent; huge volumes of data do not

ﬁt well in traditional database management systems.

Cloud computing provides us the proper tools and in-

frastructure to manage data in a scalable and efﬁcient

way. In this paper, we have proposed a method to

deal, not just storing but also computing, with large

data repositories in the ﬁnancial ﬁeld. This method

consists on using the HDFS and MapReduce facili-

ties to detect possible errors and recalculate values of

the Spanish/Portuguesedata repository and toease the

computation of certain ﬁnancial metrics.

This work has presented a more daily application

of MapReduce which embraces economics calcula-

tion. However, there is still a long way until this usa-

CLOUD COMPUTING KEEPS FINANCIAL METRIC COMPUTATION SIMPLE

147

ge becomes familiar to practitioners due to the difﬁ-

culties of decomposing the problem in operations of

mapping and reducing required to apply the MapRe-

duce distributed computing paradigm. We hope our

sketch encourages researchers to work on this direc-

tion and provide new insight into the ﬁeld.

Finally, our future research lines are two-fold: (1)

to apply the same idea with upper layer Hadoop prod-

ucts such as HBase or Hive and compare which option

is the best in terms of coding complexity and (2) to

make a performance comparison analysing statistical

tools such as R and SPSS.

ACKNOWLEDGEMENTS

This work has been partially supported by the Spanish

Government under research grant TIN2009-14460-

C03-02.

REFERENCES

Albino, P. M. B. (2008). Eﬁciencia y productividad de

las cooperativas de cr´edito espa˜nolas frente al de-

saf´ıo de la desintermediaci´on ﬁnanciera. In INTER-

NATIONAL, C. E. A. C. (Ed.) innovation and Man-

agement: Answers to the great challenges of public,

social economy and cooperative enterprises.

Brantner, M., Florescu, D., Graf, D., Kossmann, D., and

Kraska, T. (2008). Building a database on s3. In

Proceedings of the 2008 ACM SIGMOD international

conference on Management of data, SIGMOD ’08,

pages 251–264, New York, NY, USA. ACM.

Brewer, E. A. (2000). Towards robust distributed systems

(abstract). In PODC Conf., page 7, New York, NY,

USA. ACM.

Bureau van Dijk (2010). Sabi.

http://sabi.bvdep.com

Chang, F., Dean, J., Ghemawat, S., Hsieh, W. C., Wal-

lach, D. A., Burrows, M., Chandra, T., Fikes, A., and

Gruber, R. E. (2006). Bigtable: a distributed stor-

age system for structured data. In Proceedings of the

7th USENIX Symposium on Operating Systems De-

sign and Implementation - Volume 7, pages 15–15,

Berkeley, CA, USA. USENIX Association.

Cooper, B. F., Baldeschwieler, E., Fonseca, R., Kistler, J. J.,

Narayan, P. P. S., Neerdaels, C., Negrin, T., Ramakr-

ishnan, R., Silberstein, A., Srivastava, U., and Stata,

R. (2009). Building a cloud for yahoo! IEEE Data

Eng. Bull., 32(1):36–43.

Dean, J. and Ghemawat, S. (2010). Mapreduce: a ﬂexible

data processing tool. Commun. ACM, 53(1):72–77.

DeCandia, G., Hastorun, D., Jampani, M., Kakulapati,

G., Lakshman, A., Pilchin, A., Sivasubramanian, S.,

Vosshall, P., and Vogels, W. (2007). Dynamo: Ama-

zon’s highly available key-value store. In SOSP Conf.,

pages 205–220. ACM.

DeCandia, Giuseppe, Hastorun, Deniz, Jampani, Madan,

Kakulapati, Gunavardhan, Lakshman, Avinash,

Pilchin, Alex, Sivasubramanian, Swaminathan,

Vosshall, Peter, and Vogels, Werner (2007). Dynamo:

amazon’s highly available key-value store. SIGOPS

Oper. Syst. Rev., 41(6):205–220.

Ghemawat, S., Gobioff, H., and Leung, S.-T. (2003). The

google ﬁle system. In Scott, M. L. and Peterson, L. L.,

editors, SOSP, pages 29–43. ACM.

Guzm´an, I., Arcas, N., Ghelﬁ, R., and Rivaroli, S. (2009).

Technical efﬁciency in the fresh fruit and vegetable

sector: a comparison study of italian and spanish

ﬁrms. Fruits, 64(4):243–252.

Hern´andez-C´anovas, G. and Mart´ınez-Solano, P. (2010).

Relationship lending and sme ﬁnancing in the conti-

nental european bank-based system. Small Business

Economics, 34(4):465–482.

Informa (2010). Informa D&B.

http://www.informa.es/informa/index.php/en/

Kapelko, M. and Rialp-Criado, J. (2009). Efﬁciency of the

textile and clothing industry in poland and spain. Fi-

bres & Textiles in Eastern Europe, 17(3):7–10.

Kraska, T., Hentschel, M., Alonso, G., and Kossmann, D.

(2009). Consistency rationing in the cloud: Pay only

when it matters. PVLDB, 2(1):253–264.

Lakshman, Avinash and Malik, Prashant (2010). Cassandra:

a decentralized structured storage system. SIGOPS

Operating Systems Review, 44(2).

Mart´ınez-Campillo, A. and Gago, R. F. (2009). What fac-

tors determine the decision to diversify? the case of

spanish ﬁrms (1997-2001). Investigaciones Europeas

de Direcci´on y Econom´ıa de la Empresa, 15(1):15–28.

Palankar, M. R., Iamnitchi, A., Ripeanu, M., and Garﬁnkel,

S. (2008). Amazon s3 for science grids: a viable solu-

tion? In DADC ’08: Proceedings of the 2008 interna-

tional workshop on Data-aware distributed comput-

ing, pages 55–64, New York, NY, USA. ACM.

Paz, A., P´erez-Sorrosal, F., Pati˜no-Mart´ınez, M., and

Jim´enez-Peris, R. (2010). Scalability evaluation of the

replication support of jonas, an industrial j2ee applica-

tion server. In 2010 European Dependable Computing

Conference, pages 55–60. IEEE-CS.

Retolaza, J. L. and San-Jose, L. (2008). Efﬁciency in work

insertion social enterprises: a dea analysis. In Univer-

sidad, Sociedad y Mercados Globales, pages 55–64.

Shafer, J., Rixner, S., and Cox, A. L. (2010). The hadoop

distributed ﬁlesystem: Balancing portability and per-

formance. In ISPASS, pages 122–133. IEEE Computer

Society.

White, Tom (2009). Hadoop: The Deﬁnitive Guide.

O’Reilly Media, 1 edition.

ICSOFT 2011 - 6th International Conference on Software and Data Technologies

148