CLOUD STORAGE IN A PRIVATE CLOUD DEPLOYMENT

Lessons for Data Intensive Research

Victor Chang

1, 2

, Robert John Walters

and Gary Wills

School of Electronics and Computer Science, University of Southampton, SO 17 1BJ, Southampton, U.K.

Business School, University of Greenwich, SE10 9LS, London, U.K.

Keywords: Cloud Computing Business Framework (CCBF), Portability for a Private Cloud Deployment, Cloud Stor-

age, Storage Area Network (SAN) Technologies, Data Services (Backup Automation, Data Recovery and

Data Migration), Data Intensive Research.

Abstract: This paper demonstrates portability for a private cloud deployment, which has a detailed case study about

Cloud Storage service developed as part of the Cloud Computing Business Framework (CCBF). Our Cloud

Storage design and deployment is based on Storage Area Network (SAN) technologies, details of which in-

clude functionalities, technical implementation, architecture and user support. Experiments for data services

(backup automation, data recovery and data migration) are performed and results confirm backup automa-

tion is completed swiftly and is reliable for data-intensive research. Data recovery result confirms that exe-

cution time is in proportion to quantity of recovered data, but the failure rate increases in an exponential

manner. Data migration result confirms execution time is in proportion to disk volume of migrated data, but

the failure rate increases in an exponential manner. Issues in data recovery and data migration must be re-

solved prior dealing with petabytes of data. Our Cloud Storage offers cost reduction, time-saving and user

friendliness supported by users and is highly relevant to similar portability of private cloud.

1 INTRODUCTION

Communications between different types of clouds

from different vendors are often difficult to imple-

ment. Often work-arounds require writing additional

layers of APIs, or an interface or portal to allow

communications. This brings interesting research

question such as portability, as portability of some

applications from desktop to cloud is challenging

(Beaty et al., 2009; Armbrust et al., 2009). Portabil-

ity refers to moving enterprise applications and ser-

vices to Clouds from all types, and not just files or

VM over clouds. Beaty et al. (2009) and Chang et

al. (2011 a) identify portability as an organisational

challenge for Cloud adoption and explain their ra-

tionale and demonstration in their paper.

This paper presents portability in Healthcare,

where Cloud Computing Business Framework

(CCBF) is a framework that has been involved from

service strategy to design, development, test and

user support stages. There is a Cloud Storage project

focusing on design, implementation, and user sup-

port and is the focus for this paper. Details for Cloud

Storage include functionalities, technical implemen-

tation, architecture and user support. Experiments

for data services (backup automation, data recovery

and data migration) are performed and results can

help us to meet issues and challenges of data-

intensive research.

1.1 Enterprise Portability

Enterprise portability (portability in short) involves

moving applications and services from desktops to

clouds and between different clouds, including IaaS,

PaaS and SaaS implementations. This is domain

specific as there are different requirements for port-

ability in each domain. This paper describes exam-

ples in Healthcare. These Cloud projects have been

successfully delivered and provide a high level of

user satisfaction.

CCBF aims to help organisations to achieve good

Cloud design, implementation and services (Chang

et al., 2011 b; 2011 c; 2011 d; 2011 e; 2011 f). Port-

ability plays an influential role in two aspects for

Healthcare:

Chang V., John Walters R. and Wills G..

CLOUD STORAGE IN A PRIVATE CLOUD DEPLOYMENT - Lessons for Data Intensive Research.

DOI: 10.5220/0003954000460055

In Proceedings of the 2nd International Conference on Cloud Computing and Services Science (CLOSER-2012), pages 46-55

ISBN: 978-989-8565-05-1

 2012 SCITEPRESS (Science and Technology Publications, Lda.)

• Migrating existing infrastructure, platform and

applications to Cloud environments.

• Developing a new platform and/or application

to allow new development of Cloud services.

Cloud storage is an in-house private cloud initia-

tive and the initial focus is to build a working IaaS

infrastructure to allow storage of medical databases,

images and analysis in a secure and collaborative

environment. After spending a period with smooth

delivery and user support, the focus becomes up-

grading from IaaS to PaaS, which allows better

benefits such as better efficiency and better man-

agement of resources. The structure of this paper is

as follows. Section 2 describes the overview of

Cloud Storage and Section 3 presents its deploy-

ment architecture and user support. Section 4 dis-

cusses its performance results, Section 5 presents

topics of discussion and Section 6 sums up Conclu-

sion and future work.

2 CLOUD STORAGE

Cloud Storage is a crucial project funded and sup-

ported by NHS UK, where Guy’s and St Thomas

NHS Trust (GSTT) and King’s College London

(KCL) have worked together to deliver a service,

with the initial plan for proof of concepts and to see

whether Cloud Storage can be useful. Cloud Storage

development began in September 2008 and com-

pleted in May 2010 to serve cancer researchers.

CCBF is instrumental and influential in the way

Cloud Storage has been developed with the follow-

ing reasons:

• Cloud Storage is a PaaS, and needs careful

planning and a thorough implementation. This

requires using an integrated adoption of multi-

ple vendors’ solutions.

• Cloud Storage is an area to experience rapid

growth in user requirements and disk space

consumption. Therefore, it must be easy to use,

and able to cope with an increasing demand.

• Cloud Storage is a new concept and imple-

mentation for Health domain, as in the past,

private and in-house storage is used. Mainte-

nance of data protection and security is a chal-

lenge. Recommendation, strategy and support

from CCBF can offer useful guidelines and

good services.

The Healthcare Cloud Storage is used in the

Breast Cancer project. Breast cancer is the most

common cancer in women and has a worldwide an-

nual incidence of over 1 million cases. There are

thousands of data about patients (medical records)

and tumours (detailed descriptions and images, and

its relations to the patients). Data growth is rapid

and needs to be carefully used and protected. The

work involves integrating software and cloud tech-

nologies from commercial vendors including Oracle,

VMWare, EMC, Iomega and HP. This is to ensure a

solid infrastructure and platform is available and

robust. There are also uses of third party applica-

tions to allow researchers to be able to access, view

and edit any tumour images from trusted places.

Security has been enforced in terms of data encryp-

tion, SSL and firewall. This project is considered as

a group of Private Clouds, and is not yet to join all

different Private Clouds (distributed in different

areas) together.

Cloud Storage adopts Experiments as the main

method to design and implement a robust Storage

Area Network (SAN). This is based on integrations

of different technologies where experiments are re-

quired. In relation to Cloud portability, better per-

formance in Cloud Storage than outdated storage

service is regarded as a benchmark and measure-

ment for success by executives. A hybrid case study

is used, as it requires both quantitative and qualita-

tive information not covered by experiments. Occa-

sionally checking with users and executives about

their requirements and services they prefer, take

place in the form of interviews, and thus a certain

extent of qualitative method is needed. Users are

very supportive in this project and some of them use

it daily.

2.1 Benefits and Impacts of Adopting

CCBF

The benefits of adopting CCBF allows their IT lead

to understand requirements, technical knowledge,

use cases and issues to be aware of, before and dur-

ing the project development. The Private Cloud

Storage project is divided into four stages summed

up as follows.

• Stage 1: Explore available technologies, un-

derstanding strength and weaknesses for each

key technology. Capture user requirements to

get into technical plans.

• Stage 2: Propose a framework based on the

outcomes in Stage 1 and CCBF, and carry out

plans for building and validating the frame-

work.

CLOUDSTORAGEINAPRIVATECLOUDDEPLOYMENT-LessonsforDataIntensiveResearch

• Stage 3: Propose and implementing service

oriented architecture for Cloud Storage based

on CCBF. Offer services for users and re-

search groups.

• Stage 4 (Current stage): Continue for service

improvements and provide integrations with

other services or new requirements.

The features in the Healthcare Cloud Storage in-

clude the followings:

• Automation of backup services

• Easy backup and archiving

• Snapshots and mirroring

• Replication

• Recovery

• Data Migration

• Test-bed / test environments

• Heterogeneous network and OS support

• Proof of concepts

• Some services are offering user support

It offers a wide range of self- and automated ser-

vices across secure networks. There are also options

for retrieving and viewing data through the Intranet

and Internet but are only accessible if a secure VPN

and Secure Sockets Layer (SSL) certificates are

used. The CCBF positively influences the way the

backup and storage are designed and deployed. This

project involves from the following:

• Building infrastructures in IaaS.

• Implementing, upgrading and testing systems

and resources to PaaS level.

• Resolving existing problems and making im-

provements Cloud Storage services.

This needs the state-of-the-art design and imple-

mentation that the CCBF can offer. There are two

different focuses. Firstly, it must be easy to use and

support several research groups synchronously and

asynchronously. The Cloud Storage must be able to

cope with frequent changes, updates and user activi-

ties. Secondly, the platform must be highly robust

and stable, and allow data to be kept safe, secure and

active for a long period of time, in other words, ten

years and above. This allows data archiving, mirror-

ing and recovery. Both aspects demand for the fol-

lowing four requirements:

• Automated backup.

• Data recovery and emergency services.

Snapshots or disaster recovery are used.

• Quality of services: high availability, reli-

ability and great usability.

• Security.

The Architecture design is decided to build and

support two concurrent platforms. The first is based

on Network Attached Storage (NAS), and the sec-

ond is based on the Storage Area Network (SAN).

The NAS platform supports active user activities,

and provides great usability and accessibility while

maintaining a high level of system architecture, pro-

gramming and maintenance work. The NAS sup-

ports individual backups with manual and automated

options. One option is similar to the Dropbox pat-

tern of backup, and users can copy their files onto

their allocated disk space, without worrying about

complexity because backup is easy to use and user-

friendly. The manual backup option allows users to

backup their resources onto a selected destination,

and offer both compressed and uncompressed ver-

sions of backup. There are options to choose data

encryption to enforce security.

2.2 A Storage Area Network Made Up

of Different Clusters of Network

Attached Storage (NAS)

The Storage Area Network (SAN) is a dedicated and

extremely reliable backup solution offering a highly

robust and stable platform. SAN can consolidate an

organisational backup platform and can improve

capabilities and performance of Cloud Storage. SAN

allows data to be kept safe and archived for a long

period of time, and is a chosen technology. A SAN

can be made up of different NAS, so that each NAS

can focus on a particular function.

The design of SAN focuses on SCSI, which of-

fers dual controllers and dual networking gigabyte

channels. Each SAN server is built on RAID system,

and RAID 10 is a good choice since it can boost the

performance like RAID 0, and it has mirroring ca-

pability like RAID1. A SAN can be built to have

12TB of disk space, and a group of SAN can form a

solid cluster, or a dedicated Wide Area of Network.

There are written and upgraded applications in each

SAN to achieve the following functions:

• Performance improvement and monitoring:

This allows tracking the overall and specific

performance of the SAN cluster, and also en-

hances group or individual performance if

necessary.

• Disk management: When a pool of SAN is es-

tablished, it is important to know which hard

disks in the SAN serve for which servers or

which user groups.

• Advanced backup: Similar functionalities to

those

described in the NAS, such as automa-

CLOSER2012-2ndInternationalConferenceonCloudComputingandServicesScience

tion, data recovery and quality of services, are

available here. The difference is more sophis-

ticated techniques and mechanisms (use of en-

terprise software is optional) are required.

The CCBF approach offers implementation in-

sights such as integration, as it is a challenge to co-

ordinate and to combine different research activities

and repositories into a distributed storage. This leads

to the use of third party applications and services to

improve on the quality of services. Some applications

mainly based on PHP, MySQL and Apache are writ-

ten, to allow researchers to access the digital reposi-

tory containing tumours. Users can access their Cloud

Storage via browsers from trusted offices, and they

need not worry about complexity, and work as if on

their familiar systems. This Healthcare PaaS is a dem-

onstration of enterprise portability. In addition, sev-

eral upgrades have taken place to ensure the standard

of Cloud Storage and quality of services. One exam-

ple is the use of SSL certificates and the enforced

authentication and authorisation of every user to im-

prove on security. There is an automated service to

backup important resources.

3 CLOUD STORAGE

DEPLOYMENT

ARCHITECTURE AND USER

SUPPORT

This section describes how Cloud Storage is set up,

and how its key functionality offers services and

user support. Cloud Storage is a private-cloud SAN

architecture made up of different NAS services,

where each NAS is dedicated for one specific func-

tion. Design and Deployment is based on group re-

quirements and their research focus.

3.1 Design and Deployment to Meet

Challenges for Data Intensive

Research

Design and deployment should meet challenges for

data-intensive research challenges. Moore et al

(1999) and Bryant (2007) point out that data-intensive

research should meet demands for data recovery and

data migration and allows a large number of data to be

recovered and moved quickly and efficiently in ordi-

nary operations and in emergency. This is suitable to

Cloud Storage as the design and deployment must

provide resilient, swift and effective services. Vo,

Chen and Ooi (2010) present their Cloud Storage and

experiment their read, write and transaction through-

put. They demonstrate their solution for data migra-

tion but there is a lack of consideration in data recov-

ery which is important in the event of possible data

loss. Abu-Libdeh, Princehouse and Weatherspoon

(2010) demonstrate their Cloud Storage case study

which presents how “Failure Recovery” that can get

large-scaled data recovery and data migration com-

pleted. Although they demonstrate data migration and

data recovery over months in their in-house develop-

ment, they do not show the execution time for each

data migration and recovery. This is an important

aspect in Cloud Storage to allow each operation of

large-scale data recovery and data migration to run

smoothly and effectively. Design and deployment of

Cloud Storage must meet demands in large-scaled

backup automation, data recovery and data migration.

3.2 Selections of Technology Solutions

Selections of Technology Solutions are essential for

Cloud Storage development as presented in Table 1.

3.3 Deployment Architecture

There are two sites for hosting data, one is jointly at

GSTT and KCL premises distributed in dedicated

server rooms and the other is hosted at University of

London Data Centre to store and backup the most

important data. Figure 1 shows the Deployment Ar-

chitecture.

There are five NAS at GSTT and KCL premises,

and each NAS is provided for a specific function,

where Bioinformatics Group has the most demands.

NAS 1 is used for their secure backup, and NAS 2 is

used for their computational backup, which is then

connected to Bioinformatics services. NAS 3 is used

as an important gateway for backup and archiving

and is an active service connecting with the rest.

NAS 3 is shared and used by Cancer Epidemiology

and BCBG Group. NAS 4 provides mirror services

for different locations and offers an alternative in

case of data loss. NAS 5 is initially used by Digital

Cancer cluster, and helps to back up important files

in NAS 3. There are two digital cancer clusters,

which can back up between each other, and impor-

tant data are backed up to NAS 8 for reliability and

NAS 5 for local version. The reason to do this is

because disaster recovery took place in 2010 and

that took two weeks full time to retrieve and recover

data. Multiple backups ensure if one dataset is lost,

the most recent archive (done daily) can be replaced

without much time spent.

There are three NAS at the University of London

CLOUDSTORAGEINAPRIVATECLOUDDEPLOYMENT-LessonsforDataIntensiveResearch

Table 1: Selections of Technology Solutions.

Technology

selections

What is it used

Vendors

involved

Focus or rationale Benefits or impacts

Network Attached

Storage (NAS)

To store data and

perform automated

and manual/personal

backup.

Iomega/EMC

Lacie

Western Digital

They have a different focus and

set up. HP is more robust but

more time-consuming to config-

ure. The rest is distributed be-

tween RAID 0, 1 and 5.

Each specific function is

assigned with each NAS.

There are 5 NAS at

GSTT/KCL site and 3 at

Data Centre, including 2 for

Archiving. Deployment

Architecture is shown in

Figure 4.

Infrastructure

(networking and

hosting solution)

Collaborator and

in-house

University of

London Data

Centre

Some services need a more secure

and reliable place. University of

London Data Centre offers 24/7

services with around 500 servers

in place, and is ideal for hosting

solution.

Amount of work is reduced

for maintenance of the entire

infrastructure. It stores cru-

cial data and used for archiv-

ing, which backup historical

data and backup the most

important data automatically

and periodically.

Backup applica-

tions

Third party and

in-house

Open Source

Oracle

Vmware

Symantec

In-house

development

There is a mixture of in-house

development and third party

solution. HP software is used for

high availability and reliability.

The rest is to support backup in

between NAS. Vmware is used

for virtual storage and backup.

Some applications are good

in a particular service, and it

is important to identify the

most suitable application for

particular services.

Virtualisation Third party

VMware

VSphere and

Citrix

It consolidates IaaS and PaaS in

private cloud deployment.

Resources can be virtualised

and saves effort such as

replication.

Security

Third party and

in-house

KCL/GSTT

Macafee

Symantec

Security is based on the in-house

solution and vendor solution is

focused on secure firewall and

anti-virus.

Remote access is given to a

list of approved users.

Computing (Data) Centre (ULCC) where there are

about 500 servers hosted for Cloud and HPC ser-

vices. NAS 6 is used as a central backup database to

store and archive experimental data and images. The

other two advanced servers are customised to work

as NAS 7 and 8 to store and archive valuable data.

Performance for backup and archiving services are

excellent and most data can be backed up in a short

and acceptable time frame for not more than 1 hour

to back up thousands of data and images. This out-

come is widely supported by users and executives.

There are additional five high performance comput

ing services based on Cloud technologies: Two are

computational statistics to analyse complex data.

The third one is on Database to store confidential

data and the fourth is on bioinformatics to help bio-

informatics research, and the last one is a virtualisa-

tion service that allows all data and backup to be in

virtual storage format. These five services are not

included in Cloud Storage for this thesis.

3.4 User Support

The entire Cloud Storage Service has automated

capability and is easy to use. This service has been

in use without the presence of Chief Architect for-

six months, without major problems reported. Sec-

ondary level of user support at GSTT and KCL

(such as login, networking and power restoration)

has been excellent. There is a plan to obtain ap-

proval to measure user satisfaction.

4 EXPERIMENTS FOR CLOUD

STORAGE

Design and implement a robust Storage Area Net-

work (SAN) requires integrations of different tech-

nologies. Only minimal modelling and simulations

are needed, since the focus is on building up a ser-

vice from the very beginning. Experiments are the

suitable research method, since it can identify issues

such as performance, technical capabilities such as

recovery, and whether integration of technologies

can deliver services. User and executive require-

ments are important factors for what type of experi-

ments to be performed and measured. Thousands of

files (data and records) are used for performance

tests and the time to complete the same amount of

CLOSER2012-2ndInternationalConferenceonCloudComputingandServicesScience

Figure 1: Cloud Storage Deployment Architecture.

jobs is recorded. Venue of test is between two sites:

ULCC and GSTT/KCL and execution time is used

as the benchmark. There are three data services and

each service is used to perform experiments as fol-

lows:

• Backup Automation

• Data recovery

• Data migration

4.1 Backup Automation

Cloud Storage uses a number of enterprise solutions

such as Iomega/EMC, Lacie, Western Digital and

HP to deliver a fast and reliable automation services.

The experiment performs backup automation

Automation execution time

12345678910

1 unit = 1,000 files

Time taken (sec)

Execution

time

Figure 2: Automation execution time for Cloud Storage.

between 1,000 and 10,000 files, which are available

in the existing system for user support. The bench-

mark is dependent on execution time. Each set of

experiments is performed three times with the aver-

age time obtained. Time taken was recorded to pre-

sent all results as shown in Figure 2.

4.2 Data Recovery

Data recovery is an important service to recover lost

data due to accidents or emergency services. In the

previous experience, it took two weeks to recover 5

TB of data for disaster recovery as it requires differ-

ent skills and systems to retrieve data and restore-

good quality data back to Cloud services. Data archi-

Data recovery

200

400

600

800

1000

1200

1400

12345678910

1 unit = 1,000 files

Time taken (sec)

Execution

time

Figure 3: Data Recovery.

CLOUDSTORAGEINAPRIVATECLOUDDEPLOYMENT-LessonsforDataIntensiveResearch

ved as Virtual Machines or Virtual Storage speeds

up recovery process. In addition, there are mirror

servers, and even if a server is completely broken,

data can be recovered to resume services so that re-

covery does not take days and weeks. See Figure 3

for their execution time.

4.3 Data Migration of Single Large

Files

Data migration is common amongst Clouds and is

also relevant to data intensive research. When there

are more organisations going for private cloud de-

ployment, data migration between Clouds is com-

mon and may influence the way service delivery

(Ambrust et al., 2009; Hey, 2009; Buyya et al; 2010

a; 2010 b). But there is no investigation the impact

of moving single large files between private clouds.

Hence, the objective is to identify the execution time

for moving single large files and each file is between

100 GB and 1 TB. Figure 4 shows the results.

Data migration (large files) between clouds

500

1000

1500

2000

2500

3000

12345678910

1 unit = a single file of 100 GB

Time taken (sec)

Execution

tim e

Figure 4: Data migration of large single files between

clouds.

4.4 The Percentage of Failure Rates

The percentage of failure rates in Cloud Storage op-

erations is important as each failure in service will

result in loss of time, profit and resources. This part of

experiment is to calculate the percentage of failures,

where services in Section 4.1 and 4.3 are running real-

time and record down the number of successful and

failed operations. There are hundreds of successful

operations versus and a number of failed operations.

4.4.1 Failure Rate in Backup Automation

Backup automation is relatively reliable and out of

hundreds and thousands of operations, the failure

rate is below 2%. The main reason is backup auto-

mation has been in the Storage for a significant

number of years and this area is more established.

4.4.2 Failure Rate in Data Recovery

Data recovery for large-scale data in Cloud is impor-

tant and the failure rate is shown in Figure 5 based

on the amount of successful and failed operations

since 2009. The interesting result is when there is a

low amount of data, the percentage of failure is low.

When the amount of recovered data increases, the

failure rate increases where the graph looks close to

an exponential curve. This may mean the more re-

covered data in the Cloud, even though the execu-

tion time is in proportion to quantity, the failure rate

increases in an exponential manner.

Failure rate for data recovery

12 345 678 910

1 unit = 1,000 files

Percentage of failure (%

)

Failure rate

for data

recovery

Figure 5: Failure rate of data recovery.

4.4.3 Failure Rate in Data Migration

Data migration of large files in Cloud is common

and important as Storage is designed for terabytes

and petabytes. The failure rate is shown in Figure 6

based on the amount of successful and failed opera-

tions since 2009. Similar to Figure 5, the curve is

close to an exponential one, which means when the

volume of the migrated file increases, the failure rate

increases.

Failure rate of data migration

12345678910

1 unit = a single file of 100 GB

Percentage of failure (%)

Failure rate of

data migration

Figure 6: Failure rate of data migration.

CLOSER2012-2ndInternationalConferenceonCloudComputingandServicesScience

4.5 Summary of All Experiments

This section is to discuss results in the previous

three experiments. Service and backup automation

for Cloud storage takes the least execution time and

there are several services to speed up the process of

automation. Execution time is between 8 and 46

seconds to automate backup 1,000 to 10,000 files.

The second experiment is data recovery, where data

archived as Virtual Machines or Virtual Storage in a

well-managed platform can speed up recovery proc-

ess. Data recovery takes between 135 seconds to

1,312 seconds to recover 1,000 to 10,000 of files.

The third experiment focuses on data migration of

large single files, which are important for data inten-

sive research. Data migration takes between 174

seconds to 2,686 seconds to move a single file of

100 GB to 1 TB. Although Figure 3 still shows a

linear graph, more execution time is required to

move a large single file and the percentage of un-

successful migration is likely to increase.

The results strongly support that it is quicker to

move more data with smaller size than to move less

data with larger file size in Clouds. Our results also

confirm that automation in Cloud is more estab-

lished than data recovery and data migration of sin-

gle large files, and these two are perhaps challenges

that data-intensive research need to overcome. Fail-

ure rate for these three major operations are demon-

strated. Backup automation is the most reliable and

stays below 2% all the times. Figure 4 shows the

failure rate of data recovery and when the amount of

recovered data increases, the failure rate increases

where the graph looks close to an exponential curve.

Figure 5 is similar to Figure 4 and shows that failure

rate of data migration, which means when the vol-

ume of the migrated file increases, the failure rate

increases.

5 DISCUSSIONS

There are several topics for discussions presented as

follows.

5.1 Challenges for Data Intensive

Research in Cloud

Cloud Storage can offer services up to petabytes of

storage. The interesting results in Section 4 confirm

that large-scaled data recovery and data migration in

Cloud needs to improve in its technical capabilities.

This is reflected in percentage of failure rate, where

failure rate increases like an exponential manner up

to 14.6% when data recovery volume increases up to

10,000 files. Similarly, an exponential increase is

experienced when data migration increases up to

20.4% when data migration disk increases up to 1

TB per file. Our results demonstrate data recovery

and data migration for thousands of files (that each

has up to 1TB) have to be resolved and improved

prior dealing with challenges in petabytes. Our ex-

periments and results are not only applicable locally

but also are applicable in other environments.

5.2 User Feedback on Cloud Storage

Currently Cloud Storage is only active service that

has been used daily, and has provided users the fol-

lowing benefits.

• Cost reduction: The service is automated and

saves costs in hiring additional staff and de-

ployment of a larger and more expensive pro-

ject that works the same. There is no need to

hire a team to look after maintenance and daily

services.

• Time-saving: Cloud Storage simplifies the

complex backup process and saves time in

performing backups. Users find that they need

not spend significant time for back up.

• User friendly: Cloud Storage offers easy to

use features and users without prior knowl-

edge can find it simple to use.

Healthcare community has a Data Protection Pol-

icy and not all types of services are able to release

data. Services that do not use patients’ data or confi-

dential information are likely to be presented.

5.3 Plug and Play Features in Cloud

Storage for Data Intensive

Research

There are papers explaining the importance and

relevance of data intensive research, and why it is

essential for Cloud development and services (Mor-

etti et al., 2008; Hey, 2009). This Cloud Storage

allows plugs and plays, which means adding addi-

tional hard disks to existing NAS, or new NAS, can

still provide services in place. This has been tested

in 2010 where disk volume of NAS 7 and 8 were

increased from 20 TB to 44 TB in services without

interruptions of services. This Cloud Storage was

also tested to store and protect data of up to 100 TB

in another occasion. This allows any addition of

CLOUDSTORAGEINAPRIVATECLOUDDEPLOYMENT-LessonsforDataIntensiveResearch

hard disks and applications within 100 TB limit to

provide user support and services.

5.4 Current Status for Cloud Storage

Cloud Storage has been in used daily by medical

researchers, and there are a few local administrators

supporting a minimum level of services. The focus

for this service is not longer in technical implemen-

tation but rather user satisfaction. This needs to

write to the NHS UK and get their support to follow

up this research.

5.5 Relative Performance

There are papers describing technical performance

in detail (Buyya et al 2009, 2010 a; 2010 b). Often

results are very technical and most of organisations

considering or implementing Clouds find those re-

sults difficult to follow (Chang 2010 d, 2011 a, 2011

c). Relative performance is an easier term to com-

pare performance with, and is defined as the im-

provement in performance between an old service

(before) and a new service (after). This is similar to

Organisational Sustainability Modelling (OSM)

where data is compared between ‘before’ and ‘after’

in the areas of technical, costs and user aspects.

Latch et al. (2006) also use relative performance to

present their Bayesian clustering software where the

key performance indicator is presented in terms of

percentages of improvement. Although Latch et al.

(2006) still use statistical approach where some data

have little impact or relevance to organisational

adoption, the benefit of using relative performance

approach is to bring down level of complexity and

allows stake holders to understand the percentage of

improvement.

A hybrid case study is relevant for organisational

Cloud adoption, since data needs to be checked prior

computational analysis and often this needs inter-

views or surveys to support. From interviewing

members of management, their views can be

summed up as follows:

• They support the use of relative performance,

as most of the executives are not from IT

backgrounds.

• The use of key performance indicator in rela-

tive performance makes it easy for the execu-

tives to understand and follow the extents of

improvement.

6 CONCLUSIONS AND FUTURE

WORK

This paper illustrates PaaS Portability in the form of

Cloud Storage, which is designed, deployed and

serviced to GSTT and KCL under the recommenda-

tion of CCBF to ensure good Cloud design, deploy-

ment and services. Cloud Storage is helpful to make

data service an easy-to-use, automated and collabo-

rative platform and some users use this service daily.

Design and deployment of Cloud Storage have been

described, and have followed user requirements and

executives’ feedback closely. User Groups are di-

vided into Bioinformatics Group, Databank and

Cancer Epidemiology Group, BCBG Group, Tissue

Bank and Senior Clinicians. The best approach is to

design and implement a Cloud-based Storage Area

Network (SAN), where experiments are used as the

main methods and execution time is used as the

benchmark. Three areas of experiments are per-

formed: automation, data recovery and data migra-

tion.

Cloud Storage Deployment Architecture is pre-

sented to demonstrate how this is designed and built

based on group and research requirements. Selection

of technology is explained and Cloud Storage is

private-cloud SAN architecture made up of different

NAS services. There are two premises: University of

London Data Centre and GSTT/KCL. The Deploy-

ment Architecture shows the connections between

different NAS services and how they are related.

These services include Bioinformatics (multiple

services), joint Epidemiology and BCBG service,

mirror services, two archiving services, digital can-

cer services and multiple backup services. There are

arrows showing how automated and secure backups

take place between Data Centre and GSTT/KCL.

Automation for Cloud storage has several ser-

vices to speed up the process of automation. Execu-

tion time is between 8 and 46 seconds to automate

backup 1,000 to 10,000 files. Data recovery in a

well-managed platform can speed up recovery proc-

ess and takes between 135 seconds to 1,312 seconds

to recover 1,000 to 10,000 of files. Data migration

of large single files is important for data intensive

research. Data migration takes between 174 seconds

to 2,686 seconds to move a single file of 100 GB to

1 TB. Our results also confirm that automation in

Cloud is more established than data recovery and

data migration of single large files, and these two

are perhaps challenges that data-intensive research

need to overcome. Relative performance is between

Cloud Storage and traditional storage have been

CLOSER2012-2ndInternationalConferenceonCloudComputingandServicesScience

presented and comparisons will be discussed.

Percentage of failure rate is calculated for backup

automation, data recovery and data migration where

backup automation stays below 2% of failure rate.

The failure rate increases like an exponential manner

up to 14.6% when data recovery volume increases up

to 10,000 files. Similarly, an exponential increase is

experienced when data migration increases up to

20.4% when data migration disk increases up to 1

TB per file. Our results demonstrate data recovery

and data migration for thousands of files (that each

has up to 1TB) have to be resolved and improved

prior dealing with challenges in petabytes of storage.

In summary, our main contributions include reduc-

tion of costs, time-saving to perform backups and

user friendly interfaces.

REFERENCES

Abu-Libdeh, H., Princehouse, L. and Weatherspoon, H.,

"RACS: A Case for Cloud Storage Diversity", SoCC

'10 Proceedings of the 1

ACM symposium on Cloud

computing, Indianapolis, Indiana, June 10-11, 2010.

Armbrust, M., Fox, A., Griffith, R., Jseph, A. D., Katz, R.

H., Kownwinski, A., Lee, G., Patterson, D., Rabkin,

A., Stoica, I., Zaharia, M., “Above the Clouds: A

Berkeley View of Cloud computing”. Technical Re-

port, No. UCB/EECS-2009-28, UC Berkeley, Febru-

ary 2009.

Beaty, K., Kochut, A. and Shaikh, H., “Desktop to Cloud

Transformation Planning”, “2009 IEEE International

Symposium on Parallel and Distributed Processing”,

May 23-May 29 2009, Rome, Italy.

Bryant, R. E., "Data-Intensive Supercomputing: The Case

for DISC", Technical paper, Carnegie Mellon Univer-

sity, October 2007.

Buyya, R., Yeo, C. S., Venugopal, S., Broberg, J. and

Brandic, I., “Cloud computing and emerging IT plat-

forms: Vision, hype, and reality for delivering compu-

ting as the 5th utility”, Journal of Future Generation

Computer Systems, Volume 25, Issue 6, June 2009,

Pages 559-616.

Buyya, R., Ranjan, R. and Calheiros, R. N., "InterCloud:

Utility-Oriented Federation of Cloud Computing Envi-

ronments for Scaling of Application Services", Algo-

rithm and Architectures for Parallel Processing, Lec-

ture Notes in Computer Science, 2010, Volume

6081/2010, 13-31 (Buyya et al., 2010 a).

Buyya, R., Beloglazov1, A., and Abawajy, J., "Energy-

Efficient Management of Data Center Resources for

Cloud Computing: A Vision, Architectural Elements,

and Open Challenges", PDPTA'10 - The International

Conference on Parallel and Distributed Processing

Techniques and Applications, 12-15 July 2010, Las

Vegas, USA (Buyya et al., 2010 b).

Chang, V., “Cloud Storage Framework – An Integrated

Technical Approach and Prototype for Breast Cancer”,

Poster Paper and Technical Paper, UK All Hands

Meeting, December, 2009.

Chang, V., Li, C. S., De Roure, D., Wills, G., Walters, R.

and Chee, C. (2011) The Financial Clouds Review. In-

ternational Journal of Cloud Applications and Com-

puting, 1 (2). pp. 41-63. ISSN 2156-1834, eISSN

2156-1826 (Chang et al., 2011 a)

Chang, V., De Roure, D., Wills, G., Walters, R. and

Barry, T. (2011) Organisational Sustainability Model-

ling for Return on Investment: Case Studies presented

by a National Health Service (NHS) Trust UK. Jour-

nal of Computing and Information Technology, 19 (3).

ISSN Print ISSN 1330-1136 | Online ISSN 1846-3908

(In Press) (Chang et al., 2011 b)

Chang, V., De Roure, D., Wills, G. and Walters, R. (2011)

Case Studies and Organisational Sustainability Model-

ling presented by Cloud Computing Business Frame-

work, International Journal of Web Services Research.

ISSN 1545-7362 (In Press) (Chang et al., 2011 c)

Chang, V., Wills, G. and Walters, R. (2011) Towards

Business Integration as a Service 2.0 (BIaaS 2.0), In:

IEEE International Conference on e-Business Engi-

neering, The 3rd International Workshop on Cloud

Services - Platform Accelerating e-Business, 19-21

October, 2011, Beijing, China. (Chang et al., 2011 d).

Chang, V., Wills, G. and Walters, R. (2011) The positive

impacts offered by Healthcare Cloud and 3D Bioin-

formatics. In: 10th e-Science All Hands Meeting 2011,

26-29 September 2011, York (Chang et al., 2011 e).

Chang, V., Wills, G., Walters, R. and Currie, W. (2011)

Towards a structured Cloud ROI: The University of

Southampton cost-saving and user satisfaction case

studies. Sustainable Green Computing: Practices,

Methodologies and Technologies (Chang et al., 2011 f).

Hey, A. J. G., "The fourth paradigm: data-intensive scien-

tific discovery", Microsoft Publication, 2009, ISBN-

10: 0982544200.

Latch, E. K., Dharmarajan, G., Glaubitz, J. C. and Rhodes.

Jr., O. E., "Relative performance of Bayesian cluster-

ing software for inferring population substructure and

individual assignment at low levels of population dif-

ferentiation", Conservation Genetics, 7:295–302, DOI

10.1007/s10592-005-9098-1, Springer 2006.

Moore, R. W., Baru, C., Marciano, R., Rajasekar, A. and

Wan, M., "Data-Intensive Computing", Chapter 5,

Book chapter of “The Grid: Blueprint for a New Com-

puting Infrastructure”, ISBN 1558609334, 1999.

Moretti, C., Bulosan, J., Thain, D., and Flynn, P.J., "All-

Pairs: An Abstraction for Data-Intensive Cloud Com-

puting", IEEE International Symposium on Parallel

and Distributed Processing, 2008, IPDPS 2008, 14-18

April 2008, Miami, USA.

Vo, H. T., Chen, C. and Ooi, B. C., "Towards Elastic

Transactional Cloud Storage with Range Query Sup-

port", Proceedings of the VLDB Endowment, Volume

3 Issue 1-2, September 2010.

CLOUDSTORAGEINAPRIVATECLOUDDEPLOYMENT-LessonsforDataIntensiveResearch