FLOW-BASED PROGRAMMING AS A SOLUTION FOR CLOUD

COMPUTING REQUIREMENTS

Marcel R. Barros

, Charles C. Miers

1,2

, Marcos Simplício

, Tereza C. M. B. Carvalho

Jan-Erik Mångs

, Bob Melander

and Victor Souza

Laboratory of Computer Architecture and Networks, University of São Paulo (USP), São Paulo, Brazil

Santa Catarina State University (UDESC), Florianópolis, Santa Catarina, Brazil

Ericsson Research – Packet Technologies, Kista, Sweden

Keywords:

Cloud computing, Flow-based programming, Functional programming.

Abstract:

Cloud computing services provide a new way of deploying applications over the Internet, as well a prominent

approach for achieving enhanced scalability. Usually, exploration of cloud computing resources relies on

a regular programming paradigm (such as Oriented Object Programming), depending on adjustments to

deal with details inherent to the cloud provider and the issues related to scalability of regular programming

paradigm. This paper addresses how Flow-Based Programming (FBP), a software architecture model based on

Functional Programming, can be used as a solution to the challenges involving the achievement of distributed

systems requirements. Firstly, we present a review of the concepts of FBP. We analyze Live Distributed

Objects, Microsoft Orleans, and Yahoo! S4 under FBP perspective, providing a comparison among these

solutions based on FBP criteria. Finally, we present an analysis of how FBP could be used to provide a better

way to developers create scalable applications such as cloud computing.

1 INTRODUCTION

In the last decades, Object-Oriented Programming

(OOP) has been widely employed in the development

of several systems for diverse purposes. This pre-

dominance is probably justiﬁed by the fact that OOP

solves (or at least gives a more elegant option for

avoiding) numerous of issues identiﬁed with procedu-

ral programming, such as poor maintainability when

dealing with complex systems and hardly reusable

code. Indeed, core concepts of OOP, such as inher-

itance and encapsulation, are well known for helping

developers to build complex systems that are easier

to maintain and extend (Morrison, 1994). Despite

such advantages, many authors have identiﬁed inher-

ent ﬂaws in the OOP paradigm (Armstrong, 1997;

Ostrowski et al., 2009a). The most notorious are re-

lated to parallel processing and distributed systems,

as well as situations in which a method’s call leads

to side-effects such as the modiﬁcation of some ob-

jects’ internal states or other changes that are not vis-

ible in the methods’ return value. Most of these is-

sues arise because simple events that are natural to

OOP, such as variables changing during runtime and

objects triggering other objects’ methods, introduc-

ing considerable complexity when dealing with par-

allelism and distributed systems (Armstrong, 2007a).

For these reasons, OOP may lose some of its inter-

est in highly distributed environments such as Cloud

Computing (Armstrong, 2007b), in which applica-

tions can run across a large collection of physically

separated computers, allowing a lower expenditure on

hardware structure while providing on-demand scala-

bility and ensuring Quality of Service (QoS).

In a deeper analysis, any system approach that

aims to be efﬁcient in distributed environments needs

to take into account two aspects: state dependen-

cies and synchrony. The design of method invocation

must be such that distributed events being triggered

in different locations do not turn the system incon-

sistent, something difﬁcult to achieve when there are

numerous dependencies between objects. If the sys-

tem remains synchronous, controlling such dependen-

cies may be feasible, but as pointed out in (Morrison,

1994), unnecessary synchronism can be harmful to

scalability. This issue is particularly serious when the

time spent waiting for the response of request takes

longer than the processing time itself: in such scenar-

432

R. Barros M., C. Miers C., Simplício M., C. M. B. Carvalho T., Mångs J., Melander B. and Souza V..

FLOW-BASED PROGRAMMING AS A SOLUTION FOR CLOUD COMPUTING REQUIREMENTS.

DOI: 10.5220/0003759004320440

In Proceedings of the 1st International Conference on Operations Research and Enterprise Systems (ICORES-2012), pages 432-440

ISBN: 978-989-8425-97-3

 2012 SCITEPRESS (Science and Technology Publications, Lda.)

ios, synchrony clearly decreases the system’s perfor-

mance and ﬂexibility.

The aforementioned issues can be efﬁciently ad-

dressed by means of the Flow Based Programming

(FBP) (Morrison, 1994) software architecture model,

a black-box processes assembling model that has the

use of asynchrony and development of better ways to

build scalable and maintainable systems as its cen-

tral goals. One key aspect of this model is that, by

employing Functional Programming characteristics,

it precludes the existence of states or data structures

modiﬁable in runtime, in such a manner that every

function’s output depends only on its input. There-

fore, scalability is not limited by synchrony issues

related to data state. These appealing features lead

to the adoption of Functional Programming on sys-

tems demanding high levels of performance and par-

allelism, such as Amazon EC2 and Facebook.

This paper addresses the manner how FBP can

be used for overcoming many of the most common

challenges faced by distributed systems. More specif-

ically, we show that FBP is a powerful tool for de-

veloping applications adapted to the context of Cloud

Computing, since both have many goals in common.

This study should allow developers to understand how

the FBP model can help them to take advantage of

cloud characteristics such as high availability and

scalability.

Section 2 discusses the FBP model and the moti-

vations behind its design. These concepts are neces-

sary to provide the base knowledge of FBP in order

to identify its relations with cloud computing. Sec-

tion 3 relates three distributed systems implementa-

tions using FBP concepts, highlighting how the adop-

tion of these concepts enabled solution for those sys-

tems. Section 4 generalizes the application of FBP

concepts in cloud computing environment. Finally,

we present the related works, considerations and fu-

ture work.

2 FLOW-BASED

PROGRAMMING DEFINITION

AND MOTIVATION

FBP is a software architecture model based on

black-box processes linked by dynamic connections.

Structured objects, called Information Packages or

IPs (Morrison, 1994), ﬂow between processes (mod-

ules), conﬁguring an asynchronous communication

black-boxes can be designed to accept different IPs.

Actually, even the structure of the received IP can

be used as a criteria for performing some processing,

resembling to Erlang’s pattern matching (Armstrong,

2007b). Another key concept of FBP, and also found

in functional languages such as Erlang, is the con-

cern for side-effects avoidance, a major issue in dis-

tributed systems (Morrison, 2005). This model shares

some similarities with the works usually referred to as

“dataﬂow architectures”, but some important differ-

ences exist especially because the latter is mathemat-

ically oriented, dealing with numeric values traveling

through a network, while FBP works with messages

carrying structured objects (Morrison, 1994).

It is important to highlight that FBP is not a pro-

gramming paradigm, such as OOP and Functional

Programming. FBP is a collection of many Func-

tional Programing practices applied to generic soft-

ware systems. A system organized with FBP princi-

ples does not need to be developed in a programming

language that supports FBP, i.e., FBP model can en-

capsulate existing modules in a system. The FBP be-

havior is restrict to the external side of the encapsu-

lation, so if a system uses a FBP to encapsulate large

modules, it will be further from ideal FBP behavior.

Therefore, developing systems with a programming

language that supports FBP will provide the ﬁnnest

granularity, i.e. a full FBP scenario.

Figure 1 exempliﬁes the above concepts, show-

ing how structured objects containing the data to be

processed ﬂow between modules A, B and C. These

streams of data do not possess any information about

the internal functioning of the modules, differing

from method invocation in OOP where control pa-

rameters can be sent. This absence of control data

ﬂowing between modules endows FBP with the loose

coupling property, ensuring a reduced dependency be-

tween modules (Steinseifer, 2009).

Figure 1: General outline of FBP.

The FBP model originated from the perception

that synchronous communication, used by most pro-

gram designers as the only way to obtain predictable

behavior from the code, usually imposes unnecessary

constraints upon the application, thus compromising

its performance (Morrison, 1994). In fact, the only

aspect required for producing predictable code is that

FLOW-BASED PROGRAMMING AS A SOLUTION FOR CLOUD COMPUTING REQUIREMENTS

433

dependent operations must occur in the right order.

When we build an application by connecting indepen-

dent process, we can ensure the previous condition

simply by making the correct connections between

the chosen modules.

In this environment free of side-effects and un-

necessary synchronism, program designers can build

their code by assembling reusable modules and avoid

most sources of complexity appearing in existing

techniques for building distributed applications. In

other words, programmers do not have to insert com-

plex codes and synchronism to deal with recurrent un-

expected behaviors, but can rather focus on what the

current module should do. FBP can thus help pro-

grammers to build modules with different sizes, giv-

ing them freedom to choose the desired granularity of

these modules.

3 RELATED IMPLEMENTATIONS

Even though FBP’s main asset is to improve scala-

bility and parallelism, its concepts can be used to de-

velop wide range of systems. In the following, we

describe three systems focused on such requirements

– Live Distributed Objects (LDO) (Ostrowski, 2008),

Microsoft Orleans (Bykov et al., 2010), and Yahoo

S4 (Neumeyer et al., 2010) – and then we discuss how

they use FBP principles in their conception. These

solutions were chosen due their visibility, but other

systems based on FBP exist.

3.1 Live Distributed Objects (LDO)

Live Distributed Objects (LDO) is a platform that

aims to allow the development of distributed sys-

tems that can propagate decisions in a transparent

manner (Ostrowski, 2008; Ostrowski et al., 2007).

LDO relies on the concept of Live Objects, which

are object replicas distributed among network com-

ponents that may reside in different physical loca-

tions. These replicas coordinate themselves via mul-

ticast messages, thus maintaining a well deﬁned and

consistent state. The multicast layer that gives sup-

port to this communication is the Quicksilver Scal-

able Multicast (QSM) (Ostrowski et al., 2008), which

is said to be highly optimized and to deliver improved

scalability.

LDO implements an hierarchical structure of dis-

tributed protocols, using a set of mathematical oper-

ations for performing consistent transformations on

data and to deﬁne atomic descriptions of protocols’

states. It also employs algorithms that are respon-

sible for locking methods and the election of leader

nodes in this hierarchy. The resulting architecture is

based on two main elements: the Membership Service

(MS), which is basically a local coordination service;

and the Delegation Authority (DA), a local service

that works together with the MS and is responsible

for keeping the leaders’ information, thus deﬁning a

nested hierarchy.

The main motivation for using MS instead of cen-

tralized coordination is that the latter leads to a bot-

tleneck, preventing the creation of scalable systems;

meanwhile, MS can maintain local state information

and repair the local data structures, informing local

members and structure leaders nearby about its con-

dition. Each Live Object, can only communicate with

other architecture levels through the leader’s infor-

mation contained in its DA. The recursive architec-

ture of LDO is exempliﬁed in Figure 2, in which

the arrows indicate communication between two con-

nected structures (each level of the hierarchy has a

pair DA/MS).

Figure 2: LDO hierarchical stack of structures. Adapted

from (Ostrowski et al., 2009a).

Finally, it is interesting to notice that LDO’s

authors claim that the compiler should be the one

responsible for dealing with physical aspects such

as error handling, timeouts, network topology and

construction of scalable structures (Ostrowski et al.,

2009b). This concept highlights the notion that the

development of network applications have many pe-

culiarities, especially because, when dealing with dis-

tributed system, program designers still have to care

about network details that long ago became transpar-

ent in other types of implementations.

3.2 Microsoft Orleans

Microsoft Orleans’s a software framework for build-

ing client and cloud applications (Bykov et al., 2010).

It aims to simplify programming distributed applica-

tions restricting concurrency.

Orleans is built around a unit of computation

called grain, deﬁned as an encapsulated sequential

process with a private state that communicate ex-

clusively by sending messages. The parallelism of

the system is achieved via the production of multiple

replicas (activations) of a grain. In this manner, de-

spite being internally sequential, multiple messages

ICORES 2012 - 1st International Conference on Operations Research and Enterprise Systems

434

addressed to a speciﬁc grain can be processed simul-

taneously by two different activations. Activations

can reside in different physical locations, enabling lo-

cality awareness in a manner that is completely trans-

parent to the programmer.

Orleans message exchange is asynchronous and

based on the promise concept. Basically, when an ac-

tivation receives a message, it returns a promise that it

will fulﬁll that request, and schedule the processing.

When the processing is concluded, or a failure occurs,

the promise is ﬁlled with a state, which can assume ei-

ther “fulﬁlled” or “broken” value. Based on how the

promise was ﬁlled, the requester can make decisions

just like in traditional try and catch blocks.

This system based on promises is a solution for

achieving asynchronous communication without los-

ing control of the processing advancement, while al-

lowing the information about failures during proce-

dures to be gathered and creating the possibility of

handling exceptions in a highly distributed environ-

ment.

Another aspect that reﬂects Orleans’ concern with

error handling is the adoption of a LightWeight Opti-

mistic Distributed Transactions (LWODT) system. In

LWODT, transactions are procedures that gather all

activations included in a processing and ensure that

the computation occurs with non-strict ACID (Atom-

icity, Consistency, Isolation and Durability) proper-

ties (Bykov et al., 2010), i.e., instead of using com-

plex heuristic procedures to guarantee a consistent

processing, Orleans detects when a processing be-

come inconsistent and rolls back. The resulting sys-

tem is optimistic because it takes advantage of the

fact that most processes complete successfully. Thus,

since a strict heuristic system would be unnecessary

and inefﬁcient in this context, Orleans can explore

trade-offs between consistency and scalability. Fig-

ure 3 shows an overview of Orleans’ message ex-

change and LWODT system.

Figure 3: Overview of Orleans’ message exchange (Bykov

et al., 2010).

Figure 3 exempliﬁes an external request that

reaches an activation – represented by a circle – a

of grain 1, denoted as 1a. This external request in-

volves processing data in grains 1, 2 and 3. In order

to perform this processing in atomically, grain 1 per-

form two actions. Initially, it enlists itself in a empty

transaction. Secondly, it sends an asynchronous mes-

sage that contains a request for the activation of grains

2 and 3 and also the transaction reference. Using

this reference, grains 2 and 3 enlist themselves to

the transaction that becomes complete, containing all

grains required to processing completion. If Orleans

runtime detects no inconsistencies between the new

states of grains, it sends a pack of data to be persisted

using a AzureSQL storage system.

3.3 Yahoo! S4

The Simple Scalable Streaming System (S4) is a Dis-

tributed Stream Computing Platform developed by

Yahoo!. It was conceived for solving problems rel-

ative to search applications that use data mining and

machine learning algorithms such as Yahoo!, Bing

and Google. Additionally, in order to render the most

relevant ads in search pages, S4 was designed to be

capable of processing thousands of queries per sec-

ond (Neumeyer et al., 2010).

The computation units in the S4 platform are

called Processing Elements (PEs), which are basi-

cally black-box processes with strictly private states.

PEs are deﬁned by the types of events it consumes,

by when these elements should be consumed and by

what kind of processing will be performed.

S4 considers that stream processing systems can

be simpliﬁed by the absence of central or specialized

nodes. For this reason, it adopts a symmetric ap-

proach, avoiding issues related to bottlenecks on cen-

tral decision units. This approach allows the creation

of large systems with high maintainability, which be-

have as pluggable and auto-conﬁgurable platforms.

Figure 4: Yahoo! S4 architecture view.

Figure 4 shows the S4 architecture. This architec-

ture is basically composed by Processing Nodes over

a Communication Layer, thus hiding communication

details from developers. Processing nodes receive the

FLOW-BASED PROGRAMMING AS A SOLUTION FOR CLOUD COMPUTING REQUIREMENTS

435

uncontrolled data ﬂow that passes through a chain of

PEs. After that, one of the following must occur: ei-

ther a message is sent requesting the activation of an-

other processing node or results are published by the

Emitter. Transport, failure treatment and load balance

issues are all managed by the lower layer, abstracting

these details from the programming model. Hence,

programmers need to deal basically with the conﬁgu-

ration of the PE chains’ behavior.

It is important to highlight that S4 was designed

for the context of processing user queries. In this

context, the processing system has no control over

the ﬂow of data, leading to an architecture that de-

grades gracefully when discarding excessive data is

necessary. In addition to this, S4 works with a

lossy failover, according to which nodes are sent to

a standby server after they fail, losing their memory

state during this handoff. The regeneration process af-

ter failures can only be performed by the input stream,

i.e. by data coming from in ports. This is not accept-

able in some applications (Neumeyer et al., 2010), for

which other mechanisms would be needed.

3.4 Comparisons and Comments

Despite their differences, the aforementioned solu-

tions have many characteristics deeply related to FBP.

An interesting issue that can be noticed almost im-

mediately is that FBP treats the concept of state dif-

ferently than Orleans and LDO do. Live Objects

and Grains have well deﬁned states, which are per-

sisted after processing, while FBP modules work

much more like functions, depending only on input

data to deﬁne their behavior. This model endows

FBP with the loose coupling property, which leads

to a low dependency between linked modules. This

different treatment, however, does not imply the ab-

sence of loose coupling property on the aforemen-

tioned works, since this concern is present in the def-

inition of isolated computation units. Indeed, these

units have private states that can only be indirectly ac-

cessed via asynchronous messages, which works sim-

ilarly to FBP’s black-box processes.

The deﬁnition of the computation units with pri-

vate state shall be the ﬁrst step of every distributed

system design. Knowing the behavior of units and

their communication capabilities allow developers to

address their concerns about what type of control will

be needed to maintain desired characteristics such as

error handling, scalability and replication.

Regarding the modeling of the control system,

LDO and S4 defend the adoption of a fully dis-

tributed control unit, eliminating the need of special-

ized nodes. Orleans works this issue in detail, not-

ing that despite being a highly distributed environ-

ment, it needs a central unit to keep information for

its distributed transactions; this central unit uses dis-

tributed and shared hash tables allowing grains’ acti-

vations to coordinate themselves. In fact, as pointed

in FBP, distribution is a natural consequence of elimi-

nating unnecessary points of synchronism. To achieve

a synchronous system, it is necessary to keep a lot of

information about related computation units’ states,

leading to a central control unit that possess all this

information. Thus, after designing isolated computa-

tion units, the goal must be the creation of a highly

asynchronous system, which will eventually lead to

distributed coordination.

The way that computation units are linked and

organized is discussed in (Ostrowski et al., 2009b)

and (Bykov et al., 2010). Both works adopt a hier-

archical and nested architecture. Orleans’ concept of

grain domains can even be used to address security

issues, considering that it is possible to restrict mes-

sages trafﬁc based on relationships between grains.

Grains are only capable of communicating with other

grains that are on the same domain or with explicitly

declared external grains. This grain reference con-

cept is directly related to the port references adopted

in FBP.

The three solutions presented, even if indirectly,

aim to enable an easier and more intuitive experience

when working with distributed systems. This goal is

closely related to that of creating systems with high

maintainability, as noticed by the careful treatment

given to error handling on these works. Such concerns

are a natural consequence of the increasing need for

highly available systems, capable of providing Inter-

net services in the same way as electricity services.

As aforementioned, the usage of FBP concepts

does not allow only to achieve better distributed char-

acteristics but also replication, error handling, and

scalability. These characteristics are basic require-

ments for cloud computing systems. As a system

closely related to Microsoft Windows Azure (Li,

2009), Orleans is already directly addressed to cloud

computing environment (MSDN, 2011). S4 and

LDO can also be related to cloud computing, since

most of the issues addressed by their developers

perfectly match important cloud computing require-

ments. Achieving a solid fault-tolerant system in a

distributed environment is challenging due to the ab-

sence of a central control entity that can analyze a

complete view of system’s state and take the appropri-

ate decision. Speciﬁcally, Orleans transaction system

offers a very interesting solution to this issue, which is

detailed in (Bykov et al., 2010). Table 1 summarizes

the relations between the discussed solutions.

ICORES 2012 - 1st International Conference on Operations Research and Enterprise Systems

436

Table 1: Characteristics of implementations using FBP.

Solution Main goal Programming model Processing unit Architecture Communication Error

model organization handling

LDO scalability assembly of Live Object Hierarchical and nested QSM local reboot

(Ostrowski, 2008) existing objects DA/MS pairs by MS

Microsoft scalability grains grain Hierarchical and nested asynchronous LWODT

Orleans & mobility speciﬁcation activation grains domains messages

(Bykov et al., 2010)

Yahoo! S4 scalability PEs Processing Processing nodes uncontrolled ﬂow lossy

(Neumeyer et al., 2010) speciﬁcation Elements over communication layer through PEs failover

4 THE ROLE OF FBP IN A

CLOUD COMPUTING

Cloud computing refers not only to the applications

delivered as services over the Internet, but also to the

platform/hardware and system software in the data

centers that provide those services (Armbrust et al.,

2009). The infrastructure that lies under this term

brings advantages to end-users and developers, as

well as to the cloud providers that maintains this in-

frastructure.

The main promise of Cloud Computing is to pro-

vide end-users with applications having an unforeseen

availability. As a result, the access to Internet services

for cloud users should achieve the status of an essen-

tial service, most like the access to electricity and san-

itation. Meanwhile, developers with innovative ideas

would no longer need to invest a large capital outlay

for acquiring and operating the hardware required by

their services.

Cloud providers could take advantage of the

growth in the use of web services for providing low-

cost infrastructure for developers, thus generating

larger proﬁts from their installations. However, many

challenges are linked to this structure, including how

applications must be developed and how the runtime

must work to obtain the expected result. Basically, the

main cloud requirements for deploying cloud applica-

tions are (Rimal et al., 2009; Gonzalez et al., 2011):

1. Availability: services should be available all time;

2. Locality and replication: availability and perfor-

mance should not depend on users’ locations;

3. Maintainability: developers must be allowed to

perform major updates and ﬁxes with low effort;

4. Scalability: Quality of Service (QoS) needs to be

assured even under high demand;

5. Security: security-related issues such as data pro-

tection against unauthorized access need to be

dealt with;

6. Data loss avoidance: data should be processed

and persisted in a safe manner, preventing data

losses.

It comes with no surprise that all these require-

ments have been considered in the works discussed

on Section 3. What is notable, however, is that ac-

complishing these requirements can be seen as conse-

quence of using FBP concepts.

Isolation is the main responsible for providing

availability in FBP. FBP modules can be easily re-

booted in case of failure, since the programming

model is designed to avoid side-effects between mod-

ules, as discussed in depth in (Armstrong, 2007b); in

addition, these characteristics greatly simplify debug-

ging processes, an important aspect in highly avail-

able systems. Indeed, the model adopted by Mi-

crosoft’s Orleans for consistency- and error-handling

shows that the isolation of grains activations allowed

the use of distributed transactions, a traditional con-

cept that became difﬁcult to implement using tech-

nologies such as OOP due to the complex interdepen-

dencies between objects.

FBP’s isolated modules also support the develop-

ment of simple replication techniques, as the consis-

tency between multiple instances of a module pre-

cludes the need for complex control mechanisms (Os-

trowski et al., 2009b; Bykov et al., 2010). The exis-

tence of consistent distributed instances of modules,

together with location-aware mechanisms for content

replication, allows the provision of content with the

same QoS level for millions of users spread across the

globe. The management of multiple module instances

in a highly distributed environment is achieved by

means of well deﬁned ports, which not only allows

the different instances to be referenced (Morrison,

1994), but can also be used for improving the sys-

tem’s location awareness. For example, a system hav-

ing ports with well deﬁned locations can keep infor-

mation about the amount of data traveling between

these points and, thus, it becomes natural to build al-

gorithms to relocate or replicate high demanded mod-

ules for improving performance and ensuring QoS.

FLOW-BASED PROGRAMMING AS A SOLUTION FOR CLOUD COMPUTING REQUIREMENTS

437

Regarding maintainability, the adoption of a dis-

tributed control model for the system can become a

problem, since a possible modiﬁcation in the system

must be propagated through modules with no central

communication point. On the other hand, FBP view

of black-box modules linked by dynamic connections

simpliﬁes the maintenance task. A new version of any

module can be developed and the only intervention to

the system is updating the ports’ addresses so they

refer to the new version. For example, in Microsoft’s

Orleans model, this would be equivalent to modifying

an activation ID written on the distributed hash table.

Scalability is one major concern in the context of

cloud computing, and receives especial attention on

all implementations discussed in section 3. Here, we

discuss this requirement under two aspects. The ﬁrst

refers to the avoidance of unnecessary synchronism,

along with the adoption of distributed control. The

second is related to the system’s granularity, which

also impacts on system’s performance.

Spreading an application across the globe using

the cloud structure faces some essential challenges.

Streams of data may take tens of milliseconds to be

delivered, which is likely to introduce signiﬁcant im-

pacts on the performance of delay-sensitive applica-

tions (Ostrowski et al., 2007) and, similarly, impair

processes based on a series of synchronous requests

and responses. In this scenario, asynchrony assumes

thus a major role, attenuating these effects by maxi-

mizing the system’s throughput while minimizing the

accumulation of transport delays. Coupled with asyn-

chrony, the adoption of distributed control can reduce

the trafﬁc of control data, since structures are con-

trolled locally (Morrison, 2003). At the same time,

local coordination can simpliﬁes eventual needs for

synchronous control signals, avoiding the creation of

new sources of delay. Therefore, the widely adopted

“request & response” model for developing web ap-

plications, despite its simplicity, has inherent limita-

tions.

Granularity refers to the size of the system’s mod-

ules, i.e., the amount of data that deﬁnes a module

state. For example, a system that processes large

graphs, composed by thousands of nodes, can be de-

signed in a variety of granularities: developers may

adopt a ﬁne granularity model, making each module

store the state of one node, or a coarse granularity

model in which hundreds of nodes are encapsulated

in the same node (Bykov et al., 2010). However,

the former approach could become a major concern

when developing a distributed application. Still in the

large graph example, if some process needs to access

tens of the system’s nodes, the ﬁne granularity ap-

proach would lead to high overhead due to the need

of assembling data from several different modules; in

comparison, a coarse-grained distribution with ade-

quate division among nodes would not face this prob-

lem.However, an application with another kind of

data may lead to a different scenario. For example,

we can imagine a social network in which a module

is associated to a single visitor proﬁle to a large group

of them. In this case, the ﬁne granularity approach is

likely to ﬁt better, since a coarse-grained distribution

would result in internal searches when trying to lo-

cate the data from some speciﬁc proﬁle, and also lead

to unnecessary replication of proﬁles if there is a high

demand for a single user’s data.

Hence, we can analyze granularity from two dif-

ferent perspectives. First, granularity can be seen

as a trade-off between maintainability and perfor-

mance (Morrison, 1994). This vision is based on the

fact that, if a sequential process is divided, it will add

API calls for passing data through the slices, delay-

ing the whole process. On the other hand, smaller

modules increase maintainability, allowing a more

focused work on speciﬁc parts of the process, and

can improve modules’ reuse. Second, considering

a parallel processing context, granularity can be de-

scribed as a trade-off between the level of paral-

lelism and state locality needed for efﬁcient computa-

tions (Bykov et al., 2010). This statement can be bet-

ter understood considering that, when a ﬁne-grained

distribution is adopted, a higher level of parallelism is

achieved due to the possibility of using more special-

ized processes; in the case of Microsoft’s Orleans, for

example, this concept is related to the ability to create

new activations that supply the need for some speciﬁc

piece of information.

Unlike the other requirements, security is not con-

sidered in depth by the implementations discussed in

section 3. This also applies to the security-related as-

pect of data loss avoidance, albeit Microsoft’s Orleans

developers claims that, with the addition of a dis-

tributed storage system, this solution is adequate for

highly secure and efﬁcient persistence of data. Even

though the relationship between FBP concepts and se-

curity in cloud computing remains a subject for fur-

ther study, concepts such as isolation and the capabil-

ity to create subsystems allows the establishment of

boundaries to ﬂows of data, thus facilitating the ap-

plication of security principles (Mather et al., 2009;

Kaufman, 2009; CSA, 2011).

4.1 FBP and MapReduce

MapReduce is a programming model, as well as an

associated implementation for processing and gener-

ating large data sets. It is considered the standard

ICORES 2012 - 1st International Conference on Operations Research and Enterprise Systems

438

solution for processing large amounts of data dis-

tributed across thousands of machines and can thus

be seen as a native cloud computing solution. As FBP,

MapReduce was inspired on functional programming,

more speciﬁcally on map and reduce functions that

are present on Lisp and many other functional lan-

guages (Dean and Ghemawat, 2008).

As a consequence of its functional behaviour,

MapReduce achieves many of the characteristics

aforementioned for FBP, such as side-effects avoid-

ance and high scalability. Even though these two

models may look as suitable alternatives for devel-

opers when designing a distributed system, we be-

lieve that they can actually be used together as pow-

erful tools to achieve maintainable code. Basically,

MapReduce proposes the use of two independent sets

of workers. The ﬁrst group consumes the data seg-

ments tagging and organizing them by similar terms

based on some comparison criteria, then the second

set of workers consumes those similar terms and pro-

duces an uniﬁed result containing information about

that speciﬁc group (e.g., the number of terms in this

group). This apparently simple approach fulﬁlls some

cloud computing requirements such as fault tolerance

and scalability: in case of failure, a worker can be

reset and its task can be rescheduled and relocated

to others workers; furthermore, more workers can be

spawned rapidly to increase processing capabilities.

In essence, both FBP and MapReduce are similar

in terms of data treatment. Both models deﬁne private

states for improved data consistency, and consider

that asynchronous communication is a natural way to

achieve scalability. While the FBP model works on

modules’ isolation and on the need of asynchronous

communication, discussing in depth how this com-

munication can be organized, MapReduce proposes

a system for splitting, processing and reassembling

streams of data in a consistent and scalable manner,

using isolation and asynchronous communication as

simple tools. Thus, concepts adopted by the FBP

model – such as the deﬁnition of ports, asynchronous

data ﬂow between modules and a distributed control

unit – work very well MapReduce main idea. In-

deed, both approaches describe scalable communica-

tion, distributed processing, practices for consistency

of data and locality awareness through ports deﬁni-

tion.

In conclusion, FBP can be seen as a collection

of programming practices necessary for enabling the

use of efﬁcient distributed processing models such as

MapReduce.

5 RELATED WORK

As aforementioned, there are works closely related to

FBP written under the term ’dataﬂow architectures’.

The difference between the two terms can be more

easily understood through the LDO example. LDO

is considered a dataﬂow architecture work, since its

main objective is to establish algorithms for data con-

sistency and atomicity; for this reason, the major part

of Ostrowski’s work is focused on the mathematical

treatment of data. In contrast, FBP focus on setting a

generic programming model, which can take advan-

tage of several data consistency algorithms.

In (Steinseifer, 2009), Steinseifer discusses FBP

in detail, clarifying the main concepts and provid-

ing performance information about JavaFBP imple-

mentation. Moreover, Larus (Gupta and Larus, 2010)

highlight some motivations for the adoption of Cloud

Computing, highlighting some of its main require-

ments and discussing prevailing challenges like con-

currency, parallelism, application partitioning, high

availability and distribution. Nonetheless, to the best

of our knowledge, the advantages of adopting FBP

model in the context cloud computing for overcoming

inherent limitations of other programming paradigms

have not been previously explored in the literature.

6 CONSIDERATIONS AND

FUTURE WORK

Albeit the research on Cloud Computing is still in

its infancy, many crucial and challenging require-

ments that must be accomplished by such solutions

have already been deﬁned in the literature. In this

context, FBP presents itself as a powerful tool for

developing systems and services capable of meet-

ing these challenges, allowing simple solutions for

distributed systems’ classic problems and overcom-

ing several limitations of more conventional program-

ming paradigms. This comes from the fact that FPB

relies on a set of concepts that aim at improved scal-

ability and avoidance of side-effects, which are es-

sential in highly distributed systems. This conclu-

sion is supported by the study of platforms espe-

cially focused on cloud applications, such as LDO,

Microsoft’s Orleans and Yahoo!’s S4 solutions.

It is notable that FBP concepts can be imple-

mented in a variety of ways, being able even to

encapsutate others programming paradigms, orga-

nizing an heterogen scenarios in a solid model. We

argue that FBP as a solution to distributed systems

requirements, due to its generality and absence of in-

consistencies while dealing with parallel processing.

FLOW-BASED PROGRAMMING AS A SOLUTION FOR CLOUD COMPUTING REQUIREMENTS

439

We believe that a deeper study is needed in order

to address FBP security concerns in a cloud comput-

ing environment and how FBP can assist in solving

problems related to distributed coordination and asyn-

chrony. Distributed environments with asynchronous

message passing shows a completely different sce-

nario to developers, while some security vulnerabil-

ities will not be applicable anymore, several others

shall arise.

A deeper study of how granularity affects the FBP

under a distributed scenario is needed in order to sup-

port applications’ design. As exposed in Section 4,

granularity control interferes deeply on systems’ per-

formance. It is necessary to clarify where and when a

certain granularity level should be adopted, enabling

a more accurate distributed applications design.

ACKNOWLEDGEMENTS

This work was supported by the Research and De-

velopment Centre, Ericsson Telecomunicações S.A.,

Brazil.

REFERENCES

Armbrust, M., Fox, A., Grifﬁth, R., Joseph, A. D., Katz,

R. H., Konwinski, A., Lee, G., Patterson, D. A.,

Rabkin, A., Stoica, I., and Zaharia, M. (2009). Above

the clouds: A berkeley view of cloud computing.

Technical report, EECS Department, University of

California, Berkeley.

Armstrong, J. (1997). The development of erlang. In Proc.

2nd ACM SIGPLAN ICFP, ICFP ’97, pages 196–203,

NY, USA. ACM.

Armstrong, J. (2007a). A history of erlang. In Proc. 3rd

ACM SIGPLAN HOPL, HOPL III, pages 6–1–6–26,

New York, NY, USA. ACM.

Armstrong, J. (2007b). Programming Erlang: Software for

a Concurrent World. Pragmatic Bookshelf, 1 edition.

Bykov, S., Geller, A., Kliot, G., Larus, J. R., Pandya, R.,

and Thelin, J. (2010). Orleans: A framework for cloud

computing.

CSA (2011). Security guidance for critical areas focus in

cloud computing.

Dean, J. and Ghemawat, S. (2008). MapReduce: simpliﬁed

data processing on large clusters. In OSDI ’04, pages

137–150.

Gonzalez, N., Miers, C., Redígolo, F., Simplício, M., Car-

valho, T., Naslund, M., and Pourzandi, M. (2011). A

taxonomy model for cloud computing services. In

Proceedings of CLOSER 2011, Noordwijkerhout, The

Netherlands. INSTICC.

Gupta, R. and Larus, J. (2010). Programming clouds.

In Compiler Construction, volume 6011 of Lecture

Notes in Computer Science, pages 1–9. Springer

Berlin / Heidelberg.

Kaufman, L. M. (2009). Data security in the world of cloud

computing. IEEE Security and Privacy, 7(4):61–64.

Li, H. (2009). Introducing Windows Azure. Apress, Berkely,

CA, USA.

Mather, T., Kumaraswamy, S., and Latif, S. (2009). Cloud

Security and Privacy: An Enterprise Perspective on

Risks and Compliance. O’Reilly Media, 1 edition.

Morrison, J. P. (1994). Flow-Based Programming: A New

Approach to Application Development. Van Nostrand

Reinhold, 1st edition.

Morrison, J. P. (2003). Asynchronous component-based

programming.

Morrison, J. P. (2005). Patterns in ﬂow-based programming.

MSDN (2011). Orleans: A framework for scalable

Client+Cloud computing.

Neumeyer, L., Robbins, B., Nair, A., and Kesari, A.

(2010). S4: Distributed stream computing platform.

In Data Mining Workshops, International Conference

on, pages 170–177, CA, USA. IEEE Computer Soci-

ety.

Ostrowski, K., Birman, K., and Dolev, D. (2007). Live dis-

tributed objects: Enabling the active web. IEEE IC,

11:72–78.

Ostrowski, K., Birman, K., and Dolev, D. (2008). Quicksil-

ver scalable multicast (QSM). In Network Computing

and Applications, pages 9–18, CA, USA. IEEE CS.

Ostrowski, K., Birman, K., and Dolev, D. (2009a). Dis-

tributed data ﬂow language for multi-party protocols.

In Proc. 5th Workshop on Programming Languages

and Operating Systems, PLOS ’09, page 7:1–7:5, New

York, NY, USA. ACM.

Ostrowski, K., Birman, K., Dolev, D., and Sakoda, C.

(2009b). Implementing reliable event streams in large

systems via distributed data ﬂows and recursive dele-

gation. In Proc. 3rd ACM - DEBS, page 15:1–15:14,

New York, NY, USA. ACM.

Ostrowski, K. J. (2008). Live distributed objects. PhD the-

sis, Cornell University, Ithaca, NY, USA.

Rimal, B. P., Choi, E., and Lumb, I. (2009). A taxon-

omy and survey of cloud computing systems. In Net-

worked Computing and Advanced Information Man-

agement, International Conference on, pages 44–51,

Los Alamitos, CA, USA. IEEE Computer Society.

Steinseifer, S. (2009). Evaluation and Extension of an Im-

plementation of Flow-Based Programming. PhD the-

sis, Fachhochschule Gieÿen-Friedberg.

ICORES 2012 - 1st International Conference on Operations Research and Enterprise Systems

440