By leveraging on the policies of proactive and reactive
methods, several architecture models have also been
proposed to provide fault tolerance (Cheraghlou et al.,
2016).
Different challenges in the context of Cloud com-
puting can be encountered both during and after the
deployment of applications. Specifically, manual de-
ployment of large-scale distributed systems is time-
consuming and error-prone (Hamilton, 2007) (Leite
et al., 2014). That’s the reason why the deployment
process must be automated and resilient. Speaking
of automation, BPEL and BPMN are the most ap-
plied standards for service composition and orchestra-
tion (Vargas-Santiago et al., 2017). Three basic fault
handling concepts are provided by the BPEL engine:
compensation handlers, fault handlers, and event han-
dlers. Nevertheless, BPEL only manages predefined
faults specified by application designers. Similarly,
the BPMN engine provides error events, cancel events
and compensation events.
The scientific community has also taken an in-
terest in research proposals for fault-tolerant frame-
works. In (Varela-Vaca et al., 2010), a framework
for developing business processes with fault tolerance
capabilities was provided. The framework presents
different mechanisms in the fault tolerance scope,
contemplating both replication solutions and soft-
ware fault-tolerant techniques. In (Jayasinghe et al.,
2013), the authors presented a fault-tolerant runtime
(AESON) to recover applications from failures that
could possibly happen during and after deployment.
Three types of failures are supported: node crashes,
node hangs and application component failures. Even
though deployment failures were addressed, this work
suffers from two major drawbacks: a) AESON was
designed as a P2P system, and b) application mod-
els are not TOSCA-compliant. In (Giannakopoulos
et al., 2017), a deployment methodology with error
recovery features was proposed. It bases its function-
ality on identifying the script dependencies and re-
executing the appropriate configuration scripts. Nev-
ertheless, this approach can only resolve transient fail-
ures occurring during the deployment phase.
3 THE TOSCA SPECIFICATION
TOSCA is the acronym for Topology and Orchestra-
tion Specification for Cloud Applications. It is a stan-
dard designed by OASIS to enable the portability of
Cloud applications and the related IT services (OA-
SIS, 2013). This specification permits to describe
the structure of a Cloud application as a service tem-
plate, that is composed of a topology template and
the types needed to build such a template. The topol-
ogy template is a typed directed graph, whose nodes
(called node templates) model the application com-
ponents, and edges (called relationship templates)
model the relations occurring among such compo-
nents. Each node of a topology can also be asso-
ciated with the corresponding component’s require-
ments, the operations to manage it, the capabilities it
features, and the policies applied to it.
TOSCA supports the deployment and manage-
ment of applications in two different flavours: im-
perative processing and declarative processing. The
imperative processing requires that all needed man-
agement logic is contained in the Cloud Service
Archive (CSAR). Management plans are typically im-
plemented using workflow languages, such as BPMN
or BPEL. The declarative processing shifts manage-
ment logic from plans to runtime. TOSCA run-
time engines automatically infer the corresponding
logic by interpreting the application topology tem-
plate. The set of provided management functionali-
ties depends on the corresponding runtime and is not
standardized by the TOSCA specification.
The TOSCA Simple Profile is an isomorphic ren-
dering of a subset of the TOSCA specification (OA-
SIS, 2013) in the YAML language (OASIS, 2017).
The TOSCA Simple Profile defines a few normative
workflows that are used to operate a topology, and
specifies how they are declaratively generated: de-
ploy, undeploy, scaling-workflows and auto-healing
workflows. Imperative workflows can still be used for
complex use-cases that cannot be solved in declara-
tive workflows. However, they provide less reusabil-
ity as they are defined for a specific topology rather
than being dynamically generated based on the topol-
ogy content. Moreover, by default, any activity failure
of the workflow will result in the failure of the whole
workflow. Although some constructs (e.g., on failure)
allow to execute rollback operations, neither policies
nor mechanisms are defined to automatically recover
from failures happening during the deployment of the
topology.
The work described in this paper heavily grounds
on the TOSCA standard and, specifically, on the
TOSCA Simple Profile.
4 ANALYSIS OF FAULTS IN
SERVICE PROVISIONING
IaaS providers usually allow to create and manage
Cloud resources using web-based dashboards, CLI
clients, REST APIs, and language-specific SDKs. Al-
though all the aforementioned approaches provide ac-
CLOSER 2018 - 8th International Conference on Cloud Computing and Services Science
534