Anything to Topology - A Method and System Architecture to Topologize

Technology-speciﬁc Application Deployment Artifacts

Christian Endres, Uwe Breitenb

ucher, Frank Leymann and Johannes Wettinger

Institute of Architecture of Application Systems, University of Stuttgart, Stuttgart, Germany

Keywords:

Application Deployment, Topology Crawling, TOSCA, Conﬁguration Management, Chef.

Abstract:

In recent years, many application deployment technologies have emerged such as conﬁguration management

tools, e.g., Chef and Juju, infrastructure and platform technologies, e.g., Cloud Foundry and OpenStack, as

well as container-based approaches, e.g., Docker. As a result, many repositories exist which contain executable

and heavily used artifacts that can be used with these technologies, e.g., to deploy a WordPress application.

However, to automate the deployment of more complex applications, typically, multiple of these technologies

have to be used in combination. Thus, often, diverse artifacts stored in different repositories need to be

integrated. This requires expertise about each technology and leads to a manual, complex, and error-prone

integration step. In this paper, we tackle these issues: We present a method and system architecture that enables

crawling repositories in order to transform the contained artifacts into technology-agnostic topology models,

each describing the components that get installed as well as their dependencies. We show how these topologies

can be combined to model the deployment of complex applications and how the resulting topology can be

deployed automatically by one runtime. To prove the feasibility, we developed and evaluated a prototype

based on the TOSCA standard and conducted a case study for Chef artifacts.

1 INTRODUCTION

In recent years, Cloud Computing gained a lot of

attention as it helps to achieve ﬂexible IT opera-

tion (Leymann, 2009). To automate the deployment

of Cloud applications, besides the proprietary APIs

offered by providers, many additional technologies

have been developed that focus on different kinds of

functionality. Among these technologies there are,

e.g., several conﬁguration management tools, e.g.,

Ansible, Chef, Juju, and Puppet; infrastructure and

platform technologies, e.g., OpenStack and Cloud

Foundry; as well as container-based approaches, e.g.,

Docker. Due to the heavy usage of these technolo-

gies, many open-source repositories have emerged

that contain executable and heavily used artifacts that

can be used by these technologies to deploy the de-

sired application. For example, the Chef Supermar-

ket

contains a plethora of cookbooks that can be used

by the Chef runtime chef-client (Taylor and Vargo,

2014) to automatically deploy a certain application.

Thus, installing, for instance, a WordPress application

can be automated efﬁciently by reusing the cookbook.

https://supermarket.chef.io/

However, reusing such artifacts comes with two

challenges to be tackled: (i) Selecting appropriate ar-

tifacts often requires deep technology-speciﬁc exper-

tise to understand the effect of executing an artifact

and to conﬁgure the runtime correctly. For example,

if a Chef cookbook shall be used to deploy Word-

Press, the cookbook needs to be analyzed to ensure

that exactly the desired conﬁguration gets deployed.

In addition, the Chef runtime needs to be conﬁgured

to deploy the application to virtual machine(s). Un-

fortunately, efﬁciently getting a quick overview of the

components that get installed by an artifact and their

dependencies is often not possible without highly spe-

ciﬁc domain expertise – especially as intuitive graph-

ical tooling is missing in many technologies.

(ii) While understanding artifacts is a serious chal-

lenge, combining them to deploy non-trivial applica-

tions is another challenge that needs to be tackled in

real-world scenarios. For complex applications, typ-

ically, multiple management technologies have to be

integrated (Breitenb

ucher et al., 2013): the APIs of

Cloud providers must be invoked to deploy virtual

machines whereas conﬁguration management tech-

nologies, e.g., may be used to deploy the desired com-

ponents on the provisioned virtual machines.

180

Endres, C., Breitenbücher, U., Leymann, F. and Wettinger, J.

Anything to Topology - A Method and System Architecture to Topologize Technology-speciﬁc Application Deployment Artifacts.

DOI: 10.5220/0006305302080218

In Proceedings of the 7th International Conference on Cloud Computing and Services Science (CLOSER 2017), pages 180-190

ISBN: 978-989-758-243-1

However, such combinations often require enor-

mous expertise when multiple heterogeneous services

need to be orchestrated, low-level technologies have

to be wrapped, and diverse data formats must be inte-

grated – to name a few challenges (Eilam et al., 2011).

Thus, manually executing these steps is error-prone,

time-consuming, and, therefore, not efﬁcient (Breit-

enb

ucher et al., 2014). In this paper, we tackle these

issues by introducing the Topologize method.

We present the Topologize Method and System Ar-

chitecture that enables automated crawling of differ-

ent kinds of repositories, e.g., Chef Supermarket, in

order to transform the contained technology-speciﬁc

artifacts into technology-agnostic topology models.

Each generated topology model is a directed, labeled

graph describing the components that get installed by

a certain artifact as well as the relations between the

components. Thus, the generated topology models

ease understanding the functionality of artifacts since

graphs can be interpreted without requiring any ex-

pertise about the employed technology, the artifact,

and its serialization format. Moreover, we show that

these generated topology models can be combined in

a technology-agnostic manner to model the deploy-

ment and provisioning of complex applications us-

ing a single runtime. Thus, no manual integration of

different technologies is required if diverse artifacts

needs to be combined. To achieve this, we combine

our method and system architecture with the TOSCA

standard (OASIS, 2013b) that provides a sophisti-

cated means to integrate arbitrary kinds of manage-

ment technologies. To validate the practical feasibil-

ity of our approach, we developed a prototype that

is integrated with the OpenTOSCA Ecosystem (Binz

et al., 2013a), a standards-based implementation of

the TOSCA standard. Moreover, we conducted a case

study based on the conﬁguration management tech-

nology Chef to show how the presented architecture

and concepts can be applied to a technology.

The remainder of this paper is structured as fol-

lows. In Section 2, we motivate our work. In Sec-

tion 3, we introduce our Topologize method enabling

to crawl repositories and transform the contained ar-

tifacts into technology-agnostic topology models. In

Section 4, we present the Topologize System Architec-

ture that describes a system capable of automatically

executing this method. Section 5 introduces TOSCA.

To validate the feasibility of our approach, in Section

6, we describe a prototypical implementation of this

system architecture. In Section 7, we describe a case

study in which we apply our method to the conﬁgu-

ration management technology Chef and evaluate the

prototype. Section 8 describes related work, Section

9 concludes the paper and outlines future work.

2 MOTIVATION

Many deployment automation technologies, e.g., con-

ﬁguration management technologies such as Ansi-

ble (Mohaan and Raithatha, 2014), Chef (Taylor and

Vargo, 2014), or Puppet (Uphill, 2014) come with

huge open-source repositories that contain a plethora

of artifacts usable for deployment. Typically, these ar-

tifacts, e.g., scripts, have to be adapted, deployed, and

executed in correct order to install the desired applica-

tion. The Chef Supermarket is one example that pro-

vides cookbooks for installing different kinds of ap-

plications, e.g., middleware components or database

systems. Another example are GitHub repositories

containing source code of applications and scripts for

building and deploying the application. Furthermore,

the documentation about artifacts and how to execute

them is typically available in natural language. How-

ever, to ensure achieving the desired deployment and

installation, artifacts and their implications must be

analyzed and understood in detail to avoid undesired

conﬁgurations or – in general – undesired results. Un-

fortunately, correctly interpreting all effects of exe-

cuting such artifacts typically requires deep techni-

cal expertise about the used technology because the

mentioned technologies employ different approaches,

meta models, and serialization formats.

Especially, the heterogeneity and diversity of de-

ployment automation technologies lead to serious in-

tegration challenges if multiple technologies have to

be combined (Breitenb

ucher et al., 2013). To achieve

this, often the workﬂow technology is used for or-

chestration purposes (Arshad et al., 2007; Bellavista

et al., 2013; Breitenb

ucher et al., 2014; Keller and

Badonnel, 2004; Mietzner et al., 2009). However,

even if an orchestration approach is used for integrat-

ing different technologies, nevertheless, (i) the indi-

vidual artifacts and their effects must be understood

to achieve the desired goals, (ii) the orchestration ﬂow

must be speciﬁed, and (iii) wrappers need to be imple-

mented and conﬁgured. In addition, the used runtimes

must be installed, maintained, and updated, which

typically takes a serious amount of time (Brown and

Hellerstein, 2005). Thus, a normalized model is desir-

able that only describes the desired application and its

deployment without the technical details of the tech-

nologies used to deploy distinct parts of the model.

For many technologies, these artifacts are ﬁles that

reference other ﬁles, thus, the ﬁles are linked. How-

ever, inspecting all the possible dependencies manu-

ally to determine the components that get installed is

signiﬁcantly more error-prone, knowledge-intensive,

and time-consuming than having a short look on a

structured graphical diagram such as a topology.

Anything to Topology - A Method and System Architecture to Topologize Technology-speciﬁc Application Deployment Artifacts

181

Crawl

Artifacts

Extract

Components

Derive

Topology Models

Specify

Resources

Make Topology

Models Deployable

META

Local Disk

OpenTOSCA

Plugin

Chef

Plug-in

Figure 4: The prototype we implemented to validate the Topologize method.

6 PROTOTYPE

In Section 4, we introduced an architecture for the

Topologize method, cf. Section 3. In this section,

we describe our prototypical implementation for the

Topologize method using TOSCA, cf. Section 5. The

prototype is implemented using Java 8

. Details about

the Chef-speciﬁc implementation follow in Section 7.

For the step Specify Resources, the location def-

inition (1) comprises the artifact type, location, and

access protocol to a repository containing technology-

speciﬁc artifacts and information. This deﬁnition (2)

is provided to the Crawler, a Java program that ac-

cesses the repository using an appropriate plug-in.

The identiﬁed artifacts (3) are downloaded to disk and

some meta information, for example, name and ver-

sion are stored. This corresponds to the step Crawl

Artifacts. We implemented a corresponding plug-in

for Chef, details are provided in the next section.

Within the Topologizer, the Component Recog-

nizer and Builder accesses the disk and processes

the technology-speciﬁc artifacts. Each artifact is

unpacked and its contents are analyzed, cf. Sec-

tion 7. Then, (4) the artifact is transfered to a TOSCA

Node Type using TOSCAfy

. TOSCAfy is an open-

source framework for analyzing artifacts and gen-

erating TOSCA Node Types. The resulting CSAR

(5) contains the Node Type and is downloaded to

disk. The artifact is stored as Implementation Arti-

fact at the Node Type. Furthermore, meta data re-

sulting of the analysis (5) are stored in a MongoDB

database. MongoDB was chosen because of its abil-

ities to handle documents that are not mapped to a

relational scheme, e.g., TOSCA XML and meta data

JSON documents. This corresponds to the step Ex-

tract Components of the Topologize method.

https://www.java.com/

https://github.com/toscafy/

https://www.mongodb.com/

Within the Topologizer, available structure in-

formation in the found artifacts are processed by

the Topology Recognizer and Builder. For deriving

topology models, the exposed requirements are tried

to be satisﬁed with the capabilities of components

within Node Types that are found by the Component

Recognizer and Builder and located in the Compo-

nents Repository. Thus, due to this exploration, topol-

ogy models can be built. This is implemented with an

adaption of the depth-ﬁrst search (Tarjan, 1972).

Duplicate requirements stated by different arti-

facts within the same topology model are eliminated

according (Binz et al., 2013b). This may lead to a

topology model that does not reﬂect exactly the struc-

ture of the original artifact, but is necessary for not

representing duplicate requirements, e.g., Java, mul-

tiple times as a Node Template. Each derived topol-

ogy model is serialized using TOSCA, cf. Section 5,

for example, by using TOSCAfy to deﬁne and gen-

erate a TOSCA Deﬁnitions containing the topology

model, all components and relations, and the artifact

resources. Finally, the CSAR (6) is stored to the local

disk. This corresponds to the step Derive Topology

Models of the Topologize method.

Often, derived topology models are not provision-

able directly because of missing information. There-

fore, the CSARs are not only stored locally on disk,

but also (8) sent to the TOSCA Modeling Tool Win-

ery

enabling the graphical and user-friendly cus-

tomization of the derived topology models. Thus,

a user is enabled to inject credentials and even cus-

tomize the whole TOSCA Topology Template. More-

over, TOSCA completion algorithms can automati-

cally complete the topology model if components are

missing (Hirmer et al., 2014), for example, to inject a

Cloud provider Node Template which is typically not

described in Chef cookbooks. This corresponds to the

step Make Topology Models Deployable.

https://projects.eclipse.org/projects/soa.winery

Anything to Topology - A Method and System Architecture to Topologize Technology-speciﬁc Application Deployment Artifacts

185

Based on the standardized TOSCA metamodel,

topology models generated out of technology-speciﬁc

artifacts can be combined easily to compose more

complex applications: the crawled topology mod-

els can be used as building blocks in a technology-

agnostic manner for the development of new appli-

cations on the TOSCA-layer. The graphical model-

ing tool Winery (Kopp et al., 2013) can be used, e.g.,

to merge such topology models. Thus, the resulting

and merged topology models may contain Implemen-

tation Artifacts of different technologies, which is in-

herently supported by the TOSCA standard. To de-

ploy such merged topology models that contain Node

Templates and Types having Implementation Arti-

facts implemented in different technologies, we de-

veloped a plan generator (Breitenb

ucher et al., 2014)

within the OpenTOSCA Runtime Environment (Binz

et al., 2013a) that is capable of executing different

kinds of artifacts. The generator supports technolo-

gies, for example, Chef, Ansible, and Docker. Thus,

with our approach, artifacts of different technologies

can be combined in a technology-agnostic manner on

the TOSCA-layer, while the resulting topology model

can be deployed automatically by a single runtime –

in this prototype using the OpenTOSCA ecosystem

7 CASE STUDY: CHEF

In Section 6, we introduced a prototype implementing

generically the Topologize method. In this section,

we provide a case study showing how to apply and

implement the method for Chef cookbooks.

7.1 Specify the Chef Resources

In the ﬁrst step, we specify the artifact location and

type. The Chef Software Inc. itself provides a pub-

licly available repository for Chef cookbooks: the

Chef Supermarket

. Therefore, we speciﬁed the

HTTP API of the Chef Supermarket as location and

the artifact type cookbook. For crawling the Chef Su-

permarket, we implemented a respective plug-in that

processes the location speciﬁcation.

7.2 Crawl for Chef Cookbooks

In the second step, the Crawler searches for artifacts at

the speciﬁed location. The plug-in for the Chef Super-

market employs the JAX-RS Client API of Jersey

http://www.opentosca.org

https://supermarket.chef.io/

https://jersey.java.net/

download the cookbooks and retrieve meta informa-

tion. Most of the information and artifacts were ac-

cessible whilst some could not be processed because

these cookbooks could not be downloaded.

7.3 Extract the Components

In the third step, the found cookbooks are analyzed to

identify components, extracting characteristics, and

build TOSCA Node Types. A cookbook states its

attributes and templates, includes recipes and ﬁles,

and provides necessary extensions to the chef-client

for enabling it to instantiate the cookbook. There-

fore, such a cookbook correlates to a Node Type. The

detailed mapping of cookbooks to Node Types can

be found in (Wettinger et al., 2014b). For building

TOSCA Node Types, we used TOSCAfy.

Besides the transformation to a Node Type, the re-

quirement information of the cookbook have to be

extracted. The chef version, ohai version, and de-

pends are distinct requirements, but the supported

platforms have to be treated differently: a cookbook

instance cannot be installed on, e.g., Windows and

Linux at once, thus, these requirements have to be

treated mutually exclusive. Chef has no cookbooks

for installing, e.g., Linux, thus, the platform informa-

tion cannot be satisﬁed. These information are needed

later for deriving the topology models regarding the

capabilities and requirements of the cookbooks.

7.4 Derive Topology Models

In the third step, an extensive repository of TOSCA

Node Types is built up containing many cookbooks

with requirements. Thus, by traversing transitively all

requirements, cf. Section 6, topology models are con-

structed that represent the requirements graph of the

initial cookbook. Using TOSCAfy, the TOSCA Def-

initions containing the constructed topology model

and the cookbooks are packaged to a CSAR.

In Figure 5, a Chef metadata.json ﬁle is depicted

on the left side. This ﬁle deﬁnes the cookbook’s

name, version, some meta information, e.g., its de-

scription and license, and the cookbook’s dependen-

cies that refer to other cookbooks. In this example, the

java cookbook deﬁnes the cookbooks apt, homebrew,

and windows as requirements. Derived from these

information, the java component and its resolved re-

quirements are represented as Node Templates in a

Topology Template, of which a snippet is depicted on

the right. Further resolved requirements of cookbooks

are omitted for the sake of brevity.

CLOSER 2017 - 7th International Conference on Cloud Computing and Services Science

186

{

"name": "java",

"description": "Installs Java runtime.",

"version": "1.47.0",

"license": "Apache 2.0",

"dependencies": {

"apt": ">= 0.0.0",

"homebrew": ">= 0.0.0",

"windows": ">= 0.0.0"

}

dependsOn

java-1.47.0_NodeTemplate

(java-1.47.0_NodeType)

windows-2.1.1_NodeTemplate

(windows-2.1.1_NodeType)

apt-5.0.0_NodeTemplate

(apt-5.0.0_NodeType)

metadata.json

TOSCA Topology Template Snippet

Figure 5: A Chef metadata.json ﬁle mapped to a TOSCA Topology Template representation.

7.5 Make the Topology Models

Deployable

The constructed topology models are not provision-

able, because Chef presumes a bootstrapped environ-

ment that is not installed by Chef itself. Thus, detailed

information about the infrastructure layer are miss-

ing. Within the OpenTOSCA ecosystem, the ﬁfth step

Make Topology Models Deployable can be applied by

using Winery, cf. Section 6. Using the topology mod-

eler of Winery, the generated topology model can be

completed by adding customized infrastructure infor-

mation. Thus, using the Topologize method, cook-

books can be used to provision applications to not

bootstrapped environments. Also, topology models

can be composed to arbitrary complex topology mod-

els without needing to have expertise of the composed

cookbooks. Additionally, Winery serves as TOSCA

Repository for the provisioning engine OpenTOSCA

that enables automated provisioning.

7.6 Evaluation Results

In this section, we showed how the Topologize

method can be applied to Chef. Using our proto-

type and Topologize method, we crawled 3,191 Chef

cookbook ﬁles from the Chef Marketplace on the

February 2017 and derived 2,325 topology mod-

els from the cookbook artifacts. In Figure 6, the size

of each topology model, i.e., the amount of the con-

tained components, is related to the amount of topol-

ogy models derived by the prototype. With a big gap,

the most found topology models are singletons whose

component either is not stating requirements or the

stated requirements are older, not crawled versions or

not processable cookbooks that could not be resolved.

In Figure 7, we address the time that it takes to

analyze a cookbook for which a topology model shall

be constructed and – basing on that knowledge – de-

rive and construct the topology model from the initial

1413

405

246

105

4 4

256

1024

1 2 3 4 5 6 7 8 9 10 11 12 13

Sum of Topology Models

Amount of Components Inside of Each Topology Model

Figure 6: Relation of the amount of Node Templates con-

tained in a topology model to the amount of topologies.

7.3

9.5

11.6

14.8

17.9

20.0

22.2

31.3

33.4

27.5

35.8

47.4

69.4

1 2 3 4 5 6 7 8 9 10 11 12 13

Duration in Seconds

Amount of Components Inside of Each Topology Model

Median

Figure 7: Relation of the amount of Node Templates con-

tained in a topology model to the duration in seconds for

analyzing the artifact and constructing the respective topol-

ogy. The stated durations are result of 10 measurements.

cookbook and all requirements. These values result

from 10 measurements on Ubuntu 16.04 virtual ma-

chines with 8 cores having 3.0GHz and 16 GB RAM.

At ﬁrst glance, a duration of up to 70 seconds for

topology models seems to be vast. But, as our goal is

having an efﬁcient approach for transforming artifacts

into technology-agnostic topologies, the Topologize

method is suitable, because the time-costly construc-

tion of topology models has to be done only once.

Anything to Topology - A Method and System Architecture to Topologize Technology-speciﬁc Application Deployment Artifacts

187

8 RELATED WORK

Cloud Computing enables many beneﬁts, e.g., on-

demand provisioning and resource sharing (Leymann,

2009). Whilst some reduce Cloud Computing to

merely packaging functionality into a virtual machine

or container, in fact, the application usually forms a

complex structure that has to provide distinct prop-

erties (Leymann et al., 2016). Creating a more de-

tailed view of such architectures can be visualized as a

graph of components and their relations, detailed with

information about their functional capabilities and re-

quirements, policies of non-functional requirements,

interfaces, properties for customization, and deﬁni-

tions about included or referenced resources. Such

a graph is called a topology and is deﬁned in the

industry-driven standard Topology and Orchestration

Speciﬁcation for Cloud Applications (OASIS, 2013a;

OASIS, 2013b) and reﬁned in (OASIS, 2015).

TOSCA deﬁnes a topology model that describes

in detail the structure of an application. A topology

consists of components that are related to each other,

e.g., a web application is hosted on an application

server. The components, relations, and other elements

are typed to foster the reusability. Thus, TOSCA en-

ables manifold beneﬁts: the TOSCA Deﬁnitions may

contain a model of components and structures with

clear semantics that enables modeling complex ap-

plications in a visual encoded way (Breitenb

ucher

et al., 2012) using speciﬁc tooling. With Winery,

the whole complexity of TOSCA is accessible to hu-

man modelers in a visual and guided manner (Kopp

et al., 2013). By matching not satisﬁed requirements

of components within the topology with capabilities

of other components, topology models can be com-

pleted (Hirmer et al., 2014). Thus, TOSCA enables

humans to understand and model more easily such

complex applications. Additionally, TOSCA enables

the automation of management of arbitrary complex

applications. For application topologies consisting of

components with well-deﬁned provisioning and man-

agement functionality, the orchestrated provisioning

logic can be generated (Breitenb

ucher et al., 2014;

Eilam et al., 2011). For arbitrary complex and cus-

tomized applications, e.g., prepared workﬂows can be

provided directly with the Service Template.

Contrary, Enterprise Topology Graphs (ETGs) de-

ﬁne a formal model describing the structure of run-

ning enterprise IT to support tasks, e.g., consolida-

tion, migration, or outsourcing by enabling proven

graph algorithms on the model (Binz et al., 2012a;

Binz et al., 2012b). Such tasks are time-consuming

and error-prone if the underlying ETG has to be cre-

ated manually. Therefore, such ETGs can be gener-

ated by an automated discovery that crawls the run-

ning enterprise IT (Binz et al., 2013b). But, ETGs do

not enable modeling and inferring automated provi-

sioning and management of application instances.

The OpenTOSCA ecosystem enables interpreting

TOSCA models and automated provisioning of the

modeled applications (Binz et al., 2013a). Such provi-

sioning can be divided into two opposing approaches:

with imperative provisioning, all steps necessary for

provisioning the application are modeled in full de-

tail, e.g., by using workﬂow models (Breitenb

ucher

et al., 2013; Keller and Badonnel, 2004; Mietzner

et al., 2009). The declarative provisioning enables

modeling solely the application with its characteris-

tics. Both approaches require a runtime that is able

to interpret the models, infer management functional-

ity, and execute it. There are various approaches for

automating the provisioning of applications: Ansible,

Chef, Planit, and Puppet, to just name some (Arshad

et al., 2007; Mohaan and Raithatha, 2014; Taylor and

Vargo, 2014; Uphill, 2014). But, all these approaches

are domain- and technology-speciﬁc, thus, the user

needs speciﬁc expertise and there is a lack of tooling

integrating the heterogeneity of all these technologies.

Although, workﬂow model based orchestration is

a working approach to integrate diverse provisioning

technologies, the technologies have to be prepared

for being orchestrated beforehand (Wettinger et al.,

2014a; Wettinger et al., 2014b). All artifacts have to

be provisionable and manageable, thus, have to ex-

pose their interfaces. Nevertheless the variety of inter-

faces, with the Any2API

approach the functionality

of artifacts can be wrapped with high-level APIs, e.g.,

RESTful web services (Wettinger et al., 2015). Thus,

generating distinct models of all components enables

to populate topology models, as shown in this paper.

Subsequent to the Any2API approach,

TOSCAfy

is a publicly available, open-source

framework that provides two major capabilities: (i)

retrieving and analyzing existing technology-speciﬁc

artifacts, e.g., Chef cookbooks and Docker container

images to extract and normalize their metadata; (ii)

generating portable Cloud Service Archives (CSARs)

comprising the artifacts. By using TOSCAfy, CSARs

are no longer maintained manually as source arti-

facts, but they are generated in a repeatable manner.

TOSCAfy is implemented using JavaScript based on

Node.js. Moreover, it is integrated with Any2API.

But, before applying the aforementioned ap-

proaches, artifacts have to be obtained beforehand.

The basic idea of conventional Web crawling follows

a straightforward process: “(1) select a URL to crawl,

http://www.any2api.org

https://github.com/toscafy

CLOSER 2017 - 7th International Conference on Cloud Computing and Services Science

188

(2) fetch and parse page, (3) save the important con-

tent, (4) extract URLs from page, (5) add URLs to

queue, and (6) repeat” (Matsudaira, 2014). Follow-

ing this, a broad variety of crawling approaches ex-

ist in research and industry. Consequently, imple-

menting a small-scale crawler, e.g., to fetch a dis-

tinct set of documents from the Web and store them

‘as is’ is a mere programming challenge. How-

ever, it is not trivial to implement large-scale crawlers

that repeatedly fetch large sets of documents to se-

mantically inspect and normalize their content, de-

tect updates, and classify them. Therefore, several

research efforts focus on the design of highly scal-

able and distributed crawlers to improve performance

in large-scale crawling scenarios (Boldi et al., 2004;

Da Silva et al., 1999; Shkapenyuk and Suel, 2002;

Heydon and Najork, 1999; Thelwall, 2001; Edwards

et al., 2001). Thus, there are two major categories

of crawlers: (i) general-purpose crawlers that fetch

and inspect any kind of document, e.g., to popu-

late a search engine or analyze data using mining

techniques (Matsudaira, 2014; Thelwall, 2001); (ii)

specialized and focused crawlers that only inspect

distinct documents (Chakrabarti et al., 1999). Fo-

cused crawling is utilized, e.g., to establish a domain-

speciﬁc knowledge base as it is the purpose of this

paper. Therefore, a domain-speciﬁc and specialized

crawling framework is presented. However, up to

now, none of the existing works analyzes application

structure information inside of crawled artifacts of

conﬁguration management technologies. Therefore,

the proposed approach is a novel contribution.

9 CONCLUSION

We presented Topologize that enables (i) crawling

technology-speciﬁc artifacts, (ii) extracting and ab-

stracting contained component information, and (iii)

inferring technology-agnostic topology models that

(iv) are provisionable in an automated manner. These

models are serialized in TOSCA that enables the

modeling and provisioning of complex, technology-

speciﬁc applications whilst keeping a modular, cus-

tomizable, and technology-agnostic topology model.

The topologies are generated by satisfying dependen-

cies of contained components. Beneﬁts of topologies

are (a) a technology-agnostic representation of the

technology-speciﬁc implications by showing a com-

ponent’s transitive requirements, (b) automated main-

taining of such topologies, (c) supporting a user in un-

derstanding and selecting artifacts and topology mod-

els, and (d) enabling customization and combination

of topologies in a technology-agnostic manner.

We validated our Topologize method and archi-

tecture by implementing a prototype and applying

Topologize to Chef. On the 17

February 2017, we

crawled 3,191 Chef cookbooks at the Chef Market-

place and transformed these technology-speciﬁc ar-

tifacts to technology-agnostic components serialized

in TOSCA. We derived 2,325 topologies that com-

prise the transitive dependencies as far as they could

be satisﬁed. These constructed topologies are of size

up to 13 components within one topology model. We

evaluated our prototype regarding the analyzing and

constructing duration of topologies and packaging as

CSAR that took between 7.3 seconds 69.4 seconds

depending on the size of the topology.

In this paper, we focused our case study on Chef

that is a well-known conﬁguration management sys-

tem. In the future, we plan to conduct case studies

with other technologies, e.g., Docker. Also, for speed-

ing up the analysis and the inferring of topology mod-

els, we plan to improve the prototype, e.g., by paral-

lelizing the execution of the depth-ﬁrst search.

ACKNOWLEDGMENTS

This work was partially funded by the BMWi project

SePiA.Pro (01MD16013F).

REFERENCES

Arshad, N., Heimbigner, D., and Wolf, A. L. (2007). De-

ployment and Dynamic Reconﬁguration Planning For

Distributed Software Systems. Software Quality Jour-

nal, 15(3).

Bellavista, P., Corradi, A., Foschini, L., and Pernaﬁni, A.

(2013). Towards an Automated BPEL-based SaaS

Provisioning Support for OpenStack IaaS. Scalable

Computing, 14(4).

Binz, T., Breitenb

ucher, U., Haupt, F., Kopp, O., Leymann,

F., Nowak, A., and Wagner, S. (2013a). OpenTOSCA

- A Runtime for TOSCA-based Cloud Applications.

In Proceedings of the 11

International Conference

on Service-Oriented Computing. Springer.

Binz, T., Breitenb

ucher, U., Kopp, O., and Leymann, F.

(2013b). Automated Discovery and Maintenance of

Enterprise Topology Graphs. In Proceedings of the 6

IEEE International Conference on Service Oriented

Computing & Applications. IEEE.

Binz, T., Fehling, C., Leymann, F., Nowak, A., and

Schumm, D. (2012a). Formalizing the Cloud through

Enterprise Topology Graphs. In Proceedings of 2012

IEEE International Conference on Cloud Computing.

IEEE.

Binz, T., Leymann, F., Nowak, A., and Schumm, D.

(2012b). Improving the Manageability of Enterprise

Anything to Topology - A Method and System Architecture to Topologize Technology-speciﬁc Application Deployment Artifacts

189

Topologies Through Segmentation, Graph Transfor-

mation, and Analysis Strategies. In Proceedings of

2012 Enterprise Distributed Object Computing Con-

ference. IEEE.

Boldi, P., Codenotti, B., Santini, M., and Vigna, S.

(2004). Ubicrawler: A Scalable Fully Distributed Web

Crawler. Software: Practice and Experience, 34(8).

Breitenb

ucher, U., Binz, T., K

epes, K., Kopp, O., Leymann,

F., and Wettinger, J. (2014). Combining Declarative

and Imperative Cloud Application Provisioning based

on TOSCA. In International Conference on Cloud En-

gineering. IEEE.

Breitenb

ucher, U., Binz, T., Kopp, O., Leymann, F., and

Schumm, D. (2012). Vino4TOSCA: A Visual Nota-

tion for Application Topologies based on TOSCA. In

On the Move to Meaningful Internet Systems: OTM

2012. Springer.

Breitenb

ucher, U., Binz, T., Kopp, O., Leymann, F., and

Wettinger, J. (2013). Integrated Cloud Application

Provisioning: Interconnecting Service-Centric and

Script-Centric Management Technologies. In On the

Move to Meaningful Internet Systems: OTM 2013

Conferences. Springer.

Brown, A. B. and Hellerstein, J. L. (2005). Reducing the

cost of IT operations: is automation always the an-

swer? In Proceedings of the 10

Conference on Hot

Topics in Operating Systems. USENIX.

Chakrabarti, S., Van den Berg, M., and Dom, B. (1999).

Focused crawling: A new approach to topic-speciﬁc

web resource discovery. Computer Networks, 31(11).

Da Silva, A. S., Veloso, E. A., Golgher, P. B., Ribeiro-Neto,

B., Laender, A. H., and Ziviani, N. (1999). Cobweb

– a crawler for the brazilian web. In Proceedings of

the String Processing and Information Retrieval Sym-

posium and International Workshop on Groupware.

IEEE.

Edwards, J., McCurley, K., and Tomlin, J. (2001). An adap-

tive model for optimizing performance of an incre-

mental web crawler. In Proceedings of the 10

In-

ternational Conference on World Wide Web. ACM.

Eilam, T., Elder, M., Konstantinou, A., and Snible, E.

(2011). Pattern-based Composite Application Deploy-

ment. In Proceedings of the 12

IFIP/IEEE Inter-

national Symposium on Integrated Network Manage-

ment. IEEE.

Heydon, A. and Najork, M. (1999). Mercator: A Scalable,

Extensible Web Crawler. World Wide Web, 2(4).

Hirmer, P., Breitenb

ucher, U., Binz, T., and Leymann, F.

(2014). Automatic Topology Completion of TOSCA-

based Cloud Applications. In Proceedings des Cloud-

Cycle14 Workshops auf der 44. Jahrestagung der

Gesellschaft f

ur Informatik e.V.

Keller, A. and Badonnel, R. (2004). Automating the Provi-

sioning of Application Services with the BPEL4WS

Workﬂow Language. In Proceedings of the 15

IFIP/IEEE International Workshop on Distributed

Systems: Operations and Management. Springer.

Kopp, O., Binz, T., Breitenb

ucher, U., and Leymann, F.

(2013). Winery – A Modeling Tool for TOSCA-based

Cloud Applications. In Proceedings of the 11

Inter-

national Conference on Service-Oriented Computing.

Springer.

Leymann, F. (2009). Cloud Computing: The Next Revolu-

tion in IT. In Proceedings of the 52

Photogrammet-

ric Week. Wichmann Verlag.

Leymann, F., Fehling, C., Wagner, S., and Wettinger, J.

(2016). Native Cloud Applications: Why Virtual Ma-

chines, Images and Containers Miss the Point! In Pro-

ceedings of the 6

International Conference on Cloud

Computing and Service Science. SciTePress.

Matsudaira, K. (2014). Capturing and structuring data

mined from the web. Communications of the ACM,

57(3).

Mietzner, R., Unger, T., and Leymann, F. (2009). Cafe: A

Generic Conﬁgurable Customizable Composite Cloud

Application Framework. In On the Move to Meaning-

ful Internet Systems: OTM 2009. Springer.

Mohaan, M. and Raithatha, R. (2014). Learning Ansible.

Packt Publishing.

OASIS (2013a). Topology and Orchestration Speciﬁcation

for Cloud Applications (TOSCA) Primer Version 1.0.

Organization for the Advancement of Structured In-

formation Standards (OASIS).

OASIS (2013b). Topology and Orchestration Speciﬁcation

for Cloud Applications (TOSCA) Version 1.0. Organi-

zation for the Advancement of Structured Information

Standards (OASIS).

OASIS (2015). TOSCA Simple Proﬁle in YAML Version

1.0. Organization for the Advancement of Structured

Information Standards (OASIS).

Shkapenyuk, V. and Suel, T. (2002). Design and Imple-

mentation of a High-Performance Distributed Web

Crawler. In Proceedings of the 18

International

Conference on Data Engineering. IEEE.

Tarjan, R. (1972). Depth-ﬁrst search and linear graph algo-

rithms. SIAM journal on computing, 1(2).

Taylor, M. and Vargo, S. (2014). Learning Chef: A

Guide to Conﬁguration Management and Automation.

O’Reilly.

Thelwall, M. (2001). A Web Crawler Design for Data Min-

ing. Journal of Information Science, 27(5).

Uphill, T. (2014). Mastering Puppet. Packt Publishing.

Wettinger, J., Binz, T., Breitenb

ucher, U., Kopp, O., Ley-

mann, F., and Zimmermann, M. (2014a). Uniﬁed

Invocation of Scripts and Services for Provisioning,

Deployment, and Management of Cloud Applications

Based on TOSCA. In Proceedings of the 4

Interna-

tional Conference on Cloud Computing and Services

Science. SciTePress.

Wettinger, J., Breitenb

ucher, U., and Leymann, F. (2014b).

Standards–based DevOps Automation and Integration

Using TOSCA. In Proceedings of the 7

Interna-

tional Conference on Utility and Cloud Computing.

IEEE.

Wettinger, J., Breitenb

ucher, U., and Leymann, F. (2015).

Any2API - Automated APIﬁcation. In Proceedings of

the 5

International Conference on Cloud Computing

and Service Science. SciTePress.

CLOSER 2017 - 7th International Conference on Cloud Computing and Services Science

190