data flow WhePeT {
source TwitterStream
task TwitterListener, Filter, ExtractCoord,
HeatMap
flow TwitterStream => TwitterListener
flow TwitterListener => FilterGeoTagged
flow FilterGeoTagged => ExtractCoord
flow ExtractCoord => HeatMap
group Storm : (platform="Storm")(initsize="1")
{TwitterListener FilterGeoTagged ExtractCoord}
}
Figure 3: Initial DAFLOW model of WHEPET.
Figure 3 is an excerpt of the DAFLOW model
that captures the initial design of WHEPET (See
Section 2). In this round, we, playing the role of
data engineers, first define the sources and tasks for
analysing twitter data, and the flows between them,
using the model elements marked with corresponding
keywords. The textual model corresponds to an ear-
lier version of the graphical data flow diagram shown
in the top square of Figure 2. After defining the data
flow, we record our early technical decisions by anno-
tations, e.g., the group of three processes that handle
tweets will be hosted by the Storm platform.
High-level refinement and evolutions on the data
flow level are also performed on the DAFLOW mo-
del, such as replacing the mismatched flow with two
tasks related to the message queue and the WebSocket
wrapper, and adding new tasks to convert coordinates
to countries and count the appearance of each coun-
try. After these iterations, the final data flow evolves
into the one as illustrated in the top of Figure 2. We
omit the concrete textual model.
3.3 Platform-specific Modelling
The data flow model will be transformed into a de-
ployment model as shown in the middle part of Fig 2.
Data engineers can tune the deployment model con-
cerning platform-specific parameters and configurati-
ons, and the infrastructures to host the platform.
Figure 2 illustrates the main concepts of DADE-
PLOY model. The core concept is component. A com-
ponent can be a running service operated by a third
party (such as the Twitter Streaming API, or an AWS-
EC2 virtual machine), or a software artefact hosted
by a service (such as a Kafka message broker). Such
hosting relationship is represented by dashed arrows.
A component may also exposes provided or required
ports. A pair of matched ports can be connected by a
dependency relationship, which means that the com-
ponent with the required port “knows” how to access
the component with the provided port, and therefore
the former can invoke the latter to pull or push data.
Finally, a composite component contains other com-
ponents. The model depicted in Figure 2 includes one
composite component representing a Storm topology,
which consumes data from the Twitter Stream API,
and is hosted by a Storm platform. The platform it-
self is in turn composed by 4 different Storm nodes. It
is worth noting that in this case the hosting relations-
hip is between two composite components, which me-
ans that the developers do not need to care about how
the components within a storm topology is distributed
into the different storm nodes - this is automatically
handled by the storm platform. The last component
inside the Storm topology will publish the extracted
coordinates to Kafka. In the same time, the WebSoc-
ket wrapper subscribes to the same topic and sends
the wrapped coordinates via WebSocket messages to
the heatmap.
DADEPLOY provides a formal concrete syntax
in a textual format. Figure 4 shows a sample mo-
del which defines two of the components depicted in
Figure 2, the heatmap and the WebSocket wrapper.
The example involves a key concept in DADEPLOY,
i.e., prototype, borrowed from the JavaScript object-
oriented language, which also facilitates reusability
(R
3
) and abstraction (R
1
). A component can be de-
rived from another component as its prototype. The
new component inherits all the features (i.e., attribu-
tes and ports) from its prototype, as well as the values
already bound to these features. Inside the definition
of the new component, we add new features or over-
ride values of features defined by the prototype. For
example, in Line 1 of Figure 4, we first defined a com-
ponent to implement a HTTP server that can host one
simple HTML file. The component is inherited from
dockercomp, a predefined component for any Doc-
ker images. Inside the one-page-httpd, we set the
actual image (the official python image) and the com-
mand associated to this image to download an HTML
file and start a built-in python http server to host it.
Finally, the configuration part defines the com-
ponent assembly of the application. It contains a
heatmap component inherited from one-page-http,
with a specific port 80 and a concrete page, and anot-
her component for the WebSocket wrapper. The two
components are connected by a link between the re-
quired and provided ports from the two components,
respectively. The components will be connected au-
tomatically during deployment: According to the link
in Line 19, the tool will check where ws is deployed,
in order to set the address and port values inside the
required port heatmap.wsport. These values will be
assigned to an environment variable ws inside the doc-
ker container (Line 14), for the http page to access the
MODELSWARD 2017 - 5th International Conference on Model-Driven Engineering and Software Development
526