MASTERING INTELLIGENT CLOUDS
Engineering Intelligent Data Processing Services in the Cloud
Sergiy Nikitin, Vagan Terziyan and Michal Nagy
Industrial Ontologies Group, University of Jyväskylä, Mattilanniemi 1, Jyväskylä, Finland
Keywords: Agent Technology, Cloud Computing, Semantic Web, Cloud Services, Ubiware.
Abstract: Current Cloud Computing stack mainly targets three architectural layers: Infrastructure, Platform and
Software. These can be considered as services for the respective layers above. The infrastructure layer is
provided as a service for the platform layer and the platform layer is, in turn, a service for the Software
layer. Agent platforms fit the “Platform as a service” layer within this stack. At the same time, innovative
agent-oriented approaches to programming, open new possibilities for software design in the cloud. We
introduce main characteristics of our pilot agent platform called UBIWARE and offer flexible servicing
architecture within the cloud platform, where various components and systems can configure, run and reuse
intelligent cloud services to provide higher degree of flexibility and interoperability for their applications.
1 INTRODUCTION
Fast development of network technologies has
recently brought back to life business models with
the “thin client” architecture. Powerful data centers
connected to the internet via broadband networks
can minimize IT-infrastructure of any company to a
set of simple terminals with less demanding system
requirements. All the software and data can reside
on the data center side, making user access easy and
location independent. The providers offer different
payment schemes as “pay-per-use” or subscription-
based, that seems to be advantageous compared to
standard IT-infrastructure expenses. The approach
has got a set of new features and a new branding
name: “Cloud Computing” (Hayes, 2008; Foster,
2008).
Cloud management platforms provide API for
management either through command line or a
remote method calls. The API, however, is used
mainly by system administrators, who take care of
proper functioning of services within the cloud.
Management of the cloud platform is considered as
something that a system administrator should
arrange and do. Usually administrators use batch
files for managing routine tasks and resolving
exceptional situations.
At the same time, more and more software
architecturing paradigms call for new approaches to
software design and development, where software
components get a certain degree of self-awareness,
when a component can sense its own state and act
based on the state changes. The vision of Autonomic
Computing (Kephart, 2003) proposes to handle the
complexity of information systems by introducing
self-manageable components, able to “run
themselves.” The authors state, that self-aware
components would decrease the overall complexity
of large systems. The development of those may
become a “nightmare of ubiquitous computing” due
to a drastic growth of data volumes in information
systems as well as heterogeneity of ubiquitous
components, standards, data formats, etc. The Cloud
Computing and Autonomic Computing paradigms
will become complementary parts of global-scale
information systems in the nearest future. Such a
fusion sets the highest demands to the software
architects because cloud platforms will have to
provide self-management infrastructure for a variety
of complex systems residing in the same cloud,
separated virtually, but run physically on the same
hardware. At the same time, the cloud platform itself
may possess features of self-aware complex system.
A variety of self-aware components of different
nature (i.e. end-user oriented, infrastructure-
oriented, etc.) will need a common mechanism for
interoperability, as far as they may provide services
to each other.
The vision of GUN – Global Understanding
Environment (Terziyan, 2003, 2005; Kaykova et al.,
174
Nikitin S., Terziyan V. and Nagy M. (2010).
MASTERING INTELLIGENT CLOUDS - Engineering Intelligent Data Processing Services in the Cloud.
In Proceedings of the 7th International Conference on Informatics in Control, Automation and Robotics, pages 174-181
DOI: 10.5220/0002950401740181
Copyright
c
SciTePress
2005) has introduced a concept of “Smart Resource”
and a notion of an environment where all resources
can communicate and interact regardless of their
nature. In GUN various resources can be linked to
the Semantic Web-based environment via adapters
(or interfaces), which include (if necessary) sensors
with digital output, data structuring (e.g. XML) and
semantic adapter components (e.g. XML to
Semantic Web). Software agents are assigned to
each resource and are assumed to be able to monitor
data coming from the adapter about the state of the
resource, make decisions on behalf of the resource,
and to discover, request and utilize external help if
needed. Agent technologies within GUN allow
mobility of service components between various
platforms, decentralized service discovery,
utilization of FIPA communication protocols, and
multi-agent integration/composition of services.
When applying the GUN vision, each traditional
system component becomes an agent-driven “smart
resource”, i.e. proactive and self-managing. This can
also be recursive. For example, an interface of a
system component can become a smart resource
itself, i.e. it can have its own responsible agent,
semantically adapted sensors and actuators, history,
commitments with other resources, and self-
monitoring, self-diagnostics and self-maintenance
activities.
In this paper we introduce a flexible servicing
architecture within the cloud platform, where
various components and systems can configure, run
and reuse intelligent cloud services to provide higher
degree of flexibility and interoperability for their
applications. We use our pilot agent platform
developed in accordance with GUN vision called
UBIWARE to show how cloud computing can
expand platform functionality and at the same time
how an agent platform can become a high-level
service provisioning instrument in the cloud.
The paper is organized as follows: In the next
Section we discuss architectures of state-of-the-art
cloud computing platforms and explore the
possibilities for the interoperability mechanisms.
Section 3 presents the architecture of the semantic
middleware agent platform and explores possible
options of connectivity with the cloud. Section 4
describes the scenarios and the architecture of the
agent-driven intelligent servicing platform for a
cloud. In Section 5 we discuss related work and
conclude in Section 6.
2 STATE OF THE ART IN CLOUD
INTELLIGENCE PLATFORMS
Architecture of current Cloud Computing stack
mainly targets three layers: Infrastructure, Platform
and Software. These layers can be considered as
services for the respective layers above. The
infrastructure as a service (IaaS) is provided to the
platform layer and the platform becomes a service
(PaaS) for the software layer. And finally, the
software as a service layer (SaaS) brings the topmost
end-user web services to clients (see Figure 1).
Figure 1: Cloud computing stack.
Cloud providers market niche is already a
competitive field. Several big players are currently
active in the market, e.g. SalesForce.com (SFDC),
NetSuite, Oracle, IBM, Microsoft, Amazon EC2,
Google Apps, etc. For a comprehensive survey of
cloud computing systems see (Rimal et al., 2009).
The services of the platform layer are in the scope of
this work. In the next section we present a
middleware platform and later introduce a new
servicing approach in the cloud stack.
3 UBIWARE PLATFORM
UBIWARE has two main elements: an agent engine,
MASTERING INTELLIGENT CLOUDS - Engineering Intelligent Data Processing Services in the Cloud
175
and S-APL – a Semantic Agent Programming
Language (Katasonov and Terziyan, 2008) for
programming of software agents within the platform.
The architecture of UBIWARE agent (Figure 2)
consists of a Live behavior engine implemented in
Java, a declarative middle layer, and a set of Java
components – Reusable Atomic Behaviors (RABs).
Figure 2: UBIWARE Agent.
RABs can be considered as sensors and
actuators, i.e. components sensing or affecting the
agent’s environment, but are not restricted to these.
A RAB can also be a reasoner (data processor) if
some of the logic needed is not efficient or possible
to realize with the S-APL means, or if one wants to
enable an agent to do some other kind of reasoning
beyond the rule-based one. UBIWARE agent
architecture implies that a particular UBIWARE-
based software application will consist of a set of S-
APL documents (data and behavior models) and a
set of specific atomic behaviors needed for this
particular application. Since reusability is an
important UBIWARE concern, it is reasonable that
the UBIWARE platform provides some of those
ready-made.
Therefore, logically the UBIWARE platform,
consists of the following three elements:
- The Live behavior engine
- A set of “standard” S-APL models
- A set of “standard” RABs
The extensions to the platform are exactly some
sets of such “standard” S-APL models and RABs
that can be used by the developers to embed into
their applications certain UBIWARE features.
As Figure 2 shows, an S-APL agent can obtain
the needed data and rules not only from local or
network documents, but also through querying S-
APL repositories. Such a repository, for example,
can be maintained by some organization and include
prescriptions (lists of duties) corresponding to the
organizational roles that the agents are supposed to
play.
Technically, the implementation is built on top of
the JADE – Java Agent Development Framework
(Bellifemine et al. 2007), which is a Java
implementation of IEEE FIPA specifications.
4 MASTERING INTELLIGENT
CLOUD PLATFORM
Cloud computing providers offer various stack
configurations with different sets of software and
services inside. Theoretically, one can buy any
configuration from the cloud provider; however this
configuration will have nothing to do with the
already running business logic of the customer. The
application scenarios a customer wants to run will
have to be adjusted. For example, consider a case,
when a customer buys a virtual server with the
MySQL database installed and a Java solution stack
available. On top of this stack customer runs a
workflow engine, e.g. BPEL-based. The user will
have to install the engine, and then adjust local data
storage settings (passwords, tables, queries). Then
the process descriptions (BPEL files) should be
adjusted to work with local settings. In some cases
this process may be avoided if the cloud stack is
identical to the customer’s environment, and if the
all code was developed as portable. But what if the
cloud stack slightly differs, but the prices are very
attractive? Then customers may need to spend
resources for solution code porting.
We propose architecture for a generic stack
extension that allows users and platform providers
to:
- Smoothly integrate with the infrastructure
- Build stack-independent solutions
- Automate reconfiguration of the solutions
The architecture is based on the UBIWARE
platform architecture and extends cloud platform
services with the standardized configurable
intelligent models.
4.1 Agent-driven Servicing in the
Cloud
Interoperability is stated as one of the challenges of
the cloud computing paradigm. We believe that
ICINCO 2010 - 7th International Conference on Informatics in Control, Automation and Robotics
176
adoption of the existing interoperability tools and
solutions will become one of the major cloud
computing research directions. Dummy platform
API extension will just put the interoperability
problem from the cloud provider to the client side.
At the same time, the competitiveness of the cloud
providers may depend on the simplicity of the
integration with the client solutions and systems.
Therefore, we foresee that embedded services
offered by the cloud providers should be flexible and
smart enough to handle client-specific model
adjustments and configurations. We expand the
understanding of the platform service to the smart
proactive agent driven service. Such a service should
not only be flexible and configurable in accordance
with the customers’ needs, it should also be prepared
to resolve data- and API-level interoperability issues
while being integrated with the client software.
Figure 3 shows the placement of the agent-driven
extension in the cloud computing stack. From the
user perspective the extension is still a service API
but it offers an advanced functionality.
Figure 3: Placement of the agent-driven API.
The API shown above is a standalone
middleware platform running either as a cloud
facility, or embedded into the virtual machine
instances as a platform extension. The detailed API
content is shown of Figure 4.
Figure 4: Agent-driven flexible platform service extension
API.
The agent-driven adapters are software entities
that facilitate data sources management. Adapters
provide advanced data source connectivity functions
(e.g. simplified database connectivity, file formats
parsing, sensor data acquisition, etc.). Next, adapters
handle the connectivity problem by providing the
components for data transformation with
configurable mapping functionality. The adapter
becomes a proactive entity, i.e. it observes its state
and takes actions based on the state and environment
changes. The actions of the adapter may vary from
simple fault messaging up to self-reconfiguration
when an exceptional or fault situation occurs.
The services’ API allows the user to run
declarative models as services. The API provides a
“model player API” for a particular domain-specific
model definition language (the example of the API
as well as the language will be discussed in the next
Section). The model of the service being played is at
the same time controlled by the dedicated agent that
takes care of proper model functioning (e.g. load
balancing and failures in the operation). Service
agent may temporarily relocate the service
executable code to another virtual machine instance
to improve the performance in critical cases, thus the
service becomes remote for its own original virtual
machine instance. We also consider service API that
has a local representative agent on each virtual
machine, but the service execution is handled by the
cloud provider (see Figure 5).
MASTERING INTELLIGENT CLOUDS - Engineering Intelligent Data Processing Services in the Cloud
177
Figure 5: Service execution in cloud infrastructure.
In the Figure above the PCA stands for the
Personal Customer Agent and PMA is a Platform
Management Agent. The PCA may request the PMA
agent to host a service execution (time period is a
subject of contracting details) on a separate virtual
machine to obtain e.g. higher performance, or for
any other reason. At the same time the local API
within the user’s virtual machine and/or platform
will stay the same. The PCA agent will wrap and
forward local API calls to the PMA agent. The
difference of the architecture proposed from the
standard remote method invocation is a control
channel between agents that allows the service
management layer to stay separated from the service
consumption (service calls).
In the next Section we discuss how the web
services from the data mining domain can be
integrated into the infrastructure described above.
Data mining services can be embedded as platform
services into the cloud stack for particular domain-
specific cloud configurations at the same time
preserving features of configurability, mobility and
self-awareness.
4.2 Mastering Data Mining Services
into the Cloud
To model the data mining services we have to define
a corresponding data mining domain ontology. The
ontology will cover data mining methods as well as
requirements for method inputs and respective
outputs. The inputs and outputs should, in turn, refer
to the data types. The data mining domain can not
include all possible applications of its methods;
therefore we should keep the granularity of the
conceptualization and distinguish the data mining
models and their application scenarios. For the
purpose of this scenario we take two data mining
techniques: cluster analysis and k-NN method
(classification).
The efforts towards standardization of data
mining techniques, methods and formats have been a
matter of discussion for the last ten years. One of the
notable efforts is PMML language (PMML, 2009;
Guazzelli et al., 2009). The language is a standard
for XML documents which express instances of
analytical models. In our work we take PMML as a
reference model for the Data Mining Ontology and
enhance both the model as well as the data with the
semantic descriptions required to automate data
mining methods application to the domain data. In
this work we do not take into account the stage of
information collection, preparation, etc. We assume
that data is ready for data mining algorithm
application. The PMML structure for model
definition is composed of a set of elements that
describe input data; model and outputs (Figure 6).
Figure 6: PMML model structure.
The PMML specification ver. 4.0 provides
means for exhaustive model description, thus the
model can be fully exported or imported without
losses. Such model transportability gives huge
opportunities for service orientation of the data
mining methods. We also expect the PMML models
reuse in the cloud computing domain in the nearest
future. The specification of a software-independent
descriptive data mining standard implies that
Infrastructure and Platform layers of Cloud
Computing stack are fully transparent for the data
mining methods, i.e. the functional characteristics of
the method-based services will be same for any
stack configuration. The QoS, however, may vary
ICINCO 2010 - 7th International Conference on Informatics in Control, Automation and Robotics
178
depending on the performance of the hardware and
efficiency of platform software, therefore the
additional control channel over the service
configuration may be needed.
We have identified three main types of data
mining services regardless of their application
domain and have introduced a classification of them
(Figure 7).
Figure 7: Upper ontology of data mining services.
We consider two major categories of data mining
services:
model construction services
computational services
The model construction services produce a
model (a semantic description) from the set of
learning samples. In other words, input of such a
service is learning data and conditions (for the
neural network depending on its mode it can be a set
of vectors plus e.g. initial network parameters). The
output of the model construction service is a model
with the parameters assigned (e.g. a neural network
model, see Figure 8).
Figure 8: Model construction service.
The group of computational services can also be
divided into two major groups:
services with fixed model
model player services
The services with fixed model define the format
of the input and output as well as provide reference
model description and the parameters that determine
how the model is configured. For example, Figure 9
shows the definition of the neural network-based
alarm classifier service for a paper machine.
Figure 9: Neural network model in a classification service.
Usage of model player services has two stages:
in the first stage the service accepts the service
model as an input, and in the second stage, it can
serve as a fixed model service (see Figure 10).
Figure 10: Model player service.
The true power of data mining services can be
demonstrated in combination with the distributed
querying (i.e. collecting learning or classification
data), data mining model construction and further
classification. The generic use case of such
combination is shown on Figure 11.
The automated data collection process (first step
in the use case) uses the Ontonuts technology and
approach researched in the (Nikitin et al., 2009). The
approach allows dynamic distributed query planning
and execution, which we apply in this work to
collect learning set data. The sources of the data may
vary from databases, to csv-files and reside
physically on different platforms and sites. The data
collection and, hence, the querying implemented as a
sequence of semantic data service calls orchestrated
by a workflow management agent. Service
orientation of data sources makes distributed
querying a homogeneous part of other service-based
workflow scenarios. The data collected (usually in
form of a table of query results), may undergo
preparatory steps to become a learning dataset. In
this work we omit the procedure of normalization, or
MASTERING INTELLIGENT CLOUDS - Engineering Intelligent Data Processing Services in the Cloud
179
other data transformations, however, they will be
necessary and obligatory. We assume that all
operations with the data are also wrapped as
semantic services.
As soon as the learning set is ready, a desired
model constructor should be chosen (step 2). The
model constructor may require specific data
preparation, therefore it is good to combine data
preparation step with the model constructor service.
Figure 11: A use case scenario.
As an input, model constructor may require
additional input parameters for model building.
Those may be set as default, or, if other parameters
were prepared, they should be supplied in the proper
form. When a model is ready, we feed it to the
model player service which is a platform service in
terms of cloud computing, because it provides an
infrastructure and software platform for service
execution. As soon as our newly built model is
deployed as a service, we can start classifying the
data vectors, e.g. alarms coming from the paper
machine.
The scenario described above may be
dynamically reconfigured by the infrastructure
agents. Some steps of the case (e.g. learning) may be
temporarily moved to the separate execution
environment (separate platform or virtual machine)
to perform computationally expensive tasks.
The overall infrastructure of services should be
highly proactive and responsive to the customer
calls. Agents may monitor the execution and be
ready to reconfigure their services in accordance
with the current customer context.
5 RELATED WORK
The cloud platform solutions for business
intelligence are gaining popularity. For example,
SalesForce.com claims about 2 million success
stories of its customers. The platform provides wide
range of products (from infrastructure as a service,
up to tailored web-based solutions for health care,
retail and sales). The business intelligence tools are
offered too. Nevertheless, the user has to adjust or
prepare her/his software and data for the tools
provided by the cloud. The advantage of the
approach we offer is to empower any cloud platform
with the intelligent adaptation mechanisms that
would allow seamless data connectivity and
integration. The architecture we offer is an extension
to the cloud platform, not the platform itself. The
data mining services with highly configurable
parameters driven by the intelligent agents would
simplify business intelligence integration, hence
making adoption of cloud architecture easier for
clients.
6 CONCLUSIONS
The research presented above describes specific
application domain of intelligent services. We
foresee that model player services will be a
successful business case for the emerging paradigm
of cloud computing. Pay-per-use principles
combined with high computational capacities of
cloud and standardized DM-models will be
definitely an alternative to expensive business
intelligence and statistics toolkits.
Another niche of data mining services in cloud
computing can be model construction services. Such
systems will drive innovations in data mining
methods as well as applied data mining in certain
domains. Such services will compete by introducing
know-how and innovative tools and algorithms that
bring add-values in e.g. predictive diagnostics or
computational error estimation. This direction will
lead to so-called “web intelligence” (Cercone et al.).
The role of UBIWARE platform in cloud
computing emerges as a cross-cutting management
and configuration glue for interoperability of future
intelligent cloud services and client applications.
The main burden of UBIWARE will be
management of consistency across different domain
conceptualizations (Ontologies) and cross-domain
middleware components. Fine-grained ontology
modeling is still a challenge for research community
and we predict that in the nearest future the domain
ICINCO 2010 - 7th International Conference on Informatics in Control, Automation and Robotics
180
modeling will be task-driven, i.e. the domain model
engineers may incorporate some standardized and
accepted conceptualizations, whereas the whole
ontology for solution will be tailor made. Tailored
ontologies will require subsequent mapping
mechanisms and additional efforts.
The advanced data integration mechanisms
embedded into the cloud platform as services is also
an interesting concept that may become an add value
for competing cloud platforms. The easiness of
integration into the cloud infrastructure should not
be underestimated especially by enterprise-sized
companies, where business processes and
component interdependencies have reached an
unprecedented level of complexity. We believe that
autonomy and self-awareness of building blocks will
be a key to the future design of information systems
and cloud platforms.
ACKNOWLEDGEMENTS
This research has been supported by the UBIWARE
project, funded by TEKES, and the industrial
consortium of Metso Automation, Fingrid and Inno-
W. The preparation of this paper was partially
funded by the COMAS graduate school.
REFERENCES
Bellifemine, F. L., Caire, G., and Greenwood, D., 2007.
Developing Multi-Agent Systems with JADE. Wiley.
Cercone, N.; Lijun Hou; Keselj, V.; Aijun An;
Naruedomkul, K.; Xiaohua Hu, 2002. "From
computational intelligence to Web intelligence,"
Computer , vol.35, no.11, pp. 72-76, Nov 2002.
Foster, I.; Yong Zhao; Raicu, I.; Lu, S., 2008 "Cloud
Computing and Grid Computing 360-Degree
Compared," Grid Computing Environments Workshop
GCE '08, vol., no., pp.1-10, 12-16 Nov. 2008.
Guazzelli, A., Zeller, M., Lin, W. and Williams, G., 2009.
PMML: An Open Standard for Sharing Models. The R
Journal, Volume 1/1, May 2009.
Hayes, B. 2008. Cloud computing. Commun. ACM 51, 7
(Jul. 2008), 9-11. DOI= http://doi.acm.org/10.1145/
1364782.1364786
Kaykova, O., Khriyenko, O., Kovtun, D., Naumenko, A.,
Terziyan, V., and Zharko, A., 2005. General Adaption
Framework: Enabling Interoperability for Industrial
Web Resources, In: International Journal on Semantic
Web and Information Systems, Idea Group, Vol. 1,
No. 3, pp.31-63.
Kephart, J. O. and Chess, D. M., 2003. The vision of
autonomic computing., IEEE Computer, 36(1):41–50.
Nikitin S., Katasonov A., Terziyan V., 2009. Ontonuts:
Reusable Semantic Components for Multi-Agent
Systems, In: Proceedings of the Fifth International
Conference on Autonomic and Autonomous Systems
(ICAS 2009), April 21-25, 2009, Valencia, Spain,
IEEE CS Press, pp 200-207.
PMML, 2009. Data Mining Group. PMML version 4.0.
WWW, URL http://www.dmg.org/pmml-v4-0.html
Rimal, B-P; Choi, E; Lumb, I, 2009, "A Taxonomy and
Survey of Cloud Computing Systems," INC, IMS and
IDC, 2009. NCM '09. Fifth International Joint
Conference on , pp.44-51, 25-27 Aug. 2009.
Terziyan, V., 2003. Semantic Web Services for Smart
Devices in a “Global Understanding Environment”, In:
R. Meersman and Z. Tari (eds.), On the Move to
Meaningful Internet Systems 2003: OTM 2003
Workshops, Lecture Notes in Computer Science, Vol.
2889, Springer-Verlag, pp.279-291.
Terziyan, V., 2005. Semantic Web Services for Smart
Devices Based on Mobile Agents, In: International
Journal of Intelligent Information Technologies, Vol.
1, No. 2, Idea Group, pp. 43-55.
MASTERING INTELLIGENT CLOUDS - Engineering Intelligent Data Processing Services in the Cloud
181