2 IBM CMS CLOUD
Enterprise-class customers, such as banks, insurances
or airlines typically require IT management services
such as monitoring, patching, backup, change control,
high availability and disaster recovery to support
systems running complex applications with stringent
IT process control and quality-of-service
requirements. Such features are typically offered by
IT service providers in strategic outsourcing (SO)
engagements, a business model for which the provider
takes over several, or all aspects of management of a
customer’s data center resources, software assets, and
processes. Servers with such support are characterized
as being managed.
This should be contrasted with unmanaged servers
provisioned using basic Amazon Web Services
(AWS) and IBM’s SoftLayer offerings, where the
cloud provider offers automated server provisioning.
In order to make the server managed, these cloud
providers have networked with service partners that
customers can engage to fill all of the gaps up and
down the stack. This enables the user to add services
to the provisioned server, but the cloud provider
assumes no responsibility for their upkeep or the
additional services. Therefore, it puts burden on the
customer to obtain a fully managed solution for their
enterprise workload rather than the cloud service
providing an end-to-end fully managed solution for
the customer.
The IBM’s CMS is among a small set of industry
cloud offerings that support managed virtual and
physical servers. It is an enterprise cloud, which
provides a large number of managed services that are
on par with the ones offered in high end SO contracts.
Examples of such services are patching, monitoring,
asset management, change and configuration
management, quality assurance, compliance, health-
checking, anti-virus, load-balancing, security,
firewall, resiliency, disaster recovery, and backup.
The current product offers a set of managed services
preloaded on users’ servers in the cloud. The
installation, configuration, and run-time management
of these services are automated.
3 POSITION: MISSION CRITICAL
WORKLOADS REQUIRE
ENTERPRISE DATA CENTER
RESILIENCE
The main attributes of cloud computing are scalable,
shared, on-demand computing resources delivered
over the network, and pay-per-use pricing. Typically,
one thinks of cloud as on-demand environments
which are created and destroyed as needed. This
offers flexibility in using as few or as many IT
resources as needed at any point in time. Thus, the
users do not need to predict resources that they might
need in future, which makes cloud infrastructure
attractive for businesses.
Cloud native applications take advantage of the
cloud’s elasticity, and are written in a way to run the
application on multiple nodes. The nodes are
stateless, and as such tolerate loss of any single node
without bringing down the entire application.
On the contrary, enterprise customers require
computing infrastructure which is set up infrequently,
but is available over a much longer time frame. For
example, a database is expected to run continuously,
and not to lose any data in the case of infrastructure
failure. No response from a database even over a short
period of time can result in large business losses for
an enterprise.
High availability is an important requirement for
running enterprise-level applications. Features like
standardized infrastructure, virtualization, and
modularity capabilities of cloud computing offer an
opportunity to provide highly resilient and highly
available systems. Resiliency techniques can be
deployed on a well-defined framework for providing
recovery measures for replicating unresponsive
services, and recovering the failed services.
To achieve application resiliency, high
availability clusters are used. Implementing HA
clusters requires features such as anti-collocation of
VMs – locating VMs on different physical hosts, a
requirement which is difficult to guarantee in a cloud
environment. For example, VMs are created on
physical servers based on hypervisors utilization to
achieve balanced and optimally utilized compute
environment. Additionally, VMs could migrate
between hypervisors for either load balancing or
maintenance.
The location of physical servers hosting VMs
determines the network latency between the nodes.
The latency between the nodes depends on the
location of physical servers in a data center – for
example, whether the nodes are located in the same
row – or on the current network traffic in a data center.
For example, ongoing data backup traffic can impact
network latency when accessing a DB. Additionally,
multiple VMs might need to access the same DB data,
and require implementation of a shared storage, a
feature which is not typically part of a cloud offering.
These cloud properties make implementing resiliency
features for enterprise workloads more complicated.