rule-based classifiers, which rely on disjunctive
normal form (DNF) formulas to check on the
occurrence of certain terms within the text. In text
categorization, a term can be a simple word or a
more complex regular expression describing entire
sentences. The second approach in classifier
construction is machine learning, which consists of
computational methods that use past experience to
make accurate predictions (Mohri et al., 2012).
Machine-learning-based classifiers have the
decisive advantage over rule-based classifiers in that
they can learn from examples. The construction of
effective rule-based classifiers requires experts on
the construction of such classifiers, as well as
domain experts. On the other hand, machine-
learning-based classifiers require a set of pre-
classified documents as examples from which to
learn. These documents are commonly referred to as
initial corpus Ω. This initial corpus is usually
divided in a training set Tr and an evaluation set Es.
The first is used to actually train the machine-
learning-based classifier, while the latter is used to
examine its effectiveness. While efficiency describes
how quickly and resource efficient a machine
learning algorithms categorizes documents,
effectiveness is a measure of quality on how much
Φ’ and Φ coincide.
According to (Mell and Grance, 2011), cloud
computing is defined by a set of essential
characteristics. These are on-demand self-service,
broad network access, resource pooling, rapid
elasticity and measured service.
(Mell and Grance, 2011) also represent cloud
computing as a layer model. The top layer, Software
as a Service (SaaS), is essentially software that can
be accessed by end users or other software services
through web portals or APIs.
The intermediate layer, Platform as a Service
(PaaS), is a system that provides software
developers with an environment to implement and
deploy their applications. It consists of a
programming-language-level environment with a set
of well-defined APIs.
The bottom layer, Infrastructure as a Service,
(IaaS), provides computational resources in the form
of virtual machines or data storage space.
(Weinhardt et al., 2009) compared cloud
computing with grid computing and stated that cloud
computing is systematically coupled with a business
model, thus making a pricing model a prerequisite
for a service to be considered as cloud computing.
Based on (Weinhardt et al., 2009) and (Creeger,
2009), the usage of cloud computing in all layers has
the following advantages:
Flexibility. Cloud computing services are already
there and can be utilized when needed. Less
capacity planning is required.
Shift from capital expenditure (CapEx) to
operational expenditure (OpEx) in a way that the
services are paid for as they are used. There is no
need for high up-front investments to buy and set
up the necessary server environment. If the
service is only utilized a little, then costs are
saved.
Easy integration of different services over
internet APIs.
Enforceable service level agreements (SLAs)
with cloud providers reduce operational risks.
The following description of the state of the art in
cloud-based text categorization assesses three SaaS
and one PaaS offerings.
2.1 SaaS and PaaS Classifiers
Accessible at https://www.meaningcloud.com, this
classifier provides a combination of statistical
document classification with rule-based filtering.
Statistical analysis is used to define categories based
on example documents. One can also create manual
fixed rules for fine-tuning. As categories are defined
by providing example documents, this SaaS can be
regarded as a hybrid machine learning/rule-based
classifier. The utilized algorithms and system
architectures are black-boxed and, therefore,
unknown to the cloud user. The API is accessed by
HTTPS POST requests. A user can either upload the
text that needs to be categorized in the HTTPS
POST packet (in that case being limited to 8192
characters) or provide a URL from where
meaningcloud can load the text. As HTTPS is used,
this communication is SSL protected. Texts are
never transmitted unencrypted and are not stored
within the meaningcloud service. Replies are either
XML- or JSON-encoded lists of categories for the
provided text. Meaningcloud contains a set of pre-
trained out-of-the-box classifiers for certain sets of
topics. One can also use an own initial corpus Ω to
create a custom classifier with user-defined
categories. When enrolling in the meaningcloud
service, a user is provided with an API user key.
This key must be stated in every HTTPS POST
request. (Meaningcloud documentation, 2015) A
user then consummates a subscription available in
differently sized packages. Subscription packages
limit categorizations per month and categorizations
per second. Standard packages range from free with
40,000 categorizations per month and 2 per second
CLOSER 2016 - 6th International Conference on Cloud Computing and Services Science