of labelled patterns. Each training pattern x is an n-
dimensional input vector of real values, and the label
is an m-dimensional output vector y. The k-th ele-
ment in y that represents the class of pattern x is set to
1, while the other (m− 1) elements in y are set to 0.
The aim is to train a FFNN, given the training set T ,
to be able to correctly classify (predict the label of)
a new unlabelled pattern. For that, each training pat-
tern x is, in turn, applied to the input layer of the net-
work, the signal is allowed to propagate through the
network, and the output of the network, denoted y
′
,
is compared to the desired output y to determine the
error of the network for that pattern, denoted E
x
. A
common error function is the simple Sum of Squared
Error (SSE) function, defined as:
E
x
=
1
2
m
∑
i=1
(y− y
′
)
2
, (3)
where the total error is simply: E =
∑
x
E
x
.
Perhaps the most popular neural network training
algorithm is the gradient descent based Backward Er-
ror Propagation (BP) algorithm which is based on re-
peatedly applying the training set to the network (each
full pass through the training set is called an epoch),
computing the error E, and then modifying each ele-
ment of the weight vector according to: ∆w
i
= −η
∂E
∂w
i
,
where η is the learning rate parameter. Commonly,
FFNN applications use a simple three-layer network
topology, with full connectivity between layers.
3 ANT COLONY OPTIMIZATION
Swarm intelligence is a branch of soft computing
in which a wide variety of biological collective be-
haviours are applied to solve optimization problems.
Ant Colony Optimization (ACO) was defined as a
meta-heuristic for combinatorial optimization prob-
lems (Dorigo and St¨utzle, 2004), inspired by the be-
haviour of natural ant colonies. The basic principle of
ACO is that a population of artificial ants cooperate to
find the best path in a graph, analogously to the way
that natural ants cooperate to find the shortest path be-
tween two points like their nest and a food source.
In ACO, each artificial ant constructs a candidate
solution to the target problem, represented by a com-
bination of solution components in the search space.
Ants cooperate via indirect communication, by de-
positing pheromone on the selected solution com-
ponents for a candidate solution. The amount of
pheromone deposited is proportional to the quality of
that solution, which influences the probability with
which other ants will use that solution’s components
when constructing their solution. This contributes to
the global search aspect of ACO algorithms. The pop-
ulation of ants searches for the best solution in par-
allel, thus exploring possibly different regions of the
search space at each iteration of the algorithm. This
increases the chances of finding a near-optimal solu-
tion in the search space.
ACO has been successful in tackling the clas-
sification problem of data mining. A number of
ACO-based algorithms have been introduced in the
literature with different classification learning ap-
proaches. Ant-Miner (Parpinelli et al., 2002) is the
first ant-based classification algorithm, which dis-
covers a list of classification rules in the form of
IF-Conditions-Then-Class.
The algorithm has
been followed by several extensions in (Parpinelli
et al., 2002; Salama et al., 2011; Salama et al., 2013;
Otero et al., 2009; Otero et al., 2013).
ACDT (Boryczka and Kozak, 2010; Boryczka and
Kozak, 2011) and Ant-Tree-Miner (Otero et al., 2012)
are two different ACO-based algorithms for inducing
decision trees for classification. Salama and Freitas
(2013a; 2013b) have recently employed ACO to learn
various types of Bayesian network classifiers.
As for learning neural networks, the ACO meta-
heuristic was utilized in two works. Liu et al. (2006)
proposed ACO-PB, a hybrid of the ant colony and
back-propagation algorithms to optimize the network
weights. It adopts ACO to search the optimal combi-
nation of weights in the solution space, and then uses
the BP algorithm to further fine-tune the ACO solu-
tion. Blum and Socha applied ACO
R
, an ant colony
optimization algorithm for continuous optimization
(Socha and Dorigo, 2008; Liao et al., 2014), to
train feed-forward neural networks (Socha and Blum,
2007).
4 THE ANN-Miner ALGORITHM
As discussed previously, the three-layer fully-
connected FFNN topology is the most commonly
used FFNN topology. ANN-Miner (Salama and Ab-
delbar, 2014), a recently proposed ACO algorithm for
learning FFNN topologies, allows connections to be
generated between hidden neurons and other hidden
neurons — under the restriction that the topology re-
mains acyclic — as well as direct connections be-
tween input neurons and output neurons. This allows
producing networks with a variable number of layers,
as well as arbitrary connections that skip over layers.
As for the problem at hand, a candidate solution
is a network topology, and the solution components
are the possible connections (between input and hid-
den neurons, between hidden and output neurons, be-
ECTA2014-InternationalConferenceonEvolutionaryComputationTheoryandApplications
138