with Q
i
k
, corresponding to the formula without it. If
Q
i+1
k
> Q
i
k
such a term is accepted and added to the
formula F
k
. If not, this term is discarded and a new
draw is performed. A tentative counter is incremented
in this case. When a term is added to F
k
, the tentative
counter is reset and the process for choosing terms
begins again. The parameter M controls how many
times the algorithm must insist to improve the qual-
ity criterion by attaching terms. When the counter
reaches a preset value, the algorithm records the for-
mula F
k
obtained up to this moment, and begins a new
cycle with the next ant, k + 1. Another criterion used
to stop the algorithm is the maximumnumberof terms
L that the formula can have.
After the ant k has completed its formula F
k
, the
pheromone amount τ
i
for each position i of the for-
mula F
k
is updated following expression (2).
Then, when the next ant k+1 starts to construct its
own formula, it considers the deposited pheromone
on the trails formed by the previous ants. This pro-
cess is repeated until a maximum number of ants is
reached, specified by the parameter R.
A simple description of the algorithm is the fol-
lowing:
Set the initial trail matrix (i = 0)
Calculate η
i
0
for each term appearing at the trail matrix
Calculate P
i
0
for each term appearing at the trail matrix
For each ant k = 1 until R
Do until tentative counter ≤ M
Choose at random the term i considering P
i
k
Calculate Q
i
k
to F
k
with the randomly picked
term
If Q
i+1
k
> Q
i
k
Attach the term i to F
k
Increment i
Reset the tentative counter
Calculate η
i
k
for each term at the trail matrix
Calculate P
i
k
for each term at the trail matrix
Else
Increment the tentative counter
End Do
Update the pheromone in each segment forming F
k
Reset the tentative counter
Reset the term position (i = 0)
Reset η
i
k
Calculate P
i
k
for each term at the trail matrix
End Loop
4 NUMERICAL EXPERIMENT
Formula Miner was tested on the Wisconsin Diagnos-
tic Breast Cancer WDBC dataset, which contains559
cases, 2 classes and 30 numerical attributes. Accord-
ing to Mangasarian et al. (1995), 3 of these 30 at-
tributes are more relevant for diagnostics. Thus, we
selected them among the other attributes to search for
a mathematical relationship that separates the cases
of malignant and benignant breast cancer. These at-
tributes are: mean texture (x), worst smoothness (y),
and worst area (z) of the analyzed cells (Mangasar-
ian et al., 1995). Due to the dimensional differences
among these attributes, it was necessary to include
scale factors. Thus: x
1
≡ x, x
2
≡ 10y, x
3
≡ z/10.
Then, Formula Miner was applied to search a
mathematical relationship for these three attributes,
in order to correctly classify malignant and benignant
cases. The goal is to find a formula that expresses one
attribute in terms of the two others. The attributes are
related by the function x
3
= F(x
1
,x
2
)
The algorithm accuracy was evaluated using a
ten-fold cross-validation method (Stone, 1974). The
database is divided in ten equal parts. One part is set
apart and the algorithm is applied in the other nine
parts. The resulting formulas are tested at the part
that was removed of the database. In this procedure,
all cases are used only once as test and nine times to
run the Formula Miner.
The accuracy rate of each evaluation is defined as
the quotient between the number of correctly classi-
fied cases by the total number of tested cases, using
the heuristic function given by expression (3). The
final accuracy rate is the arithmetical average of the
accuracy rate of the nine rounds, followed by the cor-
responding standard deviation.
We chose the following parameter values: maxi-
mum number of ants R = 15; maximum number of
tentatives M = 5; α = 4; β = 5; γ = 3. The number
of registries used in each turn was 512; the number of
registries used in the validation procedure was 57.
Our results were compared with the ones ob-
tained by applying well-known algorithms, for the
same database (the same 3 attributes) and all using a
ten-fold cross-validation procedure. The algorithms
and their respective performances are: Ant Miner
(Parpinelli et al., 2002): (96.0 ± 1.0)%; C4.5 (Quin-
lau, 1993): (95.0± 0.3)%; Formula Miner: (81.4±
7.6)%; MSM-T (Mangasarian et al., 1995): 97.5%.
We must notice the differences among the ap-
proaches and the purposes of the algorithms in rela-
tion to Formula Miner in order to properly comment
these results. The MSM-T (Multisurface Method
Tree) looks for multiples planes (that can not form
a function), while Formula Miner looks for a unique
mathematical relationship that defines just one con-
tinuous surface to classify the diagnosis. In the algo-
rithms C4.5 andAnt Miner, the attributes were used as
discrete range of values, and the their results depend
on the adopted ranges. Formula Miner deals with con-
ICAART 2009 - International Conference on Agents and Artificial Intelligence
206