classifiers. In the literature we have a large number of
these, however we can not test every one of them, so
we took as an starting point the paper entitled ’Top 10
algorithms in data mining’ of J. Ross Quinlan (Quin-
lan et al., 2008), which presents the top ten algorithms
of classification.
Once we decided on which could be possible can-
didates, we had to use a selection criteria based on the
problematic we have, these criteria are listed below:
1. The implementation of a classifier should be sim-
ple.
2. Low running time.
3. Since it is an ensemble based system it is not
required to have a high percentage of accuracy,
since our objective is to gather various types of
classifiers to improve accuracy.
4. Finally, the classifiers selected, must support large
amounts of data.
After experimenting with classifiers from Quin-
lan’s paper and some others, we found that some clas-
sifiers do not meet the criteria that we set earlier, for
instance Support Vector Machine meet criteria one
and three, but two and four not fulfilled as it does not
support large amounts of data and the running time
is very high. We finally selected six classifiers that
meet the criteria. We have five supervised learning
(Naive Bayes, k-NN, Decision Tables, ADTree, C4.5)
and one unsupervised learning (K-Means) algorithms.
3.2 Structure of Ensemble
Having these six classifiers, we must establish the
structure for our ensemble based algorithm, we chose
mixture of experts, putting in one stack all the classi-
fiers. The selection of Mixture of Experts was due to
the fact that this type of ensemble based systems gives
us the chance to take the opportunity to use many dif-
ferent classifiers, which in Baggin and Boosting is not
used. This combined with a weighted voting approach
is a novel approach and the results showed that it is
good one.
3.3 Criterion of Combination of the
Individual Classifications
Each classifier considered, has different degrees of
accuracy, one of the characteristics of the models of
mixture of experts ensemble, and therefore we must
determine which criterion for combining the individ-
ual classifications to use, for constructing the classi-
fier C
T +1
.
As seen in the previous section for the mixture of
experts it is common to use neural networks, however,
neural networks have some issues, as the problem of
generalisation, in which the neural network learns the
training data correctly, but is not able to deal with to
new data. Another problem arises when using gra-
dient descent method to minimize the error, which
runs the risk of being trapped on local minimal and
not finding the best way to assign weights of classi-
fiers. To find the best form to weigh each classifier,
considering these problems, we use a different way
to assign them, which is genetic algorithm. To solve
the problem of being trapped in local minimum and
maximum, genetic algorithms have the genetic opera-
tor called mutation, which reduces the probability that
this occurs.
Since different weights give a different accuracy,
how can we know what is the best configuration? the
answer that we use was applying a simple genetic al-
gorithm, in which each population represents weights
for each of the classifiers. We chose six different
classifiers, thus the size of each chromosome in our
genetic algorithm was six. The codification of each
chromosome, has a specific weight in the range of [0,
0.5, 1, 1.5, 2, . . . , 4], defined arbitrarily.
In order to find the best combination of weights
assigned to each classifier, we must set the size of
the training and test set, for obtaining the individual
accuracy of each classifiers. Since we used a large
DB with a total of 379,485 records, a 10% random
sample is selected of the DB in order to avoid a long
runtime. This was selected since it was a good trade-
off between accuracy and runtime. It also falls in line
with statistical sampling. To do a simple random sam-
pling, as in our case, we have the following analysis
to obtain the sample size. Considering a confidence
level of 0.95, with a maximum error of 0.1 and a pi-
lot study gives a variance of 154.5, according to the
sample random simple calculation we have:
n
′
=
z
2
α/2
· σ
2
e
2
where:
• n
′
possible sample size,
• z
2
α/2
is the confidence level chosen,
• σ
2
population variance,
• e: maximum error,
If it is true that N > n
′
(n
′
−1), where N is the total
size of the data, it takes the value of n
′
as the sample
size, otherwise it will calculate a new sample size n,
as shown below:
A HYBRID CLASSIFIER WITH GENETIC WEIGHTING
361