5 CONCLUSIONS
In this paper we proposed a novel Parallel Classifi-
cation System based on an Ensemble of Mixture of
Experts (PCEM) based on the MIMD architecture,
which has a set of classifiers, combined by a weighted
voting criterion.
We used a parallel computing tool called GNU
Octave to perform the PCEM, which represents a
novel tool to perform applications requiring parallel
computation. Other tools like Hadoop MapReduce or
were not considered at this time but the implemen-
tation of PCEM in frameworks of big data would be
part of the future work.
In each test we perform with the PCEM we can
handle large amounts of data (we handle datasets with
sizes up to 5.8 million records), obtaining high per-
centages in accuracy. Table 4 shows that in each test
we obtain better percentages with PCEM, compared
with a set of sequential and parallel classifiers. It’s
worth mentioning that in a previous paper, we develop
a classifier based on ensembles, called HCGW, where
we obtained increases in percentages, but at a consid-
erable time cost.
Accuracy increase over HCGW using PCEM is
over 10%, for example in KDD Cup 1999 Data Set;
we obtain an improvement of 13.22% with regard to
HCGW. This did not occur with the HCGW since
only we obtain increments no greater than 5%, with
traditional classifiers.
The runtimes of PCEM we can see in Table 3, ob-
tained a reduction all parallel WeLe with respect to all
versions of the sequential WeLe. Regarding HCGW
the time we got to the PCEM represents only 6%
of the total time HCGW needed, which represents a
great contribution to the factor of the execution times.
The main future work consists in migrating the PCEM
to a bigger cluster to test with other data sets and other
parallel architectures.
REFERENCES
Graf, H. P. et al. (2005). Parallel support vector machines:
The cascade svm. In Advances in Neural Information
Processing Systems, pages 521–528.
Houser, D. and Xiao, E. (2011). Classification of natural
language messages using a coordination game. Ex-
perimental Economics, 14:1–14.
Levchenko, K. et al. (2011). Click trajectories: Endtoend
analysis of the spam value chain. in Proceedings of
the IEEE Symposium and Security and Privacy.
Menahem, E., Rokach, L., and Elovici, Y. (2009). Troika -
an improved stacking schema for classification tasks.
Inf. Sci., 179(24):4097–4122.
Miller, D. J. and Uyar, H. S. (1997). A mixture of experts
classifier with learning based on both labeled and un-
labeled data. Neural Information Processing Systems,
9:571–577.
Moreno-Montiel, B. and MacKinney-Romero, R. (2011). A
hybrid classifier with genetic weighting. in Proceed-
ings of the Sixth International Conference on Software
and Data Technologies, 2:359–364.
Moreno-Montiel, B. and MacKinney-Romero, R. (2013).
Paraltabs: A parallel scheme of decision tables. Mex-
ican International Conference on Computer Science.
Moreno-Montiel, B. and Moreno-Montiel, C. H. (2013).
Prediction system of larynx cancer. in Proceedings of
the The Fourth International Conference on Compu-
tational Logics, Algebras, Programming, Tools, and
Benchmarking Computation Tools 2013, 2:23–30.
Peralta, R. et al. (2010). Increased expression of cellular
retinol-binding protein 1 in laryngeal squamous cell
carcinoma. Journal of Cancer Research and Clinical
Oncology, 136:931–938.
Polikar, R. (2006). Ensemble based systems in decision
making. IEEE Circuits and Systems Mag., 6:21–45.
Rauber, T. (2010). Parallel programming: for multicore and
cluster systems. Springer, 1st Edition.
Serhat, O. and Yilmaz, A. (2002). Classification and predic-
tion in a data mining application. Journal of Marmara
for Pure and Applied Sciences, 18:159–174.
Sun, S. (2010). Local within-class accuracies for weight-
ing individual outputs in multiple classifier systems.
Pattern Recognition Letters, 31(2):119 – 124.
Sun, S. and Zhang, C. (2007). The selective random sub-
space predictor for traffic flow forecasting. Intelli-
gent Transportation Systems, IEEE Transactions on,
8(2):367–373.
Sun, S., Zhang, C., and Lu, Y. (2008). The random elec-
trode selection ensemble for {EEG} signal classifica-
tion. Pattern Recognition, 41(5):1663 – 1675.
Wu, X. et al. (2009). Top 10 algorithms in data mining.
Knowledge and Information Systems, 14:1–37.
Zhang, Y. et al. (2006). The study of parallel k-means al-
gorithm. Proceedings of the 6th World Congress on
Intelligent Control and Automation, pages 241–259.
ICPRAM2014-InternationalConferenceonPatternRecognitionApplicationsandMethods
278