Typically, the method can be used to discover pat-
terns from raw business data and support result expla-
nation. In future work, we will conduct user exper-
iments to evaluate the usefulness of our method for
interpreting the insights found from data. In addition,
we will apply the method on large real-world environ-
mental datasets. To this end, we will improve our ap-
proach to support large datasets by using incremental
techniques for clustering and decision tree induction.
REFERENCES
Aggarwal, C. and Reddy, C. (2013). Data Clustering: Algo-
rithms and Applications. Chapman & Hall/CRC D.M.
and K.D. Series. Taylor & Francis.
Bache, K. and Lichman, M. (2013). UCI machine learning
repository.
Barlow, T. and Neville, P. (2001). Case study: Visualiza-
tion for decision tree analysis in data mining. In Pro-
ceedings of INFOVIS’01, INFOVIS ’01, pages 149–,
Washington, DC, USA. IEEE Computer Society.
BDack, T., Hoffmeister, F., and Schwefel, H. (1991). A
survey of evolution strategies.
Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J.
(1984). Classification And Regression Trees. Chap-
man and Hall, New York.
Breslow, L. A. and Aha, D. W. (1997). Simplifying decision
trees: A survey. Knowl. Eng. Rev., 12(1):1–40.
Derrac, J., Cornelis, C., Garca, S., and Herrera, F. (2012).
Enhancing evolutionary instance selection algorithms
by means of fuzzy rough set based feature selection.
Information Sciences, 186(1):73 – 92.
Dolnicar, S. and Gr
¨
un, B. (2008). Challenging factor–
cluster segmentation. Journal of Travel Research,
47(1):63–71.
Engels, R. and Theusinger, C. (1998). Using a data metric
for preprocessing advice for data mining applications.
In ECAI, pages 430–434.
Famili, A., Shen, W.-M., Weber, R., and Simoudis, E.
(1997). Data preprocessing and intelligent data anal-
ysis. Intelligent Data Analysis, 1(1–4):3 – 23.
Gan, G., Ma, C., and Wu, J. (2007). Data clustering - The-
ory, Algorithms and Applications. ASA-SIAM series
on statistics and applied probability. SIAM.
Herman, I., Delest, M., and Melancon, G. (1998). Tree vi-
sualisation and navigation clues for information visu-
alisation. Technical report, Amsterdam, Netherlands.
Jaccard, P. (1908). Nouvelles recherches sur la distribution
florale. Bulletin de la Soci
´
et
´
e vaudoise des sciences
naturelles. Impr. R
´
eunies.
Jain, A. K. (2010). Data clustering: 50 years beyond k-
means. Pattern Recognition Letters, 31(8):651–666.
Kandel, S., Heer, J., Plaisant, C., Kennedy, J., van Ham,
F., Riche, N. H., Weaver, C., Lee, B., Brodbeck, D.,
and Buono, P. (2011). Research directions in data
wrangling: Visualizations and transformations for us-
able and credible data. Information Visualization,
10(4):271–288.
Keim, D., Andrienko, G., Fekete, J.-D., G
¨
org, C., Kohlham-
mer, J., and Melanc¸on, G. (2008). Information vi-
sualization. chapter Visual Analytics: Definition,
Process, and Challenges, pages 154–175. Springer-
Verlag, Berlin, Heidelberg.
Kotsiantis, S. (2013). Decision trees: a recent overview.
Artificial Intelligence Review, 39(4):261–283.
Murthy, S. K. (1998). Automatic construction of decision
trees from data: A multi-disciplinary survey. Data
Min. Knowl. Discov., 2(4):345–389.
O’Madadhain, J., Fisher, D., White, S., and Boey, Y. (2003).
The JUNG (Java Universal Network/Graph) frame-
work. Technical report, UCI-ICS.
Parisot, O., Bruneau, P., Didry, Y., and Tamisier, T. (2013).
User-driven data preprocessing for decision support.
In Luo, Y., editor, CDVE, volume 8091 of LNCS,
pages 81–84. Springer Berlin Heidelberg.
Quinlan, J. R. (1993). C4.5: Programs for Machine Learn-
ing. Morgan Kaufmann.
Qyu, M., Davis, S., and Ikem, F. (2004). Evaluation of
clustering techniques in data mining tools.
Rousseeuw, P. J. (1987). Silhouettes: a graphical aid to
the interpretation and validation of cluster analysis.
Journal of computational and applied mathematics,
20:53–65.
Rudolph, G. (1996). Convergence of evolutionary algo-
rithms in general search spaces. In Proceedings of
IEEE Int. Conf. on Evolutionary Computation, pages
50–54.
Stiglic, G., Kocbek, S., Pernek, I., and Kokol, P. (2012).
Comprehensive decision tree models in bioinformat-
ics. PLoS ONE, 7(3):e33812.
Torgo, L. (1998). Regression datasets.
www.dcc.fc.up.pt/∼ltorgo/Regression/DataSets.html.
van den Elzen, S. and van Wijk, J. (2011). Baobabview:
Interactive construction and analysis of decision trees.
In Visual Analytics Science and Technology (VAST),
2011 IEEE Conference on, pages 151–160.
Wagner, S. and Wagner, D. (2007). Comparing clusterings:
an overview. Universit
¨
at Karlsruhe.
Wang, S. and Wang, H. (2007). Mining data quality in com-
pleteness. In ICIQ, pages 295–300.
Witten, I. H., Frank, E., and Hall, M. A. (2011). Data
Mining: Practical Machine Learning Tools and Tech-
niques. Elsevier.
DecisionTreesandDataPreprocessingtoHelpClusteringInterpretation
55