Authors:
Elio Ventocilla
and
Maria Riveiro
Affiliation:
School of Informatics, University of Skövde, Skövde and Sweden
Keyword(s):
Growing Neural Gas, Dimensionality Reduction, Multidimensional Data, Visual Analytics, Exploratory Data Analysis.
Related
Ontology
Subjects/Areas/Topics:
Abstract Data Visualization
;
Computer Vision, Visualization and Computer Graphics
;
General Data Visualization
;
Large Data Visualization
;
Visual Data Analysis and Knowledge Discovery
;
Visual Representation and Interaction
Abstract:
This paper argues for the use of a topology learning algorithm, the Growing Neural Gas (GNG), for providing an overview of the structure of large and multidimensional datasets that can be used in exploratory data analysis. We introduce a generic, off-the-shelf library, Visual GNG, developed using the Big Data framework Apache Spark, which provides an incremental visualization of the GNG training process, and enables user-in-the-loop interactions where users can pause, resume or steer the computation by changing optimization parameters. Nine case studies were conducted with domain experts from different areas, each working on unique real-world datasets. The results show that Visual GNG contributes to understanding the distribution of multidimensional data; finding which features are relevant in such distribution; estimating the number of k clusters to be used in traditional clustering algorithms, such as K-means; and finding outliers.