volume data such as a CT or MRI volume is a single
valued function defined over a 3D domain. If we
extend the 3D domain to an N-dimensional feature
space, it defines a ML model where the function
value is the learning label such as the classification
probability or value of a predictive regression model.
The rendering of such a model is, however, more
challenging for several reasons. First, the concepts of
depth cue and visual perception do not exist in high-
dimensional space. Therefore, traditional rendering
operations such as blending and shading do not apply.
Secondly, sampling in a higher dimensional
orthogonal subspace (for each pixel) to the viewing
space does not have a simple order. Thus, cross-
sections and projections will need to be carefully re-
defined to generate meaningful visual
representations. Third, when the dimensionality of
the feature space is high, a 2D screen space is a very
narrow and limited viewing window. Thus, the
selection of and interaction with the viewing spaces
are important for the understanding and interpretation
of the model.
In this paper, we propose a new visualization
technique to simulate a 3D volume rendering problem
for ML models. Our visualization technique uses an
interpolation-based subspace morphing algorithm
and a subspace sampling method to generate various
renderings through projections and cross-sections of
the model space as 3D surfaces or heatmap images.
We will also apply our visualization technique to two
real-world datasets and applications: the diagnosis of
Alzheimer's Disease (AD) using a human brain
networks dataset and a real-world benchmark dataset
for predicting home credit default risks
.
2 RELATED WORK
Applying visualization and visual analytics principles
in interactive or human-in-the-loop ML has become
an active research area in recent years
(Chatzimparmpas, et al., 2020). Most of the existing
studies focus on using visualization for understanding
local decision-making processes of ML models
(Seifert, et al., 2017). There are also some recent
works on using visual analytics to improve the
performance of ML algorithms through better feature
selection or parameter setting (Endert. et al., 2017;
May, et al., 2011).
Previous works on using visualization to help
understand the ML processes are usually designed for
specific types of algorithms, such as support vector
machines, neural networks, and deep learning neural
networks. Multi-dimensional visualization
techniques such as scatterplot matrix have been used
to depict the relationships between different
components of the neural networks (Zahavy, et al.,
2016; Rauber, et al., 2017). Typically, a learned
component is represented as a higher dimensional
point. The 2D projections of these points in either
principal component analysis (PCA) spaces or a
multi-dimensional scaling (MDS) space can better
reveal the relationships of these components that are
not easily understood, such as clusters and outliers.
Several methods apply graph visualization
techniques to visualize the topological structures of
the neural networks (Tzeng & Ma, 2005; Harley,
2015; Streeter, et al., 2001). Visual attributes of the
graph can be used to represent various properties of
the neural network models and processes.
Several recent studies addressed the challenges
of visualizing deep neural networks. In (Liu, et al.,
2017), a visualization system, CNNVis, was
developed to help ML experts understand deep
convolutional neural networks by clustering the
layers and neurons. Techniques have also been
developed to visualize the response of a deep neural
network to a specific input in a real-time dynamic
fashion (Yosinski, et al., 2015; Luisa, et al., 2017).
Observing the live activations that change in response
to user input helps build valuable intuitions about
how convnets work. There are several literatures that
discuss visualization’s roles in Support Vector
Machines. In (Lim, 2014), visualization methods
were used to provide access to the distance measure
of each data point to the optimal hyperplane as well
as the distribution of distance values in the feature
space. In (Hamel, 2006), multi-dimensional scaling
technique was used to project high-dimensional data
points and their clusters onto a two-dimensional map
maintaining the topologies of the original clusters as
much as possible to preserve their support vector
models. In (Wang, et al., 2016), interactive volume
visualization was used to identify potential features
for classification of brain network data. Finally,
Visualization were also used to analyze the
performances of ML algorithms in different
applications (Ren, et al., Alsallakh, et al., 2014; 2017;
Chuang, et al., 2013).
Compared to the visualization of local ML
processes, there have been relatively few known
techniques for the global visualization of a ML model
as a whole. The Manifold system (Zhang, et al., 2019)
provides a generic framework that does not rely on or
access the internal logic of the model and solely
observes the input and output. It applies scatter plot
matrix visualization to observe input and output
samples to evaluate model performance and behavior.