4.4 Constructing NEViM
4.4.1 What Base Structure to Use for NEViM?
Since the aim of our model is to help a user decide
which data visualiza tion to use, the obvious cho ic e
seemed to be the structure of decision trees. A d e -
cision tree has fou r ma in parts: a root node, inte rnal
nodes, leaf nodes and branches. T he biggest advan-
tages of decision trees are that they can help uncover
unknown altern ative solutions to a problem and that
they are well suited for machine learning methods.
Once we dete rmined th at the decision tree was a
possible base structure, we nee ded to specify what
our root node, interna l nodes, leaf nodes and bran-
ches would be. I t was clear that the leaf nodes would
be the different types of data visualizations since that
was the outcome that we wanted to achieve. The
root n ode, internal nodes and branches are inspired
by Akinato r, the Web Ge nie. Akinator is a game that
attempts to determine which character the player is
thinking of by asking a series of questions. The struc-
ture hidden under the user interface is a decision tree,
as in the case of NEViM.
Our model’s root and internal n odes are questions
which possess the ability to clearly distinguish diffe-
rent types of data visualizations. The branches ar e
’yes’ or ’no’ answers to those questions.
4.4.2 What Questions to Ask? (Establishing the
Internal Nodes and Root Node)
The biggest challenge in constructing questions for
our model was that they must be understandable for
non-experts, yet every que stion should get the user
closer to a data visualization recommendation. This
means tha t the subjects of th e questions must be fea-
tures that distinguish the different data visualizations
from each other. The key to solvin g this problem is to
base the questions on a c le ar classification hierarchy.
As far as we know, there is no one specific classifi-
cation hierarchy of da ta visualizations which would
be used globally. We researched different methods of
classification and combined them tog e ther to derive
a classification of our own. T his was a very time-
consumin g process. We went throu gh a total of 19
books (O’Neil and Schu tt, 2014; Kirk, 2016; Illin-
sky and Steele, 2011 ; Munzner and Maguire, 201 5;
Gnanamgari, 1981; Evergreen, 2 016; Yau, 2011; Yau,
2013; Heer et al., 2010; Hardin et al. , 2012; Yuk and
Diamond, 201 4; Brath and Jonker, 2015; Brner and
Polley, 2014; Telea, 2007; Brner, 2015; Ware, 2010 ;
Ware, 2 012; Stacey et al., 2015; Hinderman, 2015)
and f or each one, we construc te d a diagram showing
the cla ssification that was described in the text.
We examined the classification hierarchies from
books together with hierarchies available from web
resources and existing tools. We also ma de n ote of
any advantages or disadvantages of a specific data vi-
sualization, if they were listed. For example in several
sources (O’Ne il and Schutt, 2014; Kirk, 2016; Illin-
sky and Steele, 2011) the authors stated that the pie
chart is not suitable for when you have mo re than 7
parts. The a dvantages and disadvantages reflected fe-
atures of the d ata v isu alizations that could determine
whether they are candid a te s for recommendation or
not, so they are crucial for the final mode l.
We identified that there are two b a sic views that
the classifications incorporate. The fir st one is a view
from the perspective of the task the user wants to per-
form. The second is a view from the perspective of the
characteristics of the data the user has available. This
is in line with data characteristics and task oriented
recommendation systems (Ka ur and Owonibi, 2017).
We have identified a prominent issue in the c la s-
sification hierarchies: they mix different views into
one without making a clear distinction betwee n them.
To avoid this issue, we have selected the root node o f
our model to be a question which would distinguish
between two views. The first view is from a task-
based perspective and it uses the representationa l go a l
or user’s intentions behind visualizing the data to re-
commend a suitable visualization. The second view is
from a data- driven perspective, where a visualization
recommendation is made based on gathe ring informa-
tion abo ut the user’s data. The root node of NEViM
is a question asking ”Do you know what your main
task is?” If the user answers ”Yes”, he is taken in to
the task-based branch. If he answer s ”No”, he is taken
straight into the data characteristics-based branch.
Once we established the root node, we had to
come up with internal nodes. The inter nal nodes ar e
questions which possess the ab ility to clearly distin-
guish different types of data visu alizations. The sub-
ject of such a question must be something that we de-
fine as a distinguishing feature. Based on the findings
we ma de in previous paragraphs, we have established
a list of distinguishing features and their hierarchy.
Based on the disting uishing features, w e have co n-
structed questions that ask whether that feature is pre-
sent or not. You can see an example of such questions
in Figure 3.
4.4.3 What Data Visualizations to Include?
(Establishing the Leaf Nodes)
Once we had figured out our model’s ba se structure,
distinguishing features and questions, the challenge
was, which data visualizations to include. We knew
that we would not be able to cover all the 300 types