Knowledge at First Glance: A Model for a Data Visualization

Recommender System Suited for Non-expert Users

Petra Kubern´atov´a

, Magda Friedjungov´a

and Max van Duijn

Leiden Institute of Advanced Computer Science, Leiden University, Netherlands

Faculty of Information Technology, Czech Technical University in Prague, Czech Republic

Keywords:

Data Visualization, Recommender System, Non-experts, Model.

Abstract:

In today’s age, there are huge amounts of data being generated every second of every day. Through data

visualization, humans can explore, analyse and present it. Choosing a suitable visualization for data is a

difﬁcult task, especially for non-experts. Current data visualization recommender systems exist to aid in

choosing a visualization, yet suffer from issues such as low accessibility and indecisiveness. The aim of

this study is to create a model for a data visualization recommender system for non-experts that resolves

these issues. Based on existing work and a survey among data scientists, requirements for a new model

were identiﬁed and implemented. The result is a question-based model that uses a decision tree and a data

visualization classiﬁcation hierarchy in order to recommend a visualization. Furthermore, it incorporates both

task-driven and data characteristics-driven perspectives, whereas existing solutions seem to either convolute

these or focus on one of the two exclusively. Based on testing against existing solutions, it is shown that the

new model reaches similar results while being simpler, clearer, more versatile, extendable and transparent.

The presented model can be applied in the development of new data visualization software or as part of a

learning tool.

1 INTRODUCTION

In today’s age, there are huge a mounts of data being

generated every second of every day and Big Data has

been one of the hot topics of computer science in re-

cent years. Being th e curious species that we are, hu-

mans are looking for ways to get the most information

out of this vast amount of data that we have available

at our ﬁngertips. We are always looking for method s

to help us explore, analyze and present it.

A crucial p art of th is process is data visualization.

Data visualization is the representa tion of information

in a visual form, such as a c hart, diagram or picture. It

can ﬁnd its place in a variety of areas such as art, mar-

keting, social relations and scientiﬁc research. There

were over 300 visualization types available at the time

of writing this pape r ( Bostock, 2017). But how do we

choose the most suitable one? This is where data vi-

sualization recommender systems come in: these sy-

stems help with this difﬁcult task that becomes even

more difﬁcult when the u ser is a non-expert.

In this paper we deﬁne a ’non-exper t user’ as so-

meone without professional or specialized knowledge

of da ta visualization. We thus include both complete

beginners and users who have general knowledge of

data visualization types (e.g. bar charts, pie charts,

scatter plots) but have no professional experience in

the ﬁelds of data science and data communication.

In this study we focus on building a model for

a data visualizatio n recommender system aimed at

non-expert users. We term our model NEViM: Non-

Expert Visualization Model.

Section 2 of this paper pla c es data visualization

recommender systems for non-experts in the context

of data science. We discuss different types of systems

and comme nt on where the model we are building ﬁts

in. Section 3 introduces our research goal and the

method we intend to use to fulﬁll it. Section 4 discus-

ses the results of the work done within our method.

We present results of our literatu re study, existing so-

lutions analysis, survey, model requirements, model

construction process and model testing process. We

draw conclusions in Section 5 and set an agenda fo r

future work in Section 6.

208

Kubernátová, P., Friedjungová, M. and Duijn, M.

Knowledge at First Glance: A Model for a Data Visualization Recommender System Suited for Non-expert Users.

DOI: 10.5220/0006851302080219

In Proceedings of the 7th International Conference on Data Science, Technology and Applications (DATA 2018), pages 208-219

ISBN: 978-989-758-318-6

2 CONTEXT

2.1 Data Science

Data science plays an important role in scientiﬁc re-

search, as it aids us in collecting, organizing, and in-

terpreting data, so tha t it can be transformed into va-

luable knowledge.

Communicate

Results

Machine Learning

Algorithms

Statistical Models

Exploratory Data

Analysis

Clean Data

Data is

Processed

Raw Data is

Collected

Figure 1: The data science process (O’Neil and Schutt,

2014).

Figure 1 shows a simp liﬁed diagr a m of the data

science proc e ss. First, real world raw data is col-

lected, processed and cleaned through a process cal-

led da ta munging. Then exploratory data analysis

(EDA) follows, d uring which we might ﬁnd that we

need to collect mor e data or dedicate more time to

cleaning and organizing the curre nt dataset. When

ﬁnished with E DA, we may use machine learning

algorithm s, statistical models and data visualization

techniques, depending on the type of problem we are

trying to solve. Finally, results can be c ommunicated

(O’Neil and Schutt, 2014).

Our focus her e is on the part of the process con-

cerning exploratory data analysis or EDA. EDA uses

a variety of statistical techniques, principle s of ma-

chine learning, but also, crucially, the data visualiza-

tion techniques we study in this paper. Please note

that data visu a lization can also be a pa rt of the Com-

municate Results stage of the data scienc e process

(see Figure 1). There is a thin line between data vi-

sualizations made for exploration and ones made for

explanation, as most exploratory data visualizations

also contain some level of explanation an d vice-versa.

2.2 Exploratory Data Analysis

Explora tory data analysis (EDA) is not only a criti-

cal part of the data science process, it is also a kin d

of philosophy. You are aiming to understand the data

and its shape and connect your understanding of the

process that collec te d the data with the data itself.

EDA helps with suggesting hypotheses to test, eva-

luating the quality of the data, identifying potential

need for further collection or cleaning, supporting the

selection of appropr ia te models and techniques and,

most importantly for the con text of this study, it helps

ﬁnd interesting insights in your data (Tukey, 1970).

2.3 Data Visualization

There are many deﬁnitions of the term data visualiza-

tion. The one used in this study is: data visualization

is the representation and presentation of data to faci-

litate u nderstanding. According to Kirk, our eye and

mind are not equip ped to easily tran slate the textual

and numeric values of raw data into quantitative and

qualitative meaning. ”We can look at the data, but we

cannot understand it. To truly understand the data, we

need to see it in a different kind of form. A visual

form.” (Kirk, 2016 )

Illinsky and Steele describe data visualization as a

very p owerful tool for identifying patterns, communi-

cating relation ships and meaning, insp iring new q ue-

stions, identifying sub-problems, identifying trends

and outliers, discovering or sear ching for interesting

or speciﬁc data points (Illinsky and Steele, 2011).

Tamara Munzner made a 3-step model for data vi-

sualization design. A ccording to this mod e l, we ﬁrst

need to decide what we want to show. Secondly, we

need to motivate why we want to show it. Finally, we

need to decide how we a re going to show it (Mun-

zner and Maguire, 2015). There are many different

types of data visu alizations to help us with the third

step. However, the challenge rem a ins in choosing the

most suitable one. Data visualization recommender

systems were made to help with this d ifﬁcult task. We

ﬁnd that the WHAT and the WHY greatly inﬂuence

the HOW, thus we aim to build a system that r eﬂects

all three aspects o f the data visualization de sign pro-

cess in some way.

2.4 Data Visualization Recommender

Systems

Within this study we deﬁn e data visualization recom-

mender systems as tools that seek to rec ommend visu-

alizations which highlight features of interest in data.

This deﬁnition is based on combining common as-

pects of deﬁnitions in existing work.

While the output of data visualization recommen-

der systems is always a recommendation for data vi-

sualization types in some shape or fo rm, the input can

differ. It can be, f or example, just the data itself, a

speciﬁcation of goals or the speciﬁcation of aesthe-

tic prefere nces. The type of input affects the type of

recommendation strategy used and conseq uently the

type of the recommender system.

Kaur and Owonibi distinguish 4 types of recom-

mender systems (Kaur and Owonibi, 2017):

• Data Characteristics Oriented. These systems

recommend visualizations based on data charac-

teristics.

Knowledge at First Glance: A Model for a Data Visualization Recommender System Suited for Non-expert Users

209

• Task Oriented. These systems recomm e nd visu-

alizations based on representa tional goals as well

as data characteristics.

• Domain Knowledge Oriented. Th ese systems

improve the visualization recommendation pro-

cess w ith domain knowledge.

• User Preferences Oriented. These systems gat-

her information abo ut the user presentation goals

and preferences through user interaction with the

visualization system.

The line be tween different categories of recom-

mendation systems is rather thin and some systems

can have ambiguous classiﬁcations, as will be discus-

sed below.

3 METHOD

Within this study our aim is to devise a new data vi-

sualization recommender system, which is simple and

easy to use for non-experts, but c a n nonetheless com-

pete w ith existing, often mo re complex systems. Cle-

arly, we will avoid reinventing the whee l: the current

solutions are already good, but we want to see if we

can make adjustments that make a system more suit-

able for n on-expert users while maintaining effecti-

veness (still clearly distinguishing the data visualiza-

tions from each other) and performance (recommen-

ding the most suitable visualiza tion type).

We w ill begin by conducting a literatu re study

of previous work done in the ﬁeld of data visua-

lization recommender systems. We focus on data

characteristics-oriented and ta sk-oriented data visua-

lization recommender systems, as this is where our

model belongs. The study helps us identify aspec ts of

current solutions which could be utilized in our mo-

del and determine which solutions are suitable for the

testing of our model.

Next, we run a survey among different data

science communities on Faceboo k and LinkedIn.

This way, we ask 88 responden ts who have some sor t

of familiarity with data science and its terminology.

The main goals of the survey are to aid us in decisi-

ons about our model and, as our model is aimed at

non-expert user s, to aid us in specifying who exactly

these users are.

The ﬁnd ings we make from the literature study,

as well as the results of the survey will help us form

requirements for our model.

Once we have the requirements, we commence

constructing the model. First we choose a suitable

base stru cture. Then we establish the different com-

ponen ts of the structure and specify what they will be

in our model. Fin a lly, we combine it all together.

We perform two tests on the co nstructed model.

The ﬁrst test focuses on establishing whether the mo-

del is able to pro duce results similar o r identical to

existing solutions. The second focuses on testing the

extendibility of the model by adding a new type of

visualization.

4 RESULTS

4.1 Existing Solutions Study

4.1.1 Data Characteristics Oriented systems

Systems based on data c haracteristics aim to improve

the understanding of the data, of different relations-

hips that exist within the da ta an d of procedures to re-

present them. Some of the following tools and techni-

ques are not rec ommend ation systems per se but they

were a crucial part of the history of this ﬁeld and foun-

dations for other recom mender systems stated, thus

we feel it is appropriate to list them as well.

BHARAT

BHARAT was the ﬁrst system that proposed some

rules for determining which type of visualization is

appropriate for certain data attributes (Gnana mgari,

1981). As this work was written in 1981, the set of

possible visualizations was not as varied as it is to-

day. The system incorporated only the line, pie and

bar charts. If the function was co ntinuous, a line chart

was recomme nded. If the user indicated that the range

sets could be summed u p to a me aningfu l total, a pie

chart was recommended and bar charts were recom-

mended in all the remainin g cases. Even though this

system would now be con sidered very basic, it served

as the foundation for other systems that followed.

APT

In 1986, Mackinlay proposed to f ormalize and co-

dify the graphical de sig n speciﬁcation to automate the

graphics genera tion process (Mackinglay, 1986). His

work is based on the work o f Josep h Bertin , who, in

1983, ca me up with a semiology of graphics (Bertin,

1983), whe re he speciﬁed visual variables such as po-

sition, size, value, color, orientation etc. and classi-

ﬁed th em according to which features they commu -

nicate best. Mackinlay cod iﬁed Bertin’s semiology

into a lgebraic operators that were used to search for

effective presentations of informa tion. He based his

ﬁndings on the principals of expressiveness and ef-

fectiveness. Expressiveness is the idea that graphi-

DATA 2018 - 7th International Conference on Data Science, Technology and Applications

210

cal presentations a re actually sentences of graphi-

cal lan guages and effectiveness ref ers to how accu-

rately these presentations are perceived. He would

take the encoding technique and formalize it with pri-

mitive graphical language (which data visua lizations

can show this), then he would order these primitive

graphica l languages using the effectiveness princ iple.

VizQL(Visual Query Language)

In 2003, Hanrahan revised M a ckinlay’s speciﬁcations

into a declarative visual language known as VizQL

(Hanraha n, 2006 ). It is a formal language for descri-

bing tables, charts, graphs, maps and time series. The

languag e is capable of translating actions into a data -

base q uery and then exp ressing the response gra phi-

cally.

Tableau and Its Show Me Feature

The introduction of Tableau was a real milestone in

the world of data visualization tools. Due to the sim-

ple user interface, even inexperienced users could cre-

ate data v isualizations. It was created when Stolte,

together with Hanrahan and Chabot, decided to com-

mercialize a system ca lled Polaris (Stolte et al., 2002)

under the name Tableau Software. In 2007 Tableau

introdu ced a feature called Show Me (Mackinlay

et al., 2007). The Show Me functionality takes advan-

tage of VizQL to automatically present data. At the

heart of this feature is a data characteristics-oriented

recommendation system. The u ser selects the data at-

tributes that interest him and Tableau recommends a

suitable visualization. Tableau determines the pro-

per visualization type to use by looking at the types

of attributes in the data. Each visualization requires

speciﬁc attribute types to be present before it ca n be

recommended. Furthermore , it also ranks every vi-

sualization o n familiarity an d design best practices.

Finally, it recommends the highest-ranked eligible vi-

sualization. Mackinla y and his team have also perfor-

med in te resting user tests with the Show Me fe ature.

They found that the Show Me feature is being used

(very) modestly by skilled users (i.e. in o nly 5.6% of

cases).

ManyEyes

Viegas et al. created the ﬁrst known public we b-

site where users could upload data and create in-

teractive visualizations collaboratively: ManyEyes

(Viegas et al., 2007). Design choices were made to

reﬂect the effort to ﬁnd a bala nce between powerful

data-analysis capabilities and accessibility to the non-

expert visualization user. The visualiza tions were cre-

ated by matching a dataset with one of the 13 types of

data visualizations implemented in the tool. They di-

vided th e data visualizations into groups by data sche-

mas. A data schema could be, for example, single co-

lumn textual data. Thus, a bar chart was described as

single column textual data a nd more than one nume-

rical value. The tool closed down in 2015.

Watson Analytics

Since 2 014, IBM have been developing a tool cal-

led Watson Analytics (IBM, 2017). Watson Analy-

tics uses principles of machine learning and natural

languag e processing to recommend users either que-

stions they can ask about their data, or a speciﬁc vi-

sualization. However, little is known about how the

recommendation system works.

Microsoft Excel’s Recommended Charts Feature

In the 2013 release of Microsoft Excel, a new feature

called Recommended Charts was introduced. The

user can select the data they want to v isu alize and Ex-

cel recom mends a suitable visualization (Microsoft,

2017). However, Microsoft does not share exactly

how this process is carried out, makin g it less suita-

ble as a source of inspiration.

SEEDB

In 2015 Vartak et al. proposed an eng ine called

SEEDB (Vartak et al. , 2015). They judge the inte-

restingness of a visualization based on the following

theory: a visualization is likely to be interesting if

it displays large deviations from some reference (e.g.

another dataset, historical da ta, or the rest of the data).

This h elps them identify the most interesting visua-

lizations from a large set of potential visualizations.

They identiﬁed that there are mo re aspects that de-

termine the interestingness of a visualization, su c h as

aesthetics, user pref e rence, metadata and user tasks.

A full-ﬂedged visualization recommendation system

should take into a c count a combina tion of these as-

pects. A major disadvantage of SEEDB is that it only

uses variations of bar charts and line charts. As far as

we know SEEDB was never deployed.

Voyager

In 2016, Wongsuphasawat et al. developed a visuali-

zation recommendation web application called Voy-

ager (Wongsuphasawat et al., 2016), based on the

Compass recommendation engine ( Wongsuphasawat,

2017) and a high-level speciﬁcation language called

Vega-lite (Satyanaray a n et al., 2017). It couples brow-

sing with visualization recommendation to support

exploration of multivariate, tabular data.

Knowledge at First Glance: A Model for a Data Visualization Recommender System Suited for Non-expert Users

211

Google Sheets and Its Explore Feature

Google Sheets (Google, 201 7) is a tool which allows

users to create, edit and sha re spreadsheets. It was

introdu ced in 2007 and is very similar to Microsoft

Excel. In June of 2017, the tool was extended with

the Explore Feature, which helps with automatic chart

building and data visualization. It uses e le ments of

artiﬁcial in telligence and natural language pr ocessing

to r ecommend users q uestions they might want to ask

about their data, as well as recommending da ta visua-

lizations that best suit their data. In the documentation

for this feature, Google speciﬁes each of the inclu ded

data visualizations using functions and conditions th at

have to be fulﬁlled in order for tha t particular data vi-

sualization to be recommende d. However, a couple of

visualizations have the same conditions and it is not

revealed how the most suitable data visualization is

chosen.

4.1.2 Task Oriented Systems

Task-oriented systems aim to design different techni-

ques to infer the representational goal or a user’s in-

tentions. In 1990 Roth and Mattis were the ﬁrst

to identify different domain-independent inf ormation

seeking goals, such as compa rison, distribution, cor-

relation etc. (Roth and Mattis, 1990). Also in 1990,

Wehrend and Lewis proposed a classiﬁcation scheme

based on sets of representatio nal goals (Wehrend and

Lewis, 1990). It was in the form of a 2D m atrix where

the columns were data attributes, the rows representa-

tional goals and th e cells data v isu alizations. To ﬁnd a

visualization, the user had to divide the problem into

subproblems, until for each subproblem it was possi-

ble to ﬁnd an e ntry in the matrix. A representation for

the original complex problem could then be found by

combining the candidate representation methods for

the subproblems. Unfortunately, the complete matrix

was not published so it is unkn own which sp eciﬁc ty-

pes of data visualizations were included.

IMPROVISE

In the previous studies, the user task list was manu-

ally created. However, in 1998, Zhou and Feiner in-

troduced advanced linguistic techniques to automate

the derivation of the user task from a natural language

query (Zhou and Feiner, 1998). They introduced a

visual task taxonomy to auto mate the process of gai-

ning presentation intents from the text. For example,

the visual task Focus implies that visual techniques

such as Enlarge or High light could be used. This taxo-

nomy is implemented in IMPROVISE. Zhou and Fei-

ner show how IMPROVISE generates a visual narr a-

tive from speech to p resent an overview of a hospital

patient’s information to a nurse. To achieve th is goal,

it constructs a structure diagram that organizes vari-

ous informatio n ( e .g. IV lines) around a core com-

ponen t (the patient’s body). In a top-down de sig n

manner, IMPROVISE ﬁrst creates an ’empty’ struc-

ture diagram and then populates it with components

by par titioning and encoding the patient informatio n

into different groups.

HARVEST

In 2009 Gotz and Wen introduced a novel behavior-

driven approach (Gotz and Wen, 2009). Instead of

needing explicit task descriptions, they use impli-

cit task informatio n obtained by monitoring users’

behavior to make recomm endation more effective.

The Behavior-Driven Visualization Recommendation

(BVDR) approach has two phases. In the ﬁrst phase

of BDVR, they detect four predeﬁn e d patterns from

user activity. In the second phase, they fe ed the

detected patterns into a recommendation algorithm,

which infers user intent in terms of common visual

tasks (e. g. comparison) and suggests visualizations

that better support the user’s needs. The inferred vi-

sual task is used together with the properties of the

data to retrieve a list of poten tially useful visual me -

taphors from a visualization example corpus m a de by

Zhou and Chen (Zhou et al., 2002). It contains over

300 examples from a wide variety of sources. Unfor-

tunately, we were not able to access this c orpus.

All in all, we identify some pitfalls of the existing sy-

stems. Such as them not being accessible enough, too

complicated, too formal and too secretive when it co-

mes to their recommenda tion process. The biggest

pitfall is that the result of their recomm endation pro-

cess is most commonly a set of data visualization s,

which, in our opinion, leaves the users a bit further

than they started, but still nowhere, because they still

have to choose the most suitable visualization. The

possibilities have been narrowed, but a decision still

must be made. We hop e to avoid these pitfalls within

our model. We establish that we are going to test our

model against the solutions available to us. This me-

ans Tableau, Watson Analytics, Excel, Voyag e r and

Google Sheets. Please note that we are going to com-

pare aga inst the recommendation system features of

the tools, not the tools as a w hole.

4.2 Exploratory Survey

We run a survey among different data science commu-

nities on Facebook a nd LinkedIn. This way, we get

respond ents who have som e sor t of familiarity with

data science and its terminology. The main g oals of

the survey are to aid us in making decisions about our

DATA 2018 - 7th International Conference on Data Science, Technology and Applications

212

model and specifying the term non-expert user.

4.2.1 Participants

In total, we gathered 88 valid responses (n=88). Out

of the 88 re spondents, 78% (n=69) we re male and

22% (n=19) female. The average age was 29.86 ye-

ars.

We had asked the respondents to indicate their

knowledge level on a scale of 1 to 10, 1 being be-

ginner and 10 being expert. The average knowledge

level was 5.70. We opted to divide the scale into three

ranges in the following way: 1-3 are beginners, 4-7

are non-experts and 8-10 are experts. According to

our ranges w e had 26% (n=2 3) beginner level, 44%

non-expert (n=39) level and 30% (n=26) expert level

respond ents.

4.2.2 Results

We make the following ﬁndings from the results of

our survey:

• For all groups, the main purpose of making data

visualizations was for analysis (65% of beginners,

64% of non-experts, 58% of experts).

• All types of users choose data visualizations

mainly according to: the characteristics of their

data (57% of beginners, 62% of non -experts, 65%

of exp erts) and the tasks that they want to perform

(48% of beginners, 51% of non-experts, 6 2% of

experts).

• For all gro ups, the two most used visualizations

are bar charts (1 7% of beginn e rs, 38% of non-

experts, 3 5% of experts) and scatter plots ( 43% of

beginners, 26% of non-experts, 31% of experts).

• All groups were mostly unable to name an ex-

isting data visualization recommen dation system

(0% able vs. 10 0% unable for beginners, 5% able

vs. 95% unable for non-exper ts and 4% able vs.

96% unable for experts).

• All groups would be willing to use a data visuali-

zation recommenda tion system, although experts

were less willing than b eginners and non-experts

(100% willing vs. 0% not willing for beginners,

87% willing vs. 13% not willing for non-experts

and 77% willing vs. 23% not willing for experts).

To summarize, we have lea rned that no n-experts make

data visualizations mainly for the purpose of analy-

sis. When they select a suitable data visualization

type, they do so according to the c haracteristics of

their data and the tasks they want to perform. Their

most used visualization types are bar charts an d scat-

ter plots. They are not familiar with data visualization

recommender systems but are mostly willing to use

one. We also learned that there is not much difference

between the a pproaches of beginners, non- exp ert and

expert users, which was unexpected.

4.3 Model Requirements

Based on researc h of previous ap proaches to ou r pro-

blem and the results of our survey, we have identiﬁed

the following require ments which NEViM should ful-

ﬁll:

1. Simplicity - The model should be simple enough

to be used by non-experts. It must have good ﬂow

and a very straightforward base structure.

2. Clarity - We aim for the result of our recommen-

dation system to be one data visualization. Not a

set, like in some current tools. This means that the

underlying classiﬁcation hierarchy of data visua-

lizations must be clear and unambiguous.

3. Versatility - We want our model to combine dif-

ferent kinds of recomm endation systems. From

our survey we learn that when users select a suit-

able data visua lization type, they do so based on

the characteristics of their data and the tasks th ey

want to perform. Based on this we incorporate a

data characteristics-oriented and task -oriented ap-

proach . Furthermore, we want our model to be

easily implemented in different programming lan -

guages and environments.

4. Extensibility - Our aim is for our mode l to be ea-

sily extendable. We want the process of adding

visualizations into the model to be simple. We

want it to be a useful skeleton which can be easily

extended to include automatic visua liza tions etc.

5. Education - We want o ur model to not on ly

function as a recommen der system, but also as a

learning tool.

6. Transparency - Once we recommend a visualiza-

tion, we want the users to see, why the particular

visualization was recommended, meaning that the

path to a visualization recommendatio n throug h

our model has to b e retraceable.

7. Self-learning - We want our model to be able to

improve itself. This means, amongst other things,

that it should be machine learning fr ie ndly.

8. Competitiveness - We want our model to still

produce results which are compa rable to results

from other systems.

Knowledge at First Glance: A Model for a Data Visualization Recommender System Suited for Non-expert Users

213

4.4 Constructing NEViM

4.4.1 What Base Structure to Use for NEViM?

Since the aim of our model is to help a user decide

which data visualiza tion to use, the obvious cho ic e

seemed to be the structure of decision trees. A d e -

cision tree has fou r ma in parts: a root node, inte rnal

nodes, leaf nodes and branches. T he biggest advan-

tages of decision trees are that they can help uncover

unknown altern ative solutions to a problem and that

they are well suited for machine learning methods.

Once we dete rmined th at the decision tree was a

possible base structure, we nee ded to specify what

our root node, interna l nodes, leaf nodes and bran-

ches would be. I t was clear that the leaf nodes would

be the different types of data visualizations since that

was the outcome that we wanted to achieve. The

root n ode, internal nodes and branches are inspired

by Akinato r, the Web Ge nie. Akinator is a game that

attempts to determine which character the player is

thinking of by asking a series of questions. The struc-

ture hidden under the user interface is a decision tree,

as in the case of NEViM.

Our model’s root and internal n odes are questions

which possess the ability to clearly distinguish diffe-

rent types of data visualizations. The branches ar e

’yes’ or ’no’ answers to those questions.

4.4.2 What Questions to Ask? (Establishing the

Internal Nodes and Root Node)

The biggest challenge in constructing questions for

our model was that they must be understandable for

non-experts, yet every que stion should get the user

closer to a data visualization recommendation. This

means tha t the subjects of th e questions must be fea-

tures that distinguish the different data visualizations

from each other. The key to solvin g this problem is to

base the questions on a c le ar classiﬁcation hierarchy.

As far as we know, there is no one speciﬁc classiﬁ-

cation hierarchy of da ta visualizations which would

be used globally. We researched different methods of

classiﬁcation and combined them tog e ther to derive

a classiﬁcation of our own. T his was a very time-

consumin g process. We went throu gh a total of 19

books (O’Neil and Schu tt, 2014; Kirk, 2016; Illin-

sky and Steele, 2011 ; Munzner and Maguire, 201 5;

Gnanamgari, 1981; Evergreen, 2 016; Yau, 2011; Yau,

2013; Heer et al., 2010; Hardin et al. , 2012; Yuk and

Diamond, 201 4; Brath and Jonker, 2015; Brner and

Polley, 2014; Telea, 2007; Brner, 2015; Ware, 2010 ;

Ware, 2 012; Stacey et al., 2015; Hinderman, 2015)

and f or each one, we construc te d a diagram showing

the cla ssiﬁcation that was described in the text.

We examined the classiﬁcation hierarchies from

books together with hierarchies available from web

resources and existing tools. We also ma de n ote of

any advantages or disadvantages of a speciﬁc data vi-

sualization, if they were listed. For example in several

sources (O’Ne il and Schutt, 2014; Kirk, 2016; Illin-

sky and Steele, 2011) the authors stated that the pie

chart is not suitable for when you have mo re than 7

parts. The a dvantages and disadvantages reﬂected fe-

atures of the d ata v isu alizations that could determine

whether they are candid a te s for recommendation or

not, so they are crucial for the ﬁnal mode l.

We identiﬁed that there are two b a sic views that

the classiﬁcations incorporate. The ﬁr st one is a view

from the perspective of the task the user wants to per-

form. The second is a view from the perspective of the

characteristics of the data the user has available. This

is in line with data characteristics and task oriented

recommendation systems (Ka ur and Owonibi, 2017).

We have identiﬁed a prominent issue in the c la s-

siﬁcation hierarchies: they mix different views into

one without making a clear distinction betwee n them.

To avoid this issue, we have selected the root node o f

our model to be a question which would distinguish

between two views. The ﬁrst view is from a task-

based perspective and it uses the representationa l go a l

or user’s intentions behind visualizing the data to re-

commend a suitable visualization. The second view is

from a data- driven perspective, where a visualization

recommendation is made based on gathe ring informa-

tion abo ut the user’s data. The root node of NEViM

is a question asking ”Do you know what your main

task is?” If the user answers ”Yes”, he is taken in to

the task-based branch. If he answer s ”No”, he is taken

straight into the data characteristics-based branch.

Once we established the root node, we had to

come up with internal nodes. The inter nal nodes ar e

questions which possess the ab ility to clearly distin-

guish different types of data visu alizations. The sub-

ject of such a question must be something that we de-

ﬁne as a distinguishing feature. Based on the ﬁndings

we ma de in previous paragraphs, we have established

a list of distinguishing features and their hierarchy.

Based on the disting uishing features, w e have co n-

structed questions that ask whether that feature is pre-

sent or not. You can see an example of such questions

in Figure 3.

4.4.3 What Data Visualizations to Include?

(Establishing the Leaf Nodes)

Once we had ﬁgured out our model’s ba se structure,

distinguishing features and questions, the challenge

was, which data visualizations to include. We knew

that we would not be able to cover all the 300 types

DATA 2018 - 7th International Conference on Data Science, Technology and Applications

214

of data visualizations available (Bostock, 2017) in the

initial version of our model. We took a r ather quanti-

tative approach to the problem. We went through all

the different classiﬁcation hierarchies we constructed

previously and extracted a list of the data visualizati-

ons that occur. We rem oved duplicates (different na-

mes for the same visualization, different layouts of the

same visualization) and we counted how many times

each data visualization occurred . The ones that occur-

red 5 times or more were included in our ﬁnal model.

The ﬁnal list contains 29 data visualizations and yo u

can see it below. Since one of our requirements f or the

ﬁnal model is easy extensibility, we feel that 29 data

visualizations are appropriate for the initial model.

Table 1: Data visualizations included in NEViM.

Bar Chart Bubble Chart Cartogram

Choropleth Map Clustered Bar Connected Dot

Connection Map Density Plot Dot Map

Flow Map Heat Map Histogram

Line Chart Network Pie Chart

Proportional Map Radar Plot Scatter Plot

SPLOM Slope Graph Small Multiples

Stacked Area Stacked Bar Stacked Line

Table Timeline Treemap

Parallel Coordinates

4.4.4 Putting It All Together

We classiﬁed each of our leaf nodes (data visualizati-

ons) using the distingu ish ing features we constructed

previously. This revealed the path of internal nodes

and branches that leads to a certain leaf node. In other

words, it revealed which questions have to be answe-

red and how in order to get to a certain data visualiza-

tion.

We then combined all the classiﬁcations together

to construct the ﬁnal model

. The model has 107 in-

ternal nodes and 105 leaf nodes. The model always

results in a recommendation. If no other suitable vi-

sualization is found, we recommend to use a table by

default. Tableau does this as well.

4.5 Testing the Model

4.5.1 Can the Model Compete with Existing

Solutions?

We carried out tests to determine whether our model

was able to compete with existing systems in terms of

similarity of solutions. We obtained 10 different test

data sets with various features (See Table 2). The data

The whole model as well as a prototype can be

viewed at a website dedicated to this research project:

http://www.datavisguide.com

sets were preprocessed to remove invalid entrie s and

to ensure that all the attributes w ere of the co rrect data

type.

For each data set, we formulated an example que-

stion that a potential user is a iming to answer. This

was done in order to determine which a ttributes of

the data would be used in the recommendation p ro-

cedure. M ost existing tools require the user to select

the speciﬁc attributes that they want to use for their

data visualization. By specifying these for each data

set we a ttempt to mimic this behavior. Table 2 shows

the data sets along with their descriptions.

We tested our m odel again st existing solutions

which are freely available: Tablea u (10.1.1), Watson

Analytics (version available in July 2017), Microsoft

Excel (15.28 Mac), Voyager (2) and Google Sheets

(version available in July 2017). For each system and

every data set, we a imed to achieve a recommenda-

tion for a data visualization that would answer the

question and incorporate all the speciﬁed a ttributes in

one graph as there is no possible way to answer the

question without incorpora ting th e speciﬁed attribu-

tes. Some systems solve more complex questions by

creating a series of different data visualizations, with

each visualization incorporating a different combina-

tion of attributes. We exclude d such solutions from

our test results because we feel that it is a workaround.

For Microsoft Excel and Google Sheets, th e recom-

mendation process results in several recommendati-

ons and the systems do no t rank them. For these cases

we recorded all valid recommendations.

Results

For data set 1 , all systems recommended a bar ch a rt.

Excel and Google Sheets also recommended a pie

chart. The r e commendations for data set 2 were either

line charts or bar charts. The speciﬁed question could

be answered by either of these. Watson Analytics was

not able to give a recommendation because it could

not recognize that the average price attribute was a

number. We have attempte d resolving this issue but

were not able to. For data set 3, the majority recom-

mendation was a clu ster ed bar chart, in line with the

recommendation mad e by NEViM. Data set 4 pro-

ved to be challenging for Voyager and Watson Ana-

lytics. Since the data was h ie rarchical and the ques-

tion was asking to see parts-of-whole, a suitable solu-

tion would be a tree m ap. A pie ch a rt shows parts-of-

whole, but does not indicate hierarchy. The question

asked for data set 5 could be answered using diffe-

rent types of da ta visualizations. Since it is asking to

analyze the correlation between 2 variables, a scatter

plot is a suitable solution. All systems recommended

it. Data set number 6 was an example of a social net-

Knowledge at First Glance: A Model for a Data Visualization Recommender System Suited for Non-expert Users

215

Table 2: Results of the competitiveness test.

Data

set

Description Records

Favourite subjects

within a class of

students

Average prices of

cigarettes over several

years

Percentage of men and

women in EU countries

for 2016

Causes of death in

Kenya in 2012

Daily ice cream sales

information with

temperature

Question Used attributes Excel

Google

Sheets

Tableau

What does the

composition of the data

look like?

subject, no. of

students

bar chart,

pie chart

bar chart,

pie chart

bar chart

What was the

development of the

cigarette price over the

years?

year, average price

line chart,

bar chart

line chart line chart

Which 5 countries have

the highest percentage

of females?

country, % of men,

% of women

clustered bar chart,

scatter plot,

stacked bar chart

clustered

bar chart

proportional

symbol map

How big of a part does

each cause take?

cause of death, no. of

deaths, % of total

none pie chart tree map

Are ice cream sales

related to the weather?

income, temperature

scatter plot,

clustered

bar chart, line

chart,

stacked bar chart

line chart,

scatter

plot,

clustered

bar chart

scatter plot

Voyager

bar chart

scatter

plot

none

scatter

plot

Watson

Analytics

bar chart

none

clustered

bar chart

none

scatter plot

NEViM

bar chart

line chart

clustered

bar chart

tree map

scatter plot

Email communication

between researchers

working together

461

Finishing times of

runners in the 2014

Boston Marathon

32K

Records of UFO

sightings with detailed

information

80K

List of cars and their

parameters

393

Origins and

destinations of flights

within the US

Which researcher is

connected to most

people?

sender, receiver none none none

Which finishing time

interval was the most

common?

finishing time

scatter plot,

line chart

line chart,

histogram

Are there any clusters

of locations where

UFOs have been seen

more often?

latitude, longitude none none none

Are there any

relationships between

the different

parameters?

miles per gallon, no.

of cylinders,

displacement,

horsepower, weight,

acceleration, year

stacked

line chart

none none

Which city has the

most ingoing and

outgoing flights?

flight origin, flight

destination

none none

proportional

symbol map

scatter

plot

none

histogram

none

network

histogram

dot map

parallel

coordinates

connection

map

work, thus the most suitable visualization would be a

network. However, the answer to the speciﬁed ques-

tion could also be answered with a scatter plot as sug-

gested by Voyager. This is because networks can also

be represented as ad ja c ency matrices and the scatter

plot generated by Voyager is essentially an adjacency

matrix. Data set 7 and its question we re aimed at

visualizing distributions. Distributions c a n be visu-

alized, among others with histograms, scatter plots

and line charts. Data set 8 was an exam ple of spa-

tial data. Spatial data is best visualized th rough maps.

Tableau offers map visualizations but we suspec t that

it cannot plot on the map according to latitude and

longitude coordinates. Watson Analytics and Google

Sheets have the same issue. Microsoft Excel and Voy-

ager do not support map s at all. In Data set 9 the ans-

wer to the question was revealed through compar ing

7 attributes. This meant that the visualization has to

support 7 different variables. Both stacked line chart

and parallel coordinates are valid solutions. The ﬁ-

nal data set 10 was again spatial. This time it could

be solved through plotting on a map but also by analy-

zing the distribution of the data set. Both p roportional

symbol map and connectio n map (as a ﬂight implies

a connection between two cities) are valid so lutions.

Overall, we can observe that NEViM provided

usable solutions in all cases. The users have several

paths that they ca n take through NEViM to get to a re -

commendation, depending on what information they

know about the ir data o r their task. NEViM has an

advantage that it is not limited by implementation.

Since two of our data sets were aimed at spatial data

visualization (9 and 10) and one at network data vi-

sualization (6), some systems were not able to make

recommendations simply because they do not support

such visualization types. Furthermore, NEViM inclu-

des more types of visualizations than any of the cur-

rent systems, which results in recommendations for

specialty visualizations that can be more suitable for

a certain task. Another advantage is that it always re-

sults in o nly one recommenda tion, unlike Microsoft

Excel or Google Sh eets, where the user has to choose

which one out of the set of recommend ations to use.

According to our survey, the most used visualiza tion

tool which incorporates a recommender system is Ta-

bleau (28% of non-exp ert respondents). From the re-

sult table, we can see that in 5 out of 7 valid cases,

NEViM made the same recommendation as Tableau.

DATA 2018 - 7th International Conference on Data Science, Technology and Applications

216

Furthermore, in data set 3 Tableau also made a recom-

mendation for a Clustered Bar Ch a rt, like NEViM did,

but it was not the resulting recom mendation. One of

the attributes was the name of a country, so Tableau

evaluated the data as spatial. We have noticed that

whenever there is a geographical attribute, Tableau

prefers to recommend maps, even though they might

not be the most suitable solu tion.

4.5.2 Adding a New Data Visualization

We demonstrate that our model is easily extensible

by showing the pro cess o f adding a new data visua-

lization type - a Sankey diagr a m. Sankey diagrams

are speciﬁc types of ﬂow diagrams and they display

quantities in proportion to one another. An exam ple

of a Sankey diagram can be seen in Figu re 2 .

Figure 2: Example of a Sankey diagram showing the distri-

bution of energy in a ﬁlament lamp (BBC, 2016).

We look into the classiﬁcations that we already

have and search for the most similar one. We ﬁnd out

that the Tree Map has the same classiﬁcation. So we

need to ﬁnd a distinguishing feature between a Tree

Map and a Sankey diagram . That feature is, that a

Sankey diagr am shows ﬂow. We search through the

model and ﬁnd occurrences of a Tree Map. We then

add a qu e stio n asking ”Do you want to show ﬂow?”.

If the user answers ”Ye s”, he g e ts a recommendation

for a Sankey diagram. If he answers ”No” he gets a

Tree Map. Figure 3 shows the two paths that a user of

NEViM can take to get to the Sankey diagram.

5 DISCUSSION & CONCLUSIONS

We mana ged to build a model for a data visualiza-

tion recommender system suited to non-experts cal-

led NEViM. Through testing, we have ma naged to

show that the resulting recommendations are similar

or identical to the ones ge nerated by existing soluti-

ons. Based on a review of existing work and a ex-

plorator y survey among users, we have put togeth er

requirements. This is a sho rt evaluation of how NE-

ViM managed to fulﬁll these:

1. Simplicity - Thanks to its question -based struc-

ture, usin g the model is simple. T he user only has

to answer yes or no questions. The basic structur e

is very straightforward.

2. Clarity - The result of our recommendation sy-

stem is a single data visualization, making it very

clear. We believe that n on-expert users need a

clear answer to their visu alization pro blem. If

they are given a choice between two or more vi-

sualizations in the en d, we believe that we have

failed at the task of recommen ding them the most

suitable one. We have narrowed their choices, but

still have not provided a clear answer. However,

this decision seems to be a c ontroversial one, so

it deﬁnitely needs to be validated through a user

study (See Section 6. ) In the case that none of

the data visualizations within the model are deter-

mined as suitable, the model still makes a recom-

mendation to visualize using a table.

3. Versatility - NEViM combines two d ifferent ty-

pes of data visualization rec ommend ation systems

as deﬁned in (Kaur a nd Owonibi, 2017): task-

oriented and da ta character istics-oriented. These

two types are distinguished by two different star-

ting points within our mod el. Thanks to its base

structure the model can be easily implemen te d in

various different programming languag es a nd en-

vironm ents.

4. Extensibility - To illustrate the extensibility of the

model, we have added the Sankey diagram visua-

lization. Th is proved to be a doable task.

5. Education - This requirement has no t been met

yet. For suggestions on how we mean to fulﬁll it,

see Section 6.

6. Transparency - The traversal through our model

is logical enough that it is clear why a certain type

of data visualization was recommende d.

7. Self-learning - Our model is machine learn ing

friendly and techniques can be applied for it to be

able to self-learn. See our Section 6.

8. Competitiveness - Through testing we have pro-

ved that our model produces recommendations si-

milar or identical to existing solutions. It provided

suitable solutions for all cases tested, unlike exis-

ting solutions.

A possible disadvantage of NEViM could be tha t the

user has to either know what their main task is, or

know what type of data they have. The question is,

whether non -expert users will be able to determine

this. We believe that this could be ﬁxed through user

testing to validate the overall structure of the mode l

Knowledge at First Glance: A Model for a Data Visualization Recommender System Suited for Non-expert Users

217

Is your main

task to compare?

Do you know what

your main task is?

Is your main task

to analyse a speciﬁc

data feature?

Do you want to

show flow?

Do you want

to compare

proportions?

Do you want to

show hierarchy?

Sankey

Diagram

Do you want to

compare proportions

over time?

Yes No Yes

Yes

Do you know what

your main task is?

Is your data

statistical?

Do you want

to compare?

Do you want

to compare

quantities?

Do you want

to compare

proportions?

Do you want to

compare proportions

over time?

Do you want to

show flow?

Do you want to

show hierarchy?

Sankey

Diagram

Yes YesNoYesNoYesYesNo

Figure 3: Two possible paths to reach a Sankey diagram (left: t ask-based, right: data-based).

as well as the quality of the q uestions. The questi-

ons could be checked by a linguistics expert to see

whether the wording is suitable and does not lead to

possible ambiguous interpretations.

Another disadvantage might lie in the fact that

since we use data science termin ology in our ques-

tions, we risk that non-experts might not be familiar

with it and might not be a ble to answer the question.

A solution could be to clarify the terms using a dicti-

onary deﬁn ition, which could pop up when the user

hovers over the unfamiliar term. The solution is more

part of the implementation phase, not the theoretical

phase which we discuss he re.

A difﬁculty in the usability of our model might be

that the traversal thro ugh it is quite lengthy. This is

due to the chosen question-b a sed approach. A poten-

tial ﬁx for this could be to present some parts of the

model in the form of a multiple choice question. This

way, the user could see beforehand what other options

are available and might ﬁnd a more suitable task they

want to pe rform. This is once again a problem that

could be ﬁxed easier in the implementation pha se.

We have questioned whether the choice to rec om-

mend a table when no other suitable visualization is

found is the correct one. There is an ongoing de-

bate about when it is best to not visualize things, as

discussed by Stephanie Evergreen (Evergreen, 2016).

Within the implementation phase, d ata could be col-

lected to ﬁnd out in how many cases the Table option

is reached, to identify whether it is necessary to furt-

her address this issue.

6 FUTURE WORK

We have proved that ther e is deﬁnitely a place for our

model in the data science world. The logical next step

would be to perform more tests with more data sets

and make improvements to the mo del. Then th e mo-

del could be tested with non-expert users. Such a user

study could evaluate the usability of the model as well

as its contribution.

The mode l could be implemented as a web appli-

cation and users could rate the resu lting recommenda-

tions, suggest new paths through the model or request

new visualization types to be included. This would

also validate the question paths that we have desig-

ned. The ﬁnal re c ommendation could be enhanced

with useful information ab out the data visualization

type, tips on how to construct it, which tools to use

and examples of already made instances. This would

transform the model into a very useful educative tool

and fu lﬁll the Education requirement that we have set.

Another possible extension to the model could be

to add another view which would incorporate infor-

mation about the domain that the user’s data comes

from. There are data visu alizations that are more sui-

ted for a speciﬁc data domain than others. For exam-

ple, the area of economics has special types of data vi-

sualizations that are more suited to exposing different

econom ic indicators. This would make the m odel part

of the domain knowledge oriented data v isu alization

systems recommender systems category according to

(Kaur and Owonibi, 2017).

Thanks to its structure, NEViM is machine lear-

ning friendly. For example, n eural networks could

be used to make the model self-lea rning and self-

improving.

We could introduce different features that could

inﬂuence the visualization ranking - e.g. perceptual

qualities of different data visualization types. Now

that we have established a successful base, the possi-

bilities for further development are endless.

ACKNOWLEDGEMENTS

Research suppor te d by SGS grant No. SGS1 7/210/

OHK3/3T/18 and GACR grant No. GA18-18080S.

REFERENCES

BBC (2016). Heat transfer and efﬁciency.

Bertin, J. (1983). Semiology of graphics: diagrams, net-

works, maps.

Bostock, M. (2017). Data-driven documents.

Brath, R. and Jonker, D. (2015). Graph analysis and visu-

alization: discovering business opportunity in linked

data. John Wiley & Sons, Hoboken, NJ.

Brner, K. (2015). Atlas of knowledge: Anyone can map.

MIT Press, Cambridge, MA.

DATA 2018 - 7th International Conference on Data Science, Technology and Applications

218

Brner, K. and Polley, D. E. (2014). Visual insights: A practi-

cal guide to making sense of data. MIT Press, Cam-

bridge, MA.

Evergreen, S. D. (2016). Effective data visualization: The

right chart for your data. SAGE Publications, T hou-

sand Oaks, CA.

Gnanamgari, S. (1981). Information presentation through

default displays. PhD thesis, Univ. of Pennsylvania,

Philadelphia, PA.

Google (2017). Chart and graph types.

Gotz, D. and Wen, Z. (2009). Behavior-driven visualization

recommendation. In Proceedings of the 14th interna-

tional conference on Intelligent user interfaces, New

York, NY.

Hanrahan, P. (2006). Vizql: a language for query, analysis

and visualization. In Proceedings of the 2006 ACM

SIGMOD international conference on Management of

data, New York, NY.

Hardin, M. et al. (2012). Which chart or graph is right for

you?. tell impactful stories with data. Tableau Soft-

ware.

Heer, J. et al. ( 2010). A tour through the visualization zoo.

Queue, 8.5.

Hinderman, B. (2015). Building responsive data visualiza-

tion for the web. John Wiley & Sons, Hoboken, NJ.

IBM (2017). Smart data analysis and visualization.

Illinsky, N. and Steele, J. (2011). Designing data visu-

alizations: representing informational relationships.

O’Reilly Media, Sebastopol, CA.

Kaur, P. and Owonibi, M. (2017). A review on visualization

recommendation strategies. In Proceedings of the 12th

International Joint Conference on Computer Vision,

Imaging and Computer Graphics T heory and Appli-

cations, pages 266–273, Porto, Portugal.

Kirk, A. (2016). D ata visualization: A handbook for data

driven design. SAGE, London,UK.

Mackinglay, J. (1986). Automating the design of graphical

presentations of relational information. ACM Tran-

sactions on Graphics, 5.2:110–141.

Mackinlay, J. et al. (2007). Show me: Automatic presenta-

tion for visual analysis. IEEE Transactions on Visua-

lization and Computer Graphics, 13.6.

Microsoft (2017). Available chart types in ofﬁce.

Munzner, T. and Maguire, E. (2015). Visualization analysis

and design. CRC Press, Boca Raton, FL.

O’Neil, C. and Schutt, R. (2014). Doing Data Science:

Straight Talk From The Frontline. OReilly Media, Se-

bastopol,CA.

Roth, S. F. and Mattis, J. (1990). Data characterization for

intelligent graphics presentation. SIGCHI Conference

on Human Factors in Computing Systems.

Satyanarayan, A . et al. (2017). Vega-lite: A grammar of

interactive graphics. IEEE Transactions on Visualiza-

tion and Computer Graphics, 23.1:341–350.

Stacey, M. et al. (2015). Visual intelligence: Microsoft tools

and techniques for visualizing data. John Wiley &

Sons, Hoboken, NJ.

Stolte, C. et al. (2002). Polaris: A system for query, ana-

lysis, and visualization of multidimensional relational

databases. IEEE Transactions on Visualization and

Computer Graphics, 8.1:52–65.

Telea, A. C. (2007). Data visualization: principles and

practice. CRC Pr ess, Boca Raton, FL.

Tukey, J. W. ( 1970). Exploratory Data Analysis. Addison-

Wesley, Reading,MA.

Vartak, M. et al. (2015). Seedb: supporting visual analytics

with data-driven recommendations. VLDB.

Viegas, F. et al. (2007). Manyeyes: a site for visualization

at internet scale. IEEE Transactions on Visualization

and Computer Graphics, 13.6.

Ware, C. (2010). Visual thinking: For design. Morgan Kauf-

mann, Burlington, MA.

Ware, C. (2012). Information visualization: perception for

design. Elsevier, Amsterdam, NL.

Wehrend, S. and Lewis, C. (1990). A problem-oriented

classiﬁcation of visualization techniques. In Procee-

dings of the 1st Conference on Visualization’90, Los

Alamitos, CA.

Wongsuphasawat, K. (2017). Vega compass.

Wongsuphasawat, K. et al. (2016). Voyager: Exploratory

analysis via faceted browsing of visualization recom-

mendations. IEEE Transactions on Visualization and

Computer Graphics, 22.1:649–658.

Yau, N. (2011). Visualize This: The FlowingData Guide to

Design, Visualization, and Statistics. John Wiley and

Sons, Hoboken, NJ.

Yau, N. (2013). Data points: Visualization that means so-

mething. John Wiley & Sons, Hoboken, NJ.

Yuk, M. and Diamond, S. (2014). Data visualization for

dummies. John Wiley & Sons, Hoboken, NJ.

Zhou, M. X. et al. (2002). Building a visual database for

example-based graphics generation. INFOVIS 2002

IEEE Symposium.

Zhou, M. X. and Feiner, S. K. (1998). Visual task charac-

terization for automated visual discourse synthesis. In

Proceedings of the SIGCHI conference on Human fac-

tors i n computing systems, Boston, MA.

Knowledge at First Glance: A Model for a Data Visualization Recommender System Suited for Non-expert Users

219