of variables to be related. Additionally, selecting the
graphic types to be presented based on the structure
of the data may not be useful given that the structure
of data may be easily changed without losing the in-
formation.
With respect to the characteristics of the user, we
may consider: human perceptual capabilities, or what
Kamps (1999) calls “perceptual design”; the ques-
tions the user is seeking answers to, or what Casner
(1990) calls “task-based graphic design”; and finally
user preferences as deduced from a users graphic se-
lection history, which is known as a “recommender
system”. Graphic automations that rely on any of
these user characteristics require that the gamut of
possible graphic representations be previously limited
based on the characteristics of the data; otherwise the
system might evaluate and suggest graphics that can-
not be constructed from the data.
Hardware limitations have more of an impact on
aesthetics and usability than in the selection of the
graphic types to be presented, especially consider-
ing that the users of databases tend to work on desk-
top computers with a broadband internet connection;
therefore, these limitations can be overlooked a priori
when classifying statistical graphics.
With respect to the characteristics of the sought
after graphic, an automated system could require the
user to define the coordinate system, the visual vari-
ables to be used, the more or less generic graphic type,
or its possible decomposition into multiple panels.
Defining any of these, however, requires a certain de-
gree of data visualization knowledge; therefore, this
strategy is not suited for users who are not all that fa-
miliar with data visualization.
When thinking about an automated graphic selec-
tion system for users who are not that familiar with
data visualization, the strategy is to identify those
graphics that could possibly work based on the data
and thus restrict the selection to only those that can
perform a specific purpose with the greatest effective-
ness.
Of the various aspects of data characteristics, the
characterization of variables taken separately is a very
effective method because it allows us to breakdown
the problem of what types of graphics to present
into three parts; the number of variables to be repre-
sented; the dimensions to be considered to character-
ize those variables; and the mutually exclusionary lev-
els to be considered for each dimension. However, the
strategies used thus far by automated graphic systems
have not tried to classify statistical graphics based on
a multidimensional characterization of the variables
taken separately.
The aim of this paper is to identify the various di-
mensions of characteristics that can be described for
a variable as represented in a data column, and that
can then be used to limit the gamut of graphics to be
evaluated prior to presenting it to the user. Before do-
ing so, we will review the state of the art in automated
graphic selection and then present a list of dimensions
with mutually exclusive levels that make it possible to
limit the set of possible graphic types.
2 PREVIOUS WORKS
Bertin (1967, p.34) refers to the components of the
various variable measurement scales as levels of orga-
nization, and he distinguishes three such levels: qual-
itative for those concepts that can simply be differen-
tiated; ordered for those variables that have an inher-
ent sequence; and quantitative for those with a quan-
tifiable quality. Another characteristic Bertin uses to
limit the gamut of acceptable graphics is the length
of a variable, defined by Bertin (1967, p.33) as the
number of divisions that make it possible to iden-
tify them as short variables if their length is equal
to or less than four, long variables if their length is
greater than 15, and medium variables for those with
lengths between five and 15. This classification yields
two dimensions with three levels each. Ware (2004,
p.24) relates Bertins measurement scales with those
of Stevens (1946) and distinguishes 4 levels of vari-
able attributes: nominal, ordinal, interval and ratio.
Another classification is proposed by Bachi (1968,
p.10), who characterizes variables according to se-
quence type, such as linear, circular, geographical and
unordered qualitative sequences. Further, Bachi iden-
tified subcategories for linear sequences, differentiat-
ing quantitative, temporal and qualitative linear se-
quences. Bachi also identified subcategories for ge-
ographic sequences, distinguishing between distribu-
tion and movement. This classification yields one di-
mension with seven levels.
The BHARAT system (Gnanamgari, 1981), a pio-
neer in the automated presentation of graphics, uses
multiple dimensions to characterize variables, such
as: continuity, totality, cardinality (defined as the
number of unique values for a variable), units and
range. From these five dimensions, Gnanamgari iden-
tifies levels for only the first two, which are dichoto-
mous, and for the other three he establishes ad hoc
rules to evaluate the graphics to be presented to the
user.
Other systems, such as APT (Mackinlay, 1986),
BOZ (Casner, 1990) and Vista (Senay and Ignatius,
1994) also use Bertins levels of organization, but not
his variable length classification. Thus, their charac-
A Strategy for Automating the Presentation of Statistical Graphics for Users without Data Visualization Expertise - A Position Paper
295