This paper is outlined as follows. First, we dis-
cuss the traditional and the new top-down approach
to analyse survey data from SBS. In section 3 we ap-
ply treemaps to compare the data with the data from
previous period. For this purpose, we use a diverg-
ing color scale to indicate increase or decrease. In
section 4 we apply treemaps to analyze the relation-
ship between two variables. For this purpose we use
a sequential color scale to indicate densities. In sec-
tion 5, we propose a method to visualize confidence
intervals in treemaps. Finally, in section 6, we provide
concluding remarks.
2 TOP-DOWN APPROACH
The Structural Business Statistics annually receives
data from approximately 50,000 respondents. This
survey contains all kinds of data from economic en-
terprises. Topics that appear on the questionnaires
are turnover, number of persons employed, total pur-
chases, financial result, et cetera. The goal of the SBS
is to make proper estimations of the total economy in
The Netherlands. Concretely, this means that estima-
tions are made of the main variables on national level.
Before estimations of the economy in The Nether-
lands can be made, the survey data has to be analyzed
and edited. Usually, there are many data errors and
inconsistencies: for instance, when the wages and
salaries are not in line with the number of persons
employed. Other errors that frequently occur are the
so-called thousand errors (respondents fill in the real
value instead of the asked value in thousands), classi-
fication errors and inconsistencies with other sources.
The last mentioned type of error usually boils down
to the comparison of turnover from the survey and
turnover from the value added tax (VAT) register.
Traditionally, data analysts correct the data of the
enterprises one by one using tables and spreadsheets.
For this purpose, they use available data of the pre-
vious year, and data from monthly or quarterly based
statistics. Although this method of data editing and
analysis probably results in good quality data, it is not
very efficient. This is mainly due to the time that data
analysts spend with correcting errors that do not in-
fluence the outcomes (i.e., estimations about the over-
all Dutch economy). For instance, small errors in the
data of small enterprises will certainly not influence
the outcomes.
A better, more efficient way is to use a top-down
approach (Aelen and Smit, 2009). Data analysts that
use this approach start with the analysis of aggregated
data. If an (influential) aggregation group has a suspi-
cious value, data analysts can zoom in on this group
to detect and correct possible errors in the underlying
data that caused the suspicious outcome. In this way,
only the most influential errors are corrected. Errors
that are not influential do not have influence on the
outcomes, and therefore they do not have to be cor-
rected.
This top-down approach is currently being imple-
mented at Statistics Netherlands in several statistic
production processes. For this purpose, a software
tool has been developed by (Hacking, 2009). Stan-
dard methods such as spreadsheets, scatter plots, and
bar charts have been implemented, but other visual-
ization methods can be included as well.
3 COMPARISON TREEMAPS
A treemap is a two-dimensional visualization of hi-
erarchical data. A two-dimensional object that rep-
resents a root variable, is divided among smaller ob-
jects that represent the children, which can be divided
among the grandchildren, et cetera. The objects are
usually rectangles, but they can have other shapes
as well (see (Vliegen et al., 2006) and (Balzer and
Deussen, 2005)). Treemaps have been developed in
the 1990’s with the application of visualizing space
usage on hard disks. For an introduction and historic
overview, we refer to (Shneiderman, 1992).
The rectangles in a treemap are characterized by
two aesthetics: size and color. The sizes are derived
from the proportions of the main variable. The colors
can be used in several ways. In this section, we use
the colors to show the difference of recent data with
the data of the previous period. We refer to treemaps
with this color usage as comparison treemaps.
The main purpose of comparison treemaps is
to detect disruptive or unexpected changes in time.
These changes can be real events, but often are in-
dicators for data errors. Both cases are of interest:
are changes taking place in one industry? Is it a big
or small effect compared to other industries? These
questions can be quickly assessed using comparison
treemaps.
Figure 1 shows the estimated value added (at fac-
tor cost) of all active enterprises in The Netherlands.
The sizes of the rectangles correspond to the total
value added of the different sectors. We use a diver-
gent color scale to indicate the growth (or shrinkage)
with respect to the previous year. White is used for
values that didn’t change, blue for increasing and red
for decreasing values.
Notice that the data visualized in Figure 1 is hi-
erarchically structured. More specifically, the data is
aggregated by the highest two hierarchical levels of
TOP-DOWN DATA ANALYSIS WITH TREEMAPS
237