5 FUTURE WORK
After spanning the design-space of multidimensional
comparative data analytics, we identify a potential re-
search gap, and develop an interactive visualization
prototype, the DataShiftExplorer
1
.
However, our work has limitations. The main one
is that user studies are missing. We plan to vali-
date our prototype in controlled environments as fu-
ture work. Regarding the prototype, one limitation is
that we do not let the user change the number of data
bins interactively. Binning is a necessary preprocess-
ing step in our pipeline. However, a different number
of bins may affect the visual outcome significantly.
A necessary extension is letting the user interactively
update this number.
Lastly, in the user studies, it will also be important
to explore real datasets with our tool. So far, in the
examples we provide in this paper, we generate the
data using a synthetic classification data generator
2
,
to have complete control over the generation process.
REFERENCES
Benjamini, Y. (1988). Opening the box of a boxplot. The
American Statistician, 42(4):257–262.
Bickel, S., Br
¨
uckner, M., and Scheffer, T. (2009). Discrimi-
native learning under covariate shift with a single opti-
mization problem. In Dataset Shift in Machine Learn-
ing, pages 161–177. MIT.
Chambers, J. M. (2017). Graphical Methods for Data Anal-
ysis. Chapman and Hall/CRC.
Cherdarchuk, J. (2016). Visualizing distribu-
tions. http://www.darkhorseanalytics.com/blog/
visualizing-distributions-3.
Cieslak, D. A. and Chawla, N. V. (2009). A framework for
monitoring classifiers’ performance: when and why
failure occurs? Knowledge and Information Systems,
18(1):83–108.
Correll, M. and Gleicher, M. (2014). Error bars considered
harmful: Exploring alternate encodings for mean and
error. IEEE transactions on visualization and com-
puter graphics, 20(12):2142–2151.
Correll, M., Li, M., Kindlmann, G., and Scheidegger, C.
(2018). Looks good to me: Visualizations as sanity
checks. IEEE transactions on visualization and com-
puter graphics, 25(1):830–839.
Gleicher, M. (2018). Considerations for visualizing com-
parison. IEEE transactions on visualization and com-
puter graphics, 24(1):413–423.
Herrera, F. (2011). Dataset shift in classification: Ap-
proaches and problems. http://iwann.ugr.es/2011/pdf/
InvitedTalk-FHerrera-IWANN11.pdf.
1
http://datashiftexplorer.dbvis.de
2
http://scikit-learn.org/stable/modules/generated/
sklearn.datasets.make classification.html
Hilfiger, J. J. (2015). Graphing Data with R: An Introduc-
tion. O’Reilly Media, Inc.
Hintze, J. L. and Nelson, R. D. (1998). Violin plots: a box
plot-density trace synergism. The American Statisti-
cian, 52(2):181–184.
Hubert, M. and Vandervieren, E. (2008). An adjusted box-
plot for skewed distributions. Computational statis-
tics & data analysis, 52(12):5186–5201.
Kampstra, P. et al. (2008). Beanplot: A boxplot alternative
for visual comparison of distributions. Journal of Sta-
tistical Software.
Kosara, R., Bendix, F., and Hauser, H. (2006). Parallel sets:
Interactive exploration and visual analysis of categori-
cal data. IEEE transactions on visualization and com-
puter graphics, 12(4):558–568.
Kull, M. and Flach, P. (2014). Patterns of dataset shift. In
First International Workshop on Learning over Multi-
ple Contexts (LMCE) at ECML-PKDD.
Lex, A., Streit, M., Partl, C., Kashofer, K., and Schmalstieg,
D. (2010). Comparative analysis of multidimensional,
quantitative data. IEEE Transactions on Visualization
and Computer Graphics, 16(6):1027–1035.
Matejka, J. and Fitzmaurice, G. (2017). Same stats, differ-
ent graphs: Generating datasets with varied appear-
ance and identical statistics through simulated anneal-
ing. In Proceedings of the 2017 CHI Conference on
Human Factors in Computing Systems, pages 1290–
1294. ACM.
McGill, R., Tukey, J. W., and Larsen, W. A. (1978).
Variations of box plots. The American Statistician,
32(1):12–16.
Moreno-Torres, J. G., Raeder, T., Alaiz-Rodr
´
ıGuez, R.,
Chawla, N. V., and Herrera, F. (2012). A unifying
view on dataset shift in classification. Pattern Recog-
nition, 45(1):521–530.
Poosala, V., Haas, P. J., Ioannidis, Y. E., and Shekita, E. J.
(1996). Improved histograms for selectivity estima-
tion of range predicates. SIGMOD Rec., 25(2):294–
305.
Rodrigues, N. and Weiskopf, D. (2017). Nonlinear dot
plots. IEEE transactions on visualization and com-
puter graphics, 24(1):616–625.
Sasieni, P. D. and Royston, P. (1996). Dotplots. Journal of
the Royal Statistical Society: Series C (Applied Statis-
tics), 45(2):219–234.
Silverman, B. W. (2018). Density estimation for statistics
and data analysis. Routledge.
Storkey, A. J. (2009). When training and test sets are dif-
ferent: characterizing learning transfer. In In Dataset
Shift in Machine Learning, pages 3–28. MIT Press.
Tufte, E. R. (2001). The visual display of quantitative in-
formation, volume 2. Graphics press Cheshire, CT.
Tukey, J. (1977). Exploratory data analysis. Addison-
Wesley Publishing Company.
Waskom, M. (2018). Seaborn: statistical data visualization.
http://seaborn.pydata.org.
Wickham, H. and Stryjewski, L. (2011). 40 years of box-
plots. Am. Statistician.
IVAPP 2020 - 11th International Conference on Information Visualization Theory and Applications
148