uct prices) we obtain many different values. In con-
trast, merely 6% of the items in our product catalog
contain a unique value for price, and this is certainly
not untypical for e-commerce. Likewise, preference
queries in practice usually include pairs of indepen-
dent, correlated and anti-correlated attributes at the
same time, yet almost all experiments in the litera-
ture investigate only pure settings. We could also ob-
serve a strong influence of outliers in the data set on
the performance of skyline algorithms. Again, in con-
trast to synthetic data, commercial catalogs will al-
most always contain strong outliers. Interestingly, the
Scalagon algorithm includes a heuristic for detecting
outliers in the pre-filter phase (Endres et al., 2015),
but to our best knowledge there are no detailed studies
on outliers in skyline computation. An important ad-
vantage of synthetic data is that it avoids bias (Balke
et al., 2007). Our experiments were based on a sin-
gle yet typical e-commerce product catalog such that
they clearly do not allow for a universally valid in-
terpretation. Still, when preference queries are to be
computed in concrete commercial applications and on
data sets, whose statistical properties have been ana-
lyzed, the rich skyline literature with all its investiga-
tions on synthetic data still does not provide helpful
indications on which skyline algorithm to apply.
ACKNOWLEDGEMENTS
We gratefully acknowledge the close and inspiring
collaboration with our industry partner Arcmedia AG
(www.arcmedia.ch) as well as Roland Christen and
Daniel Pf
¨
affli for integration and testing of skyline al-
gorithms in a professional e-commerce environment
and the valuable feedback they provided.
REFERENCES
Aldrich, S. E. (2011). Recommender systems in commer-
cial use. AI Magazine, 32(3):28–34.
Balke, W.-T., G
¨
untzer, U., and Siberski, W. (2007). Re-
stricting skyline sizes using weak pareto dominance.
Inform., Forsch. Entwickl., 21(3-4):165–178.
Balke, W.-T., Zheng, J. X., and G
¨
untzer, U. (2005). Ap-
proaching the efficient frontier: cooperative database
retrieval using high-dimensional skylines. In Hutchi-
son, D., Kanade, T., Kittler, J., Kleinberg, J. M., Mat-
tern, F., Mitchell, J. C., Naor, M., Nierstrasz, O.,
Pandu Rangan, C., Steffen, B., Sudan, M., Terzopou-
los, D., Tygar, D., Vardi, M. Y., Weikum, G., Zhou, L.,
Ooi, B. C., and Meng, X., editors, Database Systems
for Advanced Applications, volume 3453 of Lecture
Notes in Computer Science, pages 410–421. Springer
Berlin Heidelberg, Berlin, Heidelberg.
B
¨
orzs
¨
onyi, S., Kossmann, D., and Stocker, K. (2001). The
skyline operator. In Proceedings of the 17th Inter-
national Conference on Data Engineering, April 2-6,
2001, Heidelberg, Germany, pages 421–430.
Chaudhuri, S., Dalvi, N., and Kaushik, R. (2006). Robust
cardinality and cost estimation for skyline operator.
In ICDE. Institute of Electrical and Electronics En-
gineers, Inc.
Chomicki, J., Godfrey, P., Gryz, J., and Liang, D. (2003).
Skyline with presorting. In Dayal, U., Ramamritham,
K., and Vijayaraman, T., editors, ICDE, pages 717–
719. IEEE Computer Society.
Chomicki, J., Godfrey, P., Gryz, J., and Liang, D. (2005).
Skyline with presorting: theory and optimizations. In
Klopotek, M. A., Wierzchon, S. T., and Trojanowski,
K., editors, Intelligent Information Systems, Advances
in Soft Computing, pages 595–604. Springer.
Cole, P. (2013). Amazon.com catalog blows past 200m
items. https://sellerengine.com/amazon-com-catalog-
blows-past-200m-items, last visited: 2015-10-10.
Endres, M., Roocks, R., and Kissling, W. (2015). Scalagon:
An efficient skyline algorithm for all seasons. In DAS-
FAA: 20th Int. Conference of Database Systems for
Advanced Applications, pages 292–308.
Galli, M., Schn
¨
urle, S., Arnold, R., and Pouly, M.
(2015). prefsql code repository and experimental set-
ting. https://github.com/migaman/prefSQL, last vis-
ited: 2015-10-10.
Godfrey, P., Shipley, R., and Gryz, J. (2005). Maximal
vector computation in large data sets. In B
¨
ohm, K.,
Jensen, C. S., Haas, L. M., Kersten, M. L., Larson, P.,
and Ooi, B. C., editors, VLDB, pages 229–240. ACM.
Han, X., Li, J., Yang, D., and Wang, J. (2013). Efficient sky-
line computation on big data. IEEE Trans. on Knowl.
and Data Eng., 25(11):2521–2535.
Lofi, C. and Balke, W.-T. (2013). On skyline queries and
how to choose from pareto sets. In Catania, B. and
Jain, L. C., editors, Advanced Query Processing (1),
volume 36 of Intelligent Systems Reference Library,
pages 15–36. Springer.
Martin, F. J., Donaldson, J., Ashenfelter, A., Torrens, M.,
and Hangartner, R. (2011). The big promise of rec-
ommender systems. AI Magazine, 32(3):19–27.
Papadias, D., Tao, Y., Fu, G., and Seegerr, B. (2003). An op-
timal and progressive algorithm for skyline queries. In
Proc. of the 2003 ACM SIGMOD International Con-
ference on Management of Data, SIGMOD ’03, pages
467–478, New York, NY, USA. ACM.
Preisinger, T. and Kissling, W. (2007). The hexagon algo-
rithm for pareto preference queries. In Proc. of the
3rd Multidisciplinary Workshop on Advances in Pref-
erence Handling.
Preisinger, T., Kissling, W., and Endres, M. (2006).
The bnl++ algorithm for evaluating pareto preference
queries. In In Proc. of the Multidisciplinary Workshop
on Advances in Preference Handling.
Rooks, P. (2014). The rpref package the rpref package:
database preferences and skyline computation in r.
http://www.p-roocks.de/rpref/, last visited: 2015-10-
21.
ICAART 2016 - 8th International Conference on Agents and Artificial Intelligence
470