(0.0726) and ABCD (0.0726) than for our prototype
EGR (0.1017).
In addition, we applied the EGR algorithm to the
Household Electric Power Consumption dataset in-
troduced by Hebrail and Berard (2012) comprising
2,075,259 data records. While the EGR algorithm
was able to retrieve a large-scale GPM in less than
1.5 hours, we interrupted the computation of both al-
gorithms CKS and ABCD after 14 days, since they
were not able to complete the GPM computation.
We thus conclude, that the proposed large-scale
GPM structure enables the development of efficient
retrieval algorithms that scale to millions of data
records.
6 CONCLUSION
In this paper, we introduce a new structure for Gaus-
sian Process Models (GPMs) enabling the analysis
of large-scale datasets. This new structure utilizes
a concatenation of locally specialized models to re-
duce both kernel search complexity as well as compu-
tational effort required in the evaluative calculations.
Furthermore, we incorporate the given candidate for-
mat (i.e. sum of products form) directly into the can-
didate generation mechanism. This results in fewer
to-be-evaluated candidates and subsequently ought to
improve on future GPM retrieval algorithms as well.
Although we made a first step towards large-scale
GPM retrieval, several challenges in that field re-
main open issues. We outlined those challenges in
detail and backed our claims regarding the perfor-
mance implications of our new model. For this pur-
pose, we have implemented a first prototype for large-
scale GPM retrieval and investigated its performance
in comparison to the state of the art.
Apart from further developing this initial proto-
type, we plan to address the challenges mentioned in
this paper in our future work.
REFERENCES
Beecks, C., Schmidt, K. W., Berns, F., and Graß, A.
(2019). Gaussian processes for anomaly description
in production environments. In Proceedings of the
Workshops of the EDBT/ICDT 2019 Joint Conference,
EDBT/ICDT 2019, Lisbon, Portugal, March 26, 2019.
Berns, F., Schmidt, K., Grass, A., and Beecks, C. (2019). A
new approach for efficient structure discovery in IoT.
In 2019 IEEE International Conference on Big Data
(Big Data), pages 4152–4156. IEEE.
Bradshaw, S., Brazil, E., and Chodorow, K. (2020). Mon-
goDB: The defintive guide : powerful and scalable
data storage. O’Reilly Media Inc., 3rd revised edi-
tion.
Chee, C.-H., Jaafar, J., Aziz, I. A., Hasan, M. H., and
Yeoh, W. (2019). Algorithms for frequent itemset
mining: a literature review. Artificial Intelligence Re-
view, 52(4):2603–2621.
Datta, A., Banerjee, S., Finley, A. O., and Gelfand, A. E.
(2016). Hierarchical nearest-neighbor gaussian pro-
cess models for large geostatistical datasets. Journal
of the American Statistical Association, 111(514).
Duvenaud, D., Lloyd, J. R., Grosse, R., Tenenbaum, J. B.,
and Ghahramani, Z. (2013). Structure discovery in
nonparametric regression through compositional ker-
nel search. In Proceedings of the 30th International
Conference on International Conference on Machine
Learning, ICML’13, pages III–1166–III–1174.
Ghosal, A., Nandy, A., Das, A. K., Goswami, S., and Pan-
day, M. (2020). A short review on different clustering
techniques and their applications. In Mandal, J. K.
and Bhattacharya, D., editors, Emerging Technology
in Modelling and Graphics, volume 937 of Advances
in Intelligent Systems and Computing, pages 69–83.
Springer Singapore, Singapore.
Gittens, A. and Mahoney, M. W. (2016). Revisiting the
nystr
¨
om method for improved large-scale machine
learning. J. Mach. Learn. Res., 17(1):3977–4041.
Hayashi, K., Imaizumi, M., and Yoshida, Y. (2019). On
random subsampling of gaussian process regression:
A graphon-based analysis.
Hebrail, G. and Berard, A. (2012). Individual household
electric power consumption data set.
Hensman, J., Fusi, N., and Lawrence, N. D. (2013). Gaus-
sian processes for big data. In Proceedings of the
Twenty-Ninth Conference on Uncertainty in Artificial
Intelligence, UAI’13, pages 282–290, Arlington, Vir-
ginia, United States. AUAI Press.
Hinton, G. E. (2002). Training products of experts by
minimizing contrastive divergence. Neural Comput.,
14(8):1771–1800.
Kim, H. and Teh, Y. W. (2018). Scaling up the Automatic
Statistician: Scalable structure discovery using Gaus-
sian processes. In Proceedings of the 21st Interna-
tional Conference on Artificial Intelligence and Statis-
tics, volume 84 of Proceedings of Machine Learning
Research, pages 575–584. PLMR.
Kim, H.-M., Mallick, B. K., and Holmes, C. C. (2005).
Analyzing nonstationary spatial data using piecewise
gaussian processes. Journal of the American Statisti-
cal Association, 100(470):653–668.
Le Noac’h, P., Costan, A., and Bouge, L. (2017). A per-
formance evaluation of apache kafka in support of
big data streaming applications. In 2017 IEEE Inter-
national Conference on Big Data (Big Data), pages
4803–4806. IEEE.
Lee, J., Bahri, Y., Novak, R., Schoenholz, S. S., Penning-
ton, J., and Sohl-Dickstein, J. (2018). Deep neural
networks as gaussian processes. In 6th International
Conference on Learning Representations, ICLR 2018,
Vancouver, BC, Canada, April 30 - May 3, 2018, Con-
ference Track Proceedings. OpenReview.net.
Towards Large-scale Gaussian Process Models for Efficient Bayesian Machine Learning
281