Authors:
Leandro Ordonez-Ante
;
Gregory Van Seghbroeck
;
Tim Wauters
;
Bruno Volckaert
and
Filip De Turck
Affiliation:
Ghent University - imec, IDLab, Department of Information Technology, Technologiepark-Zwijnaarde 15, Gent and Belgium
Keyword(s):
Interactive Querying, View Selection, Clustering, Distributed Data, Dimensional Data, Data Warehouse.
Abstract:
Small-to-medium businesses are increasingly relying on big data platforms to run their analytical workloads in a cost-effective manner, instead of using conventional and costly data warehouse systems. However, the distributed nature of big data technologies makes it time-consuming to process typical analytical queries, especially those involving aggregate and join operations, preventing business users from performing efficient data exploration. In this sense, a workload-driven approach for automatic view selection was devised, aimed at speeding up analytical queries issued against distributed dimensional data. This paper presents a detailed description of the proposed approach, along with an extensive evaluation to test its feasibility. Experimental results shows that the conceived mechanism is able to automatically derive a limited but comprehensive set of views able to reduce query processing time by up to 89%–98%.