Authors:
Michael Behringer
;
Dennis Treder-Tschechlov
;
Julius Voggesberger
;
Pascal Hirmer
and
Bernhard Mitschang
Affiliation:
Institute of Parallel and Distributed Systems, University of Stuttgart, Universitätsstr. 38, D-70569 Stuttgart, Germany
Keyword(s):
Data Mashup, Human-In-The-Loop, Interactive Data Analysis.
Abstract:
Today, data analytics is widely used throughout many domains to identify new trends, opportunities, or risks and improve decision-making. By doing so, various heterogeneous data sources must be selected to form the foundation for knowledge discovery driven by data analytics. However, discovering and selecting the suitable and valuable data sources to improve the analytics results is a great challenge. Domain experts can easily become overwhelmed in the data selection process due to a large amount of available data sources that might contain similar kinds of information. Supporting domain experts in discovering and selecting the best suitable data sources can save time, costs and significantly increase the quality of the analytics results. In this paper, we introduce a novel approach – SDRank – which provides a Deep Learning approach to rank data sources based on their similarity to already selected data sources. We implemented SDRank, trained various models on 4 860 datasets, and mea
sured the achieved precision for evaluation purposes. By doing so, we showed that SDRank is able to highly improve the workflow of domain experts to select beneficial data sources.
(More)