Discussion and Future Work. In the process de-
scribed above, the user intervention can be error-
prone hence some improvements may be envisaged.
For instance, we can inject some of the algorithms
proposed by (Romero and Abell
´
o, 2007) in order to
semi-automate the identification of dimensions and
facts. Moreover, even if we have resolved the prob-
lem of complex hierarchies in the first two steps of our
ETL processes by generating strict hierarchies, we ac-
knowledge that the summarizabiliy problem (Maz
´
on
et al., 2010) is not totally resolved especially com-
pleteness and disjointness integrity constraints (Lenz
and Shoshani, 1997). Both the works of (Romero and
Abell
´
o, 2007) and (Prat et al., 2012) propose solu-
tions applied on ontologies. Compared to (Romero
and Abell
´
o, 2007), the authors of (Prat et al., 2012)
propose more explicit rules which seems straightfor-
ward to be added to our process. Since the construc-
tion of our process outputs are performed in paral-
lel, the verification of the summarizabilty constraints
in the ontology will be used to check the same con-
straints in the multidimensional model.
5 CONCLUSIONS
In this paper, we have presented a full RDF graph-
based ETL chain for warehousing SOD. This ETL
chain takes as input a set of flat SOD containing sta-
tistical tables and transforms them into a multidimen-
sional schema through three steps. The first step au-
tomatically extracts, annotates and transforms input
tables into RDF instance-schema graphs. The second
step performs automatically a holistic graph integra-
tion through an integer linear program. The third one
incrementally defines the multidimensional schema
through an interactive process between users and sys-
tem. The main contributions of our approach are:
(i) the unified representation of tables which facili-
tates their schema discovery and (ii) the extension of
the maximum weighted graph matching problem with
structural constraints in order to resolve the holistic
open data integration problem. In our future works,
we aim to train a user study on our approach to mea-
sure the efficiency of automatic detections and inte-
grations, and to measure the difficulties that users may
encounter when they define incrementally the multi-
dimensional schema from visual graphs.
REFERENCES
Bergamaschi, S., Guerra, F., Orsini, M., Sartori, C., and
Vincini, M. (2011). A semantic approach to etl
technologies. Data and Knowledge Engineering,
70(8):717 – 731.
Berro, A., Megdiche, I., and Teste, O. (2014). A content-
driven ETL processes for open data. In New Trends in
Database and Information Systems II - Selected pa-
pers of the 18th East European Conference on Ad-
vances in Databases and Information Systems and As-
sociated Satellite Events, ADBIS 2014 Ohrid, Mace-
donia, pages 29–40.
Birkhoff, G. (1967). Lattice Theory. American Mathemati-
cal Society, 3rd edition.
Etcheverry, L., Vaisman, A., and Zimnyi, E. (2014). Mod-
eling and querying data warehouses on the semantic
web using QB4OLAP. In Proceedings of the 16th
International Conference on Data Warehousing and
Knowledge Discovery, DaWaK’14, Lecture Notes in
Computer Science. Springer-Verlag.
Jaccard, P. (1912). The distribution of the flora in the alpine
zone. New Phytologist, 11(2):37–50.
Lenz, H.-J. and Shoshani, A. (1997). Summarizability in
olap and statistical data bases. In Scientific and Sta-
tistical Database Management, 1997. Proceedings.,
pages 132–143.
Malinowski, E. and Zim
´
anyi, E. (2006). Hierarchies in
a multidimensional model: From conceptual mod-
eling to logical representation. Data Knowl. Eng.,
59(2):348–377.
Mansmann, S. and Scholl, M. H. (2007). Empowering the
olap technology to support complex dimension hierar-
chies. IJDWM, 3(4):31–50.
Maz
´
on, J.-N., Lechtenbrger, J., and Trujillo, J. (2010). A
survey on summarizability issues in multidimensional
modeling. In JISBD, pages 327–327. IBERGARC-
ETA Pub. S.L.
Plastria, F. (2002). Formulating logical implications in com-
binatorial optimisation. European Journal of Opera-
tional Research, 140(2):338 – 353.
Prat, N., Megdiche, I., and Akoka, J. (2012). Multidi-
mensional models meet the semantic web: defining
and reasoning on OWL-DL ontologies for OLAP. In
DOLAP 2012, ACM 15th International Workshop on
Data Warehousing and OLAP, pages 17–24.
Rahm, E. and Bernstein, P. A. (2001). A survey of
approaches to automatic schema matching. VLDB
JOURNAL, 10.
Ravat, F., Teste, O., Tournier, R., and Zurfluh, G. (2008).
Algebraic and graphic languages for OLAP manipu-
lations. International Journal of Data Warehousing
and Mining, 4(1):17–46.
Romero, O. and Abell
´
o, A. (2007). Automating multidi-
mensional design from ontologies. In Proceedings of
the ACM Tenth International Workshop on Data Ware-
housing and OLAP, DOLAP ’07, pages 1–8. ACM.
Wang, X. (1996). Tabular abstraction, editing, and format-
ting. Technical report, University of Waretloo, Water-
loo, Ontaria, Canada.
Wu, Z. and Palmer., M. (1994). Verb semantics and lexical
selection. In In 32nd. Annual Meeting of the Associa-
tion for Computational Linguistics, New Mexico State
University, Las Cruces, New Mexico., pages 133–138.
ICEIS2015-17thInternationalConferenceonEnterpriseInformationSystems
278