Authors:
R. Raminhos
1
and
J. Moura-Pires
2
Affiliations:
1
UNINOVA – Desenvolvimento de Novas Tecnologias, Portugal
;
2
CENTRIA/FCT, Portugal
Keyword(s):
ETD, ETL, IL, Declarative Language, Semi-Structured Text Files.
Related
Ontology
Subjects/Areas/Topics:
Coupling and Integrating Heterogeneous Data Sources
;
Databases and Information Systems Integration
;
Enterprise Information Systems
Abstract:
The World Wide Web is a major source of textual information, with a human-readable semi-structured format, referring to multiple domains, some of them highly complex. Traditional ETL approaches following the development of specific source code for each data source and based on multiple domain / computer-science experts interactions, become an inadequate solution, time consuming and prone to error. This paper presents a novel approach to ETL, based on its decomposition in two phases: ETD (Extraction, Transformation and Data Delivery) and IL (Integration and Loading). The ETD proposal is supported by a declarative language for expressing ETD statements and a graphical application for interacting with the domain expert. When applying ETD mainly domain expertise is required, while computer-science expertise will be centered in the IL phase, linking the processed data to target system models, enabling a clearer separation of concerns. This paper presents how ETD has been integrated, teste
d and validated in a space domain project, currently operational at the European Space Agency for the Galileo Mission.
(More)