EXTRACTION AND TRANSFORMATION OF DATA FROM SEMI-STRUCTURED TEXT FILES USING A DECLARATIVE APPROACH

R. Raminhos, J. Moura-Pires

2007

Abstract

The World Wide Web is a major source of textual information, with a human-readable semi-structured format, referring to multiple domains, some of them highly complex. Traditional ETL approaches following the development of specific source code for each data source and based on multiple domain / computer-science experts interactions, become an inadequate solution, time consuming and prone to error. This paper presents a novel approach to ETL, based on its decomposition in two phases: ETD (Extraction, Transformation and Data Delivery) and IL (Integration and Loading). The ETD proposal is supported by a declarative language for expressing ETD statements and a graphical application for interacting with the domain expert. When applying ETD mainly domain expertise is required, while computer-science expertise will be centered in the IL phase, linking the processed data to target system models, enabling a clearer separation of concerns. This paper presents how ETD has been integrated, tested and validated in a space domain project, currently operational at the European Space Agency for the Galileo Mission.

References

  1. Caserta, J. and R. Kimball (2004). The Data Warehouse ETL Toolkit: Practical Techniques for Extracting, Cleaning, Conforming and Delivering Data, John Wiley & Sons.
  2. Daily, E. (2002). Space Weather: A Brief Review. SOLSPA: The Second Solar Cycle and Space Weather Euroconference, Napoli, Italy.
  3. Dijkstra, E. (1972). Notes on Structured Programming. A. Press.
  4. ESA. (2006). "Space Environment Support System for Telecom/Navigation Missions (SESS)." from http://telecom.esa.int/telecom/www/object/index.cfm? fobjectid=20470.
  5. Ferreira, R. and J. Moura-Pires (2007). Extensible Metadata Repository for Information Systems and Enterprise Applications. ICEIS 2007 - 9th International Conference on Enterprise Information Systems, Funchal, Portugal.
  6. Moura-Pires, J., M. Pantoquilho, et al. (2004). Space Environment Information System for Mission Control Purposes: Real-Time Monitoring and Inference of Spacecrafy Status. 2004 IEEE Multiconference on CCA/ISIC/CACSD, Taipei, Taiwan.
  7. Pantoquilho, M., N. Viana, et al. (2005). SEIS: a decision support system for optimizing spacecraft operations strategies. IEEE Aerospace Conference, Montana, USA.
  8. Schmieder, B., B. Vincent, et al. (2002). Climate and Weather of the Sun Earth System: CAWSES. SOLSPA: The Second Solar Cycle and Space Weather Euroconference, Napoli, Italy.
Download


Paper Citation


in Harvard Style

Raminhos R. and Moura-Pires J. (2007). EXTRACTION AND TRANSFORMATION OF DATA FROM SEMI-STRUCTURED TEXT FILES USING A DECLARATIVE APPROACH . In Proceedings of the Ninth International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 978-972-8865-88-7, pages 199-205. DOI: 10.5220/0002364201990205


in Bibtex Style

@conference{iceis07,
author={R. Raminhos and J. Moura-Pires},
title={EXTRACTION AND TRANSFORMATION OF DATA FROM SEMI-STRUCTURED TEXT FILES USING A DECLARATIVE APPROACH},
booktitle={Proceedings of the Ninth International Conference on Enterprise Information Systems - Volume 1: ICEIS,},
year={2007},
pages={199-205},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002364201990205},
isbn={978-972-8865-88-7},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Ninth International Conference on Enterprise Information Systems - Volume 1: ICEIS,
TI - EXTRACTION AND TRANSFORMATION OF DATA FROM SEMI-STRUCTURED TEXT FILES USING A DECLARATIVE APPROACH
SN - 978-972-8865-88-7
AU - Raminhos R.
AU - Moura-Pires J.
PY - 2007
SP - 199
EP - 205
DO - 10.5220/0002364201990205