Extraction and Multidimensional Analysis of Data from Unstructured Data Sources: A Case Study

Rui Lima, Estrela Cruz

2019

Abstract

This paper proposes an approach to detect and extract data from unstructured data source (about the subject to be studied) available online and spread by several Web pages and aggregate and store the data in a Data Warehouse properly designed for it. The Data Warehouse repository will serve as basis for the Business Intelligence and Data Mining analysis. The extracted data may be complemented with information provided by other sources in order to enrich the information to enhance the analysis and draw new and more interesting conclusions. The proposed process is then applied to a case study composed by results of athletics events realized in Portugal in the last 12 years. The files, about competition results, are available online, spread by the websites of the several athletics associations. Almost all files are published in portable document format (PDF) and each association provides files with its own different internal format. The case study also proposes an integrating mechanism between results of athletics events with their geographic location and atmospheric conditions of the events allowing to assess and analyze how the atmospheric and geographical conditions interfere in the results achieved by the athletes.

Download


Paper Citation


in Harvard Style

Lima R. and Cruz E. (2019). Extraction and Multidimensional Analysis of Data from Unstructured Data Sources: A Case Study.In Proceedings of the 21st International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 978-989-758-372-8, pages 190-199. DOI: 10.5220/0007720301900199


in Bibtex Style

@conference{iceis19,
author={Rui Lima and Estrela Cruz},
title={Extraction and Multidimensional Analysis of Data from Unstructured Data Sources: A Case Study},
booktitle={Proceedings of the 21st International Conference on Enterprise Information Systems - Volume 1: ICEIS,},
year={2019},
pages={190-199},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0007720301900199},
isbn={978-989-758-372-8},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 21st International Conference on Enterprise Information Systems - Volume 1: ICEIS,
TI - Extraction and Multidimensional Analysis of Data from Unstructured Data Sources: A Case Study
SN - 978-989-758-372-8
AU - Lima R.
AU - Cruz E.
PY - 2019
SP - 190
EP - 199
DO - 10.5220/0007720301900199