Authors:
Mustapha Bouakkaz
1
;
Sabine Loudcher
2
and
Youcef Ouinten
1
Affiliations:
1
University of Laghouat, Algeria
;
2
University of Lyon 2, France
Keyword(s):
OLAP, Textual Data, Aggregation Function, Google Similrity.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Data Mining
;
Data Warehouses and OLAP
;
Databases and Information Systems Integration
;
Enterprise Information Systems
;
Sensor Networks
;
Signal Processing
;
Soft Computing
Abstract:
With the tremendous growth of unstructured data in the Business Intelligence, there is a need for incorporating textual data into data warehouses, to provide an appropriate multidimensional analysis (OLAP) and develop new approaches that take into account the textual content of data. This will provide textual measures to users who wish to analyse documents online. In this paper, we propose a new aggregation function for textual data in an OLAP context. For aggregating keywords, our contribution is to use a data mining technique, such as kmeans, but with a distance based on the Google similarity distance. Thus our approach considers the semantic similarity of keywords for their aggregation. The performance of our approach is analyzed and compared to another method using the k-bisecting clustering algorithm and based on the Jensen-Shannon divergence for the probability distributions. The experimental study shows that our approach achieves better performances in terms of recall, precisi
on,F-measure complexity and runtime.
(More)