Authors:
Emmanuel Bresso
1
;
Sidahmed Benabderrahmane
2
;
Malika Smail-Tabbone
2
;
Gino Marchetti
2
;
Arnaud Sinan Karaboga
2
;
Michel Souchet
3
;
Amedeo Napoli
2
and
Marie-Dominique Devignes
2
Affiliations:
1
LORIA UMR 7503, CNRS, Nancy Université, INRIA NGE and Harmonic Pharma, France
;
2
LORIA UMR 7503, CNRS, Nancy Université and INRIA NGE, France
;
3
Harmonic Pharma, France
Keyword(s):
Dimension reduction, Clustering, Semantic similarity, Drug side effects.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
BioInformatics & Pattern Discovery
;
Data Reduction and Quality Assessment
;
Knowledge Discovery and Information Retrieval
;
Knowledge-Based Systems
;
Mining High-Dimensional Data
;
Symbolic Systems
Abstract:
High dimensionality of datasets can impair the execution of most data mining programs and lead to the production of numerous and complex patterns, inappropriate for interpretation by the experts. Thus, dimension reduction of datasets constitutes an important research orientation in which the role of domain knowledge is essential. We present here a new approach for reducing dimensions in a dataset by exploiting semantic relationships between terms of an ontology structured as a rooted directed acyclic graph. Term clustering is performed thanks to the recently described IntelliGO similarity measure and the term clusters are then used as descriptors for data representation. The strategy reported here is applied to a set of drugs associated with their side effects collected from the SIDER database. Terms describing side effects belong to the MedDRA terminology. The hierarchical clustering of about 1,200 MedDRA terms into an optimal collection of 112 term clusters leads to a reduced data
representation. Two data mining experiments are then conducted to illustrate the advantage of using this reduced representation.
(More)