A Semi-supervised Learning Framework to Cluster Mixed Data Types

Artur Abdullin, Olfa Nasraoui


We propose a semi-supervised framework to handle diverse data formats or data with mixed-type attributes. Our preliminary results in clustering data with mixed numerical and categorical attributes show that the proposed semi-supervised framework gives better clustering results in the categorical domain. Thus the seeds obtained from clustering the numerical domain give an additional knowledge to the categorical clustering algorithm. Additional results show that our approach has the potential to outperform clustering either domain on its own or clustering both domains after converting them to the same target domain.


