INFORMED k-MEANS: A CLUSTERING PROCESS BIASED BY PRIOR KNOWLEDGE - A case study in the dactyloscopic domain

Wagner Francisco Castilho, Hércules Antônio do Prado, Marcelo Ladeira

2004

Abstract

Knowledge Discovery in Databases (KDD) is the process by which unknown and useful knowledge and information are extracted, by automatic or semi-automatic methods, from large amounts of data. Along the evolution of Information Technology and the rapid growth in the number and size of databases, the development of methodologies, techniques, and tools for data mining has become a major concern for researchers, and has led, in turn, to the development of applications in a variety of areas of human activity. About 1997, the processes and techniques associated with cluster analysis had begun to be researched with increasing intensity by the KDD community. Within the context of a model intended to support decisions based on cluster analysis, prior knowledge about the data structure and the application domain can be used as important constraints that lead to better results in the clusters’ configurations. This paper presents an application of cluster analysis in the area of public safety using a schema that takes into account the burden of prior knowledge acquired from statistical analysis on the data. Such an information was used as a bias for the k-means algorithm that was applied to identify the dactyloscopic (fingerprint) profile of criminals in the Brazilian capital, also known as Federal District. These results was then compared with a similar analysis that disregarded the prior knowledge. It is possible to observe that the analysis using prior knowledge generated clusters that are more coherent with the expert knowledge.

References

  1. Araújo, M. E. C., Bossois L. M., Santana J. L., 2003. O Arquivo datiloscópico criminal brasileiro: os tipos fundamentais e suas freqüências. In XIII Congresso Mundial de Criminologia. Sociedade Internacional de Criminologia.
  2. Fayyad, U. M. et al, 1996. From data mining to knowledge discovery: an overview. In: Fayyad, U. M. et al. Advances in Knowledge discovery and data mining, AAAI Press. Menlo Park, CA.
  3. Han, J., Kamber, M., 2001. Data Mining: concepts and techniques, Morgan Kaufmann Publishers.
  4. Hanson, S. J., 1990. Conceptual Clustering and Categorization: Bridging The Gap Between Induction and Causal Models. In: Kodratoff, Y. & Michalski, R. (Eds.), Machine Learning: An Artificial Intelligence Approach, Morgan: San Mateo, CA.
  5. INI - Instituto Nacional de Identificação, 1987. Identificação Papiloscópica, Departamento de Polícia Federal (DPF). Brasília.
  6. Oliveira, M. G., 2003. Otimização de busca decadactilar para métodos manuais, tradicionais ou sistemas automatizados de identificação de impressões digitais (AFIS), utilizando técnicas de Data Mining. UNB. Brasília.
Download


Paper Citation


in Harvard Style

Francisco Castilho W., Antônio do Prado H. and Ladeira M. (2004). INFORMED k-MEANS: A CLUSTERING PROCESS BIASED BY PRIOR KNOWLEDGE - A case study in the dactyloscopic domain . In Proceedings of the Sixth International Conference on Enterprise Information Systems - Volume 2: ICEIS, ISBN 972-8865-00-7, pages 469-475. DOI: 10.5220/0002646704690475


in Bibtex Style

@conference{iceis04,
author={Wagner Francisco Castilho and Hércules Antônio do Prado and Marcelo Ladeira},
title={INFORMED k-MEANS: A CLUSTERING PROCESS BIASED BY PRIOR KNOWLEDGE - A case study in the dactyloscopic domain},
booktitle={Proceedings of the Sixth International Conference on Enterprise Information Systems - Volume 2: ICEIS,},
year={2004},
pages={469-475},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002646704690475},
isbn={972-8865-00-7},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Sixth International Conference on Enterprise Information Systems - Volume 2: ICEIS,
TI - INFORMED k-MEANS: A CLUSTERING PROCESS BIASED BY PRIOR KNOWLEDGE - A case study in the dactyloscopic domain
SN - 972-8865-00-7
AU - Francisco Castilho W.
AU - Antônio do Prado H.
AU - Ladeira M.
PY - 2004
SP - 469
EP - 475
DO - 10.5220/0002646704690475