DISTRIBUTED LEARNING ALGORITHM BASED ON DATA REDUCTION

Ireneusz Czarnowski, Piotr Jędrzejowicz

Abstract

The paper presents an approach to learning classifiers from distributed data, based on a data reduction at a local level. In such case, the aim of data reduction is to obtain a compact representation of distributed data repositories, that include non-redundant information in the form of so-called prototypes. In the paper data reduction is carried out by simultaneously selecting instances and features, finally producing prototypes which do not have to be homogenous and can include different sets of features. From these prototypes the global classifier based on a feature voting is constructed. To evaluate and compare the proposed approach computational experiment was carried out. The experiment results indicate that data reduction at the local level and next merger of prototypes into the global classifier can produce very good classification results.

References

  1. Asuncion, A., Newman, D.J. (2007). UCI Machine Learning Repository (http://www.ics.uci.edu/ mlearn/MLRepository.html).
  2. Czarnowski, I., Je¸drzejowicz, P. (2004) An approach to instance reduction in supervised learning. In: Coenen F., Preece A. and Macintosh A. (Eds.), Research and Development in Intelligent Systems XX, Proc. of AI2003, the Twenty-third SGAI International Conference on Innovative Techniques and Applications of Artificial Intelligence, Springer-Verlag London Limited, 267-282.
  3. Czarnowski, I., Je¸drzejowicz, P., Wierzbowska, I. (2008a) An A-Team Approach to Learning Classifiers from Distributed Data Sources. In: Ngoc Thanh Nguyen, Geun Sik Jo, Robert J. Howlett,an Lakhmi C. Jain (Eds.), KES-AMSTA 2008, Lecture Notes in Computer Science, LNAI 4953, Springer-Verlag Berlin Heidelberg, 536-546
  4. Czarnowski, I., Je¸drzejowicz, P. (2008b) Data Reduction Algorithm for Machine Learning and Data Mining. In: Nguyen N.T. et al. (eds) IEA/AIE 2008, Lecture Notes in Computer Science, LNAI 5027, SpringerVerlag Berlin Heidelberg, 276-285.
  5. Dash, M., & Liu H. (1997). Feature selection for classification. Intelligence Data Analysis 1(3), 131-156.
  6. Kargupta, H., Byung-Hoon Park, Daryl Hershberger, & Johnson, E. (1999). Collective Data Mining: A New Perspective Toward Distributed Data Analysis. In Kargupta H and Chan P (Eds.), Advances in Distributed Data Mining. AAAI/MIT Press, 133-184.
  7. Liu, H., Lu, H., & Yao, J. (1998). Identifying Relevant Databases for Multidatabase Mining. In Proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining, 210-221.
  8. Prodromidis, A., Chan, P.K., & Stolfo, S.J. (2000). Metalearning in Distributed Data Mining Systems: Issues and Approaches. In H. Kargupta and P. Chan (Eds.) Advances in Distributed and Parallel Knowledge Discovery, AAAI/MIT Press, Chapter 3.
  9. Raman, B., & Ioerger, T.R. (2003). Enhancing learning using feature and example selection. Journal of Machine Learning Research (in press)
  10. Rozsypal, A., & Kubat, M. (2003). Selecting Representative Examples and Attributes by a Genetic Algorithm. Intelligent Data Analysis, 7(4), 291-304.
  11. Quinlan, J.R. (1993). C4.5: programs for machine learning. Morgan Kaufmann, SanMateo, CA.
  12. Skalak, D.B. (1994). Prototype and Feature Selection by Sampling and Random Mutation Hill Climbing Algorithm. Procciding of the International Conference on Machine Learning, 293-301.
  13. Stolfo, S., Prodromidis, A.L., Tselepis, S., Lee, W., & Fan. D.W. (1997). JAM: Java Agents for Meta-Learning over Distributed Databases. In Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, Newport Beach, CA, AAAI Press, 74-81.
  14. Talukdar, S., Baerentzen, L., Gove, A., & P. de Souza (1996). Asynchronous Teams: Co-operation Schemes for Autonomous. Computer-Based Agents, Technical The European Network of Excellence on Intelligence Technologies for Smart Adaptive Systems (EUNITE) - EUNITE World Competition in domain of Intelligent Technologies (2002). Accesed on 1 September 2002 from http://neuron.tuke.sk/competition2.
  15. Tsoumakas, G., Angelis, L., & Vlahavas, I. (2004). Clustering Classifiers for Knowledge Discovery from Physical Distributed Database. Data & Knowledge Engineering, 49(3), 223-242.
  16. Xiao-Feng Zhang, Chank-Man Lam, & William K. Cheung (2004). Mining Local Data Sources For Learning Global Cluster Model Via Local Model Exchange. IEEE Intelligence Informatics Bulletine, Vol. 4, No. 2.
  17. Vucetic, S., & Obradovic, Z. (2000). Performance Controlled Data Reduction for Knowledge Discovery in Distributed Databases, Procciding of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, 29-39.
Download


Paper Citation


in Harvard Style

Czarnowski I. and Jędrzejowicz P. (2009). DISTRIBUTED LEARNING ALGORITHM BASED ON DATA REDUCTION . In Proceedings of the International Conference on Agents and Artificial Intelligence - Volume 1: ICAART, ISBN 978-989-8111-66-1, pages 198-203. DOI: 10.5220/0001655401980203


in Bibtex Style

@conference{icaart09,
author={Ireneusz Czarnowski and Piotr Jędrzejowicz},
title={DISTRIBUTED LEARNING ALGORITHM BASED ON DATA REDUCTION},
booktitle={Proceedings of the International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,},
year={2009},
pages={198-203},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001655401980203},
isbn={978-989-8111-66-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,
TI - DISTRIBUTED LEARNING ALGORITHM BASED ON DATA REDUCTION
SN - 978-989-8111-66-1
AU - Czarnowski I.
AU - Jędrzejowicz P.
PY - 2009
SP - 198
EP - 203
DO - 10.5220/0001655401980203