BUILDING SCALABLE DATA MINING GRID APPLICATIONS - An Application Description Schema and Associated Grid Services

Vlado Stankovski, Dennis Wegener

Abstract

Grid-enabling existing stand-alone data mining programs, data and other resources, such as computational servers, is motivated by the possibility for their sharing via local and wide area networks. Expected benefits are improved effectiveness, efficiency, wider access and better use of existing resources. In this paper, the problem of how to grid enable a variety of existing data mining programs, is investigated. The presented solution is a simple procedure, which was developed under the DataMiningGrid project. The actual data mining program, which is a batch-style executable, is uploaded on a grid server and an XML document that describes the program is prepared and registered with the underlying grid information services. The XML document conforms to an Application Description Schema, and is used to facilitate discovery and execution of the program in the grid environment. Over 20 stand-alone data mining programs have already been grid enabled by using the DataMiningGrid system. By using Triana, a workflow editor and manager which represents the end-user interface to the grid infrastructure, it is possible to combine grid enabled data mining programs and data into complex data mining applications. Grid-enabled resource sharing may facilitate novel, scalable, distributed data mining applications, which have not been possible before.

References

  1. Antonioletti, M., et al., 2005. “The design and implementation of Grid database services in OGSADAI,” “Concurrency and Computation: Practice and Experience,” vol. 17, no. 2-4, pp. 357--376.
  2. Churches, G., et al., 2005. “Programming scientific and distributed workflow with Triana services”, Concurrency and Computation: Practice and Experience, vol. 18, no. 10, pp. 1021--1037.
  3. Congiusta, D., Talia, D., and Trunfio, P. 2007. “Distributed data mining services leveraging WSRF,” Future Generation Computer Systems, vol. 23, no. 1, pp. 34--41.
  4. DataMiningGrid. 2006. Data Mining in Grid Computing Environments, EU contract no. 4475, http://www.datamininggrid.org
  5. Foster, I., Kesselman, C. and Tuecke, S., 2001. “The Anatomy of the Grid: Enabling Scalable Virtual Organizations,” International Journal of High Performance Computing Applications, vol. 15, no. 3, pp. 200--222.
  6. Guedes, D., Meira, W.Jr., and Ferreira, R., 2006. “Anteater: A Service-Oriented Architecture for HighPerformance Data Mining”, IEEE Internet Computing, pp. 36--43.
  7. Kumar, M., Kantardzic, M. and Madden, S., 2006. “Guest Editors' Introduction: Distributed Data Mining-- Framework and Implementations,” IEEE Internet Computing, vol. 10, no. 4, pp. 15--17.
  8. Nabrzyski, J., Schopf, M., and Weglarz, J., 2004. (editors), “Grid Resource Management: State of the Art and Future Trends,” Kluwer Academic Publishers, Boston.
  9. Plaszczak, P. and Wellner, Jr. R., 2006. “Grid Computing: The Savvy Manager's Guide,” Moragan Kaufmann, Amsterdam.
  10. Sotomayor, B., and Childers, L., 2006. “Globus Toolkit 4: Programming Java Services,” Moragan Kaufmann, Amsterdam.
  11. Stankovski et al., “Grid-enabling data mining applications with DataMiningGrid: An architectural perspective”, Future Generation Computing Systems, vol. 24, no. 4, pp. 259--279.
  12. Stankovski et al., 2008b. “Digging Deep in the Data Mine with DataMiningGrid”, IEEE Internet Computing, in press.
  13. Stankovski, V. and Dubitzky, W. 2007. “Special Section: Data Mining in Grid Computing Environments”, Future Generation Computer Systems, vol. 23, no. 1, pp.
  14. Thain, D., Tannenbaum, T. and Livny, M., 2005. Distributed computing in practice: The Condor Experience, Concurrency-Practice and Experience, vol. 17, pp. 323-356.
  15. Trnkoczy, J. and Stankovski, V. 2008. “Improving the performance of Federated Digital Library services” Future Generation Computer Systems, in press, doi:10.1016/j.future.2008.04.007.
  16. Trnkoczy, J., Turk, Ž. and Stankovski, V., 2006. “A Gridbased Architecture for Personalized Federation of Digital Libraries,” Library Collections, Acquisitions, and Technical Services, vol. 30, pp. 139--153.
  17. Venugopal, S., Buyya, R., and Winton, L., 2006. “A Grid Service Broker for Scheduling e-Science Applications on Global Data Grids,” Concurrency and Computation: Practice and Experience, vol.18, no 6, pp. 685-69.
  18. Wegener, D. and May, M. 2007. “Extensibility of GridEnabled Data Mining Platforms: A Case Study” In Proc. of the 5th International Workshop on Data Mining Standards, Services and Platforms, pp 13-22, San Jose, California, USA, August, 2007. ISBN 978- 1-59593-838-1.
Download


Paper Citation


in Harvard Style

Stankovski V. and Wegener D. (2008). BUILDING SCALABLE DATA MINING GRID APPLICATIONS - An Application Description Schema and Associated Grid Services . In Proceedings of the Third International Conference on Software and Data Technologies - Volume 1: ICSOFT, ISBN 978-989-8111-51-7, pages 221-228. DOI: 10.5220/0001891302210228


in Bibtex Style

@conference{icsoft08,
author={Vlado Stankovski and Dennis Wegener},
title={BUILDING SCALABLE DATA MINING GRID APPLICATIONS - An Application Description Schema and Associated Grid Services},
booktitle={Proceedings of the Third International Conference on Software and Data Technologies - Volume 1: ICSOFT,},
year={2008},
pages={221-228},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001891302210228},
isbn={978-989-8111-51-7},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Third International Conference on Software and Data Technologies - Volume 1: ICSOFT,
TI - BUILDING SCALABLE DATA MINING GRID APPLICATIONS - An Application Description Schema and Associated Grid Services
SN - 978-989-8111-51-7
AU - Stankovski V.
AU - Wegener D.
PY - 2008
SP - 221
EP - 228
DO - 10.5220/0001891302210228