EVALUATION OF TEXT CLASSIFICATION ALGORITHMS - for a Web-based Market Data Warehouse

Carsten Felden, Peter Chamoni

2005

Abstract

Decision makers in enterprises cannot handle information flooding without serious problems. A market data information system (MAIS), which is the foundation of a decision support system for German energy trading, uses search and filter components to provide decision-relevant information from Web-documents for enterprises. The already implemented filter component in form of a Multilayer Perceptron has to be benchmarked against different existing algorithms to enhance the classification of text documents. An evaluation environment with appropriate algorithms is developed for this purpose. Also a set of test data is provided and a tool selection is shown which implement different text mining algorithms for classification. The benchmark results will be shown in the paper.

References

  1. Bishop, C. M., 1995. Neural Networks for Pattern Recognition. Clarendon Press, Oxford 1995.
  2. Codd, E.; Codd, S.; Salley, C., 1993. Providing OLAP (On-line Analytical Processing) to User-Analysts. An IT Mandate. White Paper. Arbor Software Corporation.
  3. Collins, M., 2002. Ranking Algorithms for Named-EntityExtraction: Boosting and the Voted-Perceptron. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, July 2002, pp. 489-496.
  4. Colomb, R. M., 2002. Information Retrieval - The Architecture of Cyberspace, London.
  5. Computer Zeitung, 2004 (no author). Wildwuchs in der Ablage, in: Computer Zeitung, 35. Jahrgang, Nr. 50, 6. Dezember 2004, p. 17.
  6. Freund Y.; Schapire R., 1999. Large Margin Classification Using the Perceptron Algorithm, Machine Learning 37, Dordrecht, pp. 277-296.
  7. Hackathorn, R. D., 1998. Web Farming for the Data Warehouse, San Francisco.
  8. Hosmer, D. W.; Lemeshow, S., 2000. Applied logistic regression, 2. edition. New York.
  9. Inmon, W. H., 2002. Building the Data Warehouse, 3rd Edition. Wiley, New York.
  10. Joachims, T., 1998. Text Categorization with Support Vector Machines: Learning with Many Relevant Features. Forschungsbericht des Lehrstuhls VIII (KI), Fachbereich Informatik, Universität Dortmund.
  11. Kamphusmann, T., 2002. Text-Mining. Eine praktische Marktübersicht. Symposium, Düsseldorf.
  12. Kobayashi, M.; Aono, M., 2004. Vector Space Models for Search and Cluster Mining. In (Berry, M., Ed.): Survey of Text Mining. Clustering, Classification, and Retrieval. ACM, New York et al.; pp. 103 - 122.
  13. Pampel, F. C., 2000. Logistic Regression. A primer. Thousand Oaks: Sage.
  14. Rosenblatt, F., 1958. The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain. Psychological Review, 65, 1958, pp. 386 - 408. (reprint in: Neurocomputing (MIT Press, 1998).)
  15. Sebastiani, F., 2002. Machine Learning in Automated Text Categorization. In: ACM Computing Surveys, Vol. 34, No. 1, March 2002, pp. 1 - 47.
  16. Sheng, J., 2005. A Study of AdaBoost in 3 D Gesture Recognition, http://www.dgp.toronto.edu/jsheng/doc/CSC2515/Re port.pdf, last call 2005-02-03.
  17. Tveit, A., 2002. Empirical Comparison of Accuracy and Performance for the MIPSVM classifier with Existing Classifiers.
  18. http://www.idi.ntnu.no/amundt/publications/2003/MI PSVMClassificationComparison.pdf, last call at 2005- 02-02.
  19. Witten, I. H.; Frank, E., 2000. Data Mining: Practical machine learning tools with Java implementations. Morgan Kaufmann, San Francisco.
Download


Paper Citation


in Harvard Style

Felden C. and Chamoni P. (2005). EVALUATION OF TEXT CLASSIFICATION ALGORITHMS - for a Web-based Market Data Warehouse . In Proceedings of the First International Conference on Web Information Systems and Technologies - Volume 1: WEBIST, ISBN 972-8865-20-1, pages 359-362. DOI: 10.5220/0001233903590362


in Bibtex Style

@conference{webist05,
author={Carsten Felden and Peter Chamoni},
title={EVALUATION OF TEXT CLASSIFICATION ALGORITHMS - for a Web-based Market Data Warehouse},
booktitle={Proceedings of the First International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,},
year={2005},
pages={359-362},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001233903590362},
isbn={972-8865-20-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the First International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,
TI - EVALUATION OF TEXT CLASSIFICATION ALGORITHMS - for a Web-based Market Data Warehouse
SN - 972-8865-20-1
AU - Felden C.
AU - Chamoni P.
PY - 2005
SP - 359
EP - 362
DO - 10.5220/0001233903590362