EVALUATION OF TEXT CLASSIFICATION ALGORITHMS - for a Web-based Market Data Warehouse
Carsten Felden, Peter Chamoni
2005
Abstract
Decision makers in enterprises cannot handle information flooding without serious problems. A market data information system (MAIS), which is the foundation of a decision support system for German energy trading, uses search and filter components to provide decision-relevant information from Web-documents for enterprises. The already implemented filter component in form of a Multilayer Perceptron has to be benchmarked against different existing algorithms to enhance the classification of text documents. An evaluation environment with appropriate algorithms is developed for this purpose. Also a set of test data is provided and a tool selection is shown which implement different text mining algorithms for classification. The benchmark results will be shown in the paper.
References
- Bishop, C. M., 1995. Neural Networks for Pattern Recognition. Clarendon Press, Oxford 1995.
- Codd, E.; Codd, S.; Salley, C., 1993. Providing OLAP (On-line Analytical Processing) to User-Analysts. An IT Mandate. White Paper. Arbor Software Corporation.
- Collins, M., 2002. Ranking Algorithms for Named-EntityExtraction: Boosting and the Voted-Perceptron. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, July 2002, pp. 489-496.
- Colomb, R. M., 2002. Information Retrieval - The Architecture of Cyberspace, London.
- Computer Zeitung, 2004 (no author). Wildwuchs in der Ablage, in: Computer Zeitung, 35. Jahrgang, Nr. 50, 6. Dezember 2004, p. 17.
- Freund Y.; Schapire R., 1999. Large Margin Classification Using the Perceptron Algorithm, Machine Learning 37, Dordrecht, pp. 277-296.
- Hackathorn, R. D., 1998. Web Farming for the Data Warehouse, San Francisco.
- Hosmer, D. W.; Lemeshow, S., 2000. Applied logistic regression, 2. edition. New York.
- Inmon, W. H., 2002. Building the Data Warehouse, 3rd Edition. Wiley, New York.
- Joachims, T., 1998. Text Categorization with Support Vector Machines: Learning with Many Relevant Features. Forschungsbericht des Lehrstuhls VIII (KI), Fachbereich Informatik, Universität Dortmund.
- Kamphusmann, T., 2002. Text-Mining. Eine praktische Marktübersicht. Symposium, Düsseldorf.
- Kobayashi, M.; Aono, M., 2004. Vector Space Models for Search and Cluster Mining. In (Berry, M., Ed.): Survey of Text Mining. Clustering, Classification, and Retrieval. ACM, New York et al.; pp. 103 - 122.
- Pampel, F. C., 2000. Logistic Regression. A primer. Thousand Oaks: Sage.
- Rosenblatt, F., 1958. The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain. Psychological Review, 65, 1958, pp. 386 - 408. (reprint in: Neurocomputing (MIT Press, 1998).)
- Sebastiani, F., 2002. Machine Learning in Automated Text Categorization. In: ACM Computing Surveys, Vol. 34, No. 1, March 2002, pp. 1 - 47.
- Sheng, J., 2005. A Study of AdaBoost in 3 D Gesture Recognition, http://www.dgp.toronto.edu/jsheng/doc/CSC2515/Re port.pdf, last call 2005-02-03.
- Tveit, A., 2002. Empirical Comparison of Accuracy and Performance for the MIPSVM classifier with Existing Classifiers.
- http://www.idi.ntnu.no/amundt/publications/2003/MI PSVMClassificationComparison.pdf, last call at 2005- 02-02.
- Witten, I. H.; Frank, E., 2000. Data Mining: Practical machine learning tools with Java implementations. Morgan Kaufmann, San Francisco.
Paper Citation
in Harvard Style
Felden C. and Chamoni P. (2005). EVALUATION OF TEXT CLASSIFICATION ALGORITHMS - for a Web-based Market Data Warehouse . In Proceedings of the First International Conference on Web Information Systems and Technologies - Volume 1: WEBIST, ISBN 972-8865-20-1, pages 359-362. DOI: 10.5220/0001233903590362
in Bibtex Style
@conference{webist05,
author={Carsten Felden and Peter Chamoni},
title={EVALUATION OF TEXT CLASSIFICATION ALGORITHMS - for a Web-based Market Data Warehouse},
booktitle={Proceedings of the First International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,},
year={2005},
pages={359-362},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001233903590362},
isbn={972-8865-20-1},
}
in EndNote Style
TY - CONF
JO - Proceedings of the First International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,
TI - EVALUATION OF TEXT CLASSIFICATION ALGORITHMS - for a Web-based Market Data Warehouse
SN - 972-8865-20-1
AU - Felden C.
AU - Chamoni P.
PY - 2005
SP - 359
EP - 362
DO - 10.5220/0001233903590362