Obviously, the Support Vector Machine of test No. 6
has highest F
ß
-value. Most of the algorithms have
reached their best result in this test (e. g. Voted
Perceptron and Simple Logistic with value 0.7615,
Rocchio with value 0.7261, or k-NN with value
0.6395). HyperPipes (0.6865) and AdaBoost.M1
(0.7360) reached the best results in test No. 3. The
MLP (0.5000) and J48 (0.7135) get their best results
in test No. 1.
Looking at the classification including the pre-
processing, the point of view has to be changed. Test
No. 1 gives good results, but requires enormous
additional pre-processing efforts. Looking at test No.
9, an F
ß
-value of 0.7865 can be found. This means
comparable results can be gained with less pre-
processing for the same algorithm. These results are
comparable to other studies, where SVM is often the
best classificator. As the MLP cannot reach the
former value of 0.818 in the first implementation of
MAIS, it is replaced by the Naïve-Bayse-algorithm
which now builds the filter component with less pre-
processing effort.
4 CONCLUSION
The recent development of analytical information
systems shows that the necessary integration of
structured and unstructured data sources in data
warehousing is possible. The implementation of
MAIS has proved this. Only documents of decision
relevance should be delivered to the management.
The ROI of data warehouse projects can be
increased, if event-based and accepted information
improves the decision quality significantly. The
information flow alignment in MAIS is equivalent to
a classification problem. The quality of classification
algorithms must be examined in regular time
intervals to guarantee best results. Therefore it is
necessary to optimize the structure of the test
environment. This environment has to support an
intersubjective and intertemporal comparability of
the test results. Classification evaluations are often
accomplished; however these results are only
important in the context of the selected data set and
evaluation environment. In order to get concrete
statements for MAIS, such an evaluation
environment and the results are described in this
paper. The most relevant documents are to be found
so not just the classification itself has to be
optimized, but the internet retrieval as well in order
to find the perfect search terms.
REFERENCES
Bishop, C. M., 1995. Neural Networks for Pattern
Recognition. Clarendon Press, Oxford 1995.
Codd, E.; Codd, S.; Salley, C., 1993. Providing OLAP
(On-line Analytical Processing) to User-Analysts. An
IT Mandate. White Paper. Arbor Software
Corporation.
Collins, M., 2002. Ranking Algorithms for Named-Entity-
Extraction: Boosting and the Voted-Perceptron. In:
Proceedings of the 40th Annual Meeting of the
Association for Computational Linguistics (ACL),
Philadelphia, July 2002, pp. 489-496.
Colomb, R. M., 2002. Information Retrieval – The
Architecture of Cyberspace, London.
Computer Zeitung, 2004 (no author). Wildwuchs in der
Ablage, in: Computer Zeitung, 35. Jahrgang, Nr. 50, 6.
Dezember 2004, p. 17.
Freund Y.; Schapire R., 1999. Large Margin Classification
Using the Perceptron Algorithm, Machine Learning
37, Dordrecht, pp. 277–296.
Hackathorn, R. D., 1998. Web Farming for the Data
Warehouse, San Francisco.
Hosmer, D. W.; Lemeshow, S., 2000. Applied logistic
regression, 2. edition. New York.
Inmon, W. H., 2002. Building the Data Warehouse, 3rd
Edition. Wiley, New York.
Joachims, T., 1998. Text Categorization with Support
Vector Machines: Learning with Many Relevant
Features. Forschungsbericht des Lehrstuhls VIII (KI),
Fachbereich Informatik, Universität Dortmund.
Kamphusmann, T., 2002. Text-Mining. Eine praktische
Marktübersicht. Symposium, Düsseldorf.
Kobayashi, M.; Aono, M., 2004. Vector Space Models for
Search and Cluster Mining. In (Berry, M., Ed.):
Survey of Text Mining. Clustering, Classification, and
Retrieval. ACM, New York et al.; pp. 103 - 122.
Pampel, F. C., 2000. Logistic Regression. A primer.
Thousand Oaks: Sage.
Rosenblatt, F., 1958. The Perceptron: A Probabilistic
Model for Information Storage and Organization in the
Brain. Psychological Review, 65, 1958, pp. 386 - 408.
(reprint in: Neurocomputing (MIT Press, 1998).)
Sebastiani, F., 2002. Machine Learning in Automated Text
Categorization. In: ACM Computing Surveys, Vol. 34,
No. 1, March 2002, pp. 1 - 47.
Sheng, J., 2005. A Study of AdaBoost in 3 D Gesture
Recognition,
http://www.dgp.toronto.edu/~jsheng/doc/CSC2515/Re
port.pdf, last call 2005-02-03.
Tveit, A., 2002. Empirical Comparison of Accuracy and
Performance for the MIPSVM classifier with Existing
Classifiers.
http://www.idi.ntnu.no/~amundt/publications/2003/MI
PSVMClassificationComparison.pdf, last call at 2005-
02-02.
Witten, I. H.; Frank, E., 2000. Data Mining: Practical
machine learning tools with Java implementations.
Morgan Kaufmann, San Francisco.
WEBIST 2005 - WEB INTERFACES AND APPLICATIONS
362