Authors: Mirela Teixeira Cazzolato ; Marcela Xavier Ribeiro ; Cristiane Yaguinuma and Marilde Terezinha Prado Santos

Affiliation: Federal University of São Carlos, Brazil

ISBN: 978-989-8565-59-4

ISSN: 2184-4992

Keyword(s): Data Stream Mining, Classification, Decision Tree, VFDT, StARMiner Tree, Anytime Algorithm.

Related Ontology Subjects/Areas/Topics: Artificial Intelligence ; Data Engineering ; Data Mining ; Databases and Data Security ; Databases and Information Systems Integration ; Enterprise Information Systems ; Large Scale Databases ; Sensor Networks ; Signal Processing ; Soft Computing

Abstract: A large amount of data is generated daily. Credit card transactions, monitoring networks, sensors and telecommunications are some examples among many applications that generate large volumes of data in an automated way. Data streams storage and knowledge extraction techniques differ from those used on traditional data. In the context of data stream classification many incremental techniques has been proposed. In this paper we present an incremental decision tree algorithm called StARMiner Tree (ST), which is based on Very Fast Decision Tree (VFDT) system, which deals with numerical data and uses a method based on statistics as a heuristic to decide when to split a node and also to choose the best attribute to be used in the test at a node. We applied ST in four datasets, two synthetic and two real-world, comparing its performance to the VFDT. In all experiments ST achieved a better accuracy, dealing well with noise data and describing well the data from the earliest examples. However, in three of four experiments ST created a bigger tree. The obtained results indicate that ST is a good classifier using large and smaller datasets, maintaining good accuracy and execution time. (More)

