materials. Therefore Data Mining actually has
long roots from fields such as artificial
intelligent, machine learning, statistics and
databases. Data mining is the process of
applying this method to data with a view to
uncovering hidden patterns. With other
meanings Data mining is the process for
extracting patterns from the data. Data mining
becomes an increasingly important tool for
converting data into information. It is often used
in various profile practices, such as marketing,
surveillance, fraud detection and scientific
discovery. It has been used for years by
businesses, scientists and governments to filter
the volume of data such as flight passenger travel
records, census data and supermarket data
scanners to generate market research reports.
The main reason for using data mining is to assist
in the analysis of behavioural observation
collections. The data is susceptible to
collinearity due to known association.
2.2 Classification and Regression
Non-linear these two methodologies consist of a set
of techniques for predicting combinations of input
variables that fit with linear and non-linear
combinations of basic functions (sigmoid, splines,
polynomials). Examples include feed forward neural
networks, adaptive spline methodologies, and pursuit
regression projection. shows the boundary type of
non-linear decisions that may be generated by neural
networks. The non-linear regression
methodology, although sophisticated in its
representation, may be difficult to interpret. If
model space is widened to facilitate more
general expressions (e.g. multivariate hyper
planes at various angles), then this model
more sophisticated for prediction.
Figure 1. Process Stages of KDD (knowledge discovery in
Only, it may be more difficult for the user to
2.3 Decision Tree
The concept of Decision tree is one of the first
techniques of decision analysis. Trie were first
introduced in the 1960s by Fredkin. Trie or
digital tree derived from the word retrieval in
accordance with its function. Etymologically
this word is pronounced as 'tree'. Although
similar to the use of the word 'try' but it aims to
distinguish it from the general tree. In computer
science, the Trie, or prefix tree is a data structure
with an ordered tree representation used to store
an associative array of strings. Unlike a binary
search tree (BST) that does not have a node in
the tree that holds elements associated with the
previous node and, the position of each element
in the tree is crucial. All descendants of a node
have a prefix string containing elements from
that node, with root being an empty string.
Values are usually not contained in every node,
only in leaves and some nodes in the middle that
match certain elements. The Decision Tree uses
the ID3 or C4.5 Algorithm, which was first
introduced and developed by Quinlan which
stands for Iterative Dichotomiser 3 or Induction
of Decision "3" (read: Tree). The ID3 algorithm
forms a decision tree with the divide-and-
conquer data method recursively from top to
bottom. Decision Tree's establishment strategy
with ID3 algorithm is: Tree starts as a single
node (root) that represents all data. After the root
node is formed, the data on the root node will be
measured with the gain information to select
which attribute will be the attribute of the
divisor. A branch is formed from the selected
attribute into a divisor and the data will be
distributed into each branch. (Jianwei Han,
2.4 Decision Tree Model
One of common data mining used for a decision
tree is a flowchart structure that has a tree, where
each internal node signifies a test on an attribute,
each branch representing a class or class
distribution. The plot in the decision tree is
traced from the root node of the leaf node that