2. Integration of data (data integration) Data
integration is the combination of data from
different databases into the new database.
3. Selection of Data (Data Selection) Data contained
in the database is often not all used, therefore only
the appropriate data to be analyzed to be retrieved
from the database.
4. The data transformation (Data Transformation
Services) Data amended or merged into a format
suitable for processing in Data Mining.
5. The process of mining, It is a major process when
the method is applied to find valuable and hidden
knowledge from data.
6. Evaluation of the pattern (pattern evaluation) To
identify interesting patterns into knowledge-based
alerts.
7. Presentation of knowledge (knowledge
presentation) A visualization and presentation of
knowledge about the methods used to obtain the
knowledge acquired.
2.1.1 Classification
Classification is a data mining technique that maps
the data into predefined groups or classes. It is a
supervised learning method labeled roommates
training requires the data to generate rules for
classifying the data into predetermined test groups or
classes (Dunham, 2003). The method of classification
refers to the formation of groups of data by applying
known algorithms to the data warehouse under
examination. This method is useful for business
processes that require categorical information such as
marketing or sales. It can use various algorithms such
as nearest neighbor, decision tree learning, and
others.Decision Trees are also used to explore the
data, find hidden relationships among a number of
candidates for the input variables with a target
variable.The decision tree combines data exploration
and modeling, so it is great as a first step in the
modeling process even when used as the final models
of several other techniques.
2.2 Decision Tree
The decision tree is a prediction model technique that
can be used for classification and prediction of tasks.
The Decision Tree uses the technique of "divide and
conquers" to divide the problem-finding space into a
set of problems (Dunham, 2003). The process on the
decision tree is to change the shape of the data table
into a model tree. The model tree will generate rule
and simplified (Basuki & Syarif, 2003).
The advantages of the decision tree method are:
1. The area of decision making that was previously
complex and very global, can be changed to be
more simple and specific.
2. Elimination calculations are not necessary
because when using decision tree method the
samples tested was based criteria or a particular
class.
3. Flexible to choose features from the different
internal nodes, feature selected will distinguish
criteria other than the criteria in the same node.
The flexibility of this decision tree method
increases the quality of the resulting decisions
than when using the method of calculating the
phase of a more conventional
Figure 1. Decision Tree Concept
2.3 C4.5 Algorithm
There are several steps in making a decision tree
algorithm C4.5, Larose, namely:
1. Prepare the training data. The training data are
typically taken from historical data that never
happened before or referred to the past data and is
already classified in a particular class.
2. Calculate the root of the tree. The roots will be
taken of the attributes to be elected, by calculating
the value of the gain of each attribute, the highest
gain value which will be the first roots. Before
calculating the gain of attribute values, first,
calculate the value of entropy. To calculate the
value of entropy used the formula:
...................... .................. (1)
by:
S: The set Case
A: Features
n: number of partitions S
pi: The proportion of Si to S
4. Calculate the value of Gain using the
equation.