3.4 Interpreting
We propose the use of various classification
techniques that easily interpretable models like
Linear Regression, Multilayer Perceptron, Random
Forest, IBK and K-star. Using WEKA as open
software machine learning can provides several
features of selection models. Finally these algorithms
will be executed, validated, evaluated and compared
the results in order to determine which one give the
best result with high accuracy.
3.4.1 Linear Regression
Linear regression is the best predication model to test
the cause of one dependent variable (final grade)
effect on one or more independent variables (features
in online discussion forum). The initial judgement of
a possible relationship between two continuous
variables should always be made on the basis of a
scatter plot (scatter graph).(Schneider et al 2010).
Moreover, linear regression approach is quite easy
and faster processing for large size datasets. The time
to build this algorithm is 0.05 seconds. Below the
result shows the formula of linear regression:
Grade = 0.4779 * attendance + 0.4614 *
posts + 26.2719
3.4.2 Multilayer Perceptron
Multilayer perceptron is a supervised learning
algorithm that uses the concept of neural network that
interact using weighted connections. Each node will
have a weight which, then, multiply the input node
that generate the output predication. The Weight
measure the degree of correlation between activity
levels of neuron of which they connect. (Pal & Mitra,
1992). Generally, result from multilayer perceptron
more accurate than linear regression but require a
longer processing time for large datasets because the
algorithm will always update the weight for each
instance of the data. Thus, considering such factor,
the disadvantage of Multilayer perceptron is sensitive
to feature scaling (Pedregosa, 2011).
There are three hidden node labelled sigmoid
node 1, 2 and 3. Attribute Posting, ATT and FOD
seem to have nearly the same weight and sign in all
the neurons. Below show the result of multilayer
perceptron with the time taken to build 0.11 seconds:
Sigmoid Node 1
Inputs Weights
Threshold -0.3389339469178622
Attrib Posting 0.6356339310638692
Attrib Login -1.971194964716918
Attrib Forum 0.1528793652016145
Attrib ATT -2.9824012894200167
Attrib FOD -1.2565096616525258
Sigmoid Node 2
Inputs Weights
Threshold -0.3319752049637097
Attrib Posting 0.8489632795859472
Attrib Login 0.8981808286647163
Attrib Forum 1.1775792813836161
Attrib ATT -0.2727426863562934
Attrib FOD -1.4842188659857705
Sigmoid Node 3
Inputs Weights
Threshold -1.4238193757464874
Attrib Posting 2.516298013366708
Attrib Login 0.7532046884360826
Attrib Forum -0.15476793041226244
Attrib ATT -0.010654173314826458
Attrib FOD 2.257937779725289
3.4.3 Random Forest
The random forest was founded by Breiman in 2001
(Breiman 2001), as implemented in WEKA, is an
ensemble of unpruned classification trees that use
majority voting to perform prediction. The Random
forest combines the predictions from classification
trees using an algorithm similar to C4.5 (J48 in
Weka). (Khoshgoftaar 2007).
3.4.4 IBK (K-Nearest Neighbour)
IBK is a k-nearest-neighbour classifier. It is also
known as ‘lazy learning’ technique for the classifier
construction process needs only little effort and,
mostly, the work is performed along with the process
of classification.(Khoshgoftaar 2007). Various
combinations of search algorithms can be utilized to
ease the task of finding the nearest neighbours.
Normally, linear search is the most commonly used
but there are other options which are also potential
including KD-trees, ball trees and cover trees”.
(Vijayarani & Muthulakshmi 2013)
Predictions made by considering more than one
neighbour can be weighted based on the distance
from the test instance and, then, the distance is
converted into the weight by implemented two
different formulas. (Vijayarani & Muthulakshmi
2013).
3.4.5 K-Star
K* algorithm is an instance-based learner using
entropy to quantify the distance. It is considerably
Student Performance Prediction using Online Behavior Discussion Forum with Data Mining Techniques
93