that for n instances the depth of the tree is in order of
log n, which means the tree is not degenerated into
few long branches.
The information gain measure is used to select
the test attribute at each node in the tree. We refer to
such a measure an attribute selection measure or a
measure of goodness of split. The algorithm
computes the information gain of each attribute. The
attribute with the highest information gain is chosen
as the test attribute for the given set (Jiawei Han et.
al., 2001).
Finally, the cross-validation evaluation technique
measures the correctly and incorrectly classified
instances. We consider that if there are more than
80% of instances correctly classified than we have
enough good data. The obtained model is further
used for analyzing learner’s goals and obtain
recommendations. The aim of the QM is to “guide”
the learner on the correct path in the decision tree
such that he reaches the desired class.
Regarding fulfilling course manager’s goals we
use a method for classification of learners. For this,
we employed a clustering method, which is the
process of grouping a set of physical or abstract
objects into classes of similar objects (Jiawei Han et.
al., 2001). For our platform, we create clusters of
users based on their activity and data transfer.
As a product of clustering process, associations
between different actions on the platform can easily
be inferred from the logged data. In general, the
activities that are present in the same profile tend to
be found together in the same session. The actions
making up a profile tend to co-occur to form a large
item set (R. Agrawal et. al.,1994).
There are many clustering methods in the
literature: partitioning methods, hierarchical
methods, density-based methods such as (Ester M.et.
al.. 1996), grid-based methods or model-based
methods. Hierarchical clustering algorithms like the
Single-Link method (Sibson, R., 1973) or OPTICS
(Ankerst, M. et. al.,1999) compute a representation
of the possible hierarchical clustering structure of
the database in the form of a dendrogram or a
reachability plot from which clusters at various
resolutions can be extracted.
Because we are dealing with numeric attributes,
iterative-based clustering is taken into consideration
from partitioning methods. The classic k-means
algorithm is a very simple method of creating
clusters. Firstly, it is specified how many clusters are
being thought: this is the parameter k. Then k points
are chosen at random as cluster centers. Instances
are assigned to their closest cluster center according
to the ordinary Euclidean function. Next the
centroid, or the mean, of all instances in each cluster
is calculated – this is the “means” part. These
centroids are taken to be the new center values for
their respective clusters. Finally, the whole process
is repeated with the new cluster centers. Iteration
continues until the same points are assigned to each
cluster in consecutive rounds, at each point the
cluster centers have stabilized and will remain the
same thereafter (I. H. Wittenet. al., 2000).
From a different perspective for a cluster there
may be computed the following parameters: means,
standard deviation and probability (μ, σ and p). The
EM algorithm that is employed is a k-means
clustering algorithm type. It takes into consideration
that we know neither parameters. It starts with initial
guess for the parameters, use them to calculate the
cluster probabilities for each instance, use these
probabilities to estimate the parameters, and repeat.
This is called the EM algorithm for “expectation-
maximization”. The first step, the calculation of
cluster probabilities (which are the “expected” class
values) is “expectation”; the second, calculation of
the distribution parameters is “maximization” of the
likelihood of the distributions given the data (I. H.
Wittenet. al., 2000).
The quality of clustering process is done by
computing the likelihood of a set of test data given
the obtained model. The goodness-of-fit is measured
by computing the logarithm of likelihood, or log-
likelihood: and the larger this quantity, the better the
model fits the data. Instead of using a single test set,
it is also possible to compute a cross validation
estimate of the log-likelihood.
3 EXPERIMENTAL RESULTS
The study starts by setting up the e-Learning
platform. This means that all the learners and course
managers accounts have been created and the
evaluation environment has been set up.
At this time the QM is also set up by specifying
the set of goals for learners and course managers.
For learners the set of goals from which they may
choose is:
- Minimization of the time in which a certain
level of knowledge is reached. This is accomplished
by specifying a desired grade.
- Obtaining for sure a certain grade. The learner
has to specify the grade he aims for.
Course managers may choose from two goals:
- Having a normal distribution of grades at
chapter level.
- Having a testing environment that ensures a
minimum time in which learner reaches a knowledge
level for passing the exam.
For these goals there were created two sets of
recommendations. Learners may obtain one of the
following recommendations:
SIGMAP 2007 - International Conference on Signal Processing and Multimedia Applications
280