TOWARDS BUILDING FAIR AND ACCURATE EVALUATION

ENVIRONMENTS

Dumitru Dan Burdescu and Marian Cristian Mihăescu

Software Engineering Department, University of Craiova, Bvd. Decebal Nr. 107, Craiova, Romania

Keywords: Knowledge acquisition, Knowledge Management, Decision Support System, Measurement,

Experimentation.

Abstract: Each e-Learning platform has implemented means of evaluating learner’s knowledge by a specific grading

methodology. This paper proposes a methodology for obtaining knowledge about the testing environment.

The obtained knowledge is further used in order to make the testing system more accurate and fair.

Integration of knowledge management into an e-Learning system is accomplished through a dedicated

software module that analyzes learner’s performed activities, creates a learner’s model and provides a set of

recommendations for course managers and learners in order to achieve prior set goals.

1 INTRODUCTION

Every e-Learning platform has implemented a

mechanism for assessing the quantity of

accumulated knowledge for a certain discipline. A

problem that frequently arises is that the system in

place may not be fair regarding the ordering of

learners according with accumulated knowledge.

Usually, there are situations when the distributions

of grades is not normal, such that many learners are

clustered although there are differences regarding

their accumulated knowledge.

In order to estimate the way a platform evaluates

learners we have developed a separate software

module that has as input the actions executed by

learners and a set of goals and as output conclusions

and a set of recommendations. This module is called

Quality Module (QM) and is presented in Figure 1.

Figure 1: Functionality of Quality Module.

As presented in Figure 1 the input is represented

by learner’s activities and by goals. Learner’s

activities represent the data used for creating

learner’s model. The goals represent the criteria that

need to be optimized in order to obtain a better

evaluation environment.

The evaluation environment is represented by

the setup put in place within an e-Learning platform

for assessment of learners. The setup consists of

course materials and test quizzes that are set up by

course managers. The performed analysis and the

results obtained by QM may be performed only

when all setup has been done.

Learner’s activities are obtained by specific

methods embedded in our e-Learning platform,

called Tesys (D. D. Burdescu, C. M. Mihăescu,

2006). The activities are logged in files and in a

database table and processed off-line by QM. Goals

regard course administrators and learners and are

finally translated into parameters for the QM.

Conclusions obtained by QM regard the level of

fulfilling proposed goals. This is an objective

measure of the quality of the evaluation

environment. On the other hand, the

recommendations represent advice for course

managers and learners. The aim of recommendations

is to increase the quality of the evaluation

environment. The procedure consists of several

steps. Firstly, the platform has to produce enough

data regarding the learner’s performed activities

such that a learner’s model of good quality is

obtained. At this step there are also set up goals.

Course managers set goals regarding their course

and learners set up their own goals. This step is

called SETUP and is considered to be the most

important one since next steps heavily rely on it.

After the model has been obtained the next step

is to obtain recommendations. The recommendations

277

Dan Burdescu D. and Cristian Mih

aescu M. (2007).

TOWARDS BUILDING FAIR AND ACCURATE EVALUATION ENVIRONMENTS.

In Proceedings of the Second International Conference on Signal Processing and Multimedia Applications, pages 273-278

DOI: 10.5220/0002136402730278

 SciTePress

are supposed to be strictly followed by course

managers. The period in which course managers

carry out the recommendations is called EEI

(Evaluation Environment Improvement). The

activities performed by learners in this period will

not be taken into consideration regarding in the

learner’s model or recommendations by the QM.

After the EEI period ends a new dataset of learner’s

performed actions is recorded. This dataset is used

for rebuilding the learner’s model and reevaluation

of initially set goals. This step is called EER

(Evaluation Environment Reevaluation).

Regarding the learner’s recommendations, the e-

Learning platform has implemented means of

keeping track of recommendations made to learners

and the way the recommendations were followed.

This is accomplished also in EER step. The QM

provides at this step conclusions regarding the

quality of recommendations by evaluating whether

or not the learners were helped to reach their goals

or not.

Figure 2: Logic of Quality Module.

This three step process may have as many

iterations as needed. Each reevaluation step

compares a challenger learner’s model with initial

model in terms of classification accuracy. The model

with best accuracy will be further used for making

recommendations to learners. The challenger model

is based also on newly recorded data from the time

old model has been obtained. It is a primary concern

to continuously improve the learner’s model in terms

of classification accuracy. This is the basis for

obtaining valuable recommendations for learners

and course managers.

For course managers, the reevaluation step

checks if recommendations for course managers

helped in reaching their goals. Besides measuring

the progress made in reaching their goals, a new set

of recommendations is obtained for the new status of

the evaluation environment.

As presented, the QM has as primary tasks

obtaining a learner’s model and estimating the

distribution of learners when classifying them

according to accumulated knowledge. This

represents the Knowledge Management (KM) part

of the QM. In this way we present a way in which

learning can profit from available KM concepts and

technologies (R. Ericet. al. 2005).

Knowledge is considered to be “the information

needed to make business decisions” (

P. Manchester,

1999

), and so knowledge management is the

“essential ingredient of success” for 95 per cent of

CEOs (

P. Manchester, 1999).

The following picture presents the relation

between the e-Learning platform and QM.

Figure 3: Relation between Quality Module and e-

Learning platform.

An important aspect regarding the QM is the

structure of data set as input and how the goals are

specified.

Within the e-Learning platform there were

implemented specific mechanism of logging and

recording performed activities in structured format.

This is accomplished in a table from the database

which has the structure presented in Table 1.

Table 1: Structure of activity table.

Field Description

id primary key

userid

identifies the user who performed the

action

date

stores the date when the action was

performed

action stores a tag that identifies the action

details stores details about performed action

level specifies the importance of the action

Regarding the goals, when the QM is set up

there are created two sets: one with goals for

learners and one with goals for course managers.

Each learner or course manager may set up his own

goals in the SETUP step by choosing one goal form

the set of goals. This step ends when there has been

enough activity registered such that an accurate

learner’s model is created.

QM used machine learning and modelling

techniques as business logic. The KM techniques

that we use are decision trees and clustering

methods. In short, decision trees are used for

verifying the “goodness” of data and obtaining the

learner’s model while clustering is used for

SIGMAP 2007 - International Conference on Signal Processing and Multimedia Applications

278

obtaining conclusions and recommendations. The

whole process is accomplished the standard

modelling steps: defining the objective, preparing

the sources of web data, selecting the methodology,

processing and evaluating the model, validating the

model, implementing and maintaining the model

(Olivia Parr Rud, 2001).

2 EMPLOYED KNOWLEDGE

MANAGEMENT CONCEPTS

AND TECHNOLOGIES

As presented in introduction, the QM produces

recommendations for learners and for course

managers. The recommendations are obtained by

analyzing a learner’s model that is created based on

performed actions.

Within Tesys e-Learning platform the actions are

represented by all performed activities that are

logged or other information that may be derived

(e.g. average grade of tests, number of tests). Among

the logged activities that are part of model’s

parameters are: logging into the Tesys platform,

taking a test, sending a message to a course

manager, downloading course materials.

Besides activity data, Tesys platform has

implemented a transfer function that associates the

amount of transferred data with the corresponding

action that triggered the transfer. The data traffic that

is transferred by learners represent another feature of

the learner model that is created.

The whole process is conducted following the

steps of target modelling (see figure 4) (Olivia Parr

Rud, 2001).

Defining the goal represents the first step. Our

goal is to create a model of analysis for Tesys e-

Learning platform that is to be used for optimizing

the criteria specified by learners and course manager

goals. Setting up the goals is accomplished by

formally defining the criteria that is to be evaluated

and optimized. Selection and preparation of data are

the next steps. Here, we have to determine the

necessary data that will enter the modelling process.

The preparation gets that data and puts it into a form

ready for processing of the model. Since the

processing is done using machine-learning

algorithms implemented in Weka workbench (Ian H.

Witten et. al. 2000), the output of preparation step is

in the form of an arff file. Under these

circumstances, we have developed an offline Java

application that queries the platform’s database and

crates the input data file called activity.arff. This

process is automated and is driven by a property file

in which there is specified what data will lay in

activity.arff file.

Figure 4: Steps for target modeling.

For a learner in our platform we may have a very

large number of attributes. Still, in our procedure we

used only three: the number of loggings, the number

of taken tests and the number of sent messages. Here

is how the arff file looks like:

@relation activity

@attribute noOfLogins {<10,<50,<70,<100,>100}

@attribute noOfTests {<10,<20,<30,<50,>50}

@attribute noOfSentMessages {<10,<20,<30,<50,>50}

@attribute dataTraffic {<10,<20,>20}

@data

<50,<10,<10,<10

>100,<20,<20,<20

As it can be seen from the definition of the

attributes each of them has a set of nominal values

from which only one may be assigned. The values of

the attributes are computed for each of the 650

learners and are set in the @data section of the file.

For example, the first line says that the learner

logged in less than fifty times, took less than ten

tests, sent less than ten messages to professors and

had a data traffic less than 10MB.

Now, since we have prepared the data we start

analyzing it. Choosing between two learning

algorithms given a single dataset is not a trivial task

(R. Agrawal et. al. 1994). Firstly, we make sure the

data is relevant. We test the “goodness” of data

trying to build a decision tree like C4.5 (R. Quinlan,

1993) from data. A decision tree is a flow-like-chart

tree structure where each internal node denotes a test

on an attribute, each branch represents an outcome

of the test and leaf nodes represent classes (Jiawei

Han et. al., 2001).

The basic algorithm for decision tree induction is

a greedy algorithm that constructs the decision tree

in a top-down recursive divide-and-conquer manner

(Jiawei Han et. al., 2001).

The computational cost of building the tree is

O(mn log n)(I. H. Wittenet. al., 2000). It is assumed

TOWARDS BUILDING FAIR AND ACCURATE EVALUATION ENVIRONMENTS

279

that for n instances the depth of the tree is in order of

log n, which means the tree is not degenerated into

few long branches.

The information gain measure is used to select

the test attribute at each node in the tree. We refer to

such a measure an attribute selection measure or a

measure of goodness of split. The algorithm

computes the information gain of each attribute. The

attribute with the highest information gain is chosen

as the test attribute for the given set (Jiawei Han et.

al., 2001).

Finally, the cross-validation evaluation technique

measures the correctly and incorrectly classified

instances. We consider that if there are more than

80% of instances correctly classified than we have

enough good data. The obtained model is further

used for analyzing learner’s goals and obtain

recommendations. The aim of the QM is to “guide”

the learner on the correct path in the decision tree

such that he reaches the desired class.

Regarding fulfilling course manager’s goals we

use a method for classification of learners. For this,

we employed a clustering method, which is the

process of grouping a set of physical or abstract

objects into classes of similar objects (Jiawei Han et.

al., 2001). For our platform, we create clusters of

users based on their activity and data transfer.

As a product of clustering process, associations

between different actions on the platform can easily

be inferred from the logged data. In general, the

activities that are present in the same profile tend to

be found together in the same session. The actions

making up a profile tend to co-occur to form a large

item set (R. Agrawal et. al.,1994).

There are many clustering methods in the

literature: partitioning methods, hierarchical

methods, density-based methods such as (Ester M.et.

al.. 1996), grid-based methods or model-based

methods. Hierarchical clustering algorithms like the

Single-Link method (Sibson, R., 1973) or OPTICS

(Ankerst, M. et. al.,1999) compute a representation

of the possible hierarchical clustering structure of

the database in the form of a dendrogram or a

reachability plot from which clusters at various

resolutions can be extracted.

Because we are dealing with numeric attributes,

iterative-based clustering is taken into consideration

from partitioning methods. The classic k-means

algorithm is a very simple method of creating

clusters. Firstly, it is specified how many clusters are

being thought: this is the parameter k. Then k points

are chosen at random as cluster centers. Instances

are assigned to their closest cluster center according

to the ordinary Euclidean function. Next the

centroid, or the mean, of all instances in each cluster

is calculated – this is the “means” part. These

centroids are taken to be the new center values for

their respective clusters. Finally, the whole process

is repeated with the new cluster centers. Iteration

continues until the same points are assigned to each

cluster in consecutive rounds, at each point the

cluster centers have stabilized and will remain the

same thereafter (I. H. Wittenet. al., 2000).

From a different perspective for a cluster there

may be computed the following parameters: means,

standard deviation and probability (μ, σ and p). The

EM algorithm that is employed is a k-means

clustering algorithm type. It takes into consideration

that we know neither parameters. It starts with initial

guess for the parameters, use them to calculate the

cluster probabilities for each instance, use these

probabilities to estimate the parameters, and repeat.

This is called the EM algorithm for “expectation-

maximization”. The first step, the calculation of

cluster probabilities (which are the “expected” class

values) is “expectation”; the second, calculation of

the distribution parameters is “maximization” of the

likelihood of the distributions given the data (I. H.

Wittenet. al., 2000).

The quality of clustering process is done by

computing the likelihood of a set of test data given

the obtained model. The goodness-of-fit is measured

by computing the logarithm of likelihood, or log-

likelihood: and the larger this quantity, the better the

model fits the data. Instead of using a single test set,

it is also possible to compute a cross validation

estimate of the log-likelihood.

3 EXPERIMENTAL RESULTS

The study starts by setting up the e-Learning

platform. This means that all the learners and course

managers accounts have been created and the

evaluation environment has been set up.

At this time the QM is also set up by specifying

the set of goals for learners and course managers.

For learners the set of goals from which they may

choose is:

- Minimization of the time in which a certain

level of knowledge is reached. This is accomplished

by specifying a desired grade.

- Obtaining for sure a certain grade. The learner

has to specify the grade he aims for.

Course managers may choose from two goals:

- Having a normal distribution of grades at

chapter level.

- Having a testing environment that ensures a

minimum time in which learner reaches a knowledge

level for passing the exam.

For these goals there were created two sets of

recommendations. Learners may obtain one of the

following recommendations:

SIGMAP 2007 - International Conference on Signal Processing and Multimedia Applications

280

- More study is necessary for chapter X.

- You may go to the next chapter.

- You need to take more tests at chapter X.

For course managers the set of recommendations

is:

- At chapter X there are needed harder/easier

questions.

- At chapter X there are to few/many questions.

This platform is currently in use and has three

sections and at each section, four disciplines. Twelve

professors are defined and more than 650 learners.

At all disciplines, there are edited almost 2500

questions. In the first month of usage, almost 500

tests were taken. In the near future, the expected

number of learners may be close to 1000.

Recording learner’s activity under these

circumstances provides great information regarding

user traffic. After six month of usage, there are more

than 40,000-recorded actions.

With data from database (especially from

activity table), we follow the presented methodology

of analyzing the platform. We look at three different

ways in which the input can be massaged to make it

more amenable for learning schemes: attribute

selection, attribute discretization and data cleansing

(I. H. Wittenet. al., 2000). In many practical

situations, there are far too many attributes for

learning schemes to handle, and some of them –

perhaps the overwhelming majority – are clearly

irrelevant or redundant. Consequently, the data must

be preprocessed to select a subset of attributes to use

in learning. Of course, learning schemes themselves

try to select attributes appropriately and ignore

irrelevant and redundant ones, but in practice, their

performance can frequently be improved by

preselection.

Therefore, we define the set of attributes that are

used in our process. Choosing the attributes is highly

dependent on data that we have, domain knowledge

and experience. For our classification we choose

four attributes: nLogings – the number of loggings,

nTests – the number of taken tests, avgTests – the

average of taken tests and nSentMessages – the

number of sent messages, dataTraffic – the quantity

of data traffic transferred by learner. For each

registered learner the values of these attributes are

determined based on the data from the presented

relations. Each learner is referred to as an instance

within the process.

The values of attributes are computed for each

instance through a custom developed off-line Java

application. The outcome of running the application

is in the form of a file called activity.arff that will

later be used as data source file.

Now we are ready to start processing the model.

The first step estimates the “goodness” of data. After

running the algorithm, the obtained decision tree had

17 leaves (which represent in fact classes) and 25

nodes. The time to build the model was 0.13

seconds. The stratified cross-validation evaluation

technique revealed that 575 (88.6 %) instances were

correctly classified and 75 (11.4%) were incorrectly

classified. The confusion matrix showed exactly the

distribution of incorrectly classified instances among

classes. The results prove that obtained model is

accurate enough for creating recommendations

based on it.

For obtaining recommendations for course

managers we have used the EM algorithm. Running

the EM algorithm created four clusters. The

procedure clustered 130 instances (20%) in cluster 0,

156 instances (24%) in cluster 1, 169 instances

(26%) in cluster 2 and 195 instances (30%) in cluster

3. For these clusters, we compute the likelihood of a

set of test data given the model. Weka measures

goodness-of-fit by the logarithm of the likelihood, or

log-likelihood: and the larger this quantity, the better

the model fits the data. Instead of using a single test

set, it is also possible to compute a cross validation

estimate of the log-likelihood. For our instances, the

value of the log-likelihood is -2.61092, which

represents a promising result in the sense that

instances (in our case learners) may be classified in

four disjoint clusters based on their activity.

After the model has been created the

recommendations towards course managers were

made and the evaluation environment was altered

accordingly. After this EEI step (see Figure 2) the

QM started offering recommendations to learners.

The recommendations and the behavior of

learners (whether or not they followed

recommendations) were logged for further analysis.

The behavior of learners has a very important

role in obtaining challenger learner’s models that at

some point may replace the current one.

On the other hand, checking whether or not the

learners followed the recommendations may lead to

conclusions regarding the quality of

recommendations and of currently employed

learner’s model.

4 CONCLUSIONS

This paper presents a module that runs along an e-

Learning platform and makes it a better evaluation

environment .

The platform has built in capability of

monitoring and recording learner’s activity. Stored

activity and data traffic represents the data that we

analyze to obtain improve the quality of the

evaluation environment.

TOWARDS BUILDING FAIR AND ACCURATE EVALUATION ENVIRONMENTS

281

Our QM produces recommendations for learners

and course managers using different machine

learning techniques on the activity data obtained

from the platform. We use Weka workbench (I. H.

Wittenet. al., 2000) as environment for running

state-of-the-art machine learning algorithms and data

preprocessing tools. We have developed a custom

application that gets the activity data from the

platform and transforms it into the specific file

format used by Weka, called arff.

A decision tree learner is used for estimating

whether or not the data may be used to obtain

significant results. The outcome of decision tree

validation is the percentage of correctly classified

instances. We say that a value of over 80% in correct

classified instances is a promise that we might

finally obtain useful knowledge.

Clustering is used for estimating the

classification capability evaluation environment.

This is mainly performed to obtain

recommendations for course managers.

We have tested this procedure on data obtained

from the e-Learning platform on which 650 learners

were enrolled and had activity for six month. The

results are satisfactory and prove that the evaluation

environment can be successfully used in an e-

Learning process.

We plan using the QM on the same evaluation

environment (same disciplines and same test and

exam questions) but on different set of learners. This

may lead to further and continuous improvement of

the evaluation environment.

The QM may also run near other evaluation

environments in order to analyze goals and produce

recommendations. This would add important domain

knowledge and may significantly improve the

feature selection process and the business logic of

the QM.

REFERENCES

D. D. Burdescu, C. M. Mihăescu (2006). Tesys: e-

Learning Application Built on a Web Platform.

Proceedings of International Joint Conference on e-

Business and Telecommunications, Setubal, Portugal,

pp. 315-318.

Ras Eric, Memmel Martin, Weibelzahl Stephan (2005).

Integration of E-learning and knowledge management

- barriers, solutions and future issues. Biennial

Conference on Professional Knowledge Management -

WM 2005, Kaiserslautern, Germany, pp. 155-164.

Olivia Parr Rud (2001). Data Mining Cookbook –

Modeling Data for Marketing, Risk, and Customer

Relationship Management. Wiley Computer

Publishing.

Jiawei Han, Micheline Kamber (2001). Data Mining –

Concepts and Techniques. Morgan Kaufmann

Publishers.

I. H. Witten, E. Frank (2000). Data Mining – Practical

Machine Learning Tools and Techniques with Java

Implementations. Morgan Kaufmann Publishers.

R. Agrawal and R. Srikant (1994). Fast algorithms for

mining association rules. Proc. of the 20th VLDB

Conference, Santiago, Chile, pp. 487-499.

R. Quinlan (1993). C4.5: Programs for Machine Learning.

Morgan Kaufmann Publishers, San Mateo, CA.

S. Salzberg (1997). On Comparing Classifiers: Pitfalls to

Avoid and a Recommended Approach”. Data Mining

and Knowledge Discovery 1:3, pp. 317-327.

Nasraoui O., Joshi A., and Krishnapuram R. (1999).

Relational Clustering Based on a New Robust

Estimator with Application to Web Mining. Proc. Intl.

Conf. North American Fuzzy Info. Proc. Society

(NAFIPS 99), New York.

B. Mobasher, N. Jain, E-H. Han, and J. Srivastava (1996).

Web mining: Pattern discovery from World Wide Web

transactions. Technical Report 96-050, University of

Minnesota.

Ester M., Kriegel H.-P., Sander J., Xu X. (1996). A

Density-Based Algorithm for Discovering Clusters in

Large Spatial Databases with Noise. Proc. KDD’96,

Portland, OR, pp.226-231.

Sibson, R. (1973). SLINK: An Optimally Efficient

Algorithm for the Single-link Cluster Method. The

Computer Journal, 16(1): 30-34.

Ankerst, M., Breuing, M., Kriegel, H-P., Sander, J. (1999).

OPTICS: Ordering Points to Identify the Clustering.

Structure. In SIGMOD’99, 49-60.

Philip, Manchester (1999). Survey – Knowledge

Management. Financial Times.

SIGMAP 2007 - International Conference on Signal Processing and Multimedia Applications

282