Rule Induction Algorithms Applied in Educational Data Set

Walisson Carvalho

, Cristiane Nobre

and Luis Zarate

Centro Universitario Una, Belo Horizonte, Brazil

Pontiﬁcia Universidade Catolica de Minas Gerais, Belo Horizonte, Brazil

Keywords: Rule Induction Algorithms, Educational Data Mining, Classiﬁcation, OneR, RIPPER, PART.

Abstract:

This article presents the application of three induction rules algorithm: OneR, RIPPER and PART in an edu-

cational data set aiming to explain the main factors that lead students to be succeed or failure in online course.

The dataset used to develop this article was extracted from the log of activities of engineering students that

enrolled in a 20 weeks course of Algorithm offered online. The students used Learning Management System,

Moodle. The dataset was preprocessed and then it was applied the algorithms into it. As result it was observed

that students who begin earlier an assignment improve their probability of succeed.

1 INTRODUCTION

Due to the accessibility of Information Technology,

over the last years, the process of teaching and learn-

ing had been changing of paradigm. One big change

is related to the modality of teaching, day after day the

traditional, face-to-face, modality is being replaced

by online or blended teaching.

The hybrid teaching, as well as online, makes

use of Learning Management System (LMS) or e-

learning systems such as Moodle

, Eliademy

and

others. These LMS records data of the actions such

as reading materials, interactions, chats, assignments

and others from students and teachers as server logs.

As a result, a big amount of data related to the behav-

ior of the students, teachers are being produced.

Educational Data Mining (EDM) is an interdisci-

plinary research ﬁeld of Data Mining that focus on de-

veloping methods and analyzing data that come from

education sector. The main goal of EDM is to im-

prove teaching and learning process. According to

Romero and Ventura (2010), even though in EDM the

dataset used are related to education, the methods and

techniques are basically the same of the traditional

Data Mining (DM).

YACEF (2009) classify EDM in ﬁve categories,

they are: i) prediction; ii) clustering; iii) Relationship

mining; iv) Distillation of data for human judgment

and v) Discovery with models. The ﬁrst three cate-

https://moodle.com/

https://eliademy.com/

gories are like most of traditional DM methods, the

other two are not so usual in traditional DM.

The ﬁrst category is splitted in three sub-areas: i)

classiﬁcation; ii) Regression and iii) Density estima-

tion. The focus of this article is in the second sub-

area, classiﬁcation.

Classiﬁcation is a learning supervised technique

that aim at categorizing data from prior information.

According to Qin et al. (2009), even though there is

a great number of algorithms dealing with classiﬁca-

tion, classiﬁers are still a big challenge in machine

learning.

Two approaches are used in order to classify an

object: black-box and rule-based classiﬁers. Black-

box classiﬁers such as SVM, Neural Network, and

other despite of having good performance, does not

explain how an object were classiﬁed in one or an-

other class. On the other hand, classiﬁers based on

rules explicit the rules used during the process of clas-

siﬁcation. Thus, this approach makes it viable to cre-

ate models that explain why an instance were des-

ignated to a speciﬁc class (Devasena and Sumathi,

2012).

In this scenario, the main goal of this article is to

extract rules from an educational dataset from an on-

line course offered to engineers’ students in the face to

face modality of a private university in Brazil. From

the rules, it is expected to understand the students’ be-

havior that lead them to succeed or fail in the course.

To develop this study the dataset was submit-

ted to three algorithms of rule-base classiﬁers, they

are OneR (Holte, 1993), RIPPER (Cohen, 1995) and

Carvalho, W., Nobre, C. and Zarate, L.

Rule Induction Algorithms Applied in Educational Data Set.

DOI: 10.5220/0007832506910696

In Proceedings of the 11th International Conference on Computer Supported Education (CSEDU 2019), pages 691-696

ISBN: 978-989-758-367-4

691

PART (Frank and Witten, 1998).

The remainder of the paper is structured as follow.

Section 2 provides an overview of the main concepts

covered in the paper. Section 2 describes the prepro-

cessing and application of the algorithm in the data

ser. In section 4, ours experiments and results is pre-

sented. Finally, section 5, presents some conclusions

and future work.

2 RULE-BASED CLASSIFIERS

According to Aggarwal (2014), rule-based classiﬁers,

even thought are related to decisions trees, are more

general than decision trees.

Classiﬁers based on rules can be categorized ac-

cording to its method as direct, focus of this article, or

indirect. Direct algorithms, such as OneR, RIPPER,

PART, CN2 and others, extract rules directly from the

dataset. On the other hand, indirect algorithms iden-

tify rules of induction from classiﬁers such as Deci-

sion Tree, Neural Network and others.

Rule-based approach classiﬁes objects based on a

set of rules in the format of if...then.... According to

Qin et al. (2009), each rule is expressed in the format

: (condition) → y

, such that r

: condition is known

as rule antecedent and y

is the rule consequent that

represents the label of an instance.

Two important characteristics of rule-based Clas-

siﬁer are: mutually exclusive and exhaustive. Mutu-

ally exclusive occurs if no two rules are triggered by

the same record, in other words, every object is cov-

ered by one rule. Exhaustive rule guarantee that all

instances of the data set have at least one rule that

cover it.

A rule can be assessed by its Accuracy and its

Coverage. Given a rule R, we can say that R cov-

ers an instance I when the attributes of I observes the

rule R. Thus, the Coverage of a rule is the number of

instances that satisfy the antecedent condition. Cov-

erage of a rule can be expressed as follow:

coverage(R) =

covers

|D|

(1)

where n

covers

represents the number of instances

covered by R and

the number of instances in data

set.

Qin et al. (2009) deﬁne Accuracy of a rule as the

fraction of instances that satisfy the antecedent and

consequent of a rule, normalized by those satisfy-

ing the antecedent. Accuracy can be expressed from

Equation 2:

accuracy(R) =

correct

covers

(2)

where n

correct

is the number of instances correctly

classiﬁed by R and n

covers

instances covered by R.

Qin et al. (2009) highlight that both metrics, Cov-

erage and Accuracy, should be high. One paradigm

used by classiﬁers based on rules is the sequential

covering paradigm, this paradigm learns a list of rules

sequentially, one at a time, to cover the whole train-

ing mining rules with high accuracy and coverage ﬁrst

(Aggarwal, 2014).

2.1 OneR

The One Rule, OneR, algorithm was proposed by

Holte (1993). It is one of the simplest algorithm for

rule induction and despite its simplicity, it has good

accuracy.

OneR creates a single rule for each attribute of

data set and then picks up the rule with the minor er-

rors rate. This algorithm learns an one level decision

tree, it has four main steps as follow:

1. For each attribute, the algorithm creates one

branch for each value of its domain;

2. for each value of its domain, the algorithm identi-

ﬁes the most frequent class;

3. OneR identify the error rate, in other words, the

proportion of objects that do not belong to the ma-

jority class;

4. pick the attribute with the minor error rate;

Once that OneR uses all domain from each feature

of the dataset, it covers all instances of the data set.

2.2 RIPPER

RIPPER, short for Repeated Incremental Pruning to

Produce Error Reduction, was projected by Cohen

(1995).

Rule learning in RIPPER is based on the strat-

egy separate to conquer, this approach concentrates

on one class at time disregarding what happens to the

other classes. According to Cohen (1995), this algo-

rithm can be described as follow:

1. Divide training set into growing and pruning sets;

2. grow a rule adding conditions;

3. prune rule;

4. go to 2), stopping criteria;

5. optimization of rules.

This algorithm is considered efﬁcient and works

well with imbalanced data as well as noisy data.

A2E 2019 - Special Session on Analytics in Educational Environments

692

2.3 PART

PART, short for Projective Adaptive Resonance The-

ory, was projected by Frank and Witten (1998). This

algorithm takes advantage of the construction of the

tree based on C4.5 algorithm and from separate to

conquer rule learning strategy of RIPPER.

According to Frank and Witten (1998), this algo-

rithm has three main steps, they are:

1. Induce a rule from a partial tree;

2. remove all instances that are not covered by the

rule;

3. induce new rules from the remaining instances.

Frank and Witten (1998) emphasize that PART,

once that combines two paradigms, C4.5 and RIP-

PER, produces good results without global optimiza-

tion.

3 METHODS

The dataset used in this article were extracted from

Learning Management System Moodle - Modu-

lar Object-Oriented Dynamic Learning Environment.

LMS recorded the actions of 229 students that en-

rolled an online course of Algorithm during the sec-

ond semester of 2016. Initially the dataset had 75,948

instances and 42 features, each feature representing

possibles actions performed by one of the 229 stu-

dents and each instance representing an action taken

by one student. Therefore an average of 331.65

(75,948/229) actions per student.

3.1 Pre-processing

After being extracted from LMS, the data was pre-

processed. During this stage the data was transform

such that each instance represents a student and each

attribute a kind of action performed by the students.

After the transformation, the dataset was with 229 in-

stances and 42 attributes.

Once that not all students perform all actions,

there was many missing data in the dataset. For exam-

ple, only 8 students performed ”chat talk” action, only

1 performed ”course report log”, and so on. Therefore

it was discarded 23 features during the pre-processing

stage due to missing data.

It was also added a new dichotomy’s feature, Sta-

tus. This is a variable with the domain 1, student ap-

proved, or 0, student failure. Status were computed

according to the grade of the student. Students with ﬁ-

nal grade greater than or equal to 70, it was attributed

1 to Status, approved, otherwise it was attributed 0.

The ﬁnal dataset is compound of 20 features, be-

ing one target, Status, and 19 predictive variables.

Those explanatories variables describe actions per-

formed by students such as visualizing tasks, submit-

ting assignments, participating in chats and other ac-

tions that are part of the routine of the students. Table

1 describes each variable of the data set, access mean

and standard deviation for each action performed by

the 229 students at Moodle during the course.

The dataset analyzed was with 229 instance repre-

senting the students in which 135 were approved and

94 failure, this means that 41% failure in the course.

3.2 Rule Induction

After being preprocessed, three algorithms were ap-

plied to the dataset: OneR, RIPPER and PART. It was

used R language and IDE StudioR version 1.0.136.

OneR is based on the OneR library and for RIPPER

and PART it was used RWeka library (Hornik et al.,

2008)

All 20 attributes were discretized through the

function optbin of the library OneR of the R Lan-

guage. This function makes discretization of numeric

data considering the target variable. Besides that, it is

used logit regression to deﬁne the number of factors

of the discretization.

The model used to train all algorithms was

the simplest cross validation method, hold-out

(Kuncheva, 2014). The dataset was divided in train,

80% of the data set (183 instances), and test, 20% (46

instances).

4 EXPERIMENTS AND RESULTS

The ﬁrst algorithm applied into the educational

dataset were OneR. Table 2 presents ﬁve attributes

with the lowest error rate.

Most of the features presented in Table 2 are re-

lated to an assignment as described in table 1, only

attribute course.view is not directly linked to an evalu-

ation activity. Considering the feature assign.submit,

attribute with minor error rate, OneR identiﬁed two

rules, they are:

1. i f assign.submit = (−0.014,5] then Status = 0

2. i f assign.submit = (5,14] then Status = 1

According to the contingency table, Table 3, 68

instances are covered by rule number one and 115 by

rule number two, implying in 37.3% and 67.2% cov-

erage. Rule number one has an accuracy of 88.2%

and number two, 85.2%. Considering both rules the

accuracy is 86.34%, 158/183, as shown in Table 2.

Rule Induction Algorithms Applied in Educational Data Set

693

Table 1: Actions performed by students.

Id Action Meaning Mean Standard

Deviation

1 assign submit Student is performing an evaluation activity: user has closed an

evaluative activity, that was saved on Moodle to continue later

and that was not yet sends for correction.

6.3 3.88

2 assign submit

for grading

Student is ﬁnishing an evaluation activity: user has ﬁnished an

evaluative task and sent it for correction.

3.9 2.38

3 assign view Student is performing an evaluation activity: user has visualized

the main page of evaluative task.

58.5 44.72

4 assign view all Student has clicked on the link that lists all the evaluative tasks

of a course.

1.0 2.24

5 assign view

feedback

Student accessed the teacher feedback of an evaluative task. 04 1.14

6 assign view

submit assign-

ment form

User has viewed an evaluative task that was already submitted

to be corrected by teacher. It is not permitted to edit anymore.

8.4 5.05

7 chat report User has viewed the chat report of all previous conversation. 0.2 1.70

8 chat talk Student has accessed the chat forum. 0.1 0.69

9 chat view User has viewed the history of previous conversations on a chat. 0.4 1.15

10 course view User has viewed the main page of the course to study or prepar-

ing to study some content.

127.4 98.18

11 forum view fo-

rum

User has viewed the forum main page. 1.7 3.99

12 page view User has clicked on the page resource link, a custom html page

that was displayed by the teacher. Student is studying or prepar-

ing to study some content.

24.8 24.9

13 quiz attempt Student is performing an evaluation activity: user has started

an evaluative task, however the results are not yet saved on the

Moodle.

10.6 4.56

14 quiz close at-

tempt

Student is performing an evaluation activity: user has ﬁnished

an evaluative task that was saved on the Moodle.

10.4 4.63

15 quiz continue

attempt

A questionnaire can be started and saved so that the student can

continue to carry out the activity later. In this case the student

is giving up for continuity to the questionnaire that moment.

14.8 7.82

16 quiz review Student is performing an evaluation activity: user has edited an

evaluative task that is saved on the Moodle but not yet ﬁnished.

4.6 7.38

17 quiz view Student is performing or preparing to do an evaluation activity:

user has viewed the main screen of an evaluative questionnaire.

35.9 21.17

18 quiz view sum-

mary

User has clicked a speciﬁc link to see if all questions of an eval-

uative questionnaire were answered.

12.0 5.69

19 url view Student is studying or preparing to study some content: user has

clicked on an url resource link and was directed to another page

out of the Moodle system.

6.8 9.53

20 Status Identify if student succeed or failure during the course. NA NA

Table 2: Top 5 attributes’ accuracy.

Attribute Accuracy (%)

assign.submit 86.34

assign.view 85.25

quiz.attempt 84.7

course.view 84.15

quiz.close.attempt 84.15

Combining the rules inferred from OneR and the

description of the attributes presented in Table 1, it

is possible to observe that one important factor of be-

ing succeed in the process, happens when the students

save an assignment before submitting it for assess-

Table 3: Contingency table of assign.submit.

Performance (-0.014,5]

(5,14] sum

Approved 8 98 106

Failure 60 17 77

Sum 68 115 183

ment. This means that students that reﬂect or review

the activity before send it to be evaluated improves

their probabilities of being approved.

The second algorithm applied into the dataset

were RIPPER. As shown in confusion matrix, Table

4, the accuracy of RIPPER was 89.07%, 163 instances

classiﬁed correctly.

A2E 2019 - Special Session on Analytics in Educational Environments

694

Table 4: Confusion Matrix of RIPPER.

Approved Failure Total

Approved 64 8 72

Failure 12 99 111

Sum 76 107 183

From RIPPER algorithm it was identiﬁed two

rules:

1. assign.submit = (−0.023, 5]) then Status = 0

(67.0/7.0)

2. (url.view = (-0.057,1]) and (assign.view = (-

0.345,39]) then Status = 0 (5.0/1.0) else Status =

1 (111.0/12.0)

It is important to observe that the antecedent of the

ﬁrst rule identiﬁed by RIPPER, assign.submit, is the

same in OneR rule. The other two attributes that are

antecedent in

The numbers in parentheses at the end of each rule

represent the number of correctness and errors of the

classiﬁer. Therefore the ﬁrst rule has an coverage of

of 40.4%, 74/183 , and accuracy of 90.5%, 67/74.

The coverage of the second rule induced by RIPPER

is 70.49% and accuracy of 89.9%.

The third algorithm applied into the dataset had an

accuracy of 90.71%, 166 instances correctly classiﬁed

as presented by confusion matrix, Table 5.

Table 5: Confusion Matrix of PART.

Approved Failure Total

Approved 62 3 65

Failure 14 104 111

Sum 76 107 183

PART identiﬁed eight rules as shown bellow. As

in the output of the RIPPER algorithm, the numbers in

parentheses at the end of the rule represent number of

correctness and errors of the classiﬁer. For instance,

rule number one covers 57, being 55 classiﬁed correct

and 2 errors.

1. assign.submit = (-0.023,5] AND course.view =

(0.408,97]: 0 (55.0/2.0)

2. quiz.view.summary = (9,27] AND assign.submit

= (5,23] AND page.view = (13,123]: 1 (83.0/4.0)

3. quiz.view.summary = (9,27] AND

quiz.close.attempt = (9,16] AND page.view

= (-0.123,13] AND chat.view.all = (-0.004,0]

AND course.view = (97,594]: 1 (13.0/3.0)

4. quiz.review = (0,44] AND quiz.continue.attempt

= (11,38] AND quiz.close.attempt = (9,16] AND

assign.submit = (5,23]: 1 (15.0/5.0)

5. quiz.review = (0,44] AND as-

sign.submit.for.grading = (-0.007,4] AND

assign.view.submit.assignment.form = (-0.021,6]

AND assign.view.all = (-0.013,0]: 1 (5.0/2.0)

6. forum.view.forum = (-0.029,0] AND quiz.review

= (0,44] AND assign.submit.for.grading = (-

0.007,4]: 0 (4.0/1.0)

7. assign.submit = (-0.023,5]: 0 (4.0)

8. quiz.review = (-0.044,0]: 0 (2.0) : 1 (2.0)

Table 6 presents metrics accuracy and coverage of

each rule extracted by PART algorithm.

Table 6: Accuracy and coverage of the rules inducted by

PART.

Rule Accuracy Coverage

1 96.4% 31.7%

2 95.4% 47.5%

3 81.2% 0.8%

4 75% 10.9%

5 71.4% 3.8%

6 80% 2.7%

7 100% 2.1%

8 100% 2.1%

From the set of rules extracted from the algorithms

it is possible to observe that 14 distinct attributes are

antecedents. Besides, assign.submit is the only at-

tribute that is present on all sets of rules. The cross-

table shown in Table 7 represents all 14 attributes that

take part in some rule and its respective algorithm.

Table 7: Antecedent’s attributes in the set of rules.

Attribute OneR RIPPER PART

assign.submit X X X

url.view X

assign.view X

course.view X

quiz.view.summary X

page.view X

quiz.close.attempt X

chat.view.all X

quiz.review X

quiz.continue.attemp X

assign.submit.for.grading X

assign.view.submit. as-

signment.form

assign.view.all X

forum.view.forum X

From the analysis of the accuracy and cover-

age of the rules extracted from the three algo-

rithms, it can be observed that the attributes that

have the greatest inﬂuence on student performance

are: assign.submit, assign.view, quiz.view.summary,

url.view, course.view and page.view. The ﬁrst three

Rule Induction Algorithms Applied in Educational Data Set

695

features are related to assignment and the other three

are linked to the preparation of the students.

5 CONCLUSIONS

Considering that the main of this article is to identify

the factors that lead a student to be succeed or failure

in a teaching and learning process, it was possible to

identify 14 attributes that are related to their perfor-

mance.

It is important to stress the hole of the attribute as-

sign.submit that is present in all three algorithms and

has good metrics. Therefore, we can conclude that

students who begin their activities earlier and reﬂect

on them, increase their chances of being approved.

In spite of using a small data set, it is possible, ac-

cording to the rules inferred through this work, under-

stand the behavior of students in a Learning Manage-

ment System and, from this comprehension, propose

actions that can improve the process do teaching and

learning.

As future work one can apply induction rules in

feature selection identifying the relevant attributes to

a target variable.

Another possible research is compare the results

from this work with another techniques such as causal

inference and Formal Concept Analysis.

REFERENCES

Aggarwal, C. C. (2014). Data Classiﬁcation: Algorithms

and Applications. Chapman & Hall/CRC, 1st edition.

Cohen, W. W. (1995). Fast effective rule induction. In In

Proceedings of the Twelfth International Conference

on Machine Learning, pages 115–123. Morgan Kauf-

mann.

Devasena, C. L. and Sumathi, T. (2012). Effectiveness eval-

uation of rule based classiﬁers for the classiﬁcation of

iris data set i.

Frank, E. and Witten, I. H. (1998). Generating accurate

rule sets without global optimization. pages 144–151.

Morgan Kaufmann.

Holte, R. C. (1993). Very simple classiﬁcation rules per-

form well on most commonly used datasets. Machine

Learning, 11(1):63–90.

Hornik, K., Buchta, C., and Zeileis, A. (2008). Open-

source machine learning: R meets weka. Comput Stat,

24(2):225–232.

Kuncheva, L. I. (2014). Combining Pattern Classiﬁers:

Methods and Algorithms. Wiley-Interscience, New

York, NY, USA.

Qin, B., Xia, Y., Prabhakar, S., and Tu, Y. (2009). A rule-

based classiﬁcation algorithm for uncertain data. In

2009 IEEE 25th International Conference on Data

Engineering, pages 1633–1640.

Romero, C. and Ventura, S. (2010). Educational data min-

ing: A review of the state of the art. IEEE Transac-

tions on Systems, Man, and Cybernetics, Part C (Ap-

plications and Reviews), 40(6):601–618.

YACEF, R. S. B. K. (2009). The state of educational data

mining in 2009: A review and future visions. Jour-

nal of Educational Data Mining Publishing Home,

1(1):3–17.

A2E 2019 - Special Session on Analytics in Educational Environments

696