
 
Due to the amount of data daily generated by the 
system, teachers need specific tools to aid them 
manage courses. Moodle provides a report tool that 
shows all the user activities. Such data can be used 
to understand user interaction but as there is too 
much information, especially considering the 
amount of students, it is complicated to manually 
analyze it.  
Data Mining (DM) and Recommender Systems 
techniques aid the teacher in this process. DM is a 
pattern identification process that can be applied on 
large datasets (Maimon & Rokach, 2005) , and can 
be used by instructors to understand students´ 
patterns and also to evaluate their activities. A RS 
generates suggestions to learners about the activities, 
resources, paths or other users who may be relevant 
to them.  
The proposed system was developed as an 
extension to Moodle. It is composed of the Moodle 
itself, a database and Apache Mahout (Figure 1). 
The database, as already stated, is used by Moodle to 
store information about the users. Apache Mahout is 
a powerful framework to perform mining techniques 
such as clustering and classification, and is used by 
the recommender module as explained in section 
3.2.   
The processing steps of our approach are the 
following: 
1. Data cleansing: information from the database 
is transformed into a format appropriated to be 
processed by Mahout using an automated task that is 
triggered when there is any inclusion of an activity 
in the log table of Moodle; 
2. Mining: once the data is transformed, the 
recommendation module activates Mahout, which 
analyses the data and generates recommendations 
that are stored again in the database;  
3. Recommendation: according to the action that 
the user is performing on Moodle, the 
recommendation module use the data generated by 
Mahout to show recommendations to the user. 
3.1  Preprocessing 
Moodle follows a structure based on topics for the 
courses. When organizing a course, one is able to 
add personalized modules, static material (e.g., web 
pages, hyperlinks), interactive material (e.g., tasks, 
blog, questionnaires) and cooperative activities (e.g., 
chat, forum, glossary, wiki). Each user access to 
these activities is logged into the database. In our 
case, MySQL was used.  
RecMoodle does not use the Moodle database 
directly. Instead, it uses specific tables that are 
prepared in a format that Mahout understands. Thus, 
a preprocessing must be performed. This is 
conducted by specific triggers programmed by us to 
collect data and store in a table named Tab_rec 
(Figure 2).   
In this table, course_id identifies the course 
accessed by the user. User_id is the user 
identification. Item_id represents the element 
accessed by the user inside Moodle. Preference 
represents the user preference on this item, and 
timestamp corresponds to the date and the time the 
user has accessed this item.  
By this table we are able to know the order of 
access or sequence of items the user has followed in 
a give curse. We are also able to know the most 
accessed items and the items the user has not 
accessed.   
Preference means how much the user liked or not 
a specific item or element in the course. It is based 
in a five points’ Likert scale, in which the least 
significant is 1 and the most is 5. 
In Mahout, the values of de preferences can be 
omitted (e.g., like or dislike, accept or not accept, 
access or not access or even without a score), or he 
value’s preference can be an implicit preference 
(when the user simply states that he likes something 
given a score). 
 
Figure 2: Pre-processing table. 
In the left size of Figure 3 we show a situation in 
which the user has single visit to an item. In this 
case, it may be a Boolean value. In the right of the 
figure, the user scored that his preference for the 
item clicked. 
In the case of RecMoodle, four types of 
preference representation were taken into 
consideration. None of them is implicit. The first 
one is based on the order of item access followed by 
each student and the resulting performance of this 
student on the course. Table 1 gives an example of 
this kind of representation. 
Each time a user access one item, this item is put 
on this table, in order of access and do not recount 
the times they were accessed. Based on this table, it 
is possible to infer the similarity among users. For 
instance,   in    Table   1, users    A1   and   A3   have 
RECMOODLE-ANEDUCATIONALRECOMMENDERSYSTEM
243