Due to the amount of data daily generated by the
system, teachers need specific tools to aid them
manage courses. Moodle provides a report tool that
shows all the user activities. Such data can be used
to understand user interaction but as there is too
much information, especially considering the
amount of students, it is complicated to manually
analyze it.
Data Mining (DM) and Recommender Systems
techniques aid the teacher in this process. DM is a
pattern identification process that can be applied on
large datasets (Maimon & Rokach, 2005) , and can
be used by instructors to understand students´
patterns and also to evaluate their activities. A RS
generates suggestions to learners about the activities,
resources, paths or other users who may be relevant
to them.
The proposed system was developed as an
extension to Moodle. It is composed of the Moodle
itself, a database and Apache Mahout (Figure 1).
The database, as already stated, is used by Moodle to
store information about the users. Apache Mahout is
a powerful framework to perform mining techniques
such as clustering and classification, and is used by
the recommender module as explained in section
3.2.
The processing steps of our approach are the
following:
1. Data cleansing: information from the database
is transformed into a format appropriated to be
processed by Mahout using an automated task that is
triggered when there is any inclusion of an activity
in the log table of Moodle;
2. Mining: once the data is transformed, the
recommendation module activates Mahout, which
analyses the data and generates recommendations
that are stored again in the database;
3. Recommendation: according to the action that
the user is performing on Moodle, the
recommendation module use the data generated by
Mahout to show recommendations to the user.
3.1 Preprocessing
Moodle follows a structure based on topics for the
courses. When organizing a course, one is able to
add personalized modules, static material (e.g., web
pages, hyperlinks), interactive material (e.g., tasks,
blog, questionnaires) and cooperative activities (e.g.,
chat, forum, glossary, wiki). Each user access to
these activities is logged into the database. In our
case, MySQL was used.
RecMoodle does not use the Moodle database
directly. Instead, it uses specific tables that are
prepared in a format that Mahout understands. Thus,
a preprocessing must be performed. This is
conducted by specific triggers programmed by us to
collect data and store in a table named Tab_rec
(Figure 2).
In this table, course_id identifies the course
accessed by the user. User_id is the user
identification. Item_id represents the element
accessed by the user inside Moodle. Preference
represents the user preference on this item, and
timestamp corresponds to the date and the time the
user has accessed this item.
By this table we are able to know the order of
access or sequence of items the user has followed in
a give curse. We are also able to know the most
accessed items and the items the user has not
accessed.
Preference means how much the user liked or not
a specific item or element in the course. It is based
in a five points’ Likert scale, in which the least
significant is 1 and the most is 5.
In Mahout, the values of de preferences can be
omitted (e.g., like or dislike, accept or not accept,
access or not access or even without a score), or he
value’s preference can be an implicit preference
(when the user simply states that he likes something
given a score).
Figure 2: Pre-processing table.
In the left size of Figure 3 we show a situation in
which the user has single visit to an item. In this
case, it may be a Boolean value. In the right of the
figure, the user scored that his preference for the
item clicked.
In the case of RecMoodle, four types of
preference representation were taken into
consideration. None of them is implicit. The first
one is based on the order of item access followed by
each student and the resulting performance of this
student on the course. Table 1 gives an example of
this kind of representation.
Each time a user access one item, this item is put
on this table, in order of access and do not recount
the times they were accessed. Based on this table, it
is possible to infer the similarity among users. For
instance, in Table 1, users A1 and A3 have
RECMOODLE-ANEDUCATIONALRECOMMENDERSYSTEM
243