How are they Watching Me

Learning from Student Interactions with Multimedia Objects Captured from

Classroom Presentations

Caio C

esar Viel

, Erick Lazaro Melo

, Maria da Grac¸a C. Pimentel

and Cesar A. C. Teixeira

DC, Universidade Federal de S

ao Carlos, S

ao Carlos, SP, Brazil

ICMC, Universidade de S

ao Paulo, S

ao Carlos, SP, Brazil

Keywords:

Interactive Multimedia, E-learning, Ubiquitous Capture, Capture and Access, NCL, Interactions.

Abstract:

The performance of a teacher in the exposition of a subject is a rich experience that can be captured and

transformed into a corresponding multimedia learning object, given the multimodal and multi-device nature

of the presentation. Using as a starting point an interactive multimedia object which is an electronic version of

a problem solving lecture recorded by the teacher, in this paper we report how a group of students interacts with

one multimedia learning object composed of synchronized videos, audio, images and context information. The

qualitative analysis of the data allows the teacher to infer useful information not only for reﬁning the lecture

content but also for improving its presentation. The case study presented illustrates how a similar analysis

can be performed by other instructors with respect to their own lectures, and demonstrates both the power of

capturing the multimodal and multi-device nature of the original presentation, and the utility of logging the

student-multimedia learning object interaction.

1 INTRODUCTION

When lecturing to her students, the performance of an

instructor in the classroom can be considered a mul-

timodal and multi-device live presentation that can be

captured and transformed into a corresponding multi-

media learning object.

The classroom activity is the primary learning

context in many courses (Abowd et al., 1999), so cap-

turing such activities, lectures in special, may be in-

teresting for several reasons. From the attendee’s per-

spective, a student may use the recordings when solv-

ing assignments or to study for an exam, or a student

who misses a class may still have access to what was

presented by watching the recordings. From the in-

structor’s perspective, a professor who will be absent

from the campus may prepare a recorded lecture to

deliver to the students. Moreover, a previously cap-

tured lecture may be improved and reused, or a por-

tion of captured lecture may be used as a comple-

mentary learning object in different educational ap-

proaches. Last but not least, captured lectures can be

a valuable resource for e-learning and distance educa-

tion courses (Liu and Kender, 2004).

We are aware that there are strong divergences

among educators as to the efﬁciency of the lecture

format as a method of instruction in middle school

and higher education. Ross, for example, states that

“when I was younger, I used to say that it took 40

years for any change in signiﬁcant higher education

to take effect, because that was the time by when all

the existing teachers would have retired. I now real-

ize that I was not a cynic, but an optimist, since lec-

tures are just the prevalent as they ever were” (Ross,

2011). However, as Ross himself acknowledges, lec-

tures are still widely used in all levels of education.

Moreover, Schwerdt and Wuppermann observe that

“contrary to contemporary pedagogical thinking, we

ﬁnd students score higher on standardized tests in the

subject in which their teachers spent more time on

lecture-style presentations than in the subject in which

the teacher devoted more time to problem-solving ac-

tivities” (Schwerdt and Wuppermann, 2011).

Although recording lectures is common practice

in several universities, producing quality video lec-

tures demands a high operational cost. To reduce such

costs, many tools for the (semi) automatic capture of

lectures were developed in the past (Brotherton and

Abowd, 2004), (Chou et al., 2010), (Dickson et al.,

2010), (Halawa et al., 2011), (Nagai, 2009). How-

ever, such tools usually record only video streams and

generate, as a result, a single video stream (e.g. a

Viel C., Melo ., Pimentel M. and A. C. Teixeira C..

How are they Watching Me - Learning from Student Interactions with Multimedia Objects Captured from Classroom Presentations.

DOI: 10.5220/0004454500050016

In Proceedings of the 15th International Conference on Enterprise Information Systems (ICEIS-2013), pages 5-16

ISBN: 978-989-8565-61-7

 2013 SCITEPRESS (Science and Technology Publications, Lda.)

podcast). In several scenarios, this may not be always

enough to reproduce the classroom experience.

The classroom itself can be viewed as a rich mul-

timedia environment where audiovisual information

is combined with annotating activities (Abowd et al.,

1999). Furthermore, the context of the class (e.g.

the slide being presented, what the lecturer says and

her body language) and how the different audiovisual

contents relate to each other are also important. For

instance, sometimes it is necessary to relate the slide

presentation with the whiteboard for the comprehen-

sion of an exercise or lesson (Dickson et al., 2012). In

addition, the interaction between the lecturer and the

students is also a valuable part of the learning process.

In this work, capturing a presentation means

recording the audio and one or more video streams

of the speaker, the images presented on the screen or

projector, the writings and drawings made on white-

boards, and capturing relevant contextual informa-

tion – the aim is to use the captured information

to automatically generate an interactive multimedia

object, as proposed by the Linking by Interacting

paradigm (Pimentel et al., 2000). We refer to as an

“interactive multi-video object” the composition of

several videos, audio and some static media, properly

synchronized and with facilities for ﬂexible interac-

tion and browsing.

From the multi-video object, the lecture may be

reconstituted and explored in dimensions not achiev-

able in the classroom. The student may be able, for

example, to obtain multiple synchronized audiovisual

content that includes the slide presentation, the white-

board content, video streams with focus on the lec-

turer’s face or the lecturer’s full body, or the lec-

turer’s web browsing, among others. The student may

choose at any time what content is more appropriated

to be exhibited in full screen. The student may also

be able to perform semantic browsing using points of

interest like slides transitions and the position of lec-

turer in the classroom. Moreover, facilities can be

provided for users to annotate the captured lecture

while watching it, as suggested by the Watch-and-

Comment paradigm (Cattelan et al., 2008).

In this paper we report how a group of students

interacts with a multimedia learning object composed

of synchronized videos, audio, images and context in-

formation, and discuss how the analysis of the inter-

action data allows the instructor to infer useful infor-

mation for improving the lecture. The case study il-

lustrates how a similar analysis can be performed by

other instructors with respect to their own presenta-

tions, and demonstrates both the power of capturing

the multimodal and multi-device nature of the original

presentations, and the utility of logging the student-

multimedia learning object interaction.

This paper is organized as follows: in Section 2

we discuss related works; in Section 3 we describe

our proposed model to capture live lectures; in Sec-

tion 4 we present our current prototype implementa-

tion; in Section 5 we present one case study in which

one instructor used the prototype to capture one prob-

lem solving session and generate an associated multi-

media learning object; in Section 6 we detail lessons

learned from the instructor after a qualitative analy-

sis of the interaction a group of students had with the

learning object; and in Section 7 we present our ﬁnal

remarks.

2 RELATED WORK

Several authors report results from building systems

designed to capture lectures. The AutoAuditorium

records classroom activities using a spotting and a

tracking camera controlled by computers. The camera

orchestration is carried out in real-time using some

heuristics based on audiovisual production. The main

idea is to create a “TV-like” production without the

usual cameraman, video director, audio engineer and

other professionals (Bianchi, 2004).

Lampi et al. consider the use of multiple cameras

to record lectures. The authors use sensors and com-

putational vision techniques to do the cameraman’s

job. They also use a ﬁnite state machine to deﬁne,

at each moment, which camera stream should be in-

cluded in the ﬁnal stream (Lampi et al., 2008).

Nagai uses an environment with a high deﬁnition

camera (Advanced Video Coding High Deﬁnition -

AVCHD) placed at the back of the classroom. The

camera can record the whole lecture scene (lecturer,

whiteboard, slide presentation, students, etc.). By us-

ing tracking techniques, the camera performs digital

zoom to what is considered the focus of attention at

different moments (Nagai, 2009).

Chou et al. use tracking techniques to detect the

lecturer’s movements and screens (whiteboard, slide

presentation) changes. A camera action table is then

queried to get what must be done (zoom in, zoom out,

pane, etc.) in order to highlight the image that must

be the focus of attention (Chou et al., 2010).

All the aforementioned works differ from the

work reported in this paper in that the resulting prod-

uct of the lecture capturing process is a single video

stream instead of a multi-video object.

In the work of Liu et al., lectures are captured in a

similar process to the ones mentioned before, result-

ing a single video stream. The difference is that the

set of slides used in the presentation is added to the

ICEIS2013-15thInternationalConferenceonEnterpriseInformationSystems

video stream. However, the slides are not synchro-

nized with the video (Liu and Kender, 2004). Given

that the result is single-video-stream, students do not

have autonomy to choose the camera that gives them

the best view of the lecture for each situation, or to

focus their point of interest, as allowed in our multi-

video object.

ClassX is a tool designed for online lecture deliv-

ery (Halawa et al., 2011) (Pang et al., 2011). A live

lecture is captured by means of an AVCHD stream

split in several virtual standard resolution cameras.

By using tracking techniques, the most appropriated

virtual camera for a given moment is chosen and

streamed to the remote students. The students have

the opportunity to choose a different stream from

another virtual camera or even watch the original

AVCHD stream, and a synchronized slide presenta-

tion is offered — but no other navigation facilities are

available the students.

REPLAY is a system for producing, manipulating

and sharing lecture videos (Schulte et al., 2008). Be-

sides offering similar features to the aforementioned

systems, REPLAY uses computer vision to recog-

nize written words, and deploys MPEG-7 to index the

videos. Although REPLAY allows more navigation

alternatives than the previous systems, it does not pro-

duce an independent multi-video object.

Other authors report the use of other features such

as image processing and audio transcription (Dickson

et al., 2012), (Dickson et al., 2010)), (Brotherton and

Abowd, 2004), (Cattelan et al., 2003), the result being

hypermedia documents that offer interfaces providing

different ways of indexing the recorded information.

The model for capturing and recovering lectures pre-

sented in this paper allows more ﬂexibility. This ﬂex-

ibility results from the ability to specify the context

information that must be captured, and to specify how

this context information should be combined to gen-

erate a multi-video object, or to promote live inter-

ventions in the classroom during the capture process

— for example in the case that there is a change in the

illumination of the room because the light was off.

3 UBIQUITOUS CAPTURE AND

AUTHORING

In order to produce quality lecture videos, the con-

ventional lecture recording process usually requires

the presence of audiovisual professionals. Our infras-

tructure offers a self-service approach, allowing the

instructor to record a lecture herself. Some solutions

usually rely on computational vision, tracking tech-

niques and sensors to perform camera orchestrations

in a attempt to produce a single video or audio stream

output.

As detailed elsewhere, the model we have pro-

posed goes a step further (Viel et al., 2013). As de-

picted in Figure 1, the model aims at capturing all

the content presented in the classroom. The capture

process is pervasive, does not rely on human media-

tion and generates automatically an interactive multi-

video object which preserves as much as possible of

the lecture content and context.

An environment, usually a classroom, is instru-

mented with physical devices (Figure 1(1)), such as

video cameras, microphones, whiteboards, interactive

whiteboards and slide projectors. The instrumented

classroom may also contain sensors, such as temper-

ature sensors and luminosity sensors, and secondary

screens, such as notebooks, TVs, tablets, etc. The

video cameras should be placed in points where they

can frame important classroom’s points (instructors,

students, whiteboard, slide presentation, etc.).

Computer devices capture all the content pro-

duced by the physical devices used in the class-

room (e.g. whiteboards and slides) and represent

them as video, audio and data streams (Figure 1(2)).

Cameras produce video and audio streams, micro-

phones produce audio streams and sensors produce

data streams. By capturing the screen output from

the secondary screens or by intercepting the signal

sent to the slide projector, we can also produce video

streams. The electronic whiteboard can produce both

data and video streams. By capturing its strokes we

can generate a data stream; intercepting the signal

sent to its projector, we can generate a video stream.

All such streams are stored (Figure 1(3)) for fur-

ther use in the multi-video object generation. The

streams are also sent to the capture controller (Fig-

ure 1(4)), a component responsible for managing the

capture process. The capture controller uses signal

analysis to analyse the captured streams and to send

commands (Figure 1(5)) back to the physical devices

and actuators (Figure 1(6)) present in the classroom.

The instructions in the capture controller are de-

ﬁned in a customizable action table. The action ta-

ble can be used to deﬁne actions for certain events

which may occur during the capture process. For in-

stance, zooming into the image of a speciﬁc camera

when the lecturer starts talking, or activating an ac-

tuator in order to reduce the light intensity when the

lecturer starts a slide presentation.

Our model allows the instructor to split her pre-

sentation in different modules, an approach usually

adopted in e-learning platforms.

A multi-video pre-

Examples include http://www.coursera.org and

http://www.edx.org

HowaretheyWatchingMe-LearningfromStudentInteractionswithMultimediaObjectsCapturedfromClassroom

Presentations

Figure 1: Capture Workﬂow.

sentation can be composed of one or more modules.

This is useful to better organize the content of a lec-

ture. The lecturer may, for instance prepare a prob-

lem solving presentation with one exercise per mod-

ule. And, it also allows the lecturer to take breaks

during the recording process and the students to navi-

gate in the modules of the multi-video presentation.

Splitting the presentation into modules can also

minimize the time need for repeating the recording

in case of errors. For instance, if in one module the

lecturer starts stuttering or becoming confused and

wishes to make a retake, she only needs to record

that module again. Reusing the modules to compose a

new presentation is another advantage of splitting the

recording process into modules — reuse is in fact one

of the main ideas underlying learning objects.

Given that the processes of analysing and convert-

ing the captured streams can demand much compu-

tational power and time, once the capture process is

ﬁnished the data is transferred to a server for further

processing.

Considering points of Interest as moments in the

lecture which may have particular importance for stu-

dents, we designed recognizer components that use

one or more captured streams to automatically detect

potential points of interest. The points of interest can

be used to provide a more semantic navigation over

the multi-video object, allowing the students to seek

for the next slide transition, for instance.

Some points of interest have been suggested in

the literature ((Dickson et al., 2012), (Cattelan et al.,

2003) and (Brotherton and Abowd, 2004)), while oth-

ers were inspired on our own observation of real lec-

tures. Examples of Points of interest are slide transi-

tion, whiteboard interaction and change the eye-gaze

of the instructor.

The resulting multi-video learning object is com-

posed of videos and other captured media. Although

the multi-video object cannot reproduce several as-

pects of the live lecture experience (live interactions,

odors, temperature, etc.), it offers other facilities to

the students when they are interacting with the object.

4 PROTOTYPE

As a proof-of-concept of the model, we developed a

prototype tool for capturing lectures and generating

multi-video objects. This prototype was mainly de-

veloped in Python. Figure 2 depicts an overview of

the prototype.

The prototype is composed of three main parts:

the Capturing tool used to capture streams; the Pro-

cessing tool in charge of stream analysis and the gen-

eration of the multi-video object; and the Presenta-

tion tool, which allows the user to playback the multi-

video object.

Capturing Tool

The Capturing tool, named Classrec, (Figure 2(A))

performs the lecture capturing process. Each com-

puter used in the capturing process runs an instance

of Classrec, and one of these instances is selected to

be the session manager (Figure 2(B)). It corresponds

to the Capture Controller of the workﬂow (Figure 1).

The session manager is responsible for handling the

lecturer’s stimulus and for controlling the other Class-

rec instances, keeping them synchronized.

The capturing process is based on video streams.

Classrec captures content (video and audio streams)

produced by AVCHD and outputs produced by com-

puters (such as computer screens, slide presentations,

etc.). It also records metadata about the lecture, such

ICEIS2013-15thInternationalConferenceonEnterpriseInformationSystems

Figure 2: Prototype Overview.

as module structure, available streams and authoring

information into an XML ﬁle.

We opted to capture the electronic whiteboard out-

put as a video stream instead of its strokes. This

was done because a video stream is more portable

than strokes and, given the modern video encoding

as h.264 Advanced Video Codec and the static na-

ture of whiteboard outputs, the bit rate of the video

stream is low. We could record a stroke stream, but

it would require a specialized media player to play it

back (as it is the case with other systems, e.g. (M

uller

and Ottmann, 2000)).

Some streams, such as slides, whiteboards and

computer screens may contain segments with a lot

of static content, but they are still captured as video

streams. A possible improvement would be to re-

place the video for a combination of non-static con-

tent videos and a single image to represent a static

segment (video with no changes during a period of

time).

The communication among the different applica-

tions is carried out using the Apache ActiveMQ mes-

sage broker (Figure 2(C)).

Processing Tool

The Processing tool, named Classgen (Figure 2(E)),

performs the multi-video generation process. This

tool uses as input the video streams and metadata

recorded by Capturing tool. It also supports an XML

conﬁguration description language, which allows the

speciﬁcation of which recognizers (and its inputs)

should be used, and the codecs that should be used

to encode audio and video.

We have implemented recognizers capable of de-

tecting (i) the presence of a lecturer in a video stream;

(ii) if the lecturer is facing a camera; (iii) slides tran-

sitions; (iv) interactions with whiteboard or PC; and

(v) a list of spoken keywords.

It is also possible to specify an orchestration of

video streams in order to produce a new video stream.

This is useful in environments with multiple cameras

recording different angles of the lecturer. Through the

XML conﬁguration description language, it is possi-

ble to select which stream will be used in the orches-

tration and how to orchestrate then. For instance, it is

possible to specify that when a recognizer detects the

lecturer’s face in video segments, the camera orches-

tration stream should include that segment.

Classgen uses the OpenCV library (Bradski,

2000) to perform pattern recognitions in order to

identify points of interest for composing the context

stream. The media manipulation during the orches-

tration process and the audio/video conversion is han-

dled by the libav library.

Once the several processes associated with recog-

nition of points of interest, orchestration and video

conversion are concluded, the information they gen-

erate (the speciﬁcation of the points of interest, the

orchestration stream, and the converted streams) are

stored in the XML lecture. The XML is then passed

to a component of the Processing tool responsible for

generating the ﬁnal multi-video object (Figure 2(5)).

Our prototype generates NCL

(ABNT, 2007) docu-

ments, but the Classgen can be extended to generate

other types of multi-video objects, such as HTML5

pages or stand-alone desktop, tablet or smartphone

applications.

The XML conﬁguration description language can

also describe the video streams (including the orches-

tration, if any) and points of interest will be used in

the ﬁnal multi-video object. It is also possible to

generate different multi-video objects using the same

recorded lecture (for instance, by using the orchestra-

tion stream or not).

Nested Context Language - http://ncl.org.br/en

HowaretheyWatchingMe-LearningfromStudentInteractionswithMultimediaObjectsCapturedfromClassroom

Presentations

Presenting Tool

It is desirable to offer students a platform-independent

way to access the captured lectures. We would like to

avoid students having to install speciﬁc software to

playback the lecturers. To fulﬁll this requirement we

choose a web-based implementation.

The multi-video object generated from the cap-

ture imposed some challenges. In the scenario where

we considered the generation of the object directly in

HTML5 + JavaScript, a large development effort to

implement the synchronization capabilities was esti-

mated. We also noticed that most obstacles identiﬁed

in the HTML5-based implementation would be eas-

ily overcome with the use of a declarative language

specialized in media synchronization. However, there

were no solutions to support it that did not demand

external plug-ins.

As a result of these needs, we were motivated

to propose and develop a multimedia presentation

engine based on standard Web technologies. We

conducted an implementation based on HTML5 +

JavaScript that enables the presentation of multi-

video NCL documents, named WebNCL

(Melo

et al., 2012). Thanks to WebNCL, any device which

has an HTML5-compatible browser (PC, Smart TV,

Tablet, Smart Phone, etc.) can present NCL docu-

ments natively.

The choice for implementing support to the NCL

language was taken because it is a powerful language

for media synchronization, under active development

and adopted as iDTV (ABNT, 2007) and IPTV stan-

dards (H.761, 2009). A good side effect of this choice

was the possibility to reuse the content generated in

different platforms.

Figure 3 shows running NCL learning objects gen-

erated by the prototype. The NCL document offers

some facilities for students. One of these facilities

is the synchronization of the captured audio/video.

The multi-video object synchronizes the multiple au-

dio/video streams, so students can see what was writ-

ten in the whiteboard when the lecturer points to the

slide presentation. This synchronization is essential

to recover the whole audiovisual context of the cap-

tured lecture at a given moment. It is also possible to

insert non-synchronized complementary media to the

multi-video object like, for instance, an image from a

textbook.

The multi-video object offers a more semantic and

easy way to navigate in the captured lecture than time-

line navigation, common in video (however, timeline

navigation is still present). For instance, the student

WebNCL is an open-source software, available at

http://webncl.org

(a) Timeline

(b) Multiple Videos (c) Full-screen

Figure 3: Multi-video learning objects.

can move forward to the next slide transition or back-

wards to the previous one. When the lecturer begins

to write something in the whiteboard, the student can

skip all the writing process and see the ﬁnal result. In

a future implementation, students will also search for

a keyword and move forward in the multi-video object

to the point where the lecturer said “for instance”.

Similar to in-classroom lecture, wherein the stu-

dent can pay attention to different spots (the lecturer,

whiteboard, slide presentation, the textbook, or an-

other screen), the multi-video object, which contains

several navigation controls besides the timeline (Fig-

ure 3(a)), allows the student to choose whether he

wants to see more than one video at the same time

(Figure 3(b)), or which video stream he wishes to see

in full screen (Figure 3(c)).

Finally, the student has the facility to make an-

notations in the multimedia object by means of the

watch-and-comment paradigm. For instance, he can

mark some part of the lecture as important or irrele-

vant, or he can delimit a snippet of the lecture which

he did not understand for further research or to ask the

professor or tutor. He can also make comments on the

lecture via audio or text, in similar in-classroom stu-

dents do with paper and pencil.

Instrumented Classroom

The capture-tool prototype was deployed in a multi-

purpose room (Figure 4). At the front of the room

(Figure 4(a)) there is a conventional whiteboard, an

ICEIS2013-15thInternationalConferenceonEnterpriseInformationSystems

electronic whiteboard and a notebook in which the

presenter can browse the Web or use other software.

The interactive whiteboard can be used to present

slides (there is a Bluetooth presenter to control the

presentation) and it allows drawing and writing over

the screen. At the back of the room (Figure 4(b)) we

placed two AVCHD, one with focus on the interactive

whiteboard and the other with focus on the conven-

tional whiteboard. We placed a webcam as a wide-

shot cam, framing the whole front of the room. The

cameras are locked cabinets when not in use.

(a) Front Side (b) Back Side

Figure 4: Instrumented Classroom.

We invited six instructors to use the prototype

and record presentations. Four instructors recorded

a lecture simulation (without students), one professor

recorded a conventional lecture (with students), and

one instructor recorded a problem solving class. We

also used the prototype to record the presentation of

term paper.

In the next sections, we report on results from

analysing the interactions students had with the mul-

timedia learning object resulting from the capture of

the problem solving class.

5 CASE STUDY: CAPTURE

LECTURE

Using the capture-tool prototype, one instructor cap-

tured one lecture: the capture was made in several

modules, without students in the classroom. The stu-

dents had access to the multimedia learning object to

prepare to their ﬁnal exam.

The lecture captured was a problem solving ses-

sion for a Computer Organization course in which an

instructor solved a total of 15 exercises. These ex-

ercises were related to each other and usually a sub-

sequent exercise used some results from the previous

one. The exercises also become more difﬁcult as the

presentation progressed.

The presentation was organized into 12 modules,

performing a total of 1 hour and 18 minutes of con-

tent. The ﬁrst 3 exercises were grouped in the module

1, module 5 contained 2 exercises, and all the other

modules presented one exercise each.

Figure 5 depicts the multimedia object generated

from the presentation. There are four streams: (1) the

capture of the projected slide, which contained the de-

scription of the exercise; (2) the camera focused on

the conventional whiteboard; (3) the camera focused

on the slide; and (4) the wide-shot camera. Although

the generation process has a feature that allows the

automatic orchestration of the cameras (e.g., the auto-

matic selection of which video stream would be pre-

sented in the main (bigger) window), in this study

case we did not use it. The aim was to exploit the stu-

dents’ interaction, forcing them to choose, for a better

learning experience, which would be the video to be

presented in the main window at each instant.

Figure 5: Problem Solving Presentation.

The multimedia object was made available for the

students in the Web and, using the WebNCL’s log

API, we logged all the interactions carried out by the

students, such as when and where the users clicked

and to which point they seek in the presentation time-

line. The loggged data were stored in a NoSQL

database. We developed python scripts to extract in-

formation relative to how the students interacted with

the multimedia object.

Figure 6 presents information about the time spent

by the students, as well as the number of interactions

they performed with the multimedia object. Each

point in the horizontal axis represents a student (iden-

tiﬁed in the chart as letters from A to R). The blue bars

show the amount of time each student spent watch-

ing the multimedia object (left vertical axis) and red

bars show the number of interactions each student

performed (right vertical axis).

The total duration of the 12 modules was 1 hour

and 18 minutes. Eighteen students watched the pre-

sentation for at least 4 minutes. The average playback

time of these 18 students is 3542.67 seconds (about

59 minutes) with a standard deviation of 2382.23 sec-

onds (about 39 minutes). The average number of in-

teractions of the students is 118.55 with a standard

HowaretheyWatchingMe-LearningfromStudentInteractionswithMultimediaObjectsCapturedfromClassroom

Presentations

Figure 6: Students Interactions.

deviation of 99.58.

Figure 7 summarizes the number of interactions of

each category performed by the students. The inter-

actions were organized in the following categories:

• Main Video Selections: interactions carried out

by the students in order to change the main video

stream;

• Play/Pause: interactions causing the pause and the

resume of the playblack;

• Timeline navigation: interactions that cause a

move forward or backward through the timeline;

• Module Navigation: interactions that cause the

change of the module currently watched;

• Points of Interest: interactions resulting from nav-

igation by points of interest (e.g. slide transitions).

Figure 7: Interactions per Category.

Figure 8 summarizes how much time each module

was watched. In order to get a better visualization, the

values in the left vertical axis were normalized by the

module time length. The blue bars represent the time

in which the presentation was running (not paused)

and the red bars are the time in which the presenta-

tion was paused. The green line represents the num-

ber of students that watched each module for at least

10% of their time length. The ﬁgure suggests that the

modules in which the students spent more time were

Figure 8: Presentation Modules Statistics.

the module 2 and module 4. It also suggests that the

number of different students that watched the mod-

ules decreases as the presentation progress

Figure 9 summarizes the watching attendance of

some modules. The horizontal axis is the number of

seconds of each module (Presentation Space). The

blue line represents the number of times the instant

was watched by students, and the red line the number

of different students that watched each instant.

As the modules always start from second 0, it is

natural that the attendance of the ﬁrst seconds is big-

ger. The points where the blue line is above the red

line mean that the moment was watched more than

once by the same students. This graphic can be useful

for lecturers to ﬁnd out which parts of a lecture are

more useful or important for the students, or even to

identify points where students have difﬁcult to under-

stand. For instance, after the second 800 in Module 1

(Figure 9(a)), the blue line deviates from red line, it

suggests that that segment of module 1 were watched

more times by the students.

Given that the multimedia object has more than

one video stream and that the students can choose

which stream they wish to see as the main stream, the

information of which stream is most selected as the

main stream at each moment can be useful.

Figure 10(a) and Figure 10(b) summarize which

streams were most selected as the main stream in

each moment of, respectively, module 1 and module

4. Each line represents how many times a stream

was watched in a speciﬁc moment. The blue line

refers to the slide projection capture (Figure 5(1)); the

red refers to the camera focused on the conventional

whiteboard (Figure 5(2)); the green camera focused

on the slide presentation (Figure 5(3)); and the purple

the wide-shot camera (Figure 5(4)).

According to Figure 10(a), the more watched

streams were the slide presentation and the white-

board camera. We can also note that the slide presen-

tation is more watched near the moments when there

ICEIS2013-15thInternationalConferenceonEnterpriseInformationSystems

(a) Module 1

(b) Module 2

Figure 9: Modules Attendance.

are slide transitions in the module 1 (seconds 213 and

515). Figure 10(b) suggests that after the second 100

the predominant stream was the whiteboard camera

stream.

Figure 11 illustrates the behavior of 2 students

when interacting with the presentation for module

1, 2 and 4. The student P (blue line) and Q (red

line) are the same students from the Figure 6. The

horizontal axis is the playback timeline and the ver-

(a) Module 1

(b) Module 4

Figure 10: Streams View.

tical axis is the presentation timeline (presentation

space). Vertical straight lines represent a navigation

that the student performed during playback and hori-

zontal straight lines represents moments in which the

student paused the presentation. These graphics allow

to visualize how a student interact with the presenta-

tion in detail. For instance, we can observe in Fig-

ure 11(a) that student P starts watching from second

180 and performed some backward moves mainly in

the end of the presentation. Student Q watched almost

linearly until second 650 and then returned to the be-

ginning of the presentation and watched it again until

the end performing some pauses.

6 LESSONS LEARNED

The graphics were presented to the instructor. He

analysed them taking into account the content of his

presentation, how it was presented and which and how

students interacted with it.

HowaretheyWatchingMe-LearningfromStudentInteractionswithMultimediaObjectsCapturedfromClassroom

Presentations

(a) Module 1

(b) Module 2

Figure 11: Students Navigation.

His ﬁrst observation: “the graphics are very ab-

stract for a teacher to analyse them by himself”. As as

consequence, the remaining analysis was carried out

then with the help of one of the authors. From now

on, what is reported in this section is a combination

of what the teacher observed and some conclusions

of the authors.

Not all students interacted with the multimedia

learning object, even knowing that it could have tips

for the exam. Reasons for this may have been the

commitments of students with other exams, the late

release of the learning object (two days before the

exam), and also its long duration, about one hour and

twenty minutes (4800 s). As shown in Figure 8, sev-

eral of the students only watched the ﬁrst modules.

Besides the reasons already mentioned, some of them

may have found the presentation boring. A question-

naire with explicit questions could help understand

this attitude.

Students were able to view the slides presented in

two ways, watching the video of the instructor pre-

senting (and maybe interacting with) the slide on the

interactive whiteboard, or watching the slide captured

directly from the output of the projector (best qual-

ity). The preference was for the latter, as shown in

Figure 10(a) with the blue and green lines. It is likely

that the type of the presentation, without many inter-

actions with the interactive whiteboard, does not jus-

tify the view of the slide in lower quality.

The resulting multimedia learning object may

consider context information. It can then guarantee

that the focus of the presentation, at every moment, be

automatically taken to the main window display. So,

when the teacher uses the whiteboard, her or his video

could be automatically selected to the main window.

The same applies for the videos associated with the

interactive whiteboard, the application captures, etc.

However, we chose to force the student himself to

perform all the video switching. Some students ex-

pressed frustration with such duty. The goal was to

keep them alert to the presentation in order to make it

less monotonous. The strategy worked. As shown in

Figure 7, 60% of the interactions (975) were used for

selecting the video to the main window. The effec-

tiveness of the strategy in terms of learning, however,

needs to be evaluated.

Figure 7 also shows the limited use of the navi-

gation using Points of Interest. Students preferred to

use the Timeline (14% of interactions) to control the

presentation. Two reasons may be related to this: stu-

dents are used to the paradigm of watching video in

the Web; and the lesson has not encouraged or justi-

ﬁed the need for this type of navigation. However, the

navigation through the modules happened with a fre-

quency (7%) corresponding to the one expected (and

planned) by the teacher.

Figure 9(a) shows that an almost constant public

watched module 1 (in terms of number of student).

However, the blue line shows some peaks in visits to

some parts of the presentation, in terms of times the

segment was played. The moments around the 900th

second are the evident ones. The analysis of the video

in those moments, carried out by the teacher, indicates

that the subject could be presented more clearly – that

is, there is room for improvement in the way the pre-

sentation was made.

Modules 2 and 4 were the most popular, not the 1

as expected for being the ﬁrst. The visiting time was

normalized by the duration of the module in Figure 8.

ICEIS2013-15thInternationalConferenceonEnterpriseInformationSystems

As the ﬁrst module was the one with the longer dura-

tion, it may indicate that large modules are more ver-

bose (which was conﬁrmed by the teacher for mod-

ule 1) and therefore tend to be somewhat repetitive.

Moreover, this feature can be further studied since the

content of module 1 was less complex than the others.

The navigation patterns, illustrated in Figure 6,

show different behaviors by the students. There are

students who simply “watch” the presentation and do

not perform any interaction at all, even to change the

video in the main window, as was the case of the stu-

dents G and O. These, probably thinking they would

be evaluated by their performing in viewing the pre-

sentation, let the presentation run without perhaps

give attention to it. Others, such as students L and

P, watched all the presentation and performed many

interactions. There are also students, as Q and R,

who, besides interacting a lot, also watched repeat-

edly parts of the presentation, nearly doubling the

original time of the presentation. The ﬁgures show

that the number of interactions was proportional to the

time in which the presentation was watched, which in-

dicates a similar degree of interactivity between the

students in the class. Another interesting observa-

tion about the behavior of students was made by the

teacher: “one of the students who watched and inter-

acted the most with the multimedia learning object,

the student N, usually shows a very apathetic behavior

in the classroom”. This may indicate that interactive

multimedia learning objects, generated by capturing

multimodal and multi-device presentations, may be a

good option for students who like to be in control of

what they pay attention to.

7 FINAL REMARKS

Extra-class material may be offered to students in the

form of multimedia objects that integrates synchro-

nized text, image, audio and video explanations on

the studied subject. A learning object like this can

be produced in studios, with support of various pro-

fessionals. Alternatively, as is the case presented in

this paper, the multimedia object can be automatically

generated from the ubiquitous capture of a traditional

lecture in the classroom. The lecture can be delivered

to a group of students, or be delivered to an empty

classroom just for capture purposes. Context infor-

mation informing moments of interest such as slide

transitions can be included in the multimedia object

to provide students with semantic navigation.

The multimedia object should be instrumented to

log the navigation performed by students so that, be-

sides acting as extra-class material, they can be effec-

tive as tool that provides feedback which contributes

to improve its own content. In the situation presented

in this paper, it is the instructor who receives the feed-

back, which she can analyse to identify improvements

not only in terms of the content itself but also in terms

of how the exposition was made at the time of capture.

The case study presented suggests how similar

analyses that can be performed in other presentations,

even though only a portion of the logged information

was used. As a result, the analysis is useful both as

a reference for the preparation of presentations used

in research involving interactive multimedia objects,

and in the research in Education.

Regarding future work, we plan to investigate al-

ternatives for: (a) the enrichment of the graphic inter-

face of the multimedia object so as to improve inter-

activity; (b) the capture of more contextual informa-

tion during the presentation toward providing novel

navigation facilities; (c) the development of visual-

ization tools for the instructor to analyse the informa-

tion captured while the students interacted with the

multimedia object. The aim is to built a general in-

frastructure that helps building similar capture-based

applications (Pimentel et al., 2007).

We also to conduct interdisciplinary research to-

ward better understanding the impact, on education,

of the use of multimedia learning objects built from

the capture of multimodal and multi-device presenta-

tions.

The teacher also noted a relationship between stu-

dent’s performance on assessment on the subject of

the presentation and the time each one spent with the

multimedia learning object. Most who watched and

interacted with all modules of the presentation per-

formed well. The individual analysis of each student

can be performed using graphs similar to those shown

in Figure 11, for instance.

REFERENCES

ABNT (2007). Associac¸

ao Brasileira de Normas T

ecnicas.

2007. Digital Terrestrial Television Standard 06: Data

Codiﬁcation and Transmission Speciﬁcations for Dig-

ital Broadcasting. Technical report, Part 2–GINGA-

NCL: XML Application Language for Application

Coding S

ao Paulo, SP, Brazil.

Abowd, G., Pimentel, M. d. G. C., Kerimbaev, B., Ishiguro,

Y., and Guzdial, M. (1999). Anchoring discussions

in lecture: an approach to collaboratively extending

classroom digital media. In Proc. Conference on Com-

puter support for Collaborative Learning, CSCL ’99.

International Society of the Learning Sciences.

Bianchi, M. (2004). Automatic video production of lectures

using an intelligent and aware environment. In Proc.

HowaretheyWatchingMe-LearningfromStudentInteractionswithMultimediaObjectsCapturedfromClassroom

Presentations

International Conference on Mobile and Ubiquitous

Multimedia, MUM ’04, pages 117–123. ACM.

Bradski, G. (2000). The OpenCV Library. Dr. Dobb’s Jour-

nal of Software Tools.

Brotherton, J. A. and Abowd, G. D. (2004). Lessons learned

from eClass: Assessing automated capture and access

in the classroom. ACM Trans. Comput.-Hum. Inter-

act., 11(2):121–155.

Cattelan, R. G., Baldochi, L. A., and Pimentel, M. D. G.

(2003). Experiences on building capture and access

applications. In Proc. Brazilian Symposium on Multi-

media and Hypermedia Systems, pages 112–127.

Cattelan, R. G., Teixeira, C., Goularte, R., and Pimentel,

M. D. G. C. (2008). Watch-and-comment as a

paradigm toward ubiquitous interactive video editing.

ACM Trans. Multimedia Comput. Commun. Appl.,

4(4):28:1–28:24.

Chou, H.-P., Wang, J.-M., Fuh, C.-S., Lin, S.-C., and Chen,

S.-W. (2010). Automated lecture recording system.

In Proc. International Conference on System Science

and Engineering (ICSSE), pages 167 –172.

Dickson, P. E., Arbour, D. T., Adrion, W. R., and Gentzel,

A. (2010). Evaluation of automatic classroom capture

for computer science education. In Proc. Annual Con-

ference on Innovation and Technology in Computer

Science Education, ITiCSE ’10, pages 88–92. ACM.

Dickson, P. E., Warshow, D. I., Goebel, A. C., Roache,

C. C., and Adrion, W. R. (2012). Student reactions

to classroom lecture capture. In Proc. ACM Annual

Conference on Innovation and Technology in Com-

puter Science Education, ITiCSE ’12, pages 144–149.

ACM.

H.761, R. I.-T. (2009). Nested context language (NCL) and

ginga-NCL for IPTV services. Technical report.

Halawa, S., Pang, D., Cheung, N.-M., and Girod, B. (2011).

ClassX: an open source interactive lecture streaming

system. In Proc. ACM International Conference on

Multimedia, MM ’11, pages 719–722. ACM.

Lampi, F., Kopf, S., and Effelsberg, W. (2008). Automatic

lecture recording. In Proc. ACM International Con-

ference on Multimedia, MM ’08, pages 1103–1104.

ACM.

Liu, T. and Kender, J. (2004). Lecture videos for e-learning:

current research and challenges. In Proc. Interna-

tional Symposium on Multimedia Software Engineer-

ing, pages 574 – 578.

Melo, E. L., Viel, C. C., Teixeira, C. A. C., Rondon, A. C.,

Silva, D. d. P., Rodrigues, D. G., and Silva, E. C.

(2012). WebNCL: a web-based presentation machine

for multimedia documents. In Proc. Brazilian sym-

posium on Multimedia and the web, WebMedia ’12,

pages 403–410. ACM.

uller, R. and Ottmann, T. (2000). The “authoring on

the ﬂy” system for automated recording and replay

of (tele)presentations. Multimedia Systems, 8(3):158–

176.

Nagai, T. (2009). Automated lecture recording system

with avchd camcorder and microserver. In Proc. An-

nual ACM SIGUCCS Fall Conference, SIGUCCS ’09,

pages 47–54. ACM.

Pang, D., Halawa, S., Cheung, N.-M., and Girod, B. (2011).

ClassX mobile: region-of-interest video streaming to

mobile devices with multi-touch interaction. In Proc.

ACM International Conference on Multimedia, MM

’11, pages 787–788. ACM.

Pimentel, M., Abowd, G. D., and Ishiguro, Y. (2000). Link-

ing by interacting: a paradigm for authoring hyper-

text. In Proc. ACM on Hypertext and Hypermedia,

HYPERTEXT ’00, pages 39–48. ACM.

Pimentel, M., Baldochi Jr., L. A., and Cattelan, R. G.

(2007). Prototyping applications to document human

experiences. IEEE Pervasive Computing, 6(2):93–

100.

Ross, G. M. (2011). What’s the use of lectures? - forty

years on. Discourse, 10(3):23–41.

Schulte, O. A., Wunden, T., and Brunner, A. (2008). RE-

PLAY: an integrated and open solution to produce,

handle, and distributeaudio-visual (lecture) record-

ings. In Proc. Annual ACM SIGUCCS Fall Confer-

ence: moving mountains, blazing trails, SIGUCCS

’08, pages 195–198. ACM.

Schwerdt, G. and Wuppermann, A. C. (2011). Sage on the

stage: Is lecturing really all that bad? Education Next,

11(3):62–67.

Viel, C. C., Melo, E. L., Pimentel, M. d. G., and Teix-

eira, C. A. C. (2013). Presentations preserved as in-

teractive multi-video objects preserved as interactive

multi-video objects. In Proc. Workshop on Analytics

on Video-Based Learning.

ICEIS2013-15thInternationalConferenceonEnterpriseInformationSystems