A WORKFLOW MODEL FOR COLLABORATIVE

VIDEO ANNOTATION

Supporting the Workflow of Collaborative Video Annotation and Analysis

Performed in Educational Settings

Cristian Hofmann

, Nina Hollender

and Dieter W. Fellner

Interactive Graphics Systems Group, Technische Universität Darmstadt, Fraunhoferstr. 5, 64283 Darmstadt, Germany

Center for Development and Research in Higher Education, Technische Universität Darmstadt

Hochschulstr. 1, 64289 Darmstadt, Germany

Keywords: Video Annotation, Video Analysis, Computer-supported Collaborative Learning.

Abstract: There is a growing number of application scenarios for computer-supported video annotation and analysis in

educational settings. In related research work, a large number of different research fields and approaches

have been involved. Nevertheless, the support of the annotation workflow has been little taken into account.

As a first step towards developing a framework that assist users during the annotation process, the single

work steps, tasks and sequences of the workflow had to be identified. In this paper, a model of the

underlying annotation workflow is illustrated considering its single phases, tasks, and iterative loops that

can be especially associated with the collaborative processes taking place.

1 INTRODUCTION

Scenario: A group of students use a web-based tool

to analyse video sequences taken from TV panel

discussions with regard to the use of a range of

specific argumentation tactics. Their task is to mark

objects and sequences within the video, to annotate

these selections with text, and to compare and

discuss their results with peers or explore existing

video analysis databases.

Research activities in the area of computer-

supported video annotation have increased during

the last years. Corresponding solutions have been

implemented in various application areas, e.g.

interactive audiovisual presentations in e-Commerce

and edutainment or technical documentations

(Richter et al., 2007). In our research work, we focus

on the support of collaborative video analysis in

learning settings performed by applying video

annotation software. An important characteristic of

video is its ability to transfer and reflect the reality

in a direct manner (Ratcliff, 2003). Annotation

techniques provide referable multimedia documents

that serve as means of description, documentation,

and evidence of analytic results (Hagedorn et al.,

2008; Mikova and Janik, 2006). A growing number

of application scenarios for (collaborative) video

analysis in education can be identified. Pea and

colleagues (2006) report on a university course of a

film science department, in which two different

movie versions of the play „Henry V“ are analysed

by a group of students with respect to the text

transposition by different actors and directors. Other

examples for the application of video analysis in

education are motion analyses in sports and physical

education, or the acquisition of soft skills such as

presentation or argumentation techniques (Hollender

et al., 2008; Pea et al., 2006).

In previous research, a large number of different

research fields and approaches have been involved

in Video Annotation and Analysis Research.

Nevertheless, one relevant discipline has been little

taken into account: The support of the analysis

workflow, which comprises the management of

annotation data with related tasks and system

services. This is especially the case for collaborative

settings. Thus, a majority of today’s applications do

not consider the needs of the users regarding a

complete workflow in video annotation (Hagedorn

et al., 2008). We believe that an appropriate support

of that workflow will also improve learning

processes. By workflow-support, we mean the

199

Hofmann C., Hollender N. and W. Fellner D. (2009).

A WORKFLOW MODEL FOR COLLABORATIVE VIDEO ANNOTATION - Supporting the Workﬂow of Collaborative Video Annotation and Analysis

Performed in Educational Settings.

In Proceedings of the First International Conference on Computer Supported Education, pages 199-204

DOI: 10.5220/0001849101990204

 SciTePress

facilitation of loops and transitions between the

single workflow steps and tasks on the one hand. On

the other hand, appropriate tools and information

can be provided at the proper time, depending on the

current state of the work. By doing so, learners can

obtain information about 1.) which tasks they have

already accomplished, 2.) what is their current state,

and 3.) what are the next steps to do at any time of

the annotation process. Consequently, we expect a

discharge of learners and/or tutors with regard to the

use of such applications and hence an enhancement

of efficiency.

The contribution of this paper is the presentation

of a workflow model for computer-supported

collaborative video annotation. Our investigations

addressed the specific needs of users who work in

teams with a special focus on the application in

educational settings. The results base on interviews

and discussions conducted with experts and users

regarding the sequence of tasks and work steps

within the annotation process, as well as on a

summary and reflection of the existing literature. In

addition, we performed an analysis of the

functionalities, the user interface, and interaction

design of fifteen present applications.

2 RELATED WORK

Bertino, Trombetta, and Montesi (2002) presented a

framework and a modular architecture for interactive

video consisting of various information systems. The

coordination of these components is realized by

identifying inter-task dependencies with interactive

rules, based on a workflow management system. The

Digital Video Album (DVA) system is an

integration of different subsystems that refer to

specific aspects of video processing and retrieval. In

particular, the workflow for semiautomatic indexing

and annotation is focused (Zhang et al., 2003). Pea

and Hoffert (2007) illustrate a basic idea of the video

research workflow in the learning sciences (Pea and

Hoffert, 2007). In contrast to our research work, the

projects mentioned above do not or only to some

degree consider the process for collaborative use

cases. The reconsideration of such communicative

and collaborative aspects requires modifications and

enhancements of the existing approaches and

concepts.

3 A WORKFLOW MODEL FOR

COLLABORATIVE VIDEO

ANNOTATION

Figure 1: The collaborative video annotation workflow

model.

In order to design a collaborative video annotation

application that supports transitions between work

steps and loops within the collaborative process, we

first of all developed a model for the annotation

workflow. For this purpose, we conducted

interviews with experienced video analysts at the

Leibnitz Institute for Science Education in Kiel,

Germany, and at the Institute for Sports Science at

the Technische Universität Darmstadt, Germany.

Furthermore, we held discussions with researchers at

the Knowledge Media Research Center in Tübingen,

Germany, who had studied the collaborative design

of video-based hypermedia in university courses

(Stahl et al., 2006). Our model is also based on

existing literature related to video analysis

workflows (Baecker et al., 2007; Brugmann et al.,

2004; Hagedorn et al., 2008; Harrison and Baecker,

1992; Mikova and Janik, 2006; Pea and Hoffert,

2007; Ratcliff, 2003; Seidel et al., 2005). The

identified publications appeared to focus each on a

different essential part of the whole annotation

process. However, the publications do not fully

cover the aspects of team work. We therefore pooled

the single results into one common model, and in a

second step (and as our main contribution), the

model was extended by the description of

collaborative activities performed by a shared group

of video annotators. On that score, we integrated

conclusions from the collaborative hypermedia-

design courses into the models and concepts

referring to the annotation workflow. Additionally,

we performed an analysis of the functionalities, the

CSEDU 2009 - International Conference on Computer Supported Education

200

user interface, and the interaction design of fifteen

existing applications. By doing so, we were able to

tag the specific services provided by these tools, as

well as the tasks and activities that can be conducted

by users.

As demonstrated in Figure 1, the annotation

process can be divided into three main phases:

Preliminary, Working with the Video Annotation

Software, and Externalisation. In the following, the

particular items of these steps are going to be

pictured.

3.1 Preliminary

The preliminary phase includes any task and work

step that has to be accomplished before conducting

the video analysis using annotation software.

3.1.1 Planning

At the beginning of a project, several decisions have

to be made concerning the video capturing

procedure (if video recording is required), e.g., how

many cameras should be involved, if lighting is

required, or whether a storyboard needs to be created

(Pea and Hoffert, 2007; Ratcliff, 2003).

Furthermore, one must consider if there is any

additional data (like sensor or eye-tracking data) that

has to be captured in parallel with the video

recordings (Brugmann and Russel, 2004; Hagedorn

et al., 2008). From a methodical view, a theoretical

framework might need to be built up. The

framework implies the identification of research

areas, research questions and hypotheses (Seidel et

al., 2005). In the case of video analysis, the

definition of category systems is often required.

They may be either developed deductively based on

a theory, or inductively based on the video material

(Bortz and Döring, 2006; Link, 2006; Mikova and

Janik, 2006; Seidel et al., 2005).

3.1.2 Capturing/Gathering

In a next step, the video and additional data needs to

be recorded. If files already exist, the data can be

gathered from specific databases or storage media

(Hagedorn et al., 2008; Mikova and Janik, 2006; Pea

and Hoffert, 2007).

3.1.3 Storage

Depending on the format of the collected data, the

videos need to be encoded or digitalized to suitable

files (Mikova and Janik, 2006; Pea and Hoffert,

2007). Then, the data is organized and stored (Pea

and Hoffert, 2007). That can lead to a re-editing of

the video files to more granulated units (Hagedorn et

al., 2008; Stahl et al., 2006).

3.2 Working with the Video

Annotation Software

Generally, working with the video software

comprises activities from system configuration to

video segmentation and annotation procedures and

information sharing.

3.2.1 Configuration

Before starting the collaborative work, the

participants are tracked and assigned to accounts and

user groups. The allocation to groups results from

the specific tasks or bounded video parts that have to

be revised. Users are associated with specific roles

that particularly include access rights and

restrictions. Before or during the annotation process,

a group administrator is able to distribute the

annotation tasks among the individual users (Lin et

al., 2003; Volkmer et al., 2005). Furthermore,

specific project preferences can be adjusted and the

graphical user interface may be customized

(Brugmann and Russel, 2004). In video analysis

projects, if an already existing category system is

applied, or if a category system has been developed

deductively, the category system needs to be fed into

the system. The possibility to change and elaborate

category systems needs to be given, especially if an

inductive or consensual approach for the

development of a category system has been chosen

(Bortz and Döring, 2006).

The following three work steps – segmentation,

annotation and exploration – are regarded as one

(collaborative) unit. Pea and Hoffert (2007) describe

the video analysis process as an assembly of de-

composition activities (segmentation, coding,

categorization, and transcription) and acts of the re-

composition of video data (rating, interpretation,

reflection, comparison, and collocation). Activities

of de- and recomoposition are closely interrelated

(Bortz and Döring, 2004; Pea and Hoffert, 2007). It

is a complex process that contains circular and

recursive loops, in which the video analyst

alternately marks, transcribes and categorizes,

analyzes and reflects, and needs to conduct searches.

This process is accompanyied by data reviews,

comparisons, and consequently modifications (Pea

and Hoffert, 2007; Ratcliff, 2003; Stahl et al., 2006).

Hence, segmentation, annotation, and exploration, as

higher-level categories, also have to be considered

A WORKFLOW MODEL FOR COLLABORATIVE VIDEO ANNOTATION - Supporting the Workflow of Collaborative

Video Annotation and Analysis Performed in Educational Settings

201

related to each other. In collaborative use cases, any

data arising from segmentation, annotation can be a

collaborative contribution, and exploration also can

include collaborative activities (cp. Brugmann et al.,

2004; Brugmann and Russel, 2004; NRC, 1993).

Following, these three workflow items and their

collaborative aspects are going to be described.

3.2.2 Segmentation

Annotators start marking segments of interest and

chunking the video into subsets they want to refer to.

For this purpose, they may use manual,

semiautomatic, or automatic techniques (Finke,

2005; Kipp, 2008; Pea and Hoffert, 2007). Examples

are the semiautomatic keyframe method which is

accompanied by linear interpolation or automatic

approaches like object or scene detection, scene-

based event logging, or object of focus detection.

(Banerjee et al., 2004; Bertini et al., 2004; Finke,

2005; Hofmann and Hollender, 2007; Snoek and

Worring, 2005). Corresponding to the time code in

which an event takes place, users can define a point

in time (single video frame) or a time interval

(multiple following frames). Furthermore, an

enrichment of this temporal information with spatial

information is essential for almost every case of

video “pointing” (Finke, 2005; Hofmann and

Hollender, 2007; Kipp, 2008).

Video segments can be defined either by a single

person or by an assigned group in a collaborative

manner. Stahl et al. (2006) report on university

courses, in which students first discussed the video

segments they wanted to define, before actually

entering that information into the used application.

Thus, (textual) annotations that serve as

communication contributions are resources for the

coordination of collaborative segmentation

activities. Furthermore, in some of the identified use

cases, the segmentation task is partitioned and

assigned to individual users or groups. For example,

group A chunks the video according to a certain

characteristic 1, group B seeks for characteristic 2,

and so on. Furthermore, users or groups may also

work with different categorization systems.

3.2.3 Annotation

After segmenting the video, users continue with the

annotation of these subsets and pulling annotations

into order, usually within a layer-based timeline

(Link, 2006; Kipp, 2008). Different forms of

annotations have been identified. In general, they

can be summarized as any kind of additional

information (Finke, 2005; Stahl et al., 2006). One

type of annotation is the linking of metadata or

descriptive data. Apart from enriching video

segments with information like user data or

processing data, descriptive data also can be the

categorization based on a certain category system.

E.g., tagging is a useful method for collaboratively

organizing the amount of video-based information

(Baecker et al., 2007).

Users may describe observed behaviors, events,

or objects within the video. In most cases, they are

allowed to enter free textual annotations. Indeed,

also other types of media formats are possible

(Finke, 2005) During the annotation phase, a further

task can be the transcription of verbal and non-

verbal communication, which is often used in the

context of communication or interaction analyses

(Mikova and Janik, 2006). In video analysis, the

annotation phase also includes interpretation, rating,

and reflecting. These activities can be performed

either qualitatively, e.g. in discussions, or

quantitatively, by means of statistic methods

provided by specialized software (Hagedorn et al.,

2008; Pea and Hoffert, 2007).

Like the segmentation task, annotation may be

divided and distributed to different groups. In that

case, any user has got access to his or her group’s

annotations and is allowed to modify them. Thus,

annotated information becomes a shared

contribution (Finke, 2005; Hofmann and Hollender,

2007).

Communicational contributions constitute an

essential kind of annotation with respect to

collaboration. They enable the communication

between co-annotators as well as the organization of

common tasks. Most of the current applications

realize group communication by providing textual

comments similar to web forums. When users work

separatedly, they need to discuss their annotations,

conclusions, and the analysis process with other

participants (synchronous and/or asynchronous)

(Brugmann et al., 2004; Brugmann and Russel,

2004; NRC, 1993). Thus, discussion is a central

element within the collaborative annotation process.

Particularly in the context of consensual approaches,

discussion is a means of agreement and consistency

of different annotators’ results. Discussion often

leads to a return to previous steps of the workflow.

In the end, the final results of the annotation project

arise from iterative loops through the annotation

workflow, in which the data is modified and

adjusted. E.g., the interviewed experts at the IPN

report on a training phase that is conducted before

the actual video analysis on new video material. The

main goals are to develop basic analytic skills (Stahl

CSEDU 2009 - International Conference on Computer Supported Education

202

et al., 2006), perform checks for objectivity and

reliability, applying different annotator agreement

measures (Hagedorn et al., 2008; Link, 2006;

Mikova and Janik 2006; Seidel et al., 2005), and to

validate the deployed category system (Seidel et al.,

2005). As a consequence, these checks lead to a

return to the planning and configuration phases

(Mikova and Janik, 2006; Seidel et al., 2005).

3.2.4 Exploration

Searching and browsing always go along with

segmentation and annotation activities. Pea and

Hoffert (2007) assume that surveying one’s own

data is required to properly conduct an analysis.

Especially in collaborative annotation, users also

need to search for results of co-annotators, experts,

or other sources (Hollender et al., 2008). The

exploration of external information applies to

learning scenarios: The interview at the IPN

revealed that novice annotators use already analyzed

videos as training material and compare their own

results with the results of their expert colleagues

(Hollender et al., 2008).

Exploration of co-annotator’s data also can be an

issue in asynchronous collaborative projects which

proceed over a long timeframe. After being absent,

users may need to track the changes performed by

other annotators involved in the project. In this

context, they also need to browse chat or

commentary histories (Baecker et al., 2007).

Exploration also includes restructuring of the

data representation. With regard to this, annotators

are allowed to contrast relevant data with each other,

or to hide less important information. Pooling

commonly categorized information and making

statistical comparisons are part of video re-

composition (Pea and Hoffert 2007). According to

this, exploration also supports reflection. Thus, it

facilitates the consideration of multiple views of the

video. We take this as an important aspect with

regard to learning settings, since users are allowed to

obtain a view on the video’s contents beyond their

subjective point of view (Stahl et al., 2006; Volkmer

et al., 2005).

3.3 Externalisation

The externalisation phase includes activities without

the use of the annotation application, consisting of

any kind of publishing operations. It begins with

editing and converting the data into several formats,

and moves on to presenting this information in

corresponding media (Pea and Hoffert, 2007). E.g.,

it can be used for demonstration purposes (Mikova

and Janik, 2006). As mentioned above, databases of

analyzed video material can serve as digital resource

for information retrieval in following analysis

sessions. Furthermore, it is often necessary to export

data for further analytic inspection with specific

applications. Users create surveys, assemblies of

similarly categorized video subsets, and perform

comparisons of annotated data (Hagedorn et al.,

2008; Pea and Hoffert, 2007). The interviewed

experts report on exporting data to various formats,

such as tab-delimited text or transcription files.

Thus, further analytic activities can be executed with

tools and services that are not provided by the video

annotation application.

4 CONCLUSIONS AND FUTURE

WORK

In this paper, we illustrate a workflow model for

video annotation performed in teams. We

particularly point out the iterative loops that are

passed through, especially in work steps performed

in a collaborative manner.

Based on the developed model, we are currently

working on the architecture design and

implementation of a web-based collaborative video

annotation application that supports transitions

between phases and sub tasks of the workflow, and

loops within the annotation process. In summary, we

expect to achieve elementary improvement of

interaction with the software.

REFERENCES

Abowd, G.D., Gauger, M., Lachenmann, A., 2003. The

Family Video Archive: an annotation and browsing

environment for home movies. In MIR '03, 5

ACM

SIGMM international Workshop on Multimedia

Information Retrieval. ACM Press, pp. 1-8.

Baecker, R. M., Fono, D., Wolf, P., 2007. Toward a Video

Collaboratory Video research in the learning sciences.

In Goldman, R., Pea, R., Barron, B., and Derry, S.J.

(ed.) Video Research in the Learning Sciences.

Lawrence Erlbaum Associates, pp. 461-478.

Banerjee, S., Cohen, J., Quisel, T., Chan, A., Patodia, Y.,

Al-Bawab, Z., Zhang, R., Black, A., Stern, R.,

Rosenfeld, R., Rudnicky, A., Rybski, P. E. Veloso,

M., 2004. Creating multi-modal, user-centric records

of meetings with the carnegie mellon meeting recorder

architecture. In ICASSP 2004 Meeting Recognition

Workshop.

A WORKFLOW MODEL FOR COLLABORATIVE VIDEO ANNOTATION - Supporting the Workflow of Collaborative

Video Annotation and Analysis Performed in Educational Settings

203

Bertini, M., Del Bimbo, A., Cucchiara, R., Prati, A., 2004.

Applications ii: Semantic video adaptation based on

automatic annotation of sport videos. In MIR ´04, 6th

ACM SIGMM International Workshop on Multimedia

Information Retrieval. ACM Press, pp. 291-298.

Bertino, E., Trombetta, A., Montesi, D., 2002. Workflow

Architecture for Interactive Video Management

Systems. In Distributed and Parallel Databases,

Springer Netherlands, pp. 33-51.

Bortz, J., Döring, N., 2006. Forschungsmethoden und

Evaluation für Human- und Sozialwissenschaftler.

Springer. Berlin, 4

edition.

Brugman, H., Crasborn, O. A. & Russel, A., 2004.

Collaborative annotation of sign language data with

peer-to-peer technology. In LREC 2004, The fourth

international conference on Language Resources and

Evaluation. European Language Resources

Association, pp. 213-216.

Brugman, H., & Russel, A., 2004. Annotating multi-media

/ multi-modal resources with ELAN. In LREC 2004,

The fourth international conference on Language

Resources and Evaluation. European Language

Resources Association, pp. 2065-2068.

Finke, M., 2005. Unterstützung des kooperativen

Wissenserwerbs durch Hypervideo-Inhalte.

Dissertation, Technische Universität Darmstadt,

Germany.

Hagedorn, J., Hailpern, J. Karahalios, K.G., 2008. VCode

and VData: illustrating a new framework for

supporting the video annotation workflow. In AVI '08,

Working Conference on Advanced Visual Interfaces.

ACM Press, pp. 317-321.

Harrison, B.L. Baecker, R.M., 1992. Designing video

annotation and analysis systems. In CGI ’92,

Conference on Graphics interface. Morgan Kaufmann

Publishers, pp. 157-166.

Hofmann, C., Hollender, N., 2007. Kooperativer

Informationserwerb und -Austausch durch

Hypervideo. In Mensch & Computer 2007: Konferenz

für interaktive und kooperative Medien. Oldenbourg

Verlag, pp. 269-272.

Hollender, N., Hofmann, C., Deneke, M., 2008. Principles

to reduce extraneous load in web-based generative

learning settings. In Workshop on Cognition and the

Web 2008, pp. 7-14.

Lancy, D.F., 1993. Qualitative research in education,

Longman. New York.

Link, D., 2006. Computervermittelte Kommunikation im

Spitzensport, Sportverlag Strauß. Köln.

Lin, C.Y., Tseng, B.L. Smith, J.R., 2003. Video

Collaborative Annotation Forum: Establishing

Ground-Truth Labels on Large Multimedia Datasets.

In TRECVID 2003 Workshop.

Mikova, M., Janik, T., 2006. Analyse von

gesundheitsfördernden Situationen im Sportunterricht:

Methodologisches Vorgehen einer Videostudie. In

Mužík, V., Janík, T., Wagner, R. (ed.) Neue

Herausforderungen im Gesundheitsbereich an der

Schule. Was kann der Sportunterricht dazu beitragen?

MU. Brno, pp. 248-260.

National Research Council Comittee on a National

Collaboratory. National Collaboratories, 1993.

Applying information technology for scientific

research, Nation Academy Press. Washington, DC.

Kipp, M., 2008. Spatiotemporal Coding in ANVIL. In

LREC 2008, The sixth international conference on

Language Resources and Evaluation. European

Language Resources Association.

Pea, R., Hoffert, E., 2007. Video workflow in the learning

sciences: Prospects of emerging technologies for

augmenting work practices. In Goldman, R., Pea, R.,

Barron, B., Derry, S.J. (ed.) Video Research in the

Learning Sciences. Lawrence Erlbaum Associates.

London, pp. 427-460.

Pea, R., Lindgren, R., Rosen, J. , 2006. Computer-

supported collaborative video analysis. In 7th

International Conference on Learning Sciences.

International Society of the Learning Sciences, pp.

516-521.

Ratcliff, D., 2003. Video Methods in Qualitative Research.

In Camic P.M., Rhodes, J.E., Yardley, L. (ed.)

Handbook of Qualitative Research in Psychology:

Expanding Perspectives in Methodology and Design,

American Psychological Association, pp. 113-130.

Richter, K., Finke, M., Hofmann, C., Balfanz, D., 2007.

Hypervideo. In Pagani, M. (ed.) Encyclopedia of

Multimedia Technology and Networking, Idea Group

Pub. 2

edition.

Seidel, T., Dalehefte, I. M., Meyer, L. (ed.), 2005. How to

run a video study. In Technical report of the IPN

Video Study, Waxmann, Münster.

Snoek, C. G. M., Worring, M., 2005. Multimodal video

indexing: A review of the state-of-the-art. In

Multimodal Tools and Applications. Springer, pp. 5-

35.

Stahl, E., Finke, M., Zahn, C., 2006. Knowledge

Acquisition by Hypervideo Design: An Instructional

Program for Univiversity Courses, Journal of

Educational Multimedia and Hypermedia, 15, 3.

Academic Research Library, p. 285.

Volkmer, T., Smith, J. R., Natsev, A., 2005. A web-based

system for collaborative annotation of large image and

video collections: an evaluation and user study. In

ACM MULTIMEDIA '05, 13th Annual ACM

international Conference on Multimedia. ACM Press,

pp. 892-901.

Zhang, Q.Y., Kankanhalli M.S., Mulhem, P., 2003.

Semantic video annotation and vague query, In MMM

‘03, 9th International Conference on Multimedia

Modeling, pp. 190-208.

CSEDU 2009 - International Conference on Computer Supported Education

204