Program Analysis and Evaluation using Quimera

Daniela Fonte

, Ismael Vilas Boas

, Daniela da Cruz

, Alda Lopes Gancarski

and Pedro Rangel Henriques

Department of Informatics, University of Minho, Braga, Portugal

Institut Telecom, Telecom SudParis, CNRS UMR Samovar, 9 Rue Charles Fourrier, 91011 Evry, France

Keywords:

Automatic Grading System, Contest Management, Program Evaluation, Automatic Judgement.

Abstract:

During last years, a new challenge rose up inside the programming communities: the programming contests.

Programming contests can vary slightly in the rules but all of them are intended to assess the competitor skills

concerning the ability to solve problems using a computer. These contests raise up three kind of challenges:

to create a nice problem statement (for the members of the scientiﬁc committee); to solve the problem in a

good way (for the programmers); to ﬁnd a fair way to assess the results (for the judges). This paper presents

a web-based application, QUIMERA intended to be a full programming-contest management system, as well as

an automatic judge. Besides the traditional dynamic approach for program evaluation, QUIMERA still provides

static analysis of the program for a more ﬁne assessment of solutions. Static analysis takes proﬁt from the

technology developed for compilers and language-based tools and is supported by source code analysis and

software metrics.

1 INTRODUCTION

An important aspect when learning programming lan-

guages is the ability to solve practical problems. This

ability can be easily stimulated with competition, like

the one present in a programming contest. According

to experts on the ﬁeld, the challenge associated with

competitive environments provides a meaningful way

to learn and easily acquire practical skills.

In a typical programming contest, competitors

participate in teams to solve a set of problems. For

each problem, the team submits the source code of

the program developed to solve the problem. Many

well known programming contests in the world —

like ACM-ICPC

or IOI

— are based on the auto-

matic grading of the solutions proposed. It means

that the submitted code will be immediately evaluated

by an automatic grading system. The evaluation nor-

mally involves tasks like running the program over

a set of predeﬁned tests (actually a set of input data

vectors), and comparing each result (the actual output

produced by the submitted code) against the expected

output value. Time and memory space consumptions

International Collegiate Programming Contest http://

cm.baylor.edu/welcome.icpc

International Olympiad in Informatics http://

www.ioinformatics.org/index.shtml

are usually measured during execution and are taken

into account for this dynamic grading approach. This

process is typically complemented by the action of a

human judge, who takes the ﬁnal grade decision ac-

cording to the speciﬁc rules for each contest (Leal,

2003).

The research on programs that are capable to au-

tomatically grade source code is not a recent topic.

In 1965, Forsythe and Wirth (Forsythe and Wirth,

1965) introduced a system that follows the fundamen-

tal principle of the modern grading systems, validat-

ing the submitted solutions with a set of tests (pairs of

input-output vectors). With the evolution of comput-

ers, grading systems increased in complexity, diversi-

fying the tests made to the subject programs and intro-

ducing tools for monitoring the grading process (Leal

and Moreira, 1999).

Nowadays grading systems can be distinguished

according to the type of the source code veriﬁcation.

This veriﬁcation can be done employing two differ-

ent techniques: static and dynamic. The second one

focuses on the execution of the program against a

set of predeﬁned tests (the traditional one, already

described above). The ﬁrst one, more recent, takes

proﬁt from the technology developed for compilers

and language-based tools and is supported by source

code analysis and software metrics. Our proposal is

209

Fonte D., Vilas Boas I., da Cruz D., Lopes Gancarski A. and Rangel Henriques P..

Program Analysis and Evaluation using Quimera.

DOI: 10.5220/0004001702090219

In Proceedings of the 14th International Conference on Enterprise Information Systems (ICEIS-2012), pages 209-219

ISBN: 978-989-8565-11-2

 2012 SCITEPRESS (Science and Technology Publications, Lda.)

to combine both approaches and provide a system in

which the ﬁnal grade is a combination of the informa-

tion obtained from both analyzers.

So, in this paper we present a web-based appli-

cation, named QUIMERA, that extends the traditional

dynamic grading systems with static analysis aimed

at the automatic evaluation and ranking of program-

ming exercises in competitive learning or program-

ming contest environments. QUIMERA provides a full

contest management system, as well as an automatic

judgement procedure.

The paper is organized in 6 sections. Besides

the Introduction and Conclusion, Section 2 introduces

the basic concepts and similar tools; Section 3 gives

an overview of QUIMERA, presenting its architecture

and main features; Section 4 provides more details

about each feature, illustrating QUIMERA functionali-

ties; and Section 5 presents the technical details un-

derlying the system implementation.

2 RELATED WORK

As previously referred, the approaches to develop

Automatic grading systems can be distinguished be-

tween static and dynamic (Dani ´c et al., 2011).

Static approaches include systems that check the

submitted program against a provided scheme, to ﬁnd

the degree of similarity relatively to a set of character-

istics. In this category we have, for example, the Web-

based Automatic Grader (Zamin et al., 2006), which

evaluates programming exercises written in Visual

Basic, C or Java languages. One of its disadvantages

is that it can not be used for testing the correctness

of programs which contain input and output opera-

tions (Dani ´c et al., 2011). Another important disad-

vantage is that better classiﬁcation is assigned to solu-

tions that are more similar to the provided scheme, pe-

nalizing different programming styles (Rahman et al.,

2008) that can be so good or even better.

Dynamic approaches include systems that eval-

uate the submitted program by running it through

a set of predeﬁned tests. One example is Online

Judge

, implemented to help in the preparation of

ACM-ICPC (Cheang et al., 2003). Another exam-

ple is Mooshak

, a system originally developed for

managing programming contests, and a reference tool

for competitive learning (Leal, 2003; Leal and Silva,

2008). Generally these approaches use a simple string

comparison between the expected output and the out-

put actually produced to determine if both values are

http://uva.onlinejudge.org

http://mooshak.dcc.fc.up.pt/

equal (the program submitted will be considered cor-

rect only if this condition is true); of course, this strict

comparison can be a limitation — it is the main draw-

back of these systems.

There are also several other tools, used by instruc-

tors of some modern educational institutions, that

facilitate the automatic grading of programming as-

signments (Patil, 2010). Some of them are devel-

oped as web applications, where users can exercise

their programming skills by coding a solution for the

given problem — like Practice-It

or Better Program-

ming

. One of the advantages of these systems is

that the user gets instantaneous feedback about their

answers, which help him to attempt the problem un-

til he reaches the correct solution. Other example is

WebCAT

, a tool focused on test driven development

which supports the grading of assignments where stu-

dents are asked to submit their own test cases.

One major disadvantage of these traditional ap-

proaches is their incapability to analyze the way the

source code is written. This is especially relevant in

educational environments, where the instructor wants

to teach programming styles or good-practices for a

speciﬁc paradigm/language. This feature would be

crucial to detect situations where the submitted so-

lution does not comply with the exercise rules. As

an example consider a typical C programming exer-

cise that asks the student to implement a Graph us-

ing adjacency lists to print the shortest-path between

two given nodes. The referred grading systems will

consider completely correct a solution implemented

with an adjacency matrix if the ﬁnal output is equal

to the expected one; however, that solution is not ac-

ceptable because it does not satisﬁes all the assign-

ment requirements. Or even more dramatic, if the user

computes by hand the shortest-path and the submitted

program only prints it, the solution will again be ac-

cepted because the evaluation system can not detected

such erroneous situation.

This means that traditional dynamic grading sys-

tems leave aside one important aspect when assessing

programming skills: the source code quality. How-

ever, other tools like static code analyzers (see for

example Frama-C

or Sonar

), are able to identify

the structure and extract complementary information

from the source code in order to understand the way

the program is written and discuss its quality in terms

of software metrics that can be easily computed. They

can be invoked after compilation and do not need any

http://webster.cs.washington.edu:8080/practiceit/

http://www.betterprogrammer.com

http://web-cat.cs.vt.edu/

http://frama-c.com/what is.html

http://www.sonarsource.org/features/

ICEIS2012-14thInternationalConferenceonEnterpriseInformationSystems

210

Figure 1: QUIMERA architecture modules view.

execution to produce grading data about the program

under evaluation. Thus, if combined with traditional

grading systems, the static analyzers can provide an

accurate notion of the source code quality.

This assumption led to the construction of sys-

tems like CourseMaker

or Boss

, that improve the

dynamic testing mechanism by calculating metrics

and performing style or quality analysis. Provid-

ing this immediate feedback to users (students/com-

petitors or instructors/judges) is obviously a relevant

extra-value. A very recent system, AutoLEP (Tiantian

et al., 2009; Wang et al., 2011), improves traditional

grading mechanisms by combining source code static

analysis, similarity degree and dynamic testing. Sum-

ming up, it evaluates the program construction and

how close the source code is from the correct solu-

tion.

As a consequence of the research done, QUIMERA

aims at evaluating and ranking automatically pro-

gramming exercises in competitive learning or pro-

gramming contest environments, by combining a very

complete static source code analysis with dynamic

analysis. Thus, QUIMERA is a system capable of en-

suring the grading of the submitted solution based not

only in its capability of producing the expected out-

put, but also considering the source code quality and

accuracy. Our system guarantees that different pro-

gramming styles will not be penalized if they produce

the correct output and satisfy all the requirements; this

can be an advantage on learning environments, be-

http://www.coursemaker.co.uk/whatis.html

http://www.dcs.warwick.ac.uk/boss/about.php

cause the student is stimulated to ﬁnd different correct

solutions. Moreover, this does not restrict the contest

to problems that have only one correct solution, which

can represent a signiﬁcant reduction in time and effort

for larger problems.

Besides encouraging competitive learning by pro-

viding immediate feedback to the user, our system

completes its static analysis with a plagiarism detec-

tion tool, in order to prevent fraud among submitted

solutions (a common issue in learning environments).

To the best of our knowledge, this feature is not pro-

vided by the cited tools.

QUIMERA also allies a simple and intuitive user in-

terface with several graphics exhibiting various statis-

tics concerning the contest ﬂow. These statistic graph-

ics are useful for the competitors, as they illustrate

their personal evaluation and performance, and are

also useful from the judges, as they provide overviews

of each competitor performance, of the current con-

test, and of all the problems/assignments proposed.

3 QUIMERA OVERVIEW

QUIMERA is a web-based application which provides

a full management system for programming contests,

as well as a semi-automatic feature for assessing pro-

grams. It allows to create and manage contests, or

programming exercises. For a new contest, QUIMERA

permits to add problems (statements/requirements +

input-output vectors for testing), register competitors

and associate them to groups. Moreover, QUIMERA of-

ProgramAnalysisandEvaluationusingQuimera

211

Figure 2: QUIMERA Main Page.

fers various facilities to follow up contests, allowing

to monitor the overall process and different activities

involved.

To support all the functionalities announced, the

application is complex and its success was a conse-

quence of the architecture drawn and the use of recent

and powerful technologies. This section is devoted

to QUIMERA architecture and a general description of

the application. Implementation details are postponed

until Section 5.

3.1 Architecture

Basically, QUIMERA architecture is organized in three

levels, as can be seen in Figure 1. This layered struc-

ture follows the Model-View-Controller

(MVC) pat-

tern, in order to create a clear separation among the

different aspects of the application (business logic,

user interface, and orchestration) while providing a

loose coupling between these elements.

The Model layer represents the core of QUIMERA

system, this is, it models all data and basic opera-

tions realizing the underlying business logic. As de-

scribed in Figure 1, this layer has two main compo-

nents, one for Data Storage and another for System

Management. In our case we split the Data Store

into a traditional relational Database—to store gen-

http://st-www.cs.illinois.edu/users/smarch/st-

docs/mvc.html

eral data concerned with contests, users, history, etc.

— and a Submission Repository—to archive submis-

sions (problem statements, competitor solutions and

the testing set). A Data Access Interface (objectiﬁca-

tion component

) creates a map between entities in

the databases and objects in the program allowing a

smooth interface between Data Storage and System

Management components. The System Management

sub-layer concentrates all the tasks concerned with

users and contest creation, follow-up and monitoring.

It is composed of three modules:

• Contest Manager, responsible for the basic con-

test management tasks (like create or edit a con-

test);

• Problem Manager, responsible for the prob-

lem/assignment creation (submission of a new

statement or a set of the input-output vectors);

• Solution Manager, responsible for the submission

of the answers (or solutions) developed by the

competitors.

The View layer, which renders the data model into

a web page suitable for interaction with the user, is

based on a set of Twig templates to generate the differ-

ent user proﬁles. This layer also enables a command

line interface which allows to perform the essential

system administration tasks.

As will be seen later, Doctrine2 tool was used to imple-

ment this module.

ICEIS2012-14thInternationalConferenceonEnterpriseInformationSystems

212

Figure 3: QUIMERA Competitor Interface — Contest Main Page.

Finally, the Controller layer reacts to the user ac-

tions and, as appropriate, changes the view interface

or communicates with the Model layer to produce the

answers required. It can be compared to the Maestro

that drives the orchestra.

Figure 1 also shows that in our architecture the

Grader was put aside in an independent component.

This decision aims at system ﬂexibility, allowing for

easy extensions to support other source languages, or

even for the incorporation of different static or dy-

namic analyzers. This module is incharge of all the

assessing and grading tasks. Besides the Compiler, it

is composed of three modules:

• Source Code Analyzer, responsible for assessing

the source code through the analysis of its static

properties and the evaluation of a set of metrics;

• Dynamic Analyzer, responsible for the execution

of the compiled code and the veriﬁcation of the

output produced for each input data vector;

• Grader, responsible for grading the submitted

program (the competitor answer), according to

the information delivered by both Analyzers; this

module also computes and delivers various statis-

tics about the answer and competitor behavior.

The design approach above described has proven

to be very effective in what concerns the maintainabil-

ity effort and the application ﬂexibility.

3.2 Actors and Roles

QUIMERA provides different user interfaces targeted to

the proﬁles that reﬂect the roles of the different con-

test participants, as descried bellow.

• Administrator – this view has two modes: Admin-

istration and General. The ﬁrst one is used to ac-

cess the back-end area to setup the system data

and make it operational. The second one allows

to participate in a contest as a general user.

• Teacher – this view allows a user to publish prob-

lems in a contest, add a set of tests.

• Competitor – used for the player associate himself

to a Contest and a Team and compete.

• Judge – used to act as a decision maker, giving

feedback about the team submissions.

• Guest – this mode allows any user (not involved

in any of the previous roles) to follow the progress

of a contest in an anonymous mode (without being

registered).

For security purposes and to ensure that only au-

thorized users access the system, all the user proﬁles

are accessible by authentication (except for the Guest

mode). For this, it is mandatory that users register

themselves on the system. The Administrator may

posteriorly deﬁne new permissions for Teachers and

Judges in each contest.

4 USING QUIMERA

QUIMERA user interface offers a simple and intuitive

way to set up and manage contests, complemented

with an extended feature for assessing and grading

programming exercises. Figure 2 shows the main

page of QUIMERA user interface, which follows the

depicted structure: it is composed of a Main Menu,

where the user can ﬁnd the principal navigation op-

tions; a Secondary Menu that lists the options related

to the actual page content; the Sitemap, which shows

the user current location on the application and can be

used to navigate through the website levels too; and

the Content area, where the information is displayed.

QUIMERA offers full support for evaluating source

code written in C language, since it is widely used

both in academic and industrial environments. In

ProgramAnalysisandEvaluationusingQuimera

213

Figure 4: QUIMERA Teacher Interface — List of submitted solutions.

what follows we brieﬂy present how to use QUIMERA

and its major features.

4.1 Contest and Problem Management

The setting-up process for a new contest is performed

by the system Administrator. Besides the contest name

and duration, it associates to each contest a group of

Teachers and a group of Judges.

After a contest creation, the associated Teachers

can use QUIMERA front-end to publish new problems,

distributed by phases. Associated to each problem de-

scription, it is required to deﬁne the maximum execu-

tion time allowed and to provide a set of input val-

ues and the respective expected outputs for testing the

solutions submitted by each competitor team. When

submitting a new test, Teachers can deﬁne if it will be

publicly available as example for competitors, or if it

will be hidden and only used for grading purposes.

In their turn, associated Judges can consult the

current submissions and the evaluation statistics.

They can also follow the contest ﬂow by accessing

the contest statistics, and it will be asked to them to

assign a ﬁnal grade to each submission.

To participate in a contest, Competitors need to

tended contest. This association can be done directly

by the Administrator, by the Teachers or by the Com-

petitor itself. For this purpose and after choosing the

desired contest in the Active Contest list (which can

be accessed through the Secondary Menu of QUIMERA

Main Page depicted in Figure 2), the user should se-

lect the option ”Join Contest” on the Secondary Menu

of the Contest Preview page (as depicted in Figure 3).

Moreover, the user must associate himself to a

Team, as contests are organized in Teams and not in

individual competitors. The same Team is allowed

(and is encourage) to compete in several contests,

however in the same competition, a user can only be

assigned to one Team.

4.2 Answer Submission

To submit a solution

, a Competitor starts selecting

the respective contest page. It is then shown the prob-

lem statement and some examples of the desired input

and the correspondent expected output. After select-

ing the option for submitting an answer, the user up-

loads the respective ﬁle. If the upload process is un-

successful, the system informs the user to try again.

If successful, QUIMERA compiles the submitted (up-

loaded) program and notiﬁes the user about this task.

In case of compilation errors, QUIMERA reports the

errors found and provides some tips aiming at guiding

the Competitor to a correct them and resubmit the an-

swer. If no compilation errors are detected, the system

goes to the next step, the solution assessment process,

as described in Sections 4.3 and 4.4.

4.3 Dynamic Analysis

QUIMERA dynamic analysis follows the traditional ap-

proach executing the compiled submission with a set

of predeﬁned tests. As referred, each test is composed

of an input data vector and the correspondent output.

For each test, the compiled answer is invoked (loaded

and executed) receiving the data vector as an argu-

ment; the output produced, as a result of this run, is

compared with the expected output. If they are strictly

the same values, this test is OK. The submission un-

der assessment is accepted only if all the tests pass.

In this context, the words solution, answer, or submis-

sion are alternative names to designate the program devel-

oped by the Competitor aimed at solving the problem (the

programming challenge) stated by the Teacher.

ICEIS2012-14thInternationalConferenceonEnterpriseInformationSystems

214

Figure 5: QUIMERA Teacher Interface — Assessment data for a submitted solution.

After the execution, the user can consult the per-

centage of tests successfully passed, as we can see in

Figure 6. The user only has access to the public tests,

never knowing the other tests in the set, neither how

many they are. This is useful to avoid situations of

trial and error for test set guessing.

As the execution of an external program is a crit-

ical task for any computer environment (malicious

code can damage the system), QUIMERA only com-

piles and executes the solution compiled if it complies

with strong constraints (consensual safety limits are

deﬁned). With the help of the static analysis, system

calls and suspicious instructions are blocked. By de-

fault the execution timeout is 3 seconds, but this can

be personalized at the problem creation moment.

In a typical competitive environment, a Team can

submit several solutions before one is accepted. Each

submission is ranked based on the following levels:

Didn’t Compile; Compile (when it fails all tests);

Timeout; Passed some tests; Solved, as depicted in

Figure 4. For ﬁnal grading purposes, QUIMERA con-

siders the best ranked answer. This allows the submis-

sion of several answers, developed by different mem-

bers of one Team, in order to get a better score. We be-

lieve this (not conventional) approach stimulates the

competition and privileges the competitive learning,

allowing also to improve the code quality.

4.4 Static Analysis

QUIMERA produces a complete report about the qual-

ity of a submitted solution (as depicted in Figure 5),

through the direct source code analysis based on a set

of predeﬁned metrics. Each metric represents a mea-

surement that contributes for the quality estimation,

working like an indicator for Teachers and Judges.

Currently, QUIMERA measures 56 different met-

rics, grouped by ﬁve classes: Size, Consistence, Legi-

bility, Complexity and Originality.

Size metrics are based on the lines code number

and are used to compare the size of a submitted so-

lution against the average size of all the submissions

of the problem. They represent a good indicator of

solutions that are unusually small or too big, compar-

atively to other solutions.

Consistence metrics work as an indicator of the

probability of execution errors occurrence due the use

of Dynamic Structures and Memory Management. In

a learning environment, a student can easily make

mistakes when implementing thise structures. We as-

sume that an overuse of thise structures will increase

the probability of execution exceptions and may affect

the answer ﬁnal score. For this we measure markers

like the percent of pointers use, the variance between

the allocated and released memory or the number of

returns in a function.

Legibility Metrics are associated to the easy of

source code comprehension. Markers like comment

density, average lines per comment, percent of goto,

break and continue usage can inﬂuence the source

code legibility and reuse. Therefor, they can be

used as an indicator of competitor good or bad prac-

tices and inﬂuence the ﬁnal grading. Other important

marker associated with reuse and legibility is the den-

sity of duplicated source code found. If a competitor

repeats several times the same piece of source code,

instead of calling them in a function, it decreases both

reuse and legibility.

Complexity calculation is done through a sim-

ple approach of the McCabe Cyclomatic Complex-

ity

, generally used to estimate a program complex-

For more details, please consult

www.literateprogramming.com/mccabe.pdf

ProgramAnalysisandEvaluationusingQuimera

215

Figure 6: QUIMERA Common Interface — Final assessment comparison.

ity based on the amount of linearly independent paths

through the source code, and computed based on its

control ﬂow graph. It is useful to estimate the com-

plexity of each submitted solution: less complex so-

lutions obtained better scores.

Originality is featured by an index which works

like an indicator of plagiarism situations, calculated

based on the similarities of the assessed solution and

the other submitted solutions.

After the computation, metrics and other static as-

sessment data are available through QUIMERA inter-

faces. An XML full report is also generated. The

details behind the implementation of this analysis are

introduced in Section 5.

4.5 Automatic Grading

After this assessment process, the Grader evaluates

the results of the source code and dynamic analy-

sis assessment, concerning a grading formula. This

grading formula is composed of seven different as-

sessment categories, which have associated different

weights in the ﬁnal grade. These categories are:

Execution — 75% of the ﬁnal grade is related to

the number of tests passed by the solution (on the dy-

namic analysis) and 5% of the ﬁnal grade comes from

the Execution Time;

Size — related to size metrics with 2% of ﬁnal

height;

Consistence — includes measurements related to

the probability of execution errors and represents 5%

of the ﬁnal grade;

Legibility — associated to Legibility metrics, with

a total weight of 2%;

Complexity — total impact of 6% in ﬁnal grade;

Cloning — represents the total percent of dupli-

cated code in the solution, with a weight of 5%.

Although these categories naturally emphasize on

the execution, these weights also take into account

the impact of the source code quality, which we be-

lieve that encourage Competitors to ﬁnd improved so-

lutions, to obtain better ﬁnal grades.

After this grading process, it is possible to access

the ﬁnal report obtained from the solution assessment,

which includes an overview of the applied metrics

and its values, the number of tests passed and the ﬁ-

nal grading summary (depicted on both sides of Fig-

ure 6). For a better comprehension of this summary

and the grading process, let us now introduce one ex-

ample of a contest where Competitors are invited to

solve several mathematical problems. The challenge

is to present a solution to calculate the n

ﬁbonacci

number, where n is a number asked to the user. In Fig-

ure 6, we are comparing two different solutions pro-

posed by two Teams: the Rationals on the left-hand

size, and Smarties on the right-hand size.

The ﬁrst solution, assessed by the system with

”Solved Some Tests” (as we can see in Figure 4), only

calculates correctly the ﬁrst and second numbers of

the sequence (0 and 1) and, consequently only passes

in 22% of the tests (2 in 9). This leads to a ﬁnal score

on the Execution ﬁeld of only 22.2%. In a closer look,

ICEIS2012-14thInternationalConferenceonEnterpriseInformationSystems

216

we can conclude that this answer is well documented

(it has 15 commented lines in 50 lines of code) and

its Legibility has a ﬁnal score of 93%. Its Complex-

ity score of 73% is owed to its number of used vari-

ables and data structures. Its 50 lines of code exceed

the average size of the contest submissions for this

problem, which penalizes its ﬁnal grade on the Di-

mension category (only 45.2%). In the Consistency

category, its 21.6% are owed to the use of several re-

turns through the code (assuming a maximum of two

returns per function as a reasonable limit to the source

code consistency) and also to the use of pointers.

The second one, assessed by the system with

”Solved” (as depicted in Figure 4), follows an itera-

tive strategy to implement the Fibonacci recurrence.

This answer is better documented (11 commented

lines in 36 lines of code), but its Legibility has a lower

ﬁnal score (88%) once it did not used deﬁnes, as the

other solution. Its Complexity score of 58% is owed

to the implemented loop. Its 36 lines of code improve

its ﬁnal grade on the Dimension category to 88.6%. In

the Consistency category, its weak 22% are also owed

to the use of several returns through the code.

Finally we can conclude that both solutions have

not been plagiarized (100% original) and Competitors

follow the good practice of no repeating their own

code (0% of duplicated code). These assessments led

to a ﬁnal grade of 29% on the ﬁrst case and a signif-

icant 89% on the second case. QUIMERA also com-

pletes this evaluation with radial charts to a quicker

and easier comparison of the solution performance in

the different grading categories.

4.6 Statistics

QUIMERA computes several statistics to enable the

monitoring of the contests, which depends on the ac-

tive interface and context. Statistics can be concerned

with different perspectives like: all or a particular

contest; all or a particular problem; all or a partic-

ular Team or Competitor (team member). Statistics

are presented in the form of different type of graph-

ics: piecharts (see Figure 2 and 3), columns charts

(see Figure 7), tables (see Figure 4), line charts, ra-

dial charts (see Figure 6) and scatter charts.

In more detail, we can say that our system offers

numerical data and listings concerned with: Competi-

tors and Teams rankings; comparative graphics be-

tween problem submissions; the full list of submis-

sions for each Competitor or Team; comparisons be-

tween Competitors inside a Team; comparisons be-

tween the Teams performance in a context; overviews

for each contest, its phases and summary information

about the current state of the problems and submis-

sions, among many others.

This amount of statistical information makes

QUIMERA a powerful tool for supporting the assess-

ment and grading of programming exercises solu-

tions, helping all kind of users, Administrator, Teach-

ers, Judges and Competitors.

5 QUIMERA IMPLEMENTATION

QUIMERA system follows a typical web-application

schema: a server-client framework connects users

to a server, where submissions and system data are

recorded. QUIMERA User Interface is rendered in

HTML 5, CSS 3 and Javascript, optimized for the last

generation browsers but also compatible with older

browsers without loosing any features.

The system runs on any PHP5.3-enabled platform

over HTTP and HTTPS protocols, and it is based on

the Symfony2

framework. The essential applica-

tion data is stored in a MySQL database, although

the data layer of Symfony2 (managed by Doctrine2

technology) allows to use any other database engine.

We also use the template manager Twig, embedded in

this framework, to render HTML5 templates from the

view layer.

To compile the submitted solutions, the system

uses the GCC compiler. For the static analysis, we

developed a plugin for Frama-C

, a framework dedi-

cated to the static analysis of source code written in C.

This powerful platform works as a front-end for our

plugin, once its kernel provides common functionali-

ties, libraries and collaborative data structures, which

simplifyies the development process and improves the

ﬁnal results. CIL

library provides methods to gen-

erate the Abstract Syntax Tree (AST) for the source

code and to traverse the tree. Our plugin analyzes the

AST for each submitted source ﬁle and evaluates a set

of metrics (described in Subsection 4.4) with a single

traversal, to obtain quantitative information about the

quality of the submitted source code.

To detect cloning, we use regular expressions,

string comparisons and sorting algorithms from PHP,

to detect duplicated lines of code in the submission.

In order to detect plagiarism, we adopted the

tool sherlock

, which makes comparisons based on

source code signatures, and compare them with re-

lated submitted code. This tool supports its compari-

son algorithm in a Originality Index that computes the

http://symfony.com

http://frama-c.com

C Intermediate Language http://cil.sourceforge.net/

http://sydney.edu.au/engineering/it/∼scilect/sherlock/

ProgramAnalysisandEvaluationusingQuimera

217

Figure 7: QUIMERA Teacher Interface — Team members grading.

ratio between similarities and differences detected in

each pair of ﬁles. This plugin produces a ﬁnal result

that contemplates all these evaluation parameters as

well as the ﬁnal grade assessed by the system, as de-

scribed in Subsection 4.5.

QUIMERA provides several statistical graphics re-

lated with a contest ﬂow (as described in Subsec-

tion 4.6), rendered with Google Chart Tools

6 CONCLUSIONS AND FUTURE

WORK

Along this paper we have introduced QUIMERA, a

web-based system that offers a simple and efﬁcient

way to create and manage online programming con-

tests. QUIMERA underlying philosophy, its features

and implementation were discussed.

After characterizing the area of programming

challenges and their automatic assessment and grad-

ing, we reviewed the state-of-the-art looking for com-

mon approaches, applications and similar systems.

The research done so far, summarized in the paper,

led to the identiﬁcation of a drawback in the available

systems; it has inspired the proposal of an improved

system and guided the design of its architecture.

In this direction, QUIMERA offers an automatic

assessment and grading of programming exercises

based on the information produced by their source

code analysis, combined with the traditional informa-

tion obtained from analysis of the output values gen-

erated for predeﬁned input data values.

As the system is completely implemented for C

programming language, we plan to make soon it avail-

http://code.google.com/intl/en-US/apis/chart/

able online for free use, with a complete user manual

and usage examples.

After this ﬁrst developing stage, we are willing

to test the tool with real users in different scenar-

ios, to study its usability and effectiveness. We will

prepare experimentations properly designed to assess

these two indicators.

As future work, we intend to increase QUIMERA

ﬂexibility enabling the development of new front-

ends to support other programming languages. To al-

low an easy support for new languages, we plan to

modify the tool in order to accept Language plugins.

Another relevant improvement of the dynamic

evaluation process, that will extend QUIMERA usabil-

ity, is to implement a semantic evaluation of the out-

put. This idea, that is a real challenge requiring much

more research work, requires the deﬁnition of a meta-

language to describe the expected output for a given

input data vector. Then, instead of doing a strict com-

parison between the outputs produced and expected,

the evaluator checks the validity of the output content

according to the formal description.

REFERENCES

Cheang, B., Kurnia, A., Lim, A., and Oon, W.-C. (2003).

On automated grading of programming assignments

in an academic institution. Comput. Educ., 41:121–

131.

Dani ´c, M., Rado ˇsevi ´c, D., and Orehova ˇcki, T. (2011). Eval-

uation of student programming assignments in online

environments. CECiiS: Central European Conference

on Information and Intelligent Systems.

Forsythe, G. E. and Wirth, N. (1965). Automatic grading

programs. Technical report, Stanford University.

ICEIS2012-14thInternationalConferenceonEnterpriseInformationSystems

218

Leal, J. P. (2003). Managing programming contests with

Mooshak. Software—Practice & Experience.

Leal, J. P. and Moreira, N. (1999). Automatic Grading of

Programming Exercises. page 383.

Leal, J. P. and Silva, F. (2008). Using Mooshak as a Com-

petitive Learning Tool. The 2008 Competitive Learn-

ing Symposium.

Patil, A. (2010). Automatic grading of programming as-

signments. Master’s projects, Department of Com-

puter Science, San Jos

e State University.

Rahman, K., Nordin, M., and Che, W. (2008). Automated

programming assessment using pseudocode compari-

son technique: Does it really work?

Tiantian, W., Xiaohong, S., Peijun, M., Yuying, W., and

Kuanquan, W. (2009). Autolep: An automated learn-

ing and examination system for programming and its

application in programming course. In First Interna-

tional Workshop on Education Technology and Com-

puter Science, USA.

Wang, T., Su, X., Ma, P., Wang, Y., and Wang, K. (2011).

Ability-training-oriented automated assessment in in-

troductory programming course. Comput. Educ.,

56:220–226.

Zamin, N., Mustapha, E. E., Sugathan, S. K., Mehat, M.,

and Anuar, E. (2006). Development of a web-based

automated grading system for programming assign-

ments using static analysis approach.

ProgramAnalysisandEvaluationusingQuimera

219