Teaching Assistants as Assessors: An Experience Based Narrative

Faizan Ahmed, Nacir Bouali, and Marcus Gerhold

Faculty of Electrical Engineering, Mathematics and Computer Science, University of Twente, Enschede, The Netherlands

Keywords:

Teaching Assistants, Grading Consistency, Grading Variation, Reliability.

Abstract:

This study explores the role of teaching assistants (TAs) as assessors in a university’s computer science pro-

gram. It examines the challenges and implications of TAs in grading, with a focus on their expertise and

grading consistency. The paper analyzes grading experiences in various exam settings and investigates the

impact on assessment quality. We adopt an empirical methodology and answer the research question by an-

alyzing the data from two exams. The chosen exams have similar learning objectives but they differ in how

TAs graded them, thus providing an opportunity to reﬂect on different grading styles. It concludes with

recommendations for enhancing TA grading effectiveness, emphasizing the need for detailed rubrics, training,

and monitoring to ensure fair and reliable assessment in higher education.

1 INTRODUCTION

The computer science program at our university has

seen massive growth in recent years. The change

posed a logistical challenge for the examiners and

called for hiring more lecturers. To provide educa-

tion at scale, support from teaching assistants (TAs)

became essential. Teaching assistants are hired for all

quarters

, but a majority of them are hired to help

with the ﬁrst-year courses since those have a higher

number of students, and also, the courses are not very

advanced.

The population groups of teaching assistants are

diverse in their education levels. The following is a

categorization of teaching assistants: a) Undergradu-

ate students (UTA): These are usually year two or year

three students. In most cases, they have followed the

same courses they are assisting. b) Graduate students

(GTA): These are master’s degree students. c) Ph.D.

Candidate (Ph.D.): Pursuing their Ph.D. and may not

have followed the same courses they are assisting in.

In most universities, Ph.D. candidates are also consid-

ered graduate teaching assistants (Wald and Harland,

2020). However, their job contract at our university

categorizes them as an employee. Therefore, we have

grouped them separately.

At our university, the academic year is organized into a

quarter system, comprising four quarters each lasting 10-11

weeks. The complete curriculum is detailed on our web-

site: https://www.utwente.nl/en/tcs/education-programme/

tcscurriculum/

Teaching assistants have various roles depending

on the context in which they are hired. Some of

their roles include tutoring, assisting the teacher in the

classroom, leading small student groups, preparing

assignments, auditing assignment descriptions, grad-

ing assignments, exams and projects, and providing

general administrative support. Kerry et al. have pro-

vided a typology of various roles for teaching assis-

tants (Kerry, 2005).

TAs are helpful in scaling the program size. To

maintain quality of education, their various roles must

be carefully evaluated and intervened by providing

necessary training and monitoring (Wald and Har-

land, 2020). Especially so when TAs are used to help

with assessment. The use of teaching assistants as

assessors in higher education can potentially impact

the quality of assessment in a number of ways. On

the plus side, teaching assistants might add new ideas

to the assessment process and may have a fresher

perspective on students’ needs and aptitudes. With

smaller class sizes and more time to spend on grading

and evaluation, they might also be able to provide stu-

dents feedback that is more personalized. However,

employing teaching assistants as assessors could have

certain disadvantages as well. As a result, the valid-

ity and impartiality of their evaluations may be im-

pacted. Teaching assistants might not have the same

amount of training or experience as full-time faculty

members. Additionally, they can lack the education or

assistance needed to accurately assess students’ work

and offer helpful feedback.

Ahmed, F., Bouali, N. and Gerhold, M.

Teaching Assistants as Assessors: An Experience Based Narrative.

DOI: 10.5220/0012624200003693

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 16th International Conference on Computer Supported Education (CSEDU 2024) - Volume 1, pages 115-123

ISBN: 978-989-758-697-2; ISSN: 2184-5026

115

It is of paramount importance to discuss and in-

vestigate the impact of using teaching assistants for

assessment. Therefore in this paper we share our

experience with respect to the question “How does

the deployment of teaching assistants as assessors im-

pact the assessment quality in higher education?”. We

also elaborate on the questions a) Which factors, such

as transparency, reliability, and validity, are most af-

fected by the deployment of teaching assistants as

assessors? b) What are good practices in deploying

teaching assistants for assessments?

In this paper, we focus on detailing experiences

and analysis of example cases to emphasize the need

for a rigorous scientiﬁc investigation of the problem.

Our focus is on two factors mainly: expertise and

consistency. The methodology includes the analy-

sis of grading data from exams, employing statistical

tests such as two-sample t-tests to assess grading con-

sistency between TAs, and visualising them via box

plots. The methodology also encompasses a review

of grading settings, including digital exam environ-

ments and grading parties, to understand how differ-

ent setups affect grading outcomes.

Paper Organization. In Section 2, we look at the

related work on TAs’ deployment as graders. In Sec-

tion 3 we provide the details of the exam chosen and

substantiate our choices. In Section 4 we brieﬂy dis-

cuss the digital exam environment we used and the

corresponding grading setup, while Section 5 pro-

vides an analysis to evaluate the consistency of the

grading and also the impact of a TA’s subject exper-

tise on the grades. Finally, we elaborate on our ex-

perience and give some recommendations on improv-

ing the TAs’ quality and consistency of the grading in

Section 6. Concluding remarks are given in Section 7.

2 BACKGROUND

Although teaching assistants have been used for a

very long time as graders in higher education (see

e.g. Svinicki (1989)), their role is not very well dis-

cussed in the literature. Rather, more emphasis is

given to the categorization of their various roles, eval-

uating their suitability for a TAship or to the TA hiring

practices and design of TA training (Liggett, 1986).

Another aspect that received interest is the effective-

ness of a TA as a teacher, a demonstrator, and a tutor,

as well as the use of the TAs to prepare and audit edu-

cational material (Minnes et al., 2018). In most cases,

various experiences or research works can be found

for the use of the TAs to evaluate laboratory exercises

or assignments (see, e.g., Alvarado et al. (2017); Pick-

ering and Kolks (1976)).

A systematic literature review on using teaching

assistants in computer science was provided by Mirza

et al. (2019). They have covered the literature on the

different roles of teaching assistants and other related

topics. It is clear from their paper that only a few re-

searchers have reported on the use of TAs as graders.

Although there is a shortage of literature on this

topic from computer science education, in other edu-

cation programs, researchers have shared their experi-

ences. For example, Liggett (1986) have reported on

evaluating the reliability of grading from the teach-

ing assistants for mechanical engineering. They com-

puted the reliability by using various grading settings.

Their primary purpose was to compute the effective-

ness of the TA training on the grading reliability. Sim-

ilarly, Marshman et al. (2018) have reported using

TAs to grade introductory physics courses. They re-

ported on the importance of rubrics to assess the stu-

dents’ work. They also emphasized on the need for

training of teaching assistants.

Suitability of TAs as graders is critically discussed

by Hogan and Norcross (2012) (see also Wald and

Harland (2020)) from a domain independent perspec-

tive. They have categorized assessments into two

types when it comes to deploying teaching assistants

as graders: 1. assessing factual information that re-

quires recall, and 2. assessing items that require in-

terpretative judgments. Instead of providing a clear

guideline they explain the advantages of using gradu-

ate teaching assistants for assessing the latter.

It is a common practice to use undergraduate TAs

to grade various assessments. They are primarily used

for assessing assignments. Dickson (2011) questions

the ethics of using UTAs for grading. They proceed

to describe their own experience with using UTAs in

grading qualitative assignments. Dickson argues that

since the undergraduate students have gone through

the exercises more recently than the experienced pro-

fessor, they can provide much more helpful feed-

back (Dickson, 2011).

Alvarado et al. (2017) also reported on the use

of TAs for grading the assignments in the context

of micro-classes. In a related study the authors re-

ported on the usage of undergraduate TAs for student-

facing activities while graduate TAs helped with grad-

ing (Minnes et al., 2018). Also, van Dam (2018) re-

ﬂected on using undergraduate TAs for grading an in-

troductory computer science course. The TAs were

also used to provide feedback on design choices us-

ing a detailed grading rubric.

Maintaining consistency in large courses where

multiple graders are involved is a challenging task.

CSEDU 2024 - 16th International Conference on Computer Supported Education

116

The differences can be attributed to various reasons.

Kates et al. (2022) have attributed the inconsistency

to interpersonal comparability.

In the context of TA training and mentor-

ing, Lanziner et al. (2017) have reported on the ex-

perience of TAs assuming a different role. Accord-

ing to the study, TAs ﬁnd their roles as graders more

challenging and appreciate the training and guidance

to perform their job as assessors. Similarly, Riese and

Kann (2020) presented TAs experience in their differ-

ent roles in computer science education. They also

emphasize the need for clear and concise grading cri-

teria to help TAs grade students’ work. Doe et al.

(2013) argue that providing a rubric is insufﬁcient for

consistent and effective grading. They also question

the accuracy, consistency, and effectiveness of grad-

ing from faculty members in the Psychology disci-

pline.

3 THE EXAM GRADING

EXPERIENCE

We have selected two exams that are part of the ﬁrst-

year courses. The selected course is rather subjective

and consists mostly of open questions. We selected

this course since grading other courses’ exams with

objective answers is deemed easy (Wald and Harland,

2020). Most questions required some level of inter-

pretive judgment. The course was designed for ﬁrst-

year computer science (BCS) and Business Informa-

tion Technology (BIT) students. The course was split

in 2020 due to administrative reasons. The learning

objectives are also adapted to cater better to the needs

of the respective study program. While the BCS pro-

gram (see Subsection 3.1) created a new examination

format, the BIT program kept the same exam (see

Subsection 3.2) structure.

We hire a number of TAs to help us grading as-

signments and the written exams for these courses.

At our institute, Ph.D. candidates are generally con-

sidered tutors/lecturers. Predominantly, our Graduate

Teaching Assistants (GTAs) in computer science are

alumni, mainly due to limited interest from other stu-

dents who are hesitant to undertake TA roles without

prior experience with speciﬁc assignments. Conse-

quently, our TA pool is largely composed of Under-

graduate Teaching Assistants (UTAs), who form the

majority of our grading TAs.

3.1 Exam A

The exam was written in the context of a software sys-

tem design course for students following a bachelor’s

degree in Computer Science (BCS). The learning ob-

jectives of the course are two-fold: 1. Students are

asked to specify existing small software systems in

the Uniﬁed Modelling Language (UML) before they

2. deﬁne a new software system in UML. Addition-

ally, 3. students are asked to identify and explain com-

mon phases in software engineering, and 4. evaluate

the code base of software systems by means of soft-

ware metrics and software smells.

In the exam students are provided small UML

models that describe a certain context, e.g., patients

booking an appointment with hospital doctors by us-

ing an IT-system, or lecture room allocation in uni-

versities via a scheduling system. These diagrams

contain both semantic and syntactic mistakes, and

students are asked to point them out and correct

them. Their proﬁciency in the phases of software en-

gineering and software metrics is assessed by open

questions, e.g., a question could be “What are the

consequences of missing the requirements elicitation

phase?”. Generally, it is challenging to give an all-

encompassing answer-key that covers all possible an-

swers. A challenge of equal proportion is to commu-

nicate the wide range of possible answers to the ca.

20 grading TAs. Below we summarize the exams of

the last three years to better illustrate our experiences

in grading with TAs since the redesign of the course

in 2020:

2020-2021. The exam had a total of 13 question com-

prising 100 points total. Two questions asked

students to spot syntactical and semantical issues

with provided UML diagrams. Additionally, one

question presented them a scenario accompanied

by three UML diagrams. Students were asked to

check consistency between each component and

with the scenario overall. The remainder were

open questions.

2021-2022. The exam had a total of 12 questions

comprising 60 points total. The reason for the

vast decrease in the number of points was twofold:

1. to give a better indication to students how ver-

bose their answer should be in open questions,

and 2. to give stricter indication to TAs when and

when not to give points. This was a conscious

choice because most TAs found some merit to the

answer of open questions. Decreasing the total

points was intended reduce that effect. In addi-

tion, we provided stronger numerical guidelines

in the tasks, e.g., “Name at least three advantages

[...]”, or “[...] brieﬂy describe in two to three sen-

tences.” Three questions provided UML models

to spot syntax and semantics mistakes. The re-

mainder were open questions.

Teaching Assistants as Assessors: An Experience Based Narrative

117

2022-2023. The exam had a total of 11 questions

that comprise 60 points total. There were three

questions that asked for UML syntax mistakes

to be corrected, while the remaining eight were

open questions. For the ﬁrst time we tested

the students’ proﬁciency in UML syntax draw-

ing by using a drawing tool provided by our uni-

versity’s e-assessment platform. The tool was a

generic drawing tool, not speciﬁed in UML syn-

tax and students had the chance to test this tool

in an ungraded mock exam. Hence, while provid-

ing means to assess UML proﬁciency, this ques-

tion introduced a new level of subjectivity, e.g.,

by comparing hand-drawn diamonds for aggre-

gation/association relations in class diagrams to

hand-drawn arrow-heads for generalisation rela-

tions.

In this paper, we have only included the statistical

analysis of the most recent exam.

3.2 Exam B

The exams were given in the context of a software

design course for students pursuing a Business and

Information Technology (BIT) bachelor degree. The

course covers two main axes, one on low-level design

using Uniﬁed Modelling Language (UML) as a nota-

tion, and another on software maintenance and met-

rics.

2020-2021. The students were given a case study and

were asked to provide the activity diagrams, the

use case model, the class diagram and the state

machine, as well as answering a question on soft-

ware complexity. The exam was then graded by

9 teaching assistants. Each two were responsible

for grading one diagram and one TA was assigned

the complexity part.

2021-2022. The number of diagrams was reduced to

three instead of four, to allow the students more

time to draw properly on a computer. The exam

was then graded by 4 TAs, each handling one dia-

gram. A decision that was made to make the grad-

ing consistent across each question.

2022-2023. The exam has seen a shift in its structure,

as we decided to reduce the number of diagrams

the students draw to one, the class diagram. The

remaining questions were shifted to providing stu-

dent with faulty designs and asking them to ﬁx

them. Every question is then assigned to one TA

for grading.

The side effect for the grading is that the answers

to the questions are now ﬁxed, so the teaching assis-

tants have a better grading key with which they can

compare the answers. This change was actually felt

during the review session, as most regrading requests

were concerned with the class diagram question, in

which the students had to design a class diagram from

scratch. Very few students have requested a regrading

of the other questions.

4 EXAM TOOL AND GRADING

SETTINGS

The students took each exam using a digital envi-

ronment. The digital environment provides function-

alities to ask multiple-choice and open-ended ques-

tions. It also provides functionalities to ask short

questions (like ﬁll in the blanks) that can be automat-

ically graded. For both programs the questions of the

exams are open-ended. In addition to its own (non-

speciﬁc) drawing tool, the digital environment also

allows for external drawing tools which, more or less,

enforce the UML meta-model, and allow the students

to draft their UML diagrams with ease.

It also provides the functionality to grade the stu-

dents work simultaneously. Course teachers can em-

bed grading rubrics within the questions, and stu-

dents’ names are anonymous to reduce bias. The dig-

ital environment also provides statistics related to the

graded exams, for example, pass rate, average grade

per question, per-question exam analysis, and more.

Through a log, grade changes and changers are trace-

able. However, this information is neither download-

able nor used to compute statistics. We have manu-

ally copied this information for analysis in the next

section.

The number of students in BCS and BIT programs

differ signiﬁcantly. Therefore, more TAs are hired to

grade the BCS exam than the BIT exam. These TAs

also helped during the lab sessions and have direct

contact with students.

4.1 Grading Party

The BCS exam used a grading party. Grading TAs and

teachers gathered in a room. The session starts with

an explanation of the exam and the grading rubric.

The teacher also explains the possible variations in the

answers. The grading work is then divided among the

TAs based on their preferences. Due to a large number

of students, at least two TAs are assigned to grade a

question. For questions requiring more time to grade,

more TAs were assigned. The following process was

adopted for the grading party:

1. TAs attend a grading session with lecturing staff

CSEDU 2024 - 16th International Conference on Computer Supported Education

118

present. They are presented the answer key and

can ask questions.

2. After the grading session, a faculty staff mem-

ber closely inspects a random sample of student

exams. Since the TA grading is done horizon-

tally (i.e., per question), then sampling student ex-

ams vertically (i.e., per student) provides a wide-

ranging overview, and

3. An exam review is arranged in which students can

ﬂag the potential misgrading that happened during

the grading session.

4.2 Grading Individually

The BIT exam uses the other approach. The teacher

embedded the rubric in the digital environment and

provided an extended explanation separately. The

TAs choose the part they like to grade, and the grad-

ing work is then assigned. In this case, the TAs

work individually, and the teacher monitors the grad-

ing progress remotely. Since the number of students

in this course is small, typically only one teaching

assistant grades each question. However, for certain

questions, multiple TAs are involved in the grading.

5 AN ANALYSIS OF EXAM

GRADES

Our analysis of the exam grades is focused on cap-

turing variations in the grades given by a TA. We fo-

cus on capturing inconsistencies in grades assigned

by different TAs. Also, we investigate the differences

in the grading patterns between UTA and GTA. Some

TAs have been working for the program for a longer

time. They were UTAs, and are now GTAs. We reﬂect

on their learning curve to grade exams in the discus-

sion section. Due to the similarity of courses, some

TAs worked for both programs (BCS and BIT).

5.1 Exam A

The BCS exam was graded by 22 TAs and three teach-

ers. Out of 22 TAs 15 were UTA while 7 were GTA.

After that, the questions were divided among TAs

based on their preferences. At least two TAs graded

each question (to emphasize, the set of students each

TA was grading was different). Some questions had

more tasks and required more verbose answers than

others. Consequently, more TAs are needed to grade

them to guarantee that the grading party can be ﬁn-

ished in a feasible time. As soon as a TA is done

Figure 1: Plot of score obtained by students for a single

question (blue box plot) versus their total score obtained in

the exam (orange box plot). Both scores are scaled between

0 and 1.

grading a question and still willing to do more grad-

ing, a new question is assigned to them.

Since we wanted to capture variation in grading,

we adopted the box plot. Box plots are handy for vi-

sualizing variations. We created one box plot for each

TA and combined a box plot for the same questions in

one ﬁgure (cf. Figure 1). TA names are removed to

maintain privacy. For the questions that are graded by

two TAs, we have also applied a two-sample t-test to

ﬁnd out whether the difference in mean grading is sta-

tistically signiﬁcant or not. Note that the test is only

suitable for pairwise comparison. Therefore, we have

applied it only to questions that, at most, two teaching

assistants grade.

The results of the tests show that the difference

in mean grades is not statistically signiﬁcant. We

conclude that the TAs graded questions consistently.

However, when a question is graded by more than two

TAs, the variation is higher. Since we cannot judge

signiﬁcance using a t-test, we opted to compare it with

the total grade. The implicit assumption is that if, on

average, a TAs is giving a lower (or higher grade re-

spectively) while the total grade has an opposite trend,

then the TA might be grading harshly (or generously

respectively). We emphasize that before drawing any

conclusions, a sample must be checked by the teach-

ers to verify the claim.

To illustrate, we provide one example. Figure 1

shows a box graph for one question since the ques-

tion has only 6 points while the total score is 10. To

compare both graphs, we scaled both between 0 and

1. As can be seen from Figure 1 UTA4 has assigned

lower grade for this question, while for the same stu-

dent population, their overall score is much higher.

Thus, this grade section requires further investigation

either by a teacher or another TA. Another interest-

Teaching Assistants as Assessors: An Experience Based Narrative

119

Figure 2: The plot shows grades obtained by students for

question 7 (blue box plot) and the total score obtained by a

student (orange box plot). Both scores are scaled between 0

and 1.

ing observation is for T3, which represents a teacher

with only one data point. It might be the case that

this question was a borderline case (i.e. the difference

for a student to pass or fail the exam), and the teacher

decided to review and regrade.

We also like to investigate the claim that GTAs are

better in grading exams since they have higher qual-

iﬁcation. We were not able to identify any signiﬁ-

cant difference for the questions shown in Figure 1.

Figure 2 shows grading for another question. It is

again an open-ended subjective question. Here, too,

no signiﬁcance difference is observed between GTA

and UTA.

5.2 Exam B

The teacher has a more personal approach to hiring

TAs and relies on a small team of experienced and

proven teaching assistants. Since the number of stu-

dents lies between 55-120 for this course, the grading

work is signiﬁcantly smaller than for the BCS course.

We analyze the exams for three years to capture the

grading practice for a TA.

For most of these exams, there is no signiﬁcant

difference in grading from different TAs. Figure 4

shows a box plot for a question that appeared in the

2021 exam. The question is related to the activity di-

agram and is accompanied by a detailed rubric that

converts an open-ended grading task into a binary

grading (see Figure 3). Besides the clear rubric, there

is a difference in grading from different TAs. TA18

gives lower grades, while TA 7 has higher variation in

their grading.

Throughout our exams, we have tried to assign

speciﬁc teaching assistants to be involved in grading

over the years. Through such practice, we have no-

ticed that a senior TA (one who has been grading the

same type of exam more than once) is more likely

to handle answers which are not covered by the pro-

vided rubric. Senior TAs are, however, more likely to

make mistakes while grading open questions, as their

judgement more than often falls short into capturing

the myriad of forms that the mistakes can take. A dif-

ferent problem is created by junior TAs who usually

stick to the provided rubric, as these feel more con-

strained to follow the grading key to the letter, sub-

sequently failing to capture the partially correct an-

swers.

6 DISCUSSION &

RECOMMENDATIONS

To guarantee fair yet feasible assessment, TAs have

become indispensable. Hence, consistency of grading

within the TA corpus is of crucial importance.

We propose that TA grading consistency by means

of statistical analysis becomes a default in this multi-

step process. Item analysis of exam questions is com-

mon practice. This holds true for easily-assessed

multiple-choice questions, as well as open questions.

Analysis involves critical characteristics such as Kro-

nbach’s alpha for multiple assessors grading one item,

the R

value relating the difﬁculty of individual ques-

tions to the overall exam, or the average score that

participants got per question commonly indicated by

′

Our university is utilizing a digital examination

system for most of its exams. This system collects

a vast amount of data, offering insights into grades

and the grading process. Although it provides various

exam-related statistics, it currently lacks the capabil-

ity to generate statistics about the graders. A recent

update allows examiners to review the grading history

for speciﬁc exam items, including who graded them

and any changes made, but this review is limited to an

individual question and student basis. For a compre-

hensive analysis, such as calculating inter and intra-

rater variation in grading, exporting this log would be

beneﬁcial. To enhance grading consistency, several

additional features are suggested for the digital exam-

ination system. These include:

• Tracking the percentage of exams graded by each

teacher to understand individual grading loads.

• Counting the number of personnel involved in

grading each question to ensure adequate cover-

age and diversity.

• Implementing visualization tools, such as bar

CSEDU 2024 - 16th International Conference on Computer Supported Education

120

Figure 3: Grading rubric for grading activity diagram.

Figure 4: The plot shows grades obtained by students for

the activity diagram question (blue box plot) and the total

score obtained by a student (orange box plot). Both scores

are scaled between 0 and 1.

graphs, for analyzing grading variation and iden-

tifying outliers.

• Calculating statistical signiﬁcance parameters to

evaluate variation in grading across different

graders and identify patterns of inconsistency.

• Monitoring the time spent grading each question

to pinpoint potentially problematic questions that

require an inordinate amount of grading time.

• Analyzing trends in grading over time to detect

deviations or inconsistencies in grading patterns.

• Establishing a feedback mechanism for teaching

assistants to report uncertainties or ambiguities

encountered during grading, facilitating clariﬁca-

tion and consistency.

TA evaluation should be standardized with simi-

lar scrutiny to prevent outliers. This step facilitates

the ease in which commonly re-occurring themes can

be checked: 1. Do UTAs grade too strictly whether

that be due to their own insecurities, or perception

that the questions were graded equally harshly when

they took the course? 2. Do GTAs grade too leniently

because they ﬁnd some merit in each answer? The

analysis shown in our work should become a main-

stay in e-assessment tools. It can support lecturing

staff twofold. First, live monitoring of these statistics

lets them address issues on-the-ﬂy. Secondly, a pos-

terior analysis aids in the sampling step for manual

inspection. Lastly, it helps in coaching and fostering

assessment proﬁciency in TAs, since it enables lectur-

ing staff to address harshness/leniency. This expedites

the growth of a network of TAs proﬁcient in fair and

fast grading, since they work in pairs and can learn

from each other. However, we do emphasize that this

should not be the only safeguard for quality, but rather

a supplement in a multi-faceted workﬂow.

Teaching Assistants as Assessors: An Experience Based Narrative

121

We believe that using TAs in grading can only be

reliable by forcing the students to participate in the re-

view. This allows the misgraded students to ﬂag their

cases and allows the teachers to regrade the concerned

copies ensuring a fair examination to everyone.

Based on our experience we recommend the fol-

lowing practices for deploying teaching assistants to

grade exams:

• If teaching assistants are used as assessors, it is

important that they are clear about their expecta-

tions and that the assessment process is transpar-

ent to students. This could be achieved through

providing clear grading rubrics, giving feedback

on assignments, and being available to answer

questions about the assessment process.

• For grading exams with many students, it is advis-

able to arrange a grading party where active dis-

cussion among TAs is supported and appreciated.

It is essential that the teaching staff also actively

participates and grades a portion of the exams. It

ensures a higher grading accuracy, provides a bet-

ter foundation for guidance to TAs and actively

promotes a deeper understanding of assessment

practices.

• To increase the consistency in grading, assign one

question per TA. The downside could be that the

TA grades either generously or too harshly – But

at the very least does so consistently. The risk

could be mitigated by teaching staff by looking

at the average grade per question. Extreme cases

are easily identiﬁable.

• The need for close monitoring cannot be empha-

sized enough. The teaching staff must look for

clues to intervene and adjust the grading practice.

They must grade the borderline cases. Further-

more, they must be creative in creating visualiza-

tions to capture variations in the grading and iden-

tify anomalies.

• Work towards developing a team of TAs that help

with grading. The team must consist of experi-

enced TAs (who have also helped grade the same

course in previous years) and junior TAs. We em-

phasize that including junior TAs is essential for

continuity. Pay extra attention to junior TAs and

be vigilant about senior TAs. A well-trained UTA

can grade more consistently and reliably than an

untrained GTA. Therefore, groom TAs so that

they can perform better.

• Motivate students to participate in the review and

explain the grading process to them for increased

transparency. We recommend that the teaching

staff must conduct the review and review must be

as accessible as possible for students. If mistakes

are spotted during review, audit all exams graded

by the TA for whom a mistake was spotted.

7 CONCLUSIONS

In the introduction, we posed the question, “How

does the deployment of teaching assistants as asses-

sors impact the assessment quality in higher educa-

tion?”. The paper was not meant to answer the ques-

tion but rather provide an experience-based narrative.

Our experience indicates that using TAs for grading

is a delicate process that leads to low grading quality

that can be signiﬁcantly improved by close monitor-

ing and intervention from the teaching staff. The ex-

periences and analysis described in this paper do not

provide a conclusive answer; instead, they emphasize

the need for a more carefully designed scientiﬁc study

to identify impacting factors and corresponding miti-

gation strategies.

REFERENCES

Alvarado, C., Minnes, M., and Porter, L. (2017). Micro-

classes: A structure for improving student experience

in large classes. In Proceedings of the 2017 ACM

SIGCSE Technical Symposium on Computer Science

Education, pages 21–26.

Dickson, P. E. (2011). Using undergraduate teaching assis-

tants in a small college environment. In Proceedings

of the 42nd ACM technical symposium on Computer

science education, pages 75–80.

Doe, S. R., Gingerich, K. J., and Richards, T. L. (2013).

An evaluation of grading and instructional feedback

skills of graduate teaching assistants in introductory

psychology. Teaching of Psychology, 40(4):274–280.

Hogan, T. P. and Norcross, J. C. (2012). Undergraduates

as teaching assistants. Effective college and university

teaching: Strategies and tactics for the new professo-

riate, page 197.

Kates, S., Paulsen, T., Yntiso, S., and Tucker, J. A. (2022).

Bridging the grade gap: Reducing assessment bias in

a multi-grader class. Political Analysis, page 1–9.

Kerry, T. (2005). Towards a typology for conceptualizing

the roles of teaching assistants. Educational Review,

57(3):373–384.

Lanziner, N., Smith, H., and Waller, D. (2017). Reﬂections

from teaching assistants in combined learning assis-

tant and course grader roles. Proceedings of the Cana-

dian Engineering Education Association (CEEA).

Liggett, S. L. (1986). Learning to grade papers.

Marshman, E., Sayer, R., Henderson, C., Yerushalmi, E.,

and Singh, C. (2018). The challenges of changing

teaching assistants’ grading practices: Requiring stu-

dents to show evidence of understanding. Canadian

Journal of Physics, 96(4):420–437.

CSEDU 2024 - 16th International Conference on Computer Supported Education

122

Minnes, M., Alvarado, C., and Porter, L. (2018).

Lightweight techniques to support students in large

classes. In Proceedings of the 49th ACM Technical

Symposium on Computer Science Education, pages

122–127.

Mirza, D., Conrad, P. T., Lloyd, C., Matni, Z., and Gatin,

A. (2019). Undergraduate teaching assistants in com-

puter science: a systematic literature review. In Pro-

ceedings of the 2019 ACM Conference on Interna-

tional Computing Education Research, pages 31–40.

Pickering, M. and Kolks, G. (1976). Can teaching assistants

grade lab technique? Journal of Chemical Education,

53(5):313.

Riese, E. and Kann, V. (2020). Teaching assistants’ expe-

riences of tutoring and assessing in computer science

education. In 2020 IEEE Frontiers in Education Con-

ference (FIE), pages 1–9.

Svinicki, M. D. (1989). The development of TAs: Prepar-

ing for the future while enhancing the present. New

directions for teaching and learning.

van Dam, A. (2018). Reﬂections on an introductory CS

course, CS15, at brown university. ACM Inroads,

9(4):58–62.

Wald, N. and Harland, T. (2020). Rethinking the teaching

roles and assessment responsibilities of student teach-

ing assistants. Journal of Further and Higher Educa-

tion, 44(1):43–53.

Teaching Assistants as Assessors: An Experience Based Narrative

123