The Multipurpose Autonomous Agent Project: Experiential Learning for

Engineering Assistive Artiﬁcial Intelligence

Chad Mello, James Maher and Troy Weingart

Department of Computer & Cyber Sciences, United States Air Force Academy, Colorado Springs, CO 80840, U.S.A.

Keywords:

Experiential Learning, Assistive Technologies, Artiﬁcial Intelligence, Machine Learning, STEM Education,

Engineering, Teamwork.

Abstract:

Assistive Technologies (AT) and Artiﬁcial Intelligence (AI) that support humans in decision making and in

difﬁcult or dangerous tasks are in high demand. We created a two-semester capstone project, for undergrad-

uate seniors, providing the opportunity to build an assistive AI algorithm implemented on a skid-steer rover

platform. By the end of the program, students created a system with the potential for assisting humans in

dangerous indoor situations such as: gas leaks, bomb threats, ﬁres, and active shooters. Our unique approach

allowed the skid-steer rovers to autonomously navigate indoor areas never before encountered or previously

mapped. Students used deep behavioral cloning techniques coupled with deep reinforcement learning to train

the rovers for speed, steering control, and cornering. Outﬁtted with nothing more than a depth-sensing optical

camera, an inexpensive autopilot, and an onboard, assistive NVIDIA Jetson Xavier NX computer, the rover

quickly scanned and oriented to a new environment and then located objects of interest. The students’ ﬁnal

product demonstrated impressive abilities and skills demanded by industry in developing AT and AI platforms

for mission-critical applications. Herein we share our approach, technology stack, experiences, and artifacts

produced by our students at the end of the project.

1 INTRODUCTION

Post-secondary Computer Science (CS) departments

are tasked with educating and preparing graduates to

ﬁll modern, evolving CS related jobs. Many CS grad-

uates are highly knowledgeable; however, (McGu-

nagle and Zizka, 2020) found that employers desire

problem-solving graduates who are able to work har-

moniously as part of a team. CS undergraduates rely

on their degrees as a foundation for beginning a career

in industry; however, even the best students experi-

ence a skills-gap when they work their ﬁrst job. CS

faculty members identiﬁed a desire to provide more

projects working real-world, industry problems to ad-

dress this skill-gap (Valstar et al., 2020). While there

is little difference in academic performance between

students who intern and those who do not, graduates

without industry experience often ﬁnd themselves less

employable because they lack practical experience,

good technical and interpersonal skills, and the abil-

ity to work effectively in teams (Kapoor and Gardner-

McCune, 2020). Recent studies, involving 536 multi-

institutional CS students, found that only 57.5% of

undergraduate CS students completed an internship

prior to graduating (Smith and Green, 2021; Kapoor

and Gardner-McCune, 2020).

Machine Learning (ML) shows potential for solv-

ing problems in many areas of science, medicine,

and engineering (Farjo and Sengupta, 2021; L

urig

et al., 2021; Azari et al., 2020; Rutherford, 2020;

von Lilienfeld and Burke, 2020; Akbilgic and Davis,

2019; Toole et al., 2019; Fraley and Cannady, 2017;

Trister et al., 2017); therefore, it is important to have a

portion of CS curriculum in higher education devoted

to ML. Yet, according to (Shapiro et al., 2018), ed-

ucational offerings in CS departments do not reﬂect

this reality. This paper presents a practical project for

developing teamwork and problem-solving skills that

are in high demand from industry (McGunagle and

Zizka, 2020).

Recently, we designed an applied, Deep Learning

(DL) capstone project to improve students’ knowl-

edge working with Imitation Learning (IL) and Rein-

forcement Learning (RL) in an open-world environ-

ment. The Multipurpose Autonomous Agent Project

(MAAP) is a year-long undergraduate program of-

fered to senior CS majors; it targets skills and insights

that are valuable to industry as well as the United

252

Mello, C., Maher, J. and Weingart, T.

The Multipurpose Autonomous Agent Project: Experiential Learning for Engineering Assistive Artiﬁcial Intelligence.

DOI: 10.5220/0011715500003470

In Proceedings of the 15th International Conference on Computer Supported Education (CSEDU 2023) - Volume 2, pages 252-263

ISBN: 978-989-758-641-5; ISSN: 2184-5026

 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)

Figure 1: The MAAP rover (center) in action. The left and right images are a snapshot of simultaneous streams of data that the

rover collects through its front-mounted camera and processes in real time. The left image represents a RGB stream while the

right side represents rasterized 3D distance information of the same scene. This data is pre-processed, stacked, and then fed

to the rover’s onboard neural networks. The rover then bases its actions on the policy network’s throttle and steering mixture

outputs as well as its object recognition outputs.

States Air Force Academy (USAFA) at the time of

this writing: Assistive Technologies (AT), Artiﬁcial

Intelligence (AI), DL, IL, RL, and Computer Vision

(CV).

MAAP offers students direct experience by task-

ing them with designing, programming, training, test-

ing, and deploying AT under the supervision of an ex-

perienced mentor. The project holds inherent chal-

lenges that a small team of students must identify and

overcome to be successful. The ﬁnal product com-

bines hardware, software, and advanced algorithms

into a cohesive, ready-to-use package. The result is an

autonomous, skid-steer rover built using Modiﬁable

Commercial off the Shelf (MOTS) parts and open-

source Application Programming Interfaces (APIs).

The MAAP rover may be deployed to an indoor

setting that has not been mapped or is otherwise pre-

viously known by the device. The rover is equipped

with a front-facing camera and a distance-sensing de-

vice, giving it the ability to traverse and explore an

environment by navigating hallways and rooms while

simultaneously scanning for an objective object. The

objective can be detected through infrared, gas, or

video sensors installed on the rover platform. Video

and data may be streamed in real-time to a console

app running on a tablet or laptop for real-time feed-

back. The ﬁnal project deliverable is a generic pack-

age ready for further use in other object detection sce-

narios in unmapped environments.

This paper makes three contributions: (1) we

present a Project Based Learning (PBL) framework

for use in a team-based undergraduate capstone, (2)

we identify the resources needed to replicate and/or

improve MAAP, and (3) we share our assessment of

the MAAP project’s effectiveness. The remainder of

this paper is structured as follows: Section 2 sum-

marizes related works to this project, Section 3 pro-

vides a description of the hardware and software plat-

form, Section 4 describes the course format, Section 5

describes student progression through the year-long

course, and Section 6 describes our conclusions and

proposed future work.

2 RELATED WORKS

MAAP provides CS majors with experience in ap-

plied ML, complex problem solving, and harmonious

teamwork. MAAP engages students with an interest-

ing set of problems that cannot be solved only with

the knowledge they possess at the beginning of the

program. Teaching students to make design decisions

for ML implementation is more difﬁcult than teach-

ing core ML concepts (Sulmont et al., 2019), and

MAAP focuses on experiential learning beyond core

ML concepts. Coincidentally, just as students learn

complex skills through observing expert demonstra-

tion, MAAP focuses on a similar concept for teaching

machines complex tasks and functions through Im-

itation Learning (IL). This section provides a back-

ground on IL and current teaching methods.

2.1 Imitation Learning

In environments where complex autonomous func-

tions are required, it is easier to teach desired behav-

The Multipurpose Autonomous Agent Project: Experiential Learning for Engineering Assistive Artiﬁcial Intelligence

253

ior through demonstration rather than attempting to

engineer it (Osa et al., 2018). IL reduces the problem

of teaching a task down to a human providing task

demonstrations and then recording how the human

performed this task. Equipping an agent with mod-

ern sensors gives it the ability to collect large amounts

of data to rapidly process, learn and create maps that

transform data into actions (Hussein et al., 2017a). IL

encompasses a group of algorithms designed to learn

from and mimic the behavior of humans or animals

exhibited under speciﬁc circumstances; these algo-

rithms discover a mapping between observations and

actions via a learning process (Hussein et al., 2017b).

Behavioral Cloning (BC) is a subset of IL popular-

ized by two papers published by NVIDIA that utilize

a Convolutional Neural Network (CNN) to go beyond

basic pattern recognition and learn the entire process-

ing pipeline needed to perform an action. NVIDIA

researchers published an initial paper (Bojarski et al.,

2016) and a followup paper (Bojarski et al., 2017) that

inspired the creation of an online Udacity project cen-

tered around a training car to drive itself in a simulator

(Dominique Luna, 2021). Figure 2 shows the trained

model driving the car in the Udacity simulator.

2.2 A Simple Introduction via Udacity

The Udacity demonstration simulates a car with rack

& pinion steering traversing a consistent and clearly

marked race track. The simulator collects and pro-

cesses visual data related to steering input, provided

by a human driver, as the car is driven around the

track. A CNN is built to accept individual, pre-

processed video frames, generated by the simulator’s

scenery, as input and then predicts the steering angle

as output. When training the CNN, the steering angle

is provided as labeled output and the sensor collected

imagery data provides the inputs.

The Udacity simulator introduces students to the

concept of an autonomous pipeline, i.e., sensor in-

puts are utilized to train a model and generate a cyber-

physical output; however, the Udacity simulator can-

not perform outside of a very speciﬁc environment.

The simulator provides students an understanding of

the problem and an elementary approach to solve it.

The autonomous steering demonstration in a simula-

tor cannot address non-determinism in the physical

world. In addition, the CNN trained in the Udac-

ity simulator lacks the ability to predict speed control

and object avoidance, i.e., the learned policy will not

translate from the simulator to the real-world.

Figure 2: Car driven by trained CNN through Track 1 in the

Udacity simulator.

3 MAAP OVERVIEW

The MAAP capstone is a 6-credit hour program. The

program centers on integrating ML with autonomous

rovers to provide applied solutions to real-world, open

set problems.

3.1 Product Requirements

The MAAP product students are required to deliver at

the end of the program is a physical skid-steer rover

capable of traversing indoor environments with au-

tonomous steering and throttle control. The rover will

ﬁnd its way out of dead ends, smoothly turn corners,

minimize contact with walls and stationary objects,

tread carefully around people and pedestrians, accel-

erate smoothly when there are no objects in its line

of sight (conversely slow its speed according to its

proximity to objects in front of it), avoid moving ob-

jects, turn away from or even reverse itself to avoid

oncoming objects, and perform scanning maneuvers

(360 degree, in-place turns to scan for objectives).

In addition, the platform utilizes computer vision and

object recognition using mounted sensors to execute

searches for speciﬁc objectives deﬁned in the course.

The rover performs these actions in the real-world,

avoiding the limitations that result from determinis-

tic learning environments. Ideally the ﬁnal product

would utilize some form of spatial mapping, e.g., Vi-

sual Simultaneous Location and Mapping (VSLAM)

to avoid exploring areas previously searched; how-

ever, due to the limited student time, VSLAM tech-

niques were not covered or required.

3.2 Hardware Choices

The students were not involved in hardware decisions;

hardware was supplied at the beginning of the cap-

stone project. We took into account budget, safety

CSEDU 2023 - 15th International Conference on Computer Supported Education

254

concerns, and feasibility when considering the hard-

ware choices for this program. Smaller 1/10 scale

RC cars are typically used for similar projects and

are overall less expensive. We desired the maneu-

verability of larger four-wheel drive skid-steer rovers

as well as the simpler and more rugged build. We

purchased all-metal, fully-assembled, mid-sized four-

wheel skid-steer platforms equipped with Pixhawk

Cube Orange (or Black) autopilots (Ardupilot, 2022).

A number of reasons went into our decision for pur-

chasing these machines:

• We wanted to afford students the opportunity to

train and predict ML and AI models with the

same hardware which rides on the rover without

transferring trained models from a more power-

ful workstation. The autopilot, NVIDIA GPUs, 1

TB of hard-drive data storage, cameras, distance

ﬁnders, etc. are all onboard the rover. Conduct-

ing all ML and AI model work on the same hard-

ware provided a more efﬁcient work environment

as students improved their models.

• We desired to have all AI and ML algorithms to

train and predict on the rover, without the need for

a remote computer connection. This signiﬁcantly

reduces latency with data streaming, image pro-

cessing, and command execution. RC transceivers

were on hand at all times to override rover func-

tions in cases where code might create an undesir-

able situation.

• We wanted to lessen situations where rovers could

become stuck when presented with rough, bumpy

or cluttered terrain. We chose large tires and in-

dependent electric motors with adequate torque

so that rovers would be capable of rolling over

rocks, boards, nails, and other debris that could

be present.

• Our rover units are capable of carrying larger sen-

sors, computers and onboard batteries. In future

renditions of MAAP, we plan to add more sensors

and other equipment to these devices. These plat-

forms are more accommodating via their larger

footprint and payload capacity.

We purchased a long-range PowerBox Systems

Radio Core RC System (PowerBox Systems, 2022)

for controlling the rovers during data collection. In

addition, each rover is equipped with an Intel® Re-

alSense™ D455 depth-sensing camera (Intel, 2021),

the NVIDIA® Jetson Xavier™ NX for accelerated AI

execution (NVIDIA, 2022). The NVIDIA® Jetson

Xavier™ NX contains a 6-core ARM CPU, 384 GPU

cores, 8 GB RAM, and a 1 TB SSD. See Figures 3

and 4 for proﬁle and internal details.

Figure 3: Three rovers used in the MAAP capstone. In ad-

dition to the standard RC radio, autopilot, and GPS systems,

we attached an Intel® RealSense™ D455 depth-sensing

camera.

Figure 4: The MAAP rover internals include an indepen-

dent assistive computer located under the autopilot cube.

3.3 Software and Tools

We relied heavily on cross-platform software for

MAAP. This was done out of necessity. All of the

students at our institution are issued Windows laptops

during their Freshman year. Consequently, we wanted

to ensure that the software solutions would run on

their systems as well as with our hardware–much of

which utilizes Linux-based operating systems.

Students are required to use several pieces of free,

cross-platform software while developing code for

the rovers. Python, i.e., the Python 3.8 interpreter

(Sanner et al., 1999), was chosen for the program-

ming language. PyCharm Community Edition (Jet-

Brains, 2022) was chosen for the Python develop-

ment environment. PyCharm Community Edition is a

free, robust development environment that offers easy

virtual environment creation, debugging and syntax

The Multipurpose Autonomous Agent Project: Experiential Learning for Engineering Assistive Artiﬁcial Intelligence

255

checking. To communicate with drone hardware via

the MavLink protocol, we utilize two drone-related

Python libraries: (1) Pymavlink and (2) DroneKit.

Machine learning and computer vision are facilitated

by TensorFlow, Keras, and Open-CV. Finally, we in-

clude the PyRealSense Python library for interfac-

ing with the Intel® RealSense™ D455 depth-sensing

camera. All of the Python packages listed in this

paragraph can be downloaded from the PyPi pack-

age repository (Python Software Foundation, 2022)

and installed in the students’ Python environment us-

ing the pip command. Note that there was no need

or requirement for simulator software or Software in

the Loop (SITL). All work can be performed on the

actual hardware in the physical environment.

4 PROGRAM PROGRESSION

The MAAP capstone course was organized into four-

week sprints. Class time for capstone work averages

5 hours per week with an instructor present and an ad-

ditional 8 hours of non-instructor work time per week.

At the start of each sprint, the capstone team and in-

structor plan the sprint by considering what should be

accomplished in the four week period. Scrum meet-

ings occur at the beginning each week and last 30-45

minutes. The work environment is designated to en-

courage open ideas and rapid prototyping.

Periodic gate checks are conducted by the instruc-

tor to ensure work is progressing in a timely fash-

ion. The instructor may conduct hands-on work-

shops to help students apply theoretical knowledge to

the physical hardware environment. To form a com-

plete picture of an end-to-end solution, each major

sprint concludes with a demonstration where students

must present a working prototype. Twice during the

semester the students must present their work to se-

lected faculty members outside of the project and the

team is graded by an uninvolved reviewer.

MAAP combines several complex topics, and the

instructor provides the essentials necessary to com-

plete sprint tasks. It is important to consider the stu-

dents’ ability to learn and implement new material,

without overwhelming the students in the four-week

sprint period. For example, if it becomes obvious that

the team is struggling with identifying, evaluating,

and narrowing potential CNN models for use in policy

learning, the instructor may guide the team towards

the models that are likely to bring the biggest beneﬁt

to the project. An important outcome for this course

is to produce graduates who have both knowledge and

implementation skills, yet the instructor must main-

tain cognizance of the project’s direction and guide

the team away from ideas that could derail it.

4.1 Provided References

We chose to provide trade books and current papers as

a reference to the students. The advantages to using

trade books over traditional textbooks are: (1) quality

publications provide readers with robust theory while

ﬁltering all but the essentials of how to implement

the theory, (2) trade books are often considered more

readable, making examples and instruction more ac-

cessible to undergraduate students, and (3) students

who go on to become professionals are likely to turn

to trade books when working in industry (Schultz,

2014), (Smolkin et al., 2013). We chose several texts

for the course:

• Hands-on Machine Learning with Scikit-Learn,

Keras, and TensorFlow 2nd Edition (G

eron, 2019)

A G

eron - 2019; ISBN-13: 978-1492032649

• Deep Reinforcement Learning Hands-On:... (La-

pan, 2020)

M Lapan - 2020; 978-1838826994

• Reinforcement Learning: Industrial Applications

of Intelligent Agents (Winder, 2020)

P Winder - 2020; 978-1098114831

• Hands-On Computer Vision with TensorFlow 2:

Leverage deep learning to create powerful image

processing apps with TensorFlow 2.0 and Keras

(Planch, 2019)

B Planch - 2019; 978-1788830645

The Papers We Chose for the Students to Review

Were:

• End to end learning for self-driving cars (Bojarski

et al., 2016)

• Explaining how a deep neural network trained

with end-to-end learning steers a car (Bojarski

et al., 2017)

• Imitation Learning: A Survey of Learning Meth-

ods (Hussein et al., 2017b)

• An Algorithmic Perspective on Imitation Learn-

ing (Osa et al., 2018)

• Evaluation of pre-training methods for deep rein-

forcement learning (Larsson, 2018)

• Learning to drive by imitation: An overview of

deep behavior cloning methods (Ly and Akhlouﬁ,

2020)

• Deep reinforcement learning for autonomous

driving: A survey (Kiran et al., 2021)

• Mobile robots exploration through CNN-based re-

inforcement learning (Tai and Liu, 2016)

CSEDU 2023 - 15th International Conference on Computer Supported Education

256

• Recent advances in imitation learning from obser-

vation (Torabi et al., 2019)

4.2 Bootstrapping: Code and Libraries

MAAP requires students to write a signiﬁcant amount

of Python code. We provide starter code as well as

Python virtual environment setup instructions to pro-

vide a starting point. The following are the items we

provide to the students:

• Custom Python libraries containing common

functions for communicating with our rovers,

along with commonly-used commands and pro-

cesses,

• skeleton code that provides a working Main func-

tion and stubs that students are required to com-

plete,

• coding standards and naming conventions, and

• an existing Github repository where are code and

documentation are stored

These items provide structure, allowing students

to quickly focus on the major tasks at hand. Students

have an opportunity to explore the initial project code

as well as some functional examples that they may

modify to ﬁt their needs.

4.3 Mini Lectures & Labs

The instructor provides theory and industry experi-

ence for the team through additional labs. During

the program, we provided several lessons that are de-

signed to closely mimic a given sprint’s upcoming

tasks (see section 5 for breakdown of the labs we

provided to the students). We cover topics such as

DroneKit APIs, Open-CV, image processing, Tensor-

Flow, Keras, data collection, data preparation, deep

CNN models, deep RL, model performance analyt-

ics, and model ﬁne-tuning techniques. Most lectures

include some form of sample code in Jupyter Note-

books (Kluyver et al., 2016) that students may use as

starter code for their tasks.

4.4 Team Assessment

At the end of each sprint, students are graded as a sin-

gle unit. The instructor reviews and assesses sprint

artifacts, demonstrations, and assessments from out-

side faculty reviewers. The artifacts, demos, and out-

side assessments are weighted and combined to yield

a ﬁnal grade. The team is encouraged to be bold and

imaginative; however, members are also responsible

for recognizing when they are at an impasse and when

they should consult their instructor. The instructor al-

lows the students to wrestle with team conﬂicts, bad

ideas and some failures, but keeps a close eye out for

when those things begin to compromise the overall

success of the project. Typically, by the end of the ﬁrst

two sprints, the team will have learned each member’s

strengths and weaknesses and will begin to demon-

strate improved efﬁciency as a result.

5 SPRINT BREAKDOWN

The entire project lasts for eight, four-week sprints.

It is critical that the team make progress during every

sprint in order to meet the requirements by the end

of the program. In the following sections we outline

the major events, instructor interventions, and results

from the sprints at our institute.

5.1 Sprint One: Orientation

The team was introduced to the project, its require-

ments, high level concepts, source repository, devel-

opment tools, and reading material (see section 4.1).

The main objective is for each student to learn and

understand IL theory. Students already had exposure

to AI and ML as well as RL, and this prior knowl-

edge served as a base upon which to build their un-

derstand of IL and behavioral cloning. Artifacts from

this sprint consisted of a report that summarizes the

team’s understanding of the project requirements, toy

code similar to the Udacity demonstrations (discussed

in 2.2) to demonstrate an understanding of a sim-

pliﬁed example of behavioral cloning by each team

member.

5.2 Sprint Two: Data Collection Utility

and CNN Cloning Design

The team learned the essentials of data collection

from the previous sprint. For sprint two, they cre-

ated a program that collects data from the rover hard-

ware stack. This includes writing code that interfaces

with the rover’s autopilot and the RealSense camera

streams. Steering and throttle data from the rover was

captured and matched with imagery and depth data

from the Intel Realsense camera. This data was stored

and then later used to train a CNN model. The team

was free to use the starter code we provide to interface

with the camera sensors and the rover’s autopilot (see

section 4.2).

The data collection program (i.e., the recorder)

works by waiting until the rover is armed via the re-

mote RC handset. Once armed, the program begins

The Multipurpose Autonomous Agent Project: Experiential Learning for Engineering Assistive Artiﬁcial Intelligence

257

data collection. The human operator may collect as

little or as much data as desired. Human operated

driving sessions lasted anywhere from 1 to 15 min-

utes. When the rover was disarmed via the RC hand-

set, the dataset was closed. A dataset consists of a

video ﬁle that contained both RGB and distance mea-

surement streams, a data ﬁle, and a log ﬁle. The

data ﬁle temporally matched steering, throttle, ground

speed, and heading information with each frame in

the video ﬁle. Note that only steering and throttle are

currently used in MAAP, leaving the other variables

available for future use with more complex models.

The log ﬁle recorded autopilot status, recording time,

sensor status, and any exceptions that might occur

during the session.

Simultaneously, the team began training the initial

CNN model for behavioral cloning. The instructor di-

rected the team to begin with a modiﬁed version of the

CNN model introduced in (Bojarski et al., 2017). The

team built this model using Keras with TensorFlow as

the back end. In this project, the team created a more

complex model than what is presented in the Bojarski

paper. The key differences are as follows:

• Input: the original model accepts three frames

from center, right, and left cameras at the same

point in time, with steering angle as a label during

training. Our model now accepts a single RGB

frame from a centered camera stacked with pixel-

aligned gray-scale depth sensor data, using linear

throttle and steering numbers as labels. The re-

sultant CNN is now a multi-label model that takes

throttle into account as well as steering.

• Output: the original model renders a single value

representing steering angle, while our modiﬁed

regression model now provides predictions for

both steering and throttle control.

Artifacts from this sprint included (1) a utility pro-

gram to collect training data, from the rover, for use

in training the CNN cloning model, and (2) an initial

CNN ready to train with data. The data collection and

model training started in the following sprint.

5.3 Sprint Three: Training Data and

CNN Training and Experimentation

The team began data collection, utilizing the utility

program developed in Sprint 2, as well as trained their

initial CNN model. Data collection was time con-

suming. We chose to collect driving data from two

of six ﬂoors in a large building, reserving the remain-

ing ﬂoors for performance evaluation. To introduce

diversity into the training data, each student on the

team was tasked with collecting a total of two hours

of driving data during this sprint. Students used the

utility program to record driving maneuvers as they

manually guide the rover around the building.

5.3.1 Driving Sessions

Several types of driving sessions were collected at

various times of the day to account for natural and ar-

tiﬁcial light, light glare, artiﬁcial and overhead lights

as well as low-light conditions. We identiﬁed four

categories of driving sessions for which we collected

data:

1. Smooth forward driving in empty, clear hallways

around the building in clock-wise and counter-

clockwise fashion. The rover was kept more or

less to the right side as it traversed the hallways

and around corners. Some sessions had the throt-

tle locked between a low and high setting. Throt-

tle and steering are varied, but driving was per-

formed with relatively smooth steering and throt-

tle input, i.e., no erratic maneuvers. Throttle was

generally increased when driving in a straight line

with no objects in sight; throttle was lowered

when coming close to a wall or corner at the end

of a hallway.

2. Object avoidance sessions that involved driving

around various inanimate/static objects and ob-

stacles that are set in random areas throughout

the hallways. Obstacles might be tables, chairs,

boxes, trashcans, maintenance equipment, doors

that open out into the hallways, litter, and station-

ary people who are either standing or sitting at

different angles. The path of least resistance is

taken, but in the case where an object might be in

the center of the hallway, the drivers were asked

to vary left and right turns when avoiding those

objects so as to avoid introducing a right-hand or

left-hand bias. The drivers were also asked to

vary when they begin maneuvering to avoid an

object. In some instances a driver began maneu-

vering well ahead of time while in other instances

the driver maneuvered at the last moment to avoid

an object. Throttle varied, but it always decreased

when approaching an object.

3. Object avoidance sessions involving moving ob-

jects such as people, carts, trash cans, and rolling

chairs. Throttle and steering were utilized to avoid

colliding with moving objects. In the case where

a person or object is moving towards the rover, the

driver would decrease throttle and change course.

If the object also corrects and continues to ap-

proach the rover, the driver would come to a stop

or even reverse the throttle to begin a backwards

movement. Since there were no rear sensors,

CSEDU 2023 - 15th International Conference on Computer Supported Education

258

backup throttle was only used for a short duration

while the driver attempted to take the ﬁrst forward

path avoiding the object altogether.

4. Entering adjacent rooms where entrance is fea-

sible. Drivers turned into the closest doorway

where the door was open and the path was clear.

Throttle was reduced considerably as the rover

was driven past the threshold and further into the

room. Gradually the throttle increased according

to the condition of the room (i.e. how much clutter

is present). Rovers were driven around and under

desks and chairs and around any other obstacles

present in the room. Once the room was traversed,

the driver maneuvered the rover back into the hall-

way.

All sessions were stored and labeled according to

the driving session types mentioned above. All orig-

inal log and label ﬁles were retained along with the

full-resolution sensor and video streaming in rosbag

format (a Robot Operating System (ROS) ﬁle format

for storing ROS message data).

5.3.2 CNN Training

Training began before the entire data collection pro-

cess was complete. Small subsets of driving session

type 1 data (see Section 5.3.1) was used to begin ini-

tial CNN training near the beginning of Sprint Three.

The training data was further prepared by creating and

running a special utility program that rips individual

frames from the video streams, resamples and repro-

cessing the images while also labeling the frames with

steering and throttle numbers from the matching data

ﬁle. In addition, an inline data generator was also cre-

ated to wrap the raw data. The data generator was

used to randomize and split data into training, valida-

tion, and test splits as well as to feed mini batches to

the model upon request as it trains.

The training scripts saved the model only if the

validation loss improved over the previous epoch. Ini-

tial training began over a maximum of 50 epochs,

with performance ﬂattening between 17 and 20

epochs. The training script was designed to resume

training as new data arrives by de-serializing previ-

ously trained best model.

Finally, a control program was constructed to read

camera feeds in real-time while feeding them into

the newly trained CNN. The raw CNN output was

de-normalized and passed directly into the autopilot

channel feeds for throttle and steering, overriding nor-

mal function. This program ran in a continuous loop

at 15 frames per second, constantly plugging in the

cloning model’s actions to the rover’s autopilot.

Artifacts included a demonstration of a rover that

can somewhat navigate throughout a single hallway

(but not very well). Even though the rover could navi-

gate simple hallways and around corners 30%-50% of

the time, it is an exciting time as the team observes the

rover “magically” roaming the hallways and largely

avoiding walls and objects.

5.4 Sprint Four: Model

Experimentation and Shortcomings

The team was guided towards the discovery of several

major improvements to the training process. First,

the numerical range for both steering and throttle is

between 1000 and 2000 with 1500 representing the

“center” or neutral. Anything less than 1500 repre-

sents a value towards a left-hand turn and anything

over 1500 represents a value towards a right-hand turn

when for steering input. Similarly, for throttle inputs,

anything under 1500 represents backwards movement

and anything over 1500 represents forward motion.

Normalizing these variables using a min–max nor-

malization limits the range between 0 and 1. This

helped with decreasing the time it took for the CNN

to converge during training. Furthermore, the learn-

ing curves were smoother and showed less “bounc-

ing” than before applying normalization. In addition,

the team discovered that training using small, ran-

domized frame sequences of 13 frames per sequence

markedly improved overall performance; the rover’s

steering and throttle stability improved along with its

ability to smoothly round corners. Another discov-

ery of note: experimentation provided evidence that

randomized sequences using a CNN may yield bet-

ter performance over more complex Long Short Term

Memory (LSTM) models.

The team experimented with various deep models

until they discovered models that demonstrated poten-

tial for good performance. The artifacts for this sprint

were: (1) a discussion of model performance and (2)

demonstrated improvement using a behavioral clone

model. The rover was better able to negotiate cor-

ners, showing marked all-around improvements. In

fact, the rover was able get out of most dead end sit-

uations, navigate around most objects, and even slow

down or stop around people, mimicking extra caution

around people and pedestrians. The performance was

impressive as well as inspiring to the students. How-

ever, the team soon discovered that a blind cloning

effort can only take performance so far before weak-

ness begins to show.

The Multipurpose Autonomous Agent Project: Experiential Learning for Engineering Assistive Artiﬁcial Intelligence

259

5.5 Sprint Five: Improved Performance

and Need for Reinforcement

Learning

Despite the impressive performance gains in the pre-

vious sprint, the team began to understand that a sin-

gle CNN cloning model simply could not provide the

functionality MAAP requires. This is due to model’s

overly simple action (i.e., output) policy. The model

was only capable of learning a distribution of actions

over a ﬁnite training set. The model had no way

(except through inherent bias in the training data) of

learning when it was better to turn left vs turning

right, or when it is more beneﬁcial to back up vs doing

an in-place 360 (remember, this is a skid steer rover).

This is where behavioral cloning alone begins to show

its limitations.

The instructor discussed concepts related to trans-

fer learning (domain adaptation) coupled with re-

inforcement learning models best suited for learn-

ing policies over a continuous action space. We

pointed the team towards Actor-Critic models, specif-

ically Advantage Actor-Critic (A2C), coupling it with

the now much-improved, pre-trained Bojarski CNN

model the students had been developing up to this

point. This opened the door to a host of potential per-

formance gains. The idea came from several sources

we provided as reference material for this project:

Deep Reinforcement Learning Hands-On; chapters 12

and 17 (Lapan, 2020), Reinforcement Learning: In-

dustrial Applications of Intelligent Agents; chapters

7 and 8 (Winder, 2020), and the paper Evaluation of

pre-training methods for deep reinforcement learning

(Larsson, 2018).

This sprint marked the most critical point in the

project, because it required a deeper understanding of

RL and IL as well as advanced skills with TensorFlow.

The instructor closely monitored and advised the team

through the initial approach and programming efforts.

The resulting model was an A2C architecture where

the actor and critic shared the same modiﬁed Bojarski

CNN. The discounted rewards were based on a sim-

ple calculation involving the distance of objects in the

ﬁeld of view. A 2D Gaussian matrix was applied to

the depth sensor pixel values from the camera, creat-

ing a gradient scoring system where pixels of objects

in the periphery were given a better score over objects

in the forward center of view (objects in the center of

view were actually penalized more). This scoring sys-

tem was designed to teach the model to always look

for the path of least resistance (i.e. facing where fewer

objects are in the way). This created a policy whereby

the model favors actions that result in the clearest path

ahead (see Figure 5). The maximum movement was

restricted so that the rover would not be turned com-

pletely around to avoid taking a valid path ahead of it

in favor of a better path behind it (i.e. a path it most

likely already traversed). The artifacts were the re-

lated Python code that contained the new A2C model

and the code required to train, test, and analyze the

new model.

5.6 Sprint Six: Continued RL Work

Eventually a model emerged that would normally re-

quire thousands of training sessions using a virtual

environment before it would be able to do anything

useful. However, the team came up with a way to

pre-train the new A2C model using training data that

was collected during Sprint Two. The training was

not optimal because the model was not allowed to

explore during these training sessions; however, the

training was adequate for a bootstrap where trans-

fer learning techniques could be used to optimize the

model over additional incremental training sessions.

An additional Python script was created to facilitate

the rapid incremental training of the model. The pro-

gram waited until the rover was armed before starting

a new training session, and training required a human

user to operate the rover. Once the training session be-

gan, the rover would begin traversing the interior from

the location from where it was armed, using its new

A2C model. Anytime the rover makes contact with a

wall or object the user simply ﬂips the RC hand con-

trol switch to disarm the rover, thus ending the ses-

sion. The human user could then direct the program

to immediately start training a new session by ﬂipping

a switch on the RC controller. This vastly improved

the speed of data collection.

The results were encouraging. The rover was

able to traverse busy hallways and rooms right out

of the box in areas the model had never visited or

trained over beforehand. Incremental training was

time consuming, but incremental improvements took

relatively little time to notice. We did not have a vir-

tual environment to train the model over thousands

or even millions of sessions. This remains an open

agenda item for future iterations of MAAP.

5.7 Sprints Seven & Eight: Mission

Deployment

The A2C model could use more tweaking and train-

ing; however, there was no more time available to

continue that effort. By this time we addressed the

project’s object recognition objectives by implement-

ing bolt-on object recognition models. The team

settled on a pre-trained object recognition model

CSEDU 2023 - 15th International Conference on Computer Supported Education

260

(a) Original image (b) Heat map of original image (c) 2D Gaussian matrix (d) Result of inverse scaling

b with c

Figure 5: The reward system for our A2C model consists of penalizing the closest objects as they approach the forward center

of view, subtracting those points from the maximum points in the image. This is accomplished by applying an inverse gradient

scale using a multivariate (2D) normal distribution against the distance value for each pixel as measure by the distance sensor

in our camera.

Figure 6: The resultant A2C model incorporating the pre-

trained behavioral clone model.

they incrementally trained to recognize hazard signs

(demonstrating the ability to locate mock hazards).

In addition, the team needed to ﬁgure out how the

cloning model and the object recognition model could

both share the same GPU at once. In addition,

the team also determined how the rover would au-

tonomously scan for these hazards when the A2C

model was deployed. Eventually, YOLO3 was cho-

sen as a simple bolt-on to our MAAP pipeline. We

used transfer learning on a pre-trained YOLO3 model

where the model was trained with the help of Ima-

geNet and a collection of labeled hazard signs.

With the deadline for project deliverables looming

the team was unable to produce a solution that would

allow the A2C model and the YOLO3 model work on

the same GPU, so YOLO was relegated to the CPU

while the A2C was kept on the GPU. Overall, the

model was able to function at a rate between 9 and

13 frames per second - a pretty good rate, considering

there was no effort put towards optimized hardware

performance.

The artifacts for the ﬁnal two sprints were a ﬁ-

nal presentation to faculty members who were not in-

volved with the project, a presentation at a local AI

event, and a ﬁnal demonstration of the hazard recog-

nition mission to the project mentor. The mission was

successful in that rover was able to locate 7 out of 9

easy-to-ﬁnd hazard signs. The search routine was not

optimal in that the program simply issued a command

to the rover to perform a slow ﬁve-second in-place

360 turn to scan for hazard signs. This action was per-

formed at one-minute intervals. Overall, the bolt-on

object recognition portion of the project was rushed,

but provided evidence that the students’ approach to

MAAP was plausible and had potential to scale for

many real-world uses.

6 CONCLUSIONS AND FUTURE

WORK

Through our MAAP capstone program, students were

able to participate in developing an end-to-end so-

lution involving advanced applied ML techniques at

USAFA. Each student gained invaluable insight as

well as improved ML skills, technical skills, and

team-building skills employers would consider above

average for newly minted undergraduate Computer

Science majors. MAAP is a highly engaging program

that allows students to experience how ML is de-

signed, tested, improved, and applied in a real-world

scenario outside of a simulated environment.

We feel that this program can be adapted to

other robot modalities such as prosthetic limbs, aerial

The Multipurpose Autonomous Agent Project: Experiential Learning for Engineering Assistive Artiﬁcial Intelligence

261

drones, and legged pack drones. Future plans in-

clude incorporating additional sensors and providing

more interesting bolt-on missions and objectives for

MAAP. We also plan to publish our data collections

for use in research-related works as well as provid-

ing a publication covering the teaching materials for

MAAP for use by other higher education institutions.

REFERENCES

Akbilgic, O. and Davis, R. L. (2019). The promise of ma-

chine learning: When will it be delivered?

Ardupilot (2022). The cube overview.

https://ardupilot.org/copter/docs/common-thecube-

overview.html.

Azari, A. R., Lockhart, J. W., Liemohn, M. W., and Jia, X.

(2020). Incorporating physical knowledge into ma-

chine learning for planetary space physics. Frontiers

in Astronomy and Space Sciences, 7:36.

Bojarski, M., Del Testa, D., Dworakowski, D., Firner,

B., Flepp, B., Goyal, P., Jackel, L. D., Monfort,

M., Muller, U., Zhang, J., et al. (2016). End to

end learning for self-driving cars. arXiv preprint

arXiv:1604.07316.

Bojarski, M., Yeres, P., Choromanska, A., Choromanski,

K., Firner, B., Jackel, L., and Muller, U. (2017).

Explaining how a deep neural network trained with

end-to-end learning steers a car. arXiv preprint

arXiv:1704.07911.

Dominique Luna, Andrew Bauman, e. a. (2021). Udacity

Behavioral Cloning Project.

Farjo, P. D. and Sengupta, P. P. (2021). Ecg for screen-

ing cardiac abnormalities: The premise and promise

of machine learning.

Fraley, J. B. and Cannady, J. (2017). The promise of

machine learning in cybersecurity. In SoutheastCon

2017, pages 1–6. IEEE.

eron, A. (2019). Hands-on machine learning with Scikit-

Learn, Keras, and TensorFlow: Concepts, tools, and

techniques to build intelligent systems. O’Reilly Me-

dia.

Hussein, A., Gaber, M. M., Elyan, E., and Jayne, C.

(2017a). Imitation learning: A survey of learning

methods. ACM Comput. Surv., 50(2).

Hussein, A., Gaber, M. M., Elyan, E., and Jayne, C.

(2017b). Imitation learning: A survey of learning

methods. ACM Comput. Surv., 50(2).

Intel (2021). Introducing the intel® realsense™ depth

camera d455. https://www.intelrealsense.com/depth-

camera-d455/.

JetBrains (2022). The python ide for professional develop-

ers. https://www.jetbrains.com/pycharm/.

Kapoor, A. and Gardner-McCune, C. (2020). Exploring the

participation of cs undergraduate students in industry

internships. In Proceedings of the 51st ACM Technical

Symposium on Computer Science Education, pages

1103–1109.

Kiran, B. R., Sobh, I., Talpaert, V., Mannion, P., Al Sallab,

A. A., Yogamani, S., and P

erez, P. (2021). Deep rein-

forcement learning for autonomous driving: A survey.

IEEE Transactions on Intelligent Transportation Sys-

tems.

Kluyver, T., Ragan-Kelley, B., P

erez, F., Granger, B., Bus-

sonnier, M., Frederic, J., Kelley, K., Hamrick, J.,

Grout, J., Corlay, S., Ivanov, P., Avila, D., Abdalla, S.,

and Willing, C. (2016). Jupyter notebooks – a publish-

ing format for reproducible computational workﬂows.

In Loizides, F. and Schmidt, B., editors, Positioning

and Power in Academic Publishing: Players, Agents

and Agendas, pages 87 – 90. IOS Press.

Lapan, M. (2020). Deep Reinforcement Learning Hands-

On: Apply modern RL methods to practical problems

of chatbots, robotics, discrete optimization, web au-

tomation, and more, 2nd Edition. Packt Publishing.

Larsson, E. (2018). Evaluation of pretraining methods for

deep reinforcement learning.

urig, M. D., Donoughe, S., Svensson, E. I., Porto, A., and

Tsuboi, M. (2021). Computer vision, machine learn-

ing, and the promise of phenomics in ecology and evo-

lutionary biology. Frontiers in Ecology and Evolution,

9:148.

Ly, A. O. and Akhlouﬁ, M. (2020). Learning to drive

by imitation: An overview of deep behavior cloning

methods. IEEE Transactions on Intelligent Vehicles,

6(2):195–209.

McGunagle, D. and Zizka, L. (2020). Employability skills

for 21st-century stem students: the employers’ per-

spective. Higher Education, Skills and Work-Based

Learning.

NVIDIA (2022). Nvidia jetson xavier nx for embedded

& edge systems. https://www.nvidia.com/en-

us/autonomous-machines/embedded-systems/jetson-

xavier-nx/.

Osa, T., Pajarinen, J., Neumann, G., Bagnell, J. A., Abbeel,

P., and Peters, J. (2018). An algorithmic perspective

on imitation learning. Foundations and Trends® in

Robotics, 7(1-2):1–179.

Planch, B. (2019). Hands-On Computer Vision with Ten-

sorFlow 2: Leverage deep learning to create power-

ful image processing apps with TensorFlow 2.0 and

Keras. Packt Publishing.

PowerBox Systems (2022). Radio sys-

tem core. https://www.powerbox-

systems.com/en/products/radio-

system/fernsteuersystem/radio-system-core.html.

Python Software Foundation (2022). Python package index.

https://pypi.org/.

Rutherford, S. (2020). The promise of machine learning for

psychiatry. Biological Psychiatry, 88(11):e53–e55.

Sanner, M. F. et al. (1999). Python: a programming lan-

guage for software integration and development. J

Mol Graph Model, 17(1):57–61.

Schultz, L. (2014). Readability analysis of programming

textbooks: Traditional textbook or trade book? In

Proceedings of the Information Systems Educators

Conference ISSN, volume 2167, page 1435.

CSEDU 2023 - 15th International Conference on Computer Supported Education

262

Shapiro, R. B., Fiebrink, R., and Norvig, P. (2018). How

machine learning impacts the undergraduate com-

puting curriculum. Communications of the ACM,

61(11):27–29.

Smith, K. N. and Green, D. K. (2021). Employer internship

recruiting on college campuses: ‘the right pipeline for

our funnel’. Journal of Education and Work, 0(0):1–

18.

Smolkin, L. B., McTigue, E. M., and Yeh, Y.-f. Y. (2013).

Searching for explanations in science trade books:

What can we learn from coh-metrix? International

Journal of Science Education, 35(8):1367–1384.

Sulmont, E., Patitsas, E., and Cooperstock, J. R. (2019).

What is hard about teaching machine learning to non-

majors? insights from classifying instructors’ learning

goals. ACM Transactions on Computing Education

(TOCE), 19(4):1–16.

Tai, L. and Liu, M. (2016). Mobile robots exploration

through cnn-based reinforcement learning. Robotics

and biomimetics, 3(1):1–8.

Toole, A. A., Pairolero, N. A., Forman, J. Q., and Giczy,

A. V. (2019). The promise of machine learning

for patent landscaping. Santa Clara High Tech. LJ,

36:433.

Torabi, F., Warnell, G., and Stone, P. (2019). Recent ad-

vances in imitation learning from observation. arXiv

preprint arXiv:1905.13566.

Trister, A. D., Buist, D. S., and Lee, C. I. (2017). Will ma-

chine learning tip the balance in breast cancer screen-

ing? JAMA oncology, 3(11):1463–1464.

Valstar, S., Krause-Levy, S., Macedo, A., Griswold, W. G.,

and Porter, L. (2020). Faculty views on the goals of an

undergraduate cs education and the academia-industry

gap. In Proceedings of the 51st ACM Technical Sym-

posium on Computer Science Education, pages 577–

583.

von Lilienfeld, O. A. and Burke, K. (2020). Retrospective

on a decade of machine learning for chemical discov-

ery. Nature communications, 11(1):1–4.

Winder, P. (2020). Reinforcement Learning: Industrial Ap-

plications of Intelligent Agents. O’Reilly Media.

The Multipurpose Autonomous Agent Project: Experiential Learning for Engineering Assistive Artiﬁcial Intelligence

263