Optical Character Recognition Based-On System for Automated

Software Testing

D. Abbas and J. I. Olszewska

School of Computing and Engineering, University of the West of Scotland, U.K.

Keywords:

Intelligent Systems, Autonomous Systems, Trustworthy Artiﬁcial Intelligence, Expert Systems, Software

Robots, Automated Software Testing, Machine Learning, Optical Character Recognition, Computer Vision.

Abstract:

The paper presents the development and deployment of an artiﬁcial intelligence (AI) test automation frame-

work that allows testers to more ﬂuidly develop scripts and carry out their day-to-day tasks. In particular, the

framework aims to speed up the test automation process by enabling its users to locate elements on a webpage

through the use of template-matching-based image recognition as well as optical character recognition (OCR).

Indeed, test automation specialists spend much of their time creating page-object models (POMs), where they

capture elements on the screen via complex locators such as cascading style sheet (CSS) or XPath. How-

ever, when webpages are updated or elements are moved around, locators become void, eventually pointing

to nothing unless written in such a dynamic way as to prevent this. This heavily relies on developers provid-

ing meaningful tags to elements that they can then be located by, whereas with the introduction of an image

recognition engine in our AI framework, this tedious and long-winded approach has been be shortened.

1 INTRODUCTION

As advancements are made in technology, the ap-

proaches and methodologies to accurately test such

technologies must also evolve (Black et al., 2022).

Indeed, reliable software testing is required to allow

trustworthy autonomous systems, multi-agent sys-

tems, and/or robotic systems to evolve close by and/or

interact with humans such as companion robots in

assistive-living environments; autonomous vehicles

in smart cities; or cloud robotics systems in smart

manufacturing (Olszewska, 2020).

While the goal of testing is mainly to verify the

quality, performance, or reliability of whatever is

tested, testing in the modern age is a complex and

nuanced ﬁeld that is comprised of over a dozen of

different types of testing (IEEE, 2021). These ones

can be broadly split in functional testing, such as re-

gression testing (Long, 1993) to catch a large class of

bugs quickly and efﬁciently, and non-functional test-

ing such as usability testing, and especially pattern-

based usability testing (Dias and Paiva, 2017) to test

usability guidelines (or best practices) by deﬁning

generic test strategies (i.e. test patterns) in order to

allow testing usability aspects on web applications.

Even though manual testing is an integral part of

the testing process and includes the development of

a test strategy, test plan, test cases and test scripts

(Alferidah and Ahmed, 2020), automated testing is

required to cope with the pace of the software’s con-

tinuously integrated/continuously developed (CI/CD)

pipeline that streamlines the development and test-

ing process within a software development life-cycle

(SDLC) and aims to automatically ﬁre off automated

regression tests upon deployment (Chowdhury, A. R.,

2023).

Whilst the idea of test automation is straightfor-

ward, the implementation and integration of auto-

mated software testing into an SDLC can be an in-

tricate process.

So to overcome this aspect, a test automation

framework provides rules, guidelines, and tools that

the user can utilise to write test scripts (Celik et al.,

2017). It can also be seen as a structure that pro-

vides an environment where automated test scripts

can be executed (Chowdhury, A. R., 2023). Some

of the major components of test automation frame-

works usually consist of the test data management and

testing libraries, including unit testing, integration

testing and behaviour-driven development (Chowd-

hury, A. R., 2023). It is worth noting that there are

many types of test automation frameworks (Chowd-

hury, A. R., 2023), e.g. modular testing-, data-driven

testing-, keywords-driven testing-, hybrid testing-,

894

Abbas, D. and Olszewska, J.

Optical Character Recognition Based-On System for Automated Software Testing.

DOI: 10.5220/0012740000003690

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 26th International Conference on Enterprise Information Systems (ICEIS 2024) - Volume 1, pages 894-906

ISBN: 978-989-758-692-7; ISSN: 2184-4992

or behaviour-driven development (BDD) framework.

Hence, test automation can ultimately be imple-

mented in a variety of ways via different techniques,

ranging from simple capture and replace and ﬁnish-

ing with more sophisticated ones such as keyword-

and process-driven approaches (Gafurov et al., 2018).

In particular, the behaviour-driven development

(BDD) framework (Knight, A., 2017), which extends

the test-driven development (TDD) approach (She-

shasaayee and Banumathi, 2018) by focusing on user

requirements and expectations as well as enabling

collaboration and automation, can be integrated into

DevOps (Gohil et al., 2011) and further software qual-

ity veriﬁcation (Cavalcante and Sales, 2018). For

this purpose, BDD consists in writing requirements

in a structured and testable format (i.e. feature ﬁles

which are used to describe test scenario in a struc-

tured natural language (Yang et al., 2019) and step

deﬁnitions which are abstractions representing the el-

ements in a scenario such as contexts, events, and ac-

tions (Solis and Wang, 2011), with each step in the

scenario associated with a corresponding step imple-

mentation function in the underlying programming

language (Storer and Bob, 2019)) that can be evalu-

ated to ensure compliance with the expected behavior

(Farooq et al., 2023).

On the other hand, several tools have been devel-

oped for the test automation with old ones such as

QuickTest Professional (QTP) (Wang and He, 2014)

with keyword-driven methodology, which is a script-

ing technique that uses data ﬁles to contain the key-

word related to the system under test (SUT) (Hamil-

ton, T., 2023), used for the functional testing and re-

gression testing (Lenka et al., 2018). Newer tools

include the development of solutions like Selenium,

which is an open-source, highly customisable, cross-

Browser, web-testing automation framework (Ramya

et al., 2017). The most recent tools are AI-assisted

and AI-driven tools like Applitools (Calantonio, J.,

2023) and may involve visual graphical user inter-

face (GUI) testing capabilities (Alferidah and Ahmed,

2020).

It is worth noting that the decision to automate

is the ﬁrst step in the automation testing life cycle

methodology (ATLM), and not all projects will re-

quire automation nor will meet the correct criteria

for automation (Borjesson, 2012). Actually, test au-

tomation is the automation of tests that have already

been run and veriﬁed to be working correctly. It is

a step out of the typical software testing life cycle

(STLC) (Hourani et al., 2019). So, test automation

is carried out after the STLC has completed, and can

only be considered if pre-requisites such as product

owner (PO) interest and return on investment (ROI)

on test automation have been achieved, because test

automation requires a substantial initial investment

and the beneﬁts of the seeds of automation only be-

gin to sprout after several weeks and months.

Therefore, test automation can go from no au-

tomation (i.e. manual testing), to automated test-

ing and beyond, i.e. to self-managed, self-optimized

testing, or even to autonomous testing such as self-

testing, self-healing or self-repair (Eldh, 2020). For

these latter stages of automation, using AI for soft-

ware testing has a lot of potential and may improve

quality assurance (Hourani et al., 2019).

In particular, AI-assisted test automation, which

is also referred to as AI-driven testing, is concerned

with the use of AI/ML technologies in the perfor-

mance of automated testing activities (King et al.,

2019), and implementing AI/ML can help among oth-

ers to streamline the test automation process and offer

functionalities such as test case creation models, mak-

ing the software testing even more efﬁcient and bugs

easier to catch (Drugeot, C., 2020). Furthermore,

machine learning (ML) classiﬁers can aid to predict

defective software modules, most notably in safety-

critical systems (Moreira Nascimento et al., 2019).

Moreover, the use of computer-vision-based tech-

niques can lead to robotic process automation (RPA)

(Yatskiv et al., 2020) or software robots for test au-

tomation (Gao et al., 2019).

Indeed, since automated software testing needs

data (Zhu, 2018), and considering that ‘the source

code is data, and the screens, websites, databases, in-

put and output are just data’ (Hourani et al., 2019),

machine learning and computer vision techniques

such as template matching and optical character

recognition (OCR) (Olszewska, 2015) can help col-

lect data in the form of images for application screens

(Yu et al., 2019) and widgets (Qian et al., 2023) and

manage such data (Amershi et al., 2019) in context

of test automation (King et al., 2019). In particu-

lar, template matching is a technique in digital im-

age processing for ﬁnding small parts of an image

which match a template image (Kalina and Golo-

vanov, 2019), while OCR is a study of digital im-

age processing for extracting alphanumeric data from

images through pre-processing, segmentation, feature

extraction, and recognition (Hananto et al., 2023).

Thence, testing with visual GUI testing (Borjes-

son, 2012), which is also known as visually validation

testing (Borjesson and Feldt, 2012), is conducted with

tool support that uses such image recognition algo-

rithms and automated scripts to perform tests through

GUI interaction. GUI interaction works on the highest

level of system abstraction and allows the technique to

emulate end-user behaviour to automate complex user

Optical Character Recognition Based-On System for Automated Software Testing

895

scenarios (Wheeler and Olszewska, 2022). User sce-

narios can therefore, with this technique, perceivably

be executed faster, with higher frequency, at lower

cost, with gained quality, etc. (Leotta et al., 2013).

So in this study, we utilise BDD techniques to or-

ganise test suites, develop readable test scenarios, and

conﬁgure and run test runs. This is designed to work

in tandem with the AI/ML aspect of the framework

enhanced by computer-vision techniques for the OCR

and template matching algorithms, in order to provide

an automation engineer with a complete experience.

Hence, this work aims to deliver an AI-assisted

test automation framework that leverages such tech-

nologies to enhance the test automation experience,

and our framework goal being to minimise the effort

of building a test automation framework and collect-

ing data.

On the other hand, robust test execution is another

goal the developed framework in this work has sought

to achieve; the framework provides this level of test

robustness using a two-fold approach. This mixes tra-

ditional as well as visual GUI testing elements to pro-

vide a more full-bodied experience capable of discov-

ering bugs at the GUI level as well as at the document-

object-model (DOM) level.

Besides, the AI-based automated testing frame-

work has been developed following the D7-R4 ap-

proach (Olszewska, 2019).

The paper is structured as follows. In Section 2,

we present the developed AI-based automated test-

ing framework, while Section 3 compares productiv-

ity across various test automation scenarios most ex-

perienced by software testers. Conclusions are drawn

up in Section 4.

2 PROPOSED APPROACH

This section covers the various technologies and soft-

ware (Section 2.1) along with the testing methods and

features (Sections 2.2-2.3) at play within the devel-

oped framework (see Fig. 1) as well as the setting up

of the visual validation algorithms and testing (Sec-

tions 2.4-2.5).

2.1 Test Environment

In order to deliver the proposed framework (see

Fig. 1) in a working capacity, our framework has

been coded in Python (TechVidVan, 2023), using Py-

Charm integrated development environment (IDE),

along with Python’s built-in libraries and NumPy li-

brary supporting large, multi-dimensional arrays and

matrices and a large collection of high-level mathe-

Figure 1: Developed framework architecture.

matical functions to operate on these arrays. The de-

velopment of our open-source system also involved

the use of test suite core librairies such as PyTest,

PyTest-BDD, and Selenium as well as a series of

computer-vision librairies such as PyTesseract and

OpenCV, as explained further in this section.

2.1.1 Selenium

Selenium is an open-source test automation library

that utilises a WebDriver to interact with browsers

in context of web-based application testing and that

has seen widespread popularity within the industry in

recent years (Tanaka et al., 2020). The library pro-

vides basic functions such as clicking on an element

or selecting by index from a drop-down list among

others, allowing testers to simulate common activities

performed by end-users and to build regression packs

(Selenium, 2022).

In this work, it has been speciﬁcally chosen for

this reason, since it allows us to utilize the Selenium

WebDriver, which is as the name suggests what will

drive the tests. Indeed, the WebDriver communicates

directly with a browser and uses its native support

for automating the execution process of the test cases

(Ramya et al., 2017). Much like Selenium itself, Se-

lenium WebDriver can work on any browser given

that the respective driver has been developed, with all

major browsers being supported. It is worth adding

that for this project, the main web browser that all the

carried-out tests take place on is Google Chrome due

to its popularity and widespread use.

2.1.2 PyTest

PyTest is a Python testing framework that can be used

for various levels of testing, including unit tests, in-

tegration tests, end-to-end tests, and functional tests.

Its features include parametrized testing, ﬁxtures, and

assert re-writing (Hunt, 2023).

In the case of our system, PyTest is primarily used

for its ﬁxtures which are used to instantiate and yield

the various test managers that are used throughout the

framework. The reasoning for this is so that the user

ICEIS 2024 - 26th International Conference on Enterprise Information Systems

896

does not ever need to instantiate the managers them-

selves and that all of them will work in alongside the

logging module to deliver seamless test logging and

aid reporting.

PyTest is also used to run and organise the tests,

which is done so via starting the function’s name with

‘test’, this later integrates with PyTest-BDD to help

us locate our feature ﬁle, containing our scenarios

(tests); pull the steps from the scenario; locate the as-

sociated code; and run the code.

2.1.3 PyTest-BDD

PyTest-BDD is a PyTest add-on that implements

a subset of the Gherkin language for automating

project requirements testing and that enables be-

haviour driven development (BDD) within the frame-

work (Santos et al., 2022). Hence, it allows an engi-

neer to take a behaviour-centric approach to the de-

velopment of their scripts.

This approach has been taken primarily for two

reasons, namely, complexity abstraction (i.e. keep-

ing the complex code that carries out the tasks sepa-

rate from the steps of the test, enhancing readability

and understanding of the system as a whole) and ad-

vanced structuring (i.e. structuring the test suite in

such a way that complexity and code is modular and

relevant, which aids compartmentalisation and under-

standing of the framework).

The BDD element of the framework enables the

user to organise their tests via feature ﬁles, step def-

initions, and page-object models (POM) (i.e. helper

ﬁles that are a way of implementing and abstracting

the code that will run within the step deﬁnitions, by

modeling the web pages involved in the test process

as ‘objects’) (Leotta et al., 2013).

2.1.4 PyTesseract

PyTesseract is the Python library derived from Tesser-

act (Zelic, F. and Sable, A., 2023) which is a very

well-known optical character recognition (OCR) tool

(Bugayong et al., 2022).

Indeed, Tesseract is an open-source project that

provides an OCR engine capable of advanced image

recognition in a variety of formats (Smith, 2007). The

steps in which Tesseract takes to optically recognize

characters are, namely, word ﬁnding, line ﬁnding, and

character classiﬁcation. Word and line ﬁnding at-

tempts to locate the rough areas of text, which are then

organized into blobs (Zelic, F. and Sable, A., 2023).

These blobs are then broken down into words and

characters which the engine attempts to sequentially

recognize. Having successfully recognized a word or

set of words, this then further trains the model (Zelic,

F. and Sable, A., 2023). Tesseract itself is capable

of recognizing more than 100 languages and can be

trained to recognize and interpret many more (Google

Open Source, 2021).

The reason for its choice for our work speciﬁcally

is that it is the most robust and popular OCR library

available. Moreover, it is open-source, meaning its

use and extension comes with no charge.

In the case of our framework, the text is read from

the input image which is the screenshot of web ele-

ments that are provided by the Selenium WebDriver.

Actually, the screenshot is ﬁrst processed, then fed

to the OCR engine, processed once more, and ﬁnally

text is then output. It is precisely this text that is com-

pared to the ‘text’ or ‘value’ attribute of that element.

2.1.5 OpenCV

OpenCV is a comprehensive and open-source com-

puter vision and machine learning software library. It

contains more than 2,500 optimized algorithms and is

one of the most robust and popular options available

on the market (Zelic, F. and Sable, A., 2023).

The library provides several methods and func-

tionalities surrounding computer vision, which is the

basis of our visual validation testing and facilitates a

lot of the image handling, pre-processing, template

matching, and post-processing required for visual val-

idation testing.

In particular, OpenCV is used in the following

ways within our framework to deliver visual valida-

tion testing:

• Decoding images - this is necessary as part of

the pre-processing of images needed for template

matching;

• Colour conversion - this is another step in pre-

processing images for template matching;

• Reading ﬁles - OpenCV allows us to read images

and store them as variables and objects;

• Template matching - this allows us to pinpoint the

location of our template within a source image us-

ing various matching algorithms.

Hence, the use of OpenCV in our framework con-

tributes to provide added security and robustness to

the automated testing delivered by our system.

2.2 Test Managers

What the framework essentially provides an automa-

tion engineer is the necessary methods and tools from

which they can choose and utilise to carry out their

test automation efforts. However, to properly abstract

complexity within the framework so that the process

Optical Character Recognition Based-On System for Automated Software Testing

897

Figure 2: Test manager class diagram.

of utilising, it is necessary to develop the test man-

agers.

The test managers work in harmony to provide the

testers with what they might need; this is done by

splitting various functional areas into the managers

themselves. The test managers each deal with an inte-

gral area of test automation and were developed in the

following order: (1) manager base, (2) logging man-

ager, (3) input manager, (4) validation manager, and

(5) visual manager, as detailed in the remaining part

of this section.

The class diagram in Fig. 2 showcases the rela-

tionships between the test managers which are essen-

tially classes revolving around a certain layer of test

automation. The visual, input, and validation man-

agers inherit from the manager base which gives them

access to not only the parent methods but the driver

and logger that are used throughout the testing pro-

cess.

2.2.1 Manager Base

Manager Base acts as the base class for the test man-

agers. This allows them to all have access to the same

instance of the driver and logger that are both inte-

gral to the test automation. The Selenium WebDriver

is what is used to create the browser instance upon

which all the respective code is run, and the logger

simply gives access to the active logger object to en-

sure the output remains consistent.

As well as providing access to the driver and log-

ger objects, it provides the user with two basic meth-

ods, namely, getElement, which attempts to ﬁnd and

return the element that is passed in by using an XPath

locator and check frame, which checks the current

frame and whether or not we are in the correct frame

before switching back to default. This is important for

when we are working with iframes which are essen-

tially windows inside windows that must be navigated

to before work can be done on the elements inside.

2.2.2 Logging Manager

Logging Manager has no association whatsoever to

any of the other test manager classes, as it can be seen

in the class diagram displayed in Fig. 2. This is purely

because the purpose of the logging manager class is

to initialise the logger object and apply the necessary

settings for the console output.

To properly output to the console on top of the

logger’s baseline capabilities, we need to set the

formatting and logging levels. Thence, the ini-

tialiseLogger() method ﬁrst creates the new logger

object, followed by setting the level, the name and

path of the eventual .log ﬁle as well as the for-

matting of the output. Once the logger has been

initialised once, we can reference it by utilising

logging.getLogger(NameOfLogger). This references

Python’s internal logging library, which allows us to

access the same instance of the logger throughout

the project and various classes and ﬁles. The above

method is in the manager base, which as aforemen-

tioned is used as the base class for all the test man-

agers - excluding the logging manager. This is pre-

cisely how each of the managers can utilize the same

logger object for logging and reporting.

2.2.3 Input Manager

Input Manager facilitates all the input commands that

are used to interact with the web browser. These

methods enable the user to interact with a webpage

using the most common and popular types of interac-

tion such as clickViaWebElement, selectText, select-

Value, submitElement, clickElement, checkboxClick,

enterData, and clearData. Each of these methods

provided by the input manager enables the automation

engineer to interact with the webpage. The purpose

for abstracting these functions through a test manager

is for primarily two reasons: to access to the same log-

ging output and same driver. This essentially allows

the engineer to write code without having to worry

about reporting and logging, which are automatically

taken care of as long as they use the provided func-

tionality from the test managers.

So, of the three core test managers that will be

used by the automation engineer (i.e. input, valida-

tion, visual), the input manager will undoubtedly see

the most use.

2.2.4 Validation Manager

Validation Manager is undeniably the most important

to the testing process, whilst not used as frequently as

the input or validation managers. The validation man-

ager essentially allows the engineer to validate vari-

ous kinds of information and data, via assertions and

other means, and is the primary way for the engineer

to ensure robustness in functionality.

The validation manager enables the user to use

ICEIS 2024 - 26th International Conference on Enterprise Information Systems

898

the following methods: containsText (which checks

that the text within an element matches contains what

we want it to, i.e. a word from a sentence or para-

graph); validateText (which validates that the text

within the element is a 1:1 match for what we are

looking for); containsTitle (which checks that the ti-

tle of the webpage - shown in the tab near the top - is

as we expect); assertIsTrue (which asserts that some-

thing equals true); and assertIsFalse (which asserts

that something equals to false). These ﬁve methods

are what the validation manager has to offer an engi-

neer that is using the framework. While it does pro-

vide the majority of the validation methods that may

be used in web testing, it also lends itself to extension

for further development and expansion, as do all the

managers and the framework as a whole.

2.2.5 Visual Manager

Visual Manager, which as the name suggests, handles

the visual validation testing and provides to the engi-

neer operations such as optical character recognition

(OCR) and template matching for image recognition-

based testing. The purpose for this manager on top

of the existing managers is to provide another layer

of testing that can further conﬁrm the robustness and

integrity of the system under test (SUT). By doing

this, we are better achieving the goal of automa-

tion which is to (ideally) be run upon new deploy-

ments to gain an understanding of the overall sys-

tem status. The methods provided by this manager

are as follows: enterDataViaTemplate (which uses a

template image parameter to locate the area on the

screen that we want to interact with, then clicks on

this area if found and enters in the parameterized

data); clickViaTemplate (which clicks at the x/y co-

ordinates of our located template); getElementCoor-

dinates (which returns a Numpy array with the x

and y coordinates of the located template); valida-

teElementText (which uses the PyTesseract library to

implement OCR, takes a screenshot of the element

and simultaneously grabs the text attribute of said

element and compares them); saveScreenshotToFile

(which saves a screenshot to ﬁle, most typically will

be used for web element screenshot library); and

getElementScreenshotAsBytes (which returns both the

screenshot of a web element in byte form and the

text of the web element and is used by other methods

within the class).

The visual manager is the most complex of the

managers and provides a breadth of visual validation

testing in tandem with the other managers to aid in

providing a well-rounded and robust testing experi-

ence.

2.3 Test Logging and Reporting

Test reporting and logging were implemented after

the core test suite functionality was in place. This

was primarily done to deliver a more coherent envi-

ronment for an engineer to develop their tests in, as

described below.

2.3.1 Test Logging

The logging of tests and the events that take place dur-

ing a test was one of the ﬁrst developments as part of

the framework. Being able to keep track of what is

happening during the test and afterwards is crucial,

not only from a testing point of view but also from a

debugging angle.

At every stage of development for a test script, it

is re-run to ensure what has been developed is work-

ing, and to better facilitate this commonality among

automation engineers, test logging was deemed an in-

tegral part of the framework. This allows for monitor-

ing of the testing and debugging, but also for easily

indicating whether a test has passed or failed and ex-

actly at what step of the test.

The PyTest-BDD library allows for the conﬁgura-

tion of the output using Gherkin syntax. This allows

us to output formatted code, showcasing the scenario

and steps that were executed, alongside whether the

test as a whole passed or failed. The end product of

the logging is the .log ﬁle, which includes all the logs

that ﬁred during the test run. Unlike reports which

need to be generated manually, the .log ﬁles are au-

tomatically generated and easy to handle due to their

generally small size.

2.3.2 Test Reporting

Reporting must be run manually via the command

line. At ﬁrst, a JavaScript Object Notation (JSON)

object must be generated from the test run, and then

the report itself can be generated from this. The deci-

sion to make this process manual is based on the fact

that - of a hundred test runs, only maybe one report

will be sent to the stakeholders - it is not something

that needs to be constantly generated as it can quickly

begin to take up a lot of disk space.

For this reason and to reduce post-processing

times, the reporting was made to be a manual process

via the command-line interface (CLI). The reporting

itself is powered using PyTest-HTML (a built-in li-

brary that is coupled with PyTest) that allows us to

provide basic reporting that gives an overview of what

has passed and failed in a Hyper Text Mark-up Lan-

guage (.html) format.

Optical Character Recognition Based-On System for Automated Software Testing

899

2.4 Template Matching

Template Matching is one of the two ways that vi-

sual validation testing is provided by the framework

and is one of the latter areas that were developed once

the core and aforementioned managers were imple-

mented. The framework provides this functionality to

the automation engineer by utilising several libraries

in tandem.

Template matching essentially consists in ﬁnding

an image within an image. Traditionally, template

matching has been used in image recognition applica-

tions. Our framework utilises template matching by

detecting a template image within a source image. In

the case of this work, the source image is a screenshot

of the webpage that is taken using the WebDriver; this

source image being the basis of the template match-

ing algorithm. Soon after, a template - or image we

want to ﬁnd within the source image - must be de-

ﬁned. Once these two key components have been de-

ﬁned and pre-processed, we can execute the template

matching.

Figure 3: Matched template example.

An example of template matching in action can be

seen in Fig. 3, which showcases the Google home-

page search bar being found by the template match-

ing algorithm. For demonstration, the detected area

is surrounded by a yellow rectangle when utilising

OpenCV. Indeed, this entire window is generated us-

ing OpenCV, which opens a prompt window showcas-

ing the matched region. In this scenario, the template

is the search bar element screenshot, whilst the source

image is the entirety of the Google homepage.

So the process ﬂow for the getElementCoordi-

nates method within the visual manager class can be

described as follows. A screenshot of the webpage is

taken using the Selenium WebDriver - this acts as the

source image. Then, the source image is decoded and

converted to grayscale (as part of the pre-processing).

Next, the template is read from the ﬁle path pro-

vided by the engineer and parameterized as part of the

getElementCoordinates method. Then, the width and

height of the template are stored as variables. Next,

using OpenCV, we run the template matching algo-

rithm and capture the result which is then converted

into a one-dimensional array using NumPy. Next,

the one-dimensional array is unravelled and converted

into a multi-dimensional NumPy array - this is what

is returned from the getElementCoordinates method,

the multi-dimensional array holds the x and y coor-

dinates that are passed to the Selenium ActionChains

library later on to offset the cursor to the correct x and

y coordinates on the screen to interact with the web-

page. Next, the multi-dimensional array (i.e. the x

and y coordinates) are returned from the method. Fi-

nally, once the x and y coordinates are returned from

the getElementCoordinates method, we click on the

centre of the matched region by multiplying the width

and height of the template by 0.5 (i.e. template width

× 0.5). Now that we have interacted with the element

and brought it to focus, we can continue with what-

ever operations the automation engineer would like to

do such as entering data.

2.5 Optical Character Recognition

Optical Character Recognition (OCR) is the second

way in which visual validation testing is provided by

the framework. This is done by utilising PyTesser-

act which allows the use of the Tesseract.exe image

recognition engine. The path to this .exe will differ

depending on the machine and must be updated in the

OCR method called validateElementText.

OCR sees widespread use in a variety of applica-

tions in the modern-day, such as automatic number

plate recognition, QR code scanning, language trans-

lation (Kalina and Golovanov, 2019) to name a few. It

is an integral aspect of many computer-vision-related

applications and software. In its purest form, OCR is

a subset of pattern recognition problems, which forms

its basis from several processes including but not lim-

ited to input data pre-processing, segmentation, and

feature extraction (Kalina and Golovanov, 2019).

Furthermore, the use of OCR within the test au-

tomation framework is a crucial step in implement-

ing visual validation testing on top of more tradi-

tional methods, where elements are accessed at the

document object model (DOM) level. This approach,

when coupled with traditional test automation meth-

ods provides a two-fold layer of robustness, where we

are not only verifying the contents of the element at

the DOM level but also from a visual perspective.

The main beneﬁt of this is that it is closer to how a

human would interact with the application, ergo mak-

ing it more realistic. When manually testing, testers

will not inspect elements to ensure that the values,

tags, and attributes are correct - it is a test that re-

lies heavily on the tester’s vision. Thus, by utilising

ICEIS 2024 - 26th International Conference on Enterprise Information Systems

900

both methods of test automation, the test automation

framework provides greater levels of conﬁdence to an

engineer, as well as test reliability and test integrity.

This in turn gives the business conﬁdence in what they

have developed and in certain cases, can even boost

morale and productivity within the team.

Figure 4: validateElementText method utilising OCR en-

gine.

Figure 4 is an extract from the code behind the

framework, that employs the use of OCR to validate

the text of an element. At ﬁrst, it utilizes the Se-

lenium WebDriver to capture two variables, namely,

screenshot bytes (this is a screenshot of the web el-

ement taken by using the WebDriver, which is later

fed to the OCR engine) and element text (this is the

string from the text attribute of the element that is

passed in). Next, it uses the OpenCV library to de-

code the image so that it can run the OCR engine on

it to capture the text. Once it has captured the text, it

manipulates the string to get only the text we are inter-

ested in - this is a solution to an issue with PyTesser-

act where it occasionally adds trailing spaces or other

special characters. Finally, it asserts that the text held

in the element’s text attribute is equal to the string that

has been captured from the screenshot of the web el-

ement. So this is how OCR has been implemented

within the framework, with its main aim being to pro-

vide a heightened level of robustness to the testing

process.

3 APPLICATION AND

DISCUSSION

In this section, we will describe the application of

our AI-assisted automated testing framework in real-

world context (in Section 3.1) and its evaluation (in

Section 3.2) in terms of processing time and success

rate of the computer vision algorithms which are em-

bedded in our framework.

3.1 Application

Our developed framework for automated testing can

aid the test building process in three domains, as fol-

lows: (i) writing the feature ﬁle (test scenario); (ii)

coding the page-object model (POM), and (iii) attach-

ing the code via the step deﬁnition, as explained in the

remaining part of this section.

3.1.1 Writing the Feature File

The feature ﬁle is what holds one to many test sce-

narios, which are our tests. A feature ﬁle is written in

Gherkin syntax, which is practically identical to the

English language.

As a running example shown in Fig. 5, we create

a .feature ﬁle under the ‘features’ module with a de-

scriptive name. Now that the .feature ﬁle is created,

we have to name the feature. The name of the feature

should be identical to the name of the ﬁle for ease of

understanding, as highlighted in red in Fig. 5.

Then, we should give the feature a description -

this can be in plain English and acts as a way for any

user that reads this feature ﬁle to get an understanding

of what kind of tests are within the ﬁle.

Once the name of the feature ﬁle and a description

are established, we can begin to write a test scenario,

starting by naming the scenario something descriptive

that makes sense to a human tester.

Next, when writing a test scenario in Gherkin syn-

tax, we must preﬁx each step with the following oper-

ators ‘Given’, ‘When’ or ‘Then’. These serve no real

purpose besides enhanced readability, but a common

rule is to use ‘Given’ steps as the setup for the rest of

the test; ‘When’ steps as actions (e.g., ‘When I search

for Amazon’); and ‘Then’ steps as veriﬁcations or as-

sertions.

3.1.2 Coding the Page-Object Model

The Page-Object Model (POM) is where all the cod-

ing within the framework is done primarily. Each

POM will refer to a webpage (i.e. Google Home-

page), and on each website, there will be various ac-

tions one can do such as search, select, enter, etc. -

these will be the methods.

To continue with the running example, a .py ﬁle

should be created under the page object models mod-

ule in the project, naming it in a relevant way to the

intended test. It is worth noting that if the test inter-

acts with multiple websites/webpages, there is a need

for many POMs. Then, within the POM, we should

start by implementing the imports to access the meth-

ods made available by the test managers.

Optical Character Recognition Based-On System for Automated Software Testing

901

Figure 5: Example of ‘Writing the Feature File’ operation.

Once we have the POM class, it is best practice to

capture locators and store them as private variables at

the top of the class, as illustrated in Fig. 6.

Figure 6: Example of locators.

It is worth noting that any managers that one

wishes to use also need to be referenced, ideally as

protected attributes.

Next, methods (i.e. operations) for the respective

webpage/website can be written, as shown in Fig. 7.

Figure 7: Example of methods.

We can create as many or as few methods as we

like, as long as all the functionality necessary to cover

the test scenario that we wrote earlier has been ful-

ﬁlled.

3.1.3 Attaching the Code via the Step Deﬁnition

Now that we have written the feature ﬁle as well as

the POM, we are ready to ‘glue the code’ together to

run our test.

For that purpose, under the step defs module, we

create a .py ﬁle with a name that is almost identical

to the name of our scenario - this allows to keep track

of what we have worked on. After creating such .py

ﬁle, under the same module step defs, the imports are

copied and pasted in the ‘Imports’ text into the new

step deﬁnitions ﬁle.

Now that we have the step deﬁnition ﬁle setup, we

can generate the step deﬁnition code via a command

using the terminal, as follows (see Fig. 8).

Figure 8: Example of step deﬁnition template code genera-

tion.

Next, at the top of the step deﬁnition ﬁle above all

the generated methods, we can write a method using

the @scenario decorator, with the path to our feature

ﬁle and the name of our scenario, as displayed in Fig.

9. When we preﬁx the method with ‘test’, this allows

PyTest to ﬁnd it.

Figure 9: Example of method using the @scenario decora-

tor.

Then, we add in all the respective methods calls

for our operations and in the conftest.py ﬁle, we add

a method that will yield the class object so that one

can use the respective methods, as illustrated in Figs

10 and 11, respectively.

Finally, we can right-click on the step deﬁnition

ﬁle and run the test by clinking ‘Run PyTest’.

ICEIS 2024 - 26th International Conference on Enterprise Information Systems

902

Figure 10: Example of operation calls.

Figure 11: Example of method that yields the class object.

3.2 Discussion

This section look at quantifying and evaluating the

performance of the computer vision algorithms em-

bedded in our developed framework. Thus, the com-

putational speed of these algorithms is quantiﬁed us-

ing the logging from the system output that is gener-

ated through the test managers.

3.2.1 Template Matching Evaluation

As aforementioned in Section 2.4, template matching

is one of two ways that visual validation testing aided

by AI/ML has been implemented into the framework.

Traditional methods although more time-

consuming in their setup are typically faster as they

access the DOM layer of the webpage. Whereas

visual validation methods such as OCR and template

matching take more time as the algorithm needs to

work to match the region, it does this pixel by pixel

until a satisfactory match is found in the case of

template matching.

Due to how the algorithm works by starting at x(0)

and y(0), it will take longer or shorter depending on

the position of the element, because of this, the com-

putational time in order to match the element can vary

heavily. These variations in the processing time can

be seen in Fig. 12 for the running example.

In the ﬁrst instance where the clickAtCoordinates

method is called (which utilises template matching),

we can see in Fig. 12 that it takes approximately 7

seconds for this to take place, whilst the second in-

stance takes less than 1 second. Both instances start

Figure 12: Sample of template matching processing time

measures.

from the x and y coordinates 0, however the process-

ing time varies depending on how far away the ﬁrst

element is from the starting point.

3.2.2 Optical Character Recognition Evaluation

As outlined in Section 2.5, OCR within the dimen-

sion of our framework provides a two-fold approach

to error detection in the form of a dual check. This

functionality is aimed at ensuring that tests are reli-

able and that if a step passes, there are no unforeseen

issues.

Traditionally, one interact with the webpage via

the DOM layer, however in contrast to this, normal

users and manual testers will not test the system un-

der test (SUT) like this. This means that there needs

to be a way for the visual element of a webpage to be

veriﬁed alongside the DOM layer, and that is exactly

what OCR within our framework provides. This im-

proves not only error detection, but results in ﬁnding

errors sooner rather than later, all while using a more

realistic approach to that of a normal user.

Due to the nature of the characters we are work-

ing with, in that they are clearly typed words and sen-

tences, the OCR engine works very quickly, taking

only a couple of hundred milliseconds in order to be

completed. Again, using the logging output from tests

that have run, we can observe in Fig. 13 that the OCR

takes 310ms for the running example.

Figure 13: Sample of optical character recognition (OCR)

processing time measures.

Overall, from this we can see that the computa-

tional speed for the OCR is reasonably fast. OCR is

also reliable, since for all the instances where it was

used, all 100% of those instances successfully recog-

nized the characters from the screenshot of the web

element. The reason that the OCR algorithm within

Optical Character Recognition Based-On System for Automated Software Testing

903

the framework is so reliable is because the charac-

ters are easy enough for the engine to recognise. This

in comparison to attempting to recognise text from a

news paper or handwriting means the accuracy of the

optical character recognition in our AI-assisted auto-

mated testing framework is working ﬂawlessly across

the board.

4 CONCLUSIONS

This work has successfully developed an AI-assisted

test automation framework that has shone a light on

the potential of artiﬁcial intelligence (AI), machine

learning (ML), and computer vision (CV) within the

software testing industry, speciﬁcally automation.

Indeed, our AI-assisted test automation frame-

work, that leverages visual and traditional testing

methods, minimises the effort of automating tests and

collecting data. Moreover, the use of computer vision

techniques provide a two-fold layer that adds security

and robustness to the automated testing delivered by

our framework, with the contents of the element be-

ing not only veriﬁed at the DOM level but also from a

visual perspective. Therefore, this AI-assisted test au-

tomation framework with embedded computer-vision

capabilities and in tandem with BDD offers a com-

plete automated software testing solution, usable for

reliable testing in mission-critical applications.

REFERENCES

Alferidah, S. K. and Ahmed, S. (2020). Automated soft-

ware testing tools. In Proceedings of the IEEE Inter-

national Conference on Computing and Information

Technology, pages 1–4.

Amershi, S., Begel, A., Bird, C., DeLine, R., Gall, H., Ka-

mar, E., Nagappan, N., Nushi, B., and Zimmermann,

T. (2019). Software engineering for machine learning:

A case study. In Proceedings of the IEEE/ACM Inter-

national Conference on Software Engineering (ICSE),

pages 291–300.

Black, R., Davenport, J. H., Olszewska, J. I., Roessler, J.,

Smith, A. L., and Wright, J. (2022). Artiﬁcial Intelli-

gence and Software Testing: Building systems you can

trust. BCS Press.

Borjesson, E. (2012). Industrial applicability of visual GUI

testing for system and acceptance test automation. In

Proceedings of the IEEE International Conference on

Software Testing, Veriﬁcation and Validation, pages

475–478.

Borjesson, E. and Feldt, R. (2012). Automated system test-

ing using visual GUI testing tools: A comparative

study in industry. In Proceedings of the IEEE Inter-

national Conference on Software Testing, Veriﬁcation

and Validation, pages 350–359.

Bugayong, V. E., Flores Villaverde, J., and Linsangan, N. B.

(2022). Google tesseract: Optical character recogni-

tion (ocr) on hdd / ssd labels using machine vision. In

Proceedings of the IEEE International Conference on

Computer and Automation Engineering, pages 56–60.

Calantonio, J. (2023). 7 Innovative AI Test Automation

Tools. Available at: https://testguild.com/7- innov

ative-ai-test-automation-tools-future-third-wave/.

Cavalcante, M. G. and Sales, J. I. (2018). The behavior

driven development applied to the software quality

test. In Proceedings of the IEEE Iberian Conference

on Information Systems and Technologies, pages 1–4.

Celik, E., Eren, S., Cini, E., and Keles, O. (2017). Software

test automation and a sample practice for an enterprise

business software. In Proceedings of the IEEE Inter-

national Conference on Computer Science and Engi-

neering, pages 141–144.

Chowdhury, A. R. (2023). Testim. Your Complete Guide

to Test Automation Frameworks. Available at: https:

//www.testim.io/blog/test-automation-frameworks/.

Dias, F. and Paiva, A. C. R. (2017). Pattern-based usabil-

ity testing. In Proceedings of the IEEE International

Conference on Software Testing, Veriﬁcation and Val-

idation Workshops, pages 366–371.

Drugeot, C. (2020). Software Testing News. How is AI

Transforming Software Testing?

Eldh, S. (2020). Test automation improvement model -

TAIM 2.0. In Proceedings of the IEEE International

Conference on Software Testing, Veriﬁcation and Val-

idation Workshops, pages 334–337.

Farooq, M. S., Omer, U., Ramzan, A., Rasheed, M. A.,

and Atal, Z. (2023). Behavior driven development: A

systematic literature review. IEEE Access, 11:88008–

88024.

Gafurov, D., Hurum, A. E., and Markman, M. (2018).

Achieving test automation with testers without cod-

ing skills: An industrial report. In Proceedings of the

IEEE/ACM International Conference on Automated

Software Engineering, pages 749–756.

Gao, J., Tao, C., Jie, D., and Lu, S. (2019). What is ai soft-

ware testing? and why. In Proceedings of the IEEE

International Conference on Service-Oriented System

Engineering, pages 1–9.

Gohil, K., Alapati, N., and Joglekar, S. (2011). Towards

behavior driven operations (bdops). In Proceedings

of the IEEE International Conference on Advances in

Recent Technologies in Communication and Comput-

ing, pages 262–264.

Google Open Source (2021). Tesseract OCR. Available at:

https://github.com/tesseract-ocr/tesseract.

Hamilton, T. (2023). Keyword Driven Testing Framework

with Example. Available at: https://www.guru99.com

/keyword-driven-testing.html.

Hananto, A., Abdul Rahman, T. K., Brotosaputro, G., Fauzi,

A., Hananto, A. L., and Priyatna, B. (2023). Param-

eters monitoring automation kiln manufacture based

optical character recognition (OCR) with the template

matching method. International Journal of Intelligent

ICEIS 2024 - 26th International Conference on Enterprise Information Systems

904

Systems and Applications in Engineering, 11(6):621–

635.

Hourani, H., Hammad, A., and Laﬁ, M. (2019). The im-

pact of artiﬁcial intelligence on software testing. In

Proceedings of the IEEE Jordan International Joint

Conference on Electrical Engineering and Informa-

tion Technology, pages 565–570.

Hunt, J. (2023). PyTest Testing Framework. In Advanced

Guide to Python 3 Programming. Springer.

IEEE (2021). IEEE/ISO/IEC 29119-4-2021 - Interna-

tional Standard - Software and systems engineering–

Software testing–Part 4: Test techniques.

Kalina, D. and Golovanov, R. (2019). Application of tem-

plate matching for optical character recognition. In

Proceedings of the IEEE Conference of Russian Young

Researchers in Electrical and Electronic Engineering,

pages 2213–2217.

King, T. M., Arbon, J., Santiago, D., Adamo, D., Chin, W.,

and Shanmugam, R. (2019). Ai for testing today and

tomorrow: Industry perspectives. In Proceedings of

the IEEE International Conference on Artiﬁcial Intel-

ligence Testing, pages 81–88.

Knight, A. (2017). 12 Awesome Beneﬁts of BDD. Available

at: https://automationpanda.com/2017/02/13/12-awe

some-benefits-of-bdd/.

Lenka, R. K., Nayak, K. M., and Padhi, S. (2018). Auto-

mated testing tool: QTP. In Proceedings of the IEEE

International Conference on Advances in Computing,

Communication Control and Networking, pages 526–

532.

Leotta, M., Clerissi, D., Ricca, F., and Spadaro, C. (2013).

Repairing Selenium test cases: An industrial case

study about web page element localization. In Pro-

ceedings of the IEEE International Conference on

Software Testing, Veriﬁcation and Validation, pages

487–488.

Long, M. A. (1993). Software regression testing success

story. In Proceedings of the IEEE International Test

Conference, pages 271–272.

Moreira Nascimento, A., Vismari, L. F., Cugnasca, P. S.,

Camargo Jr, J. B., and Rady de Almeira Jr, J. (2019).

A cost-sensitive approach to enhance the use of ML

classiﬁers in software testing efforts. In Proceed-

ings of the IEEE International Conference on Ma-

chine Learning and Applications, pages 1806–1813.

Olszewska, J. I. (2015). Active contour based optical char-

acter recognition for automated scene understanding.

Neurocomputing, 161(C):65–71.

Olszewska, J. I. (2019). D7-R4: Software development life-

cycle for intelligent vision systems. In Proceedings of

the International Joint Conference on Knowledge Dis-

covery, Knowledge Engineering and Knowledge Man-

agement (KEOD), pages 435–441.

Olszewska, J. I. (2020). AI-T: Software testing ontology

for AI-based systems. In Proceedings of the Inter-

national Joint Conference on Knowledge Discovery,

Knowledge Engineering and Knowledge Management

(KEOD), pages 291–298.

Qian, J., Ma, Y., Lin, C., and Chen, L. (2023). Accelerating

OCR-Based widget localization for test automation of

GUI applications. In Proceedings of the IEEE/ACM

International Conference on Automated Software En-

gineering, pages 1–13.

Ramya, P., Sindhura, V., and Sagar, P. V. (2017). Testing

using Selenium web driver. In Proceedings of the

IEEE International Conference on Electrical, Com-

puter and Communication Technologies, pages 1–7.

Santos, M. G. D., Petrillo, F., Halle, S., and Gueheneuc,

Y.-G. (2022). An approach to apply automated ac-

ceptance testing for industrial robotic systems. In

Proceedings of the IEEE International Conference on

Robotic Computing, pages 336–337.

Selenium (2022). The Selenium project and tools. Available

at: https://www.selenium.dev/documentation/en/intr

oduction/the selenium project and tools/.

Sheshasaayee, A. and Banumathi, P. (2018). Impacts of

behavioral driven development in the improvement of

quality software deliverables. In Proceedings of the

IEEE International Conference on Inventive Compu-

tation Technologies, pages 228–230.

Smith, R. (2007). An overview of the Tesseract OCR en-

gine. In Proceedings of the IEEE International Con-

ference on Document Analysis and Recognition (IC-

DAR), pages 629–633.

Solis, C. and Wang, X. (2011). A study of the characteristics

of behaviour driven development. In Proceedings of

the EUROMICRO Conference on Software Engineer-

ing and Advanced Applications, pages 383–387.

Storer, T. and Bob, R. (2019). Behave nicely! automatic

generation of code for behaviour driven development

test suites. In Proceedings of the IEEE International

Working Conference on Source Code Analysis and

Manipulation, pages 228–237.

Tanaka, T., Niibori, H., Shiyingxue, L., Nomura, S., Nakao,

T., and Tsuda, K. (2020). Selenium based testing sys-

tems for analytical data generation of website user be-

havior. In Proceedings of the IEEE International Con-

ference on Software Testing, Veriﬁcation and Valida-

tion Workshops, pages 216–221.

TechVidVan (2023). Python Advantages and Disadvantages

- Step in the right direction. Available at: https://tech

vidvan.com/tutorials/python-advantages-and-disadva

ntages/.

Wang, X. and He, G. (2014). The research of data-driven

testing based on QTP. In Proceedings of the IEEE

Iberian Conference on Computer Science and Educa-

tion, pages 1063–1066.

Wheeler, D. and Olszewska, J. I. (2022). Cross-platform

mobile application development for smart services.

In Proceedings of the IEEE Joint 22nd International

Symposium on Computational Intelligence and Infor-

matics and 8th International Conference on Recent

Achievements in Mechatronics, Automation, Com-

puter Science and Robotics, pages 203–208.

Yang, A. Z. H., Alencar da Costa, D., and Zou, Y. (2019).

Predicting co-changes between functionality speciﬁ-

cations and source code in behavior driven develop-

ment. In Proceedings of the IEEE/ACM International

Conference on Mining Software Repositories, pages

534–544.

Optical Character Recognition Based-On System for Automated Software Testing

905

Yatskiv, N., Yatskiv, S., and Vasylyk, A. (2020). Method

of robotic process automation in software testing us-

ing artiﬁcial intelligence. In Proceedings of the IEEE

International Conference on Advanced Computer In-

formation Technologies, pages 501–504.

Yu, S., Fang, C., Feng, Y., Zhao, W., and Chen, Z. (2019).

Lirat: Layout and image recognition driving auto-

mated mobile testing of cross-platform. In Proceed-

ings of the IEEE/ACM International Conference on

Automated Software Engineering, pages 1066–1069.

Zelic, F. and Sable, A. (2023). OCR Unlocked: A Guide

to Tesseract in Python with Pytesseract and OpenCV.

Available at: https://nanonets.com/blog/ocr-with-tes

seract/#technologyhowitworks#.

Zhu, H. (2018). Software testing as a problem of machine

learning: Towards a foundation on computational

learning theory. In Proceedings of the IEEE/ACM In-

ternational Workshop on Automation of Software Test,

pages 1–1.

ICEIS 2024 - 26th International Conference on Enterprise Information Systems

906