The Effect of Font Variation in the Accuracy of Image to Text

Conversion

Vera Firmansyah and Amalia Rakhmawati

Academy of Metrology and Instrumentation, Ministry of Trade, Bandung, Indonesia

Keywords: Font, OCR, OpenCV, Accuracy.

Abstract: Image processing is using both hardware and software as tools to analyze and as an interface to process an

image. These tools are able to improve the welfare and the quality of life of people with visual impairments

to help them to read articles. The level of impaired vision can vary from person to person. Thus, this research

develops initiate step in image to text conversion with font variation. Image to text conversion is done by

extracting text from images obtained through the camera from the article. Previous research used

microprocessor equipped with a camera module and the Tesseract OCR in the Python Pence. The Tesseract

OCR program on Pence is an open source program used to extract text from images and save it in the form of

a text. This research using 5 font variation chosen randomly which are Times New Roman, Arial, Calibri,

Comic Sans and Courier New. Image Processing using Dilation, Crop, Canny and Median Blur. The result

shows that Comic Sans Font has the highest accuracy and Times New Roman has the lowest accuracy. Comic

Sans has the highest accuracy because the overall font does not have much curves than the other while Times

New Roman Font has the lowest accuracy because it has more curve characteristics.

1 INTRODUCTION

Image processing is using both hardware and

software as tools to analyze and as an interface to

process an image. In 2015 it is estimated that of the

7.33 trillion world population, there are 253 million

people (3.38%) who suffer from visual disturbances,

consisting of 36 million people experiencing

blindness, 217 million experiencing moderate to

severe visual impairment. In addition, there are 188

million people with mild visual disturbances (M. Patil

and R. Kagalkar, 2014).

The classification of visual

impairments used is in accordance with the WHO

classification, which is based on visual acuity

(Ministry of Health of the Republic of Indonesia,

2018).

Therefore, those tools are able to improve the

welfare and the quality of life of people with visual

impairments to help them to read articles. The level

of impaired vision can vary from person to person.

So this research develops initiate step in image to

text conversion with font variation. Image to text

conversion is done by extracting text from images

obtained through the camera from the article.

Previous research used microprocessor equipped

with a camera module and the Tesseract OCR

(Optical Character Recognition) program in the

Python OpenCV (Open Computer Vision)

programming (Rithika, H., B. N. Santhoshi, 2016

OpenCV is an API (Application Programming

Interface) library used because it has familiarity

with computer vision image processing. Computer

vision is a branch of image processing field which

allows computers to see like humans. With

computer vision, the computer can make decisions,

take action, and recognize objects. Some of the

implementations of computer vision are face

recognition, face detection, face / project tracking,

road tracking, etc. (Widja. I. B. P., 2017). The

Tesseract OCR program on OpenCV is an open

source program used to extract text from images and

save it in the form of a text. Fig. 1 shows the

Tesseract OCR program on OpenCV to convert

image to text.

This research aims to get the effect of font

variation in the accuracy of image to text

conversion. The fonts are chosen randomly which

are Times New Roman, Arial, Calibri, Comic Sans

and Courier New.

Firmansyah, V. and Rakhmawati, A.

The Effect of Font Variation in the Accuracy of Image to Text Conversion.

DOI: 10.5220/0010966800003260

In Proceedings of the 4th International Conference on Applied Science and Technology on Engineering Science (iCAST-ES 2021), pages 1431-1434

ISBN: 978-989-758-615-6; ISSN: 2975-8246

 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)

1431

Figure 1: Tesseract OCR Example.

2 DIGITAL IMAGE

When the light source hits the object, the object will

reflect some of the light back. The reflection of light

will be captured by an optical sensing device i.e.

digital camera, then the image of the object is

received by the sensor according to the intensity of

the reflected light and will be converted into a digital

image. This study uses an active complementary

metal-oxide-semiconductor (CMOS) sensor. With

CMOS sensors the power comsumption should be

lower, integration capabilities should be high and

should comes with lower price.

Digital image is a discrete data set in the form of

a two-dimensional matrix where the numbers from

the matrix indicate the brightness level of the points.

Pixel is the representation of the smallest point in the

digital image and the value in the coordinates (x, y)

as shown in Fig. 2 which indicates the intensity value.

For digital images with red, green, blue (RGB)

coordinates, the image has an intensity value in the

RGB color coordinates of each pixel.

Figure 2: Pixel Coordinate (Rithika, H., B. N. Santhoshi,

2016).

3 IMAGE PROCESSING

This research using image processing steps shown in

Figure 3: Image Processing.

Sample is taken in a constant lighting of 5 lumen,

the font size is 12 and the font variation is Times New

Roman, Arial, Calibri, Comic Sans and Courier New.

Sample taken 5 times to get repeatability. The image

taken from camera is shown in Fig. 4.

Figure 4: Digital Sample.

Sample is using words and sentences in Fig. 5.

Figure 5: Words and Sentences in Sample.

iCAST-ES 2021 - International Conference on Applied Science and Technology on Engineering Science

1432

The image from the last step of image processing

in this reasearch is binary image shown in Fig. 6.

Figure 6: Binary Image of Sample.

4 TESSERACT OCR

The binary image taken is the input of Tesseract

OCR, then on the Page Layout Analysis with analysis

of connected component to find where the component

outline is stored. The outline is gathered together to

form a blob. The blob is the area of the image that

overlaps together. Then the blobs organized into a

text line, and the lines and regions are analyzed to find

fixed pitch and proportional writing (Smith. R.,

2007). Posts with fixed pitch are broken down into

character cells. Proportional writing is divided into

words using defined spaces and fuzzy spaces.

Furthermore, image word recognition is carried out in

two stages called pass-two (Smith. R., 2007).

The first pass is made to recognize each word. The

words that pass the first pass are words that match the

dictionary and are passed on to the adaptive classifier

to be used as training data. After sufficient samples,

this adaptive classifier can also provide classification

results even on the first pass. Words that may not be

recognized or missed on the first pass will be

continued in the pass two process. In this condition,

the adaptive classifier that has received more

information on the first pass will be more able to

recognize words that were missed or less recognized

before (Smith. R., 2007).

According to (Smith, 2007) several steps taken by

Tesseract for character recognition are as follows:

1. Line and Word Search

2. Character and Word Introduction

5 RESULT AND ANALYSIS

The result of image to text conversion with font

variation is shown in Table 1. The articles consist of

4 paragraphs in Latin text with bold, italic and

underline text combined.

Accuracy of the image to text conversion depends

on how much the Tesseract OCR recognize the font

with bold, italic and underline text combined. Also

the image processing to get binary image which

affects the input of Tesseract OCR The result shows

that Comic Sans Font has the highest accuracy and

Times New Roman has the lowest accuracy. Comic

Sans has the highest accuracy because the overall font

does not have much curves and details than the other

while Times New Roman Font has the lowest

accuracy because it has more curve characteristics.

The differences of Comic Sans and Times New

Roman font shape shown in Table 2.

Table 1: Image to Conversion Result.

Table 2: Font Shape.

From Table 2, Times New Roman font has more

details shape in each letter than Comic Sans. The

details contain curve in the letter especially for the

letter ‘g’ which the Tesseract OCR has not recognized

the letter very well. Even though the performance

represented by total mean of the system is 98.9 %, the

system needs to be improved by applying image

The Effect of Font Variation in the Accuracy of Image to Text Conversion

1433

processing technique specified in extracting curve (Z.

Martinez) letter for each font styles.

REFERENCES

M. Patil and R. Kagalkar. (2014). A Review on Conversion

of Image to Text As Well As Speech Using Edge

Detection and Image Segmentation. International

Journal of Science and Research (IJSR).

I. Isewon, J. Oyelade, O. Oladipupo. (2014). Design and

Implementation of Text to Speech Conversion for

Visually Impaired People. International Journal of

Applied Information Systems (IJAIS) – ISSN: 2249-

0868 Foundation of Computer Science FCS, New York,

USA.

S. Edward, A. Jothimani, V. Jayaprakash, J.B. Xavier.

(2018). Text-To-Speech Device for Visually Impaired

People. International Journal of Pure and Applied

Mathematics Volume 119 No. 15 2018, 1061-1067.

Ministry of Health of the Republic of Indonesia. (2018) .

Situation of Vision Disorders. Infodatin of the Ministry

of Health of the Republic of Indonesia.

Rithika, H., B. N. Santhoshi. (2016). Image Text to Speech

Conversion In The Desired Language By Translating

With Raspberry Pi, IEEE International Conference on

Computational Intelligence and Computing Research

(ICCIC).

Widja. I. B. P. (2017). Image Binarization Design and Text

Character Recognition with Raspberry Pi, National

Conference on Systems & Information Technology.

Rakhmawati, A., Juliastuti. E., Umma. A. K., Sari. B. P.

(2019). Prototype of Orifice Plate Diameter

Measurement Based on OpenCV Image Analysis,

Instrumentation and Control Seminar.

Smith. R. (2007). An Overview of the Tesseract OCR

Engine, Ninth International Conference on Document

Analysis and Recognition.

Z. Martinez. Curves Extraction in Images. Revista

Colombiana de Estadística, January 2015, Volume 38,

Issue 1, pp. 295 to 320.

iCAST-ES 2021 - International Conference on Applied Science and Technology on Engineering Science

1434