Creating a Math Expression Filter to Extract

Concise Math Expression Images

Kuniko Yamada

and Harumi Murakami

Graduate School for Creative Cities, Osaka City University, Osaka, Japan

Graduate School of Informatics, Osaka Metropolitan University, Osaka, Japan

Keywords:

Math Expression Image, Concise Math Expression, Math Expression Filter, SVM, CNN.

Abstract:

Even though the web is an effective resource to search for math expressions, ﬁnding appropriate ones among

the obtained documents is time-consuming. Therefore, we propose a math expression ﬁlter that presents ap-

propriate images for such searches. We call an appropriate math expression a concise math expression, such

(x) =

∑

i=1

x−x

), written in a compact form whose content can be interpreted by the math expres-

sion itself. We determined the conditions satisﬁed by a concise math expression and developed classiﬁers

that discriminate the images of concise math expressions from web images using supervised machine learn-

ing methods based on these conditions. We performed two experiments: Experiment 1 used methods other

than deep learning, and Experiment 2 used deep learning. A convolutional neural network (CNN) with trans-

fer learning and ﬁne tuning by VGG16 shows high performance with an obtained F-measure of 0.819. We

applied this ﬁlter to a task that presents math expression images by entering mathematical terms into a web

search engine as queries. All of the evaluation metrics outperformed the previous study, including F-measure,

MAP, and MRR.

1 INTRODUCTION

In recent years, the importance of mathematical in-

formation retrieval (MIR) has increased (Zanibbi and

Blostein, 2012). Since mathematics is a tool for ex-

pressing various concepts in science ﬁelds, under-

standing math expressions is imperative in such ﬁelds.

When someone wants to clearly understand an ex-

pression, the web provides a natural searching re-

source. However, web search for math expressions

has its problems. After web documents are obtained

that contain math expressions, ﬁnding the appropriate

math expressions in the located documents is time-

consuming. Such documents contain many various

types of math expression images. For example, such

variables as “x” have no intrinsic meaning, and such

fragments as “log” and “≤” are not math expressions,

and some long expressions simply show the process

for deriving a formula. Among expressions and frag-

ments of expressions, we extracted math expressions,

including

(x) =

∑

i=1

x−x

), which are long

enough for human interpretation without supplemen-

tary explanation, although shorter than expressions

from which a formula is derived. We call these con-

cise math expressions. First, we collected candidate

expressions, analyzed their features, and determined

the conditions of a concise math expression. We then

produced a math expression ﬁlter by developing clas-

siﬁers to perform binary classiﬁcation and compared

the performance of each classiﬁer with the others. Fi-

nally, we applied our ﬁlter to a task that presents math

expression images on web searches and discussed the

results.

2 RELATED WORK

Various types of research have been pursued in MIR.

The ﬁrst is math formula similarity searches, such as

adopting three different similarity measures (Ohashi

et al., 2016), using two different tree structures

(Davila and Zanibbi, 2017), and focusing on the sim-

ilarity of substructures (Zhong et al., 2020). Other

types of MIR include a math document classiﬁcation

method based on text combined with the structures of

math expressions (Suzuki and Fujii, 2017), extract-

ing identiﬁer-deﬁniens pairs to improve performance

in MIR tasks (Schubotz et al., 2017), and focusing

on partial equations within equations to analyze the

frequency distributions of math expressions in large

Yamada, K. and Murakami, H.

Creating a Math Expression Filter to Extract Concise Math Expression Images.

DOI: 10.5220/0011799700003393

In Proceedings of the 15th International Conference on Agents and Artiﬁcial Intelligence (ICAART 2023) - Volume 3, pages 909-915

ISBN: 978-989-758-623-1; ISSN: 2184-433X

 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)

909

scientiﬁc datasets (Greiner-Petter et al., 2020). For

math image recognition, there are conversions such

as image to LaTeX (Peng et al., 2021; Wang and

Liu, 2016) and image to markup (Deng et al., 2017).

Other work summarizes content by generating head-

lines with math equations (Yuan et al., 2020). How-

ever, none of these are intended for the web, and

the datasets are also speciﬁc. For the web, a for-

mula search available from a browser is “Approach

Zero” (Zhong, 2022), which allows users to search

for formulae in speciﬁc databases. For PDFs, there

is research on analyzing PDFs using OCR software

and presenting math expression images in response

to a query (Yamada and Murakami, 2020) and re-

search on detecting math formula regions as bounding

boxes around formulae in PDFs using a CNN (Dey

and Zanibbi, 2021). Our research aims to extract con-

cise math expression images from images in HTML

documents on the Web by binary classiﬁcation with-

out directly analyzing the contents of the images. To

our knowledge, no similar studies were found.

3 EXPERIMENTS

After setting a dataset, we applied preprocessing in-

cluding elimination of duplicate images, and deter-

mined the concise math expression conditions. We

then checked the correct images of the dataset based

on the conditions and conducted two experiments for

evaluating the performance of the created classiﬁers.

Experiment 1 used machine learning methods other

than deep learning. Experiment 2 used CNNs. Fi-

nally we compared all of the classiﬁers and selected

the best one.

3.1 Dataset

Table 1: Dataset. These raw data include duplicate images

and errors in preprocessing. “Other than html” includes

PDFs, slides, Google Books and so on.

Dataset Image Acquired webpage breakdown

Total Correct Other than html Error Html

0 trn

19,470 442 1,091 80 1,829

0 val

15,427 351 1,269 60 1.671

0 tst

23,988 929 1,662 72 2,266

Total 58,885 1,722 4,022 212 5,766

We use the same dataset as studied in a prior work

(Yamada et al., 2018). We randomly selected 100 key-

words from the index of Bishop’s “Pattern Recogni-

tion and Machine Learning” (Bishop, 2006) and per-

formed a web search using these keywords as queries

to obtain the top 100 web pages. We created a dataset

by extracting all the images from those pages. In Ta-

ble 1, D

0 trn

is the keywords from 31 to 60 as the train-

ing dataset, D

0 val

is the dataset from keywords 1 to

30 as validation, and D

0 tst

is the keywords from 61 to

100 as the testing dataset. The ﬁrst author manually

judged images to determine whether they were related

to the keywords. When unclear cases surfaced, judg-

ments were made in consultation with another person

(the same person throughout all judgments). Keyword

examples are softmax function, SVM, kernel density

estimation method, Heaviside step function, Gaus-

sian kernel, convex function, Probit function, Boltz-

mann distribution, functional derivative, and least-

mean-squares algorithm.

3.2 Preprocessing

Because of the method used to create D

0 trn

and D

0 val

they included the same image registered with different

IDs. Therefore, we deleted the ones with overlapping

features. In Experiment 1, the basic features (ﬁle size,

width, and height) were used, so images with these

values overlapping were deleted. In Experiment 2,

the images were used directly, so the images with the

same features and the same appearance were deleted.

In addition, unnecessary icons such as buttons and lo-

gos were removed from the dataset for Experiment

1. We extracted the common strings from the image

names of the unnecessary icons in D

0 trn

, and images

with these strings in their image names were deleted

in advance.

3.3 Determining Concise Math

Expression Conditions

After preprocessing, we obtained 314 of the origi-

nal 442 keyword-related correct images in D

0 trn

(Ta-

ble 1) and analyzed them to identify the conditions

of a concise math expression. As a result of a web

search using the above keywords, many of the correct

math expressions have proper names such as Gaus-

sian kernel. Therefore, they are written in an orga-

nized form and are interpretable by the expressions

themselves. That means these images are considered

suitable candidates for concise math expression im-

ages. We examined the following by directly view-

ing the images: “Number of horizontal characters (in-

cluding symbols),” “number of vertical characters (in-

cluding symbols),” “number of lines,” “number of ex-

pressions

,” and “number of concatenations” (=, <,

and so on). Because the fonts used in web math ex-

The number of expressions in () is 1. Nested expres-

sions in an expression are not counted.

ICAART 2023 - 15th International Conference on Agents and Artiﬁcial Intelligence

910

Figure 1: Threshold determination. The third row of each

ﬁgure represents the frequency. We determined the thresh-

olds in order from the easiest one (from top to bottom in the

table). After determining a threshold, we searched for the

next one in the images that were already within the thresh-

old.

pression images are quite different and the layouts of

expressions are also different, the number of horizon-

tal characters is counted regardless of the size of the

font. For the number of vertical characters, an expo-

nent, a fractional line, and others are set to 0.5. For

example, if an image is

x−1

+x−1

, the number of hori-

zontal characters is 6 and the number of vertical char-

acters is 3.

The thresholds were determined in order from the

easiest one in Fig. 1. (1) From A, a relational expres-

sion is ﬁrst adopted. A relational expression is one in

which the left and right sides are connected by “=,”

“<,” and so on

. (2) From B, i = 1 or2, i: number

of expressions. (3) From C, j = 1 or 2, j: number of

concatenations. (4) From D, k = 1 or 2, k: number of

lines. Then (5) from Fig. 2, the frequency between

4 and 35 is 2 or more, so 4 ≤ l ≤ 35, l: number of

horizontal characters. (6) From Fig. 3, the frequency

of 6 or less is 2 or more, so 1 ≤ m ≤ 6, m: number of

vertical characters.

Therefore, a concise expression satisﬁes (1) a re-

lational expression, (2) i = 1 or2, i: number of expres-

sions, (3) j = 1or 2, j: number of concatenations, (4)

k = 1 or 2, k: number of lines, (5) 4 ≤ l ≤ 35, l: num-

ber of horizontal characters, and (6) 1 ≤ m ≤ 6, m:

number of vertical characters.

We manually checked the correct images of the to-

tal dataset based on the above conditions. Since our

dataset is very imbalanced, we randomly undersam-

pled the data with “*” in Table 2. In Experiment 2,

we adjusted the numbers of datasets (other than test

“=” is 299, “inequality sign” is 14, “≈” is 4, “'” is

2, “≡” is 2, “:=” is 2, “∼” is 2, and “” is 1 (including

multiple concatenations in a single image).

㻜

㻞

㻠

㻢

㻤

㻝㻜

㻝㻞

㻝㻠

㻝㻢

㻝㻤

㻞㻜

㻞㻞

㻝㻢㻝㻝㻝㻢㻞㻝㻞㻢㻟㻝㻟㻢

㻺㼡㼙㼎㼑㼞㻌㼛㼒㻌㼏㼔㼍㼞㼍㼏㼠㼑㼞㼟

㻲㼞㼑㼝㼡㼑㼚㼏㼥

㻜

㻞

㻠

㻢

㻤

㻝㻜

㻝㻞

㻝㻠

㻝㻢

㻝㻤

㻞㻜

㻞㻞

㻝㻢㻝㻝㻝㻢㻞㻝㻞㻢㻟㻝㻟㻢

㻺㼡㼙㼎㼑㼞㻌㼛㼒㻌㼏㼔㼍㼞㼍㼏㼠㼑㼞㼟

㻲㼞㼑㼝㼡㼑㼚㼏㼥

Figure 2: Number of horizontal characters. This ﬁgure

shows the number of characters from 1 to 40 (total fre-

quency 257). The remaining 41 to 91 are omitted because

they are sparsely scattered (total frequency 18).

㻠㻢

㻝

㻝㻜

㻠

㻟

㻣

㻟

㻞

㻟㻥

㻟㻟

㻤㻟

㻝㻟

㻝㻢

㻜

㻝㻜

㻞㻜

㻟㻜

㻠㻜

㻡㻜

㻢㻜

㻣㻜

㻤㻜

㻥㻜

㻝㻝㻚㻡㻞㻞㻚㻡㻟㻟㻚㻡㻠㻠㻚㻡㻡㻡㻚㻡㻢㻢㻚㻡㻣㻣㻚㻡

㻺㼡㼙㼎㼑㼞㻌㼛㼒㻌㼏㼔㼍㼞㼍㼏㼠㼑㼞㼟

㻲㼞㼑㼝㼡㼑㼚㼏㼥

㻠㻢

㻝

㻝㻜

㻠

㻟

㻣

㻟

㻞

㻟㻥

㻟㻟

㻤㻟

㻝㻟

㻝㻢

㻜

㻝㻜

㻞㻜

㻟㻜

㻠㻜

㻡㻜

㻢㻜

㻣㻜

㻤㻜

㻥㻜

㻝㻝㻚㻡㻞㻞㻚㻡㻟㻟㻚㻡㻠㻠㻚㻡㻡㻡㻚㻡㻢㻢㻚㻡㻣㻣㻚㻡

㻺㼡㼙㼎㼑㼞㻌㼛㼒㻌㼏㼔㼍㼞㼍㼏㼠㼑㼞㼟

㻲㼞㼑㼝㼡㼑㼚㼏㼥

㻠㻢

㻝

㻝㻜

㻠

㻟

㻣

㻟

㻞

㻟㻥

㻟㻟

㻤㻟

㻝㻟

㻝㻢

㻜

㻝㻜

㻞㻜

㻟㻜

㻠㻜

㻡㻜

㻢㻜

㻣㻜

㻤㻜

㻥㻜

㻝㻝㻚㻡㻞㻞㻚㻡㻟㻟㻚㻡㻠㻠㻚㻡㻡㻡㻚㻡㻢㻢㻚㻡㻣㻣㻚㻡

㻺㼡㼙㼎㼑㼞㻌㼛㼒㻌㼏㼔㼍㼞㼍㼏㼠㼑㼞㼟

㻲㼞㼑㼝㼡㼑㼚㼏㼥

Figure 3: Number of vertical characters.The frequency of 4

to 35 characters in Fig. 2 is 251, and this ﬁgure shows the

results using those images.

data D

2 tst

) for batch processing. Figure 4 shows ex-

amples of correct and incorrect images. Correct im-

age (a) is an expression for the complementary error

function and correct image (b) is an expression for the

t-distribution. Incorrect image (c) is the coordinates

of a point, and (d) is the output of the math software

for a cell decomposition of an annulus.

3.4 Evaluation Metric for Classiﬁers

In Experiment1 and 2, our dataset is very imbalanced,

so we use the F-measure (Eq. (1)) to compare a large

number of classiﬁers.

F-measure =

2 · precision · recall

precision + recall

, (1)

where

precision =

truepositives

truepositives + falsepositives

and

recall =

truepositives

truepositives + falsenegatives

Creating a Math Expression Filter to Extract Concise Math Expression Images

911

Table 2: Data for experiments. After preprocessing dataset in Table 1, we randomly undersampled the data with “*” in the

table (The criteria for “Correct” in Table 1 and in this Table are different ).

Experiment 1 Experiment 2

Dataset D

1 trn

1 val

1 tst

2 trn

2 val bal

2 val imbal

2 tst

Correct 2,410 1,868 4,336 2,400 1,900 1,900 4,462

Incorrect 2,562

∗

6,076 12,689 2,400

∗

1,900

∗

8,180 18,678

Total 4,972 7,944 17,025 4,800 3,800 10,080 23,140

Figure 4: Examples of correct and incorrect images. Im-

ages labeled (a) and (b) are correct, while (c) and (d) are

incorrect.

3.5 Experiment 1: Machine Learning

Methods Other Than Deep

Learning

In order to create classiﬁers, we used Weka (Frank

et al., 2022), which has an easy-to-understand inter-

face that allows us to combine many classiﬁers and

easily change hyperparameters. It also outputs var-

ious evaluation metrics, such as confusion matrices,

making it easy to compare classiﬁers. We created

classiﬁers to discriminate concise math expressions

using image features. The basic features were “ﬁle

size in bytes,” “number of ﬁle width pixels,” and

“number of ﬁle height pixels.” Furthermore, “density

(Eq. (2)),” “aspect ratio (Eq. (3)),” and “gray value

(Eq. (4))” were used.

density =

ﬁlesize

width · height

, (2)

aspect ratio =

width

height

, (3)

where width is the number of width pixels and height

is the number of height pixels.

gray = |R − G| + |G −B| + |B − R|, (4)

where R is the red component of the image, G is

green, and B is blue. Eq. (4) is used because some

math expression images may appear to be gray but

actually have a color component. If the gray value is

0, then the image is gray. The “+gray value” in Ta-

ble 3 is calculated again as a correct image when the

predicted value of the classiﬁer created with features

other than the gray value is correct and the gray value

is 0. We performed 10-fold cross-validation on ev-

ery classiﬁer and searched for hyperparameters. For

a multilayer perceptron, we considered the number of

hidden layers, and for a random subspace (Ho, 1998),

which is an ensemble learning method, we considered

several classiﬁers for use. For an SVM (Chang and

Lin, 2011), we used a nonlinear SVM with an RBF

kernel and searched for the values of c and γ.

In Table 3, methods B and J were the best, so we

conducted an evaluation experiment using test data

tst

in Table 2. As a result, both had an F-measure

of 0.628 using SVM with c = 7000 and γ = 6. Com-

paring both, true positives and true negatives were al-

most the same number, but B had less false positives

and more false negatives, and J had exactly the oppo-

site.

3.6 Experiment 2: CNN Methods

We created CNNs to conduct binary classiﬁcation that

discriminated between concise math expression im-

ages and other images, using Keras (Chollet et al.,

2022) with a TensorFlow backend. Previously, a large

amount of image data was required for deep learn-

ing, but recently, trained models for transfer learning

have been widely released and are generally available.

We used “data augmentation,” “transfer learning,” and

“ﬁne tuning” methods and examined the models pro-

vided by Keras. Data augmentation is a common

data increasing operation to avoid overﬁtting caused

by having too few data. Random rotation, transla-

tion, equal volume deformation, zooming, and hori-

zontal inversion of half of the image were performed.

The models of transfer learning were pre-trained by

the ImageNet. The number of epochs for ﬁne tuning

was set at 100, but only NASNetMobile was stopped

early to avoid overﬁtting. Table 4 shows the results

of the ﬁnal test data using Table 2’s data. First, as a

baseline, we constructed method A with four Conv2D

layers and four MaxPooling2D layers. The activation

function is the sigmoid function, and the loss function

ICAART 2023 - 15th International Conference on Agents and Artiﬁcial Intelligence

912

Table 3: Creating classiﬁers using machine learning method other than deep learning. Top results by each feature.

Method Feature Classiﬁer F-measure Rank

A basic features SVM 0.654 6

B basic features, density SVM 0.677 1

C basic features, aspect ratio SVM 0.661 4

D basic features, density, aspect ratio SVM 0.669 3

E basic features, gray value MultilayerPerceptron 0.638 9

F basic features, density, gray value RandomSubSpace 0.649 8

G basic features, aspect ratio, gray value MultilayerPerceptron 0.635 10

H basic features, density, aspect ratio, gray value RandomSubSpace 0.653 7

I basic features, +gray value SVM 0.658 5

J basic features, density, +gray value SVM 0.677 1

K basic features, aspect ratio, +gray value SVM 0.623 12

L basic features, density, aspect ratio, +gray value SVM 0.626 11

is binary cross entropy. We examined eight different

patterns for each model, which were whether to use

2 val bal

(balanced data) or D

2 val imbal

(imbalanced

data) as validation data, whether to use data augmen-

tation or not, and whether to use ﬁne tuning or not.

As a result, method N was found to be the best.

3.7 Experimental Results

Despite various performances in Experiment 1,

method N (Table 4) was the best throughout Exper-

iment 1 and 2. The conditions were (1) use VGG16

(Simonyan and Zisserman, 2014) as transfer learning,

(2) use balanced data as validation data, (3) do not use

data augmentation, and (4) use ﬁne tuning. Thus, the

design of our math expression ﬁlter was completed.

4 APPLICATION

We applied the experimental results to the previous

study (Yamada et al., 2018). The task performs web

searches using a math term as a query and presents

the top ten math expressions related to the term with

its surrounding information. Score(i

) is given to each

expression i

by Eq (5):

score(i

) = x

line

+ x

key

+ x

svm

+ x

bonus

, (5)

where x

line

= 1 if i

is in a separate line, 0 otherwise;

key

= 1 if i

has a keyword within a window size set

from -200 to +200 characters, 0 otherwise; x

svm

= 1

if i

is discriminated as positive by the SVM, 0 oth-

erwise; and x

bonus

= 1 if i

has a perfect score so far

and appears ﬁrst in the same web document, 0 other-

wise. The classiﬁer is a nonlinear SVM with an RBF

kernel. The correct image is a math expression image

related to the keyword, which is different from this

study. Evaluation metrics are the F-measure (Eq. (1))

where precision =

and recall =

, Mean Reciprocal

Rank (MRR) (Eq. (6)), and Mean Average Precision

(MAP). MAP is the macro mean of the Average Pre-

cision(AP) (Eq. (7)):

MRR =

∑

k=1

, (6)

AP =

min(n, c)

∑

I(s)Prec(s), (7)

where r is the number of correct images of top n, c

is the total number of correct images, r

is the rank

of the correct images at the top of the k-th keyword,

I(s) is a ﬂag indicating whether the image at s-th is

correct, and Prec(s) is the precision at s-th.

We reranked them by replacing the x

svm

values in

Eq. (5) with our ﬁlter’s values. Table 5 shows that our

ﬁlter worked well.

5 DISCUSSION

Concerning our data, using data augmentation should

be avoided because none of the math web images are

tilted or upside down. However, since the sizes of the

image margins vary, and some of the images are un-

clear after preprocessing (α channel removal and so

on), data augmentation might be considered in these

cases. We plan to study data augmentation in such

cases in our future work. For the validation data when

dealing with imbalanced data, Table 4 does not show

which choice was better, balanced or imbalanced data.

For the application, since the original classiﬁer dis-

criminated whether an image was related to a given

keyword, the same image may be correct or incorrect,

depending on its context. The ability to identify cor-

rect images regardless of context led to improvement.

Creating a Math Expression Filter to Extract Concise Math Expression Images

913

Table 4: Creating classiﬁers using CNNs. Top results by each model, where “+” is used and “−” is not used.

Method Transfer learning model Validation data Data Fine F-measure Rank

augmentation tuning

M baseline D

2 val imbal

− − 0.758 10

N VGG16 D

2 val bal

− + 0.819 1

O DenseNet169 D

2 val imbal

− + 0.807 2

P VGG19 D

2 val imbal

− + 0.804 3

Q DenseNet201 D

2 val bal

− + 0.803 4

R DenseNet121 D

2 val bal

− + 0.801 5

S ResNet50 D

2 val imbal

+ + 0.801 5

T InceptionResNetV2 D

2 val bal

+ + 0.795 7

U MobileNet D

2 val imbal

+ + 0.794 8

V MobileNetV2 D

2 val bal

+ + 0.782 9

W Xception D

2 val imbal

+ + 0.753 11

X InceptionV3 D

2 val imbal

− + 0.747 12

Y NASNetMobile D

2 val imbal

− + 0.741 13

Table 5: Comparison of evaluation metrics@10.

F-measure MRR MAP

Previous study 0.40 0.84 0.42

Using ﬁlter 0.55 0.86 0.45

6 CONCLUSION

We developed a math expression ﬁlter to extract con-

cise math expression images. We determined the con-

ditions satisﬁed by a concise math expression and

developed classiﬁers that discriminate the images of

concise math expression images from web images us-

ing supervised machine learning methods based on

these conditions. To investigate our ﬁlter’s perfor-

mance, we applied it to a task that presents math ex-

pression images and obtained good results.

REFERENCES

Bishop, C. M. (2006). Pattern Recognition and Machine

Learning. Springer, NY, New York.

Chang, C.-C. and Lin, C.-J. (2011). Libsvm: A library for

support vector machines. ACM Transactions on Intel-

ligent Systems and Technology, 2(3):27:1–27:27.

Chollet, F. et al. (2022). Keras. https://github.com/fchollet/

keras.

Davila, K. and Zanibbi, R. (2017). Layout and semantics:

Combining representations for mathematical formula

search. In Proceedings of the 40th International ACM

SIGIR Conference on Research and Development in

Information Retrieval, pages 1165–1168.

Deng, Y., Kanervisto, A., Ling, J., and Rush, A. M. (2017).

Image-to-markup generation with coarse-to-ﬁne at-

tention. In Proceedings of the 34th International Con-

ference on Machine Learning (ICML 2017), pages

980–989.

Dey, A. and Zanibbi, R. (2021). Scanssd-xyc: Faster de-

tection for math formulas. In Document Analysis and

Recognition, (ICDAR 2021), volume 12916 of LNCS,

pages 91–96. Springer.

Frank, E., Hall, M. A., and Witten, I. H. (2022).

The weka workbench. online appendix for ”data

mining: Practical machine learning tools and

techniques”. https://www.cs.waikato.ac.nz/ml/weka/

Witten et al 2016 appendix.pdf.

Greiner-Petter, A., Schubotz, M., M

uller, F., Breitinger, C.,

Cohl, H. S., Aizawa, A., and Gipp, B. (2020). Dis-

covering mathematical objects of interest - a study of

mathematical notations. In Proceedings of the Web

Conference 2020 (WWW ’20), pages 1445–1456.

Ho, T. K. (1998). The random subspace method for con-

structing decision forests. IEEE Transactions on Pat-

tern Analysis and Machine Intelligence, 20(8):832–

844.

Ohashi, S., Kristianto, G. Y., Topi

c, G., and Aizawa, A.

(2016). Efﬁcient algorithm for math formula seman-

tic search. IEICE Transactions on Information and

Systems, 99(4):979–988.

Peng, S., Gao, L., Yuan, K., and Tang, Z. (2021). Image

to latex with graph neural network for mathematical

formula recognition. In Proceedings of 16th Interna-

tional Conference on Document Analysis and Recog-

nition (ICDAR 2021), volume 12822 of Lecture Notes

in Computer Science, pages 648–663. Springer.

Schubotz, M., Kr

amer, L., Meuschke, N., Hamborg, F.,

and Gipp, B. (2017). Evaluating and improving the

extraction of mathematical identiﬁer deﬁnitions. In

Proceedings of Experimental IR Meets Multilingual-

ity, Multimodality, and Interaction- 8th International

Conference of the CLEF Association (CLEF 2017),

volume 10456 of Lecture Notes in Computer Science,

page 82–94. Springer, Cham.

Simonyan, K. and Zisserman, A. (2014). Very deep con-

volutional networks for large-scale image recognition.

ICAART 2023 - 15th International Conference on Agents and Artiﬁcial Intelligence

914

In Proceedings of 3rd International Conference on

Learning Representations, (ICLR 2015). https://arxiv.

org/abs/1409.1556.

Suzuki, T. and Fujii, A. (2017). Mathematical document

categorization with structure of mathematical expres-

sions. In Proceedings of ACM/IEEE Joint Conference

on Digital Libraries (JCDL 2017), pages 119–128.

Wang, Z. and Liu, J.-C. (2016). Translating math formula

images to latex sequences using deep neural networks

with sequence-level training. International Journal on

Document Analysis and Recognition, 24(1):63–75.

Yamada, K. and Murakami, H. (2020). Mathematical ex-

pression retrieval in pdfs from the web using mathe-

matical term queries. In Proceedings of Trends in Ar-

tiﬁcial Intelligence Theory and Applications. Artiﬁcial

Intelligence Practices - 33rd International Conference

on Industrial, Engineering and Other Applications of

Applied Intelligent Systems (IEA/AIE 2020), volume

12144 of Lecture Notes in Computer Science, pages

155–161. Springer, Cham.

Yamada, K., Ueda, H., Murakami, H., and Oka, I. (2018).

Mathematical expression image retrieval on web using

mathematical terms as queries. Transactions of the

Japanese Society for Artiﬁcial Intelligence, 33(4):A–

H91 1–13.

Yuan, K., He, D., Jiang, Z., Gao, L., Tang, Z., and Giles,

C. L. (2020). Automatic generation of headlines

for online math questions. In Proceedings of the

Thirty-Fourth AAAI Conference on Artiﬁcial Intelli-

gence (AAAI-20), pages 9490–9497.

Zanibbi, R. and Blostein, D. (2012). Recognition and re-

trieval of mathematical expressions. International

Journal on Document Analysis and Recognition,

45(4):331–357.

Zhong, W. (2022). Approach zero. https://approach0.xyz/

search/.

Zhong, W., Rohatgi, S., Wu, J., Giles, C. L., and Zanibbi,

R. (2020). Accelerating substructure similarity search

for formula retrieval. In Proceedings of Advances in

Information Retrieval - 42nd European Conference on

IR Research (ECIR 2020), volume 12035 of Lecture

Notes in Computer Science, pages 714–727. Springer.

Creating a Math Expression Filter to Extract Concise Math Expression Images

915