Towards a Novel Edge AI System for Particle Size Detection in Mineral

Processing Plants

avio Wellb Cardoso

1,2

, Mateus Coelho Silva

, Nat

alia F. de C. Meira

Ricardo Augusto Rabelo Oliveira

1,3

and Andrea G. Campos Bianchi

1,3

Graduate Program in Instrumentation, Control and Automation of Mining Processes, Instituto Tecnol

ogico Vale,

Federal University of Ouro Preto, Ouro Preto, Brazil

VALE S.A., Nova Lima, Brazil

Department of Computer Science, Federal University of Ouro Preto, Ouro Preto, Brazil

ﬂ

Keywords:

Edge AI, Particle Size Detection, Cloudlets, Mask R-CNN.

Abstract:

Monitoring and controlling the particle size is essential to reducing the variability and optimizing energy

efﬁciency in mineral process plants. The industry standard utilizes laboratory processes for particle size char-

acterization; the problems that arise here are obtaining representative sample from the bulk and ﬁnding a rapid

method of particle size assessment. We propose a machine vision concept based on Edge AI architecture and

deep convolutional neural algorithms to enable a real-time analysis of particle size, as an alternative to ofﬂine

laboratory process. The present paper is part of this proposed concept and aims exclusively to validate a deep

convolutional neural network algorithm trained from synthetic datasets. The proposed model reached a mean

Average Precision (mAP) of 0.96 and processing times of less than 1s. The results demonstrate the feasibility

of deep convolutional neural networks for real-time particle size segmentation and establishes the ﬁrst step

towards a novel Edge AI system for particle size measurement in mineral processing plants.

1 INTRODUCTION

In mineral processing, analyzing the particle size

of products from operating units is fundamental for

determining the quality and optimizing energy efﬁ-

ciency. In comminution

operations, for example, the

main objective is to release the valuable minerals from

their waste gangue

in the largest possible size to in-

crease energy efﬁciency and reduce ﬁnes (Wills and

Finch, 2015). Necessary plant operation and control

changes are made based on the particle size analy-

sis results, either by an operator or an automatic con-

troller (e.g., (Coetzee et al., 2009)).

There are several instruments and methods for lab-

oratory particle size analysis. Some examples of these

methods are sieving, laser diffraction, microscopy,

gravitational sedimentation, and centrifugal sedimen-

tation(Allen, 2013). However, the characterization

of particle sizes in the laboratory presents a range

of challenges in terms of time-consuming, expensive

bulk sampling equipment, and long response times,

comminution: the process of reducing the size of ore

particles by crushing or grinding.

gangue: mineral without economic value.

limiting the particle size analysis in all critical points,

especially in existing plants.

New methods, systems, and equipment have been

developed as an alternative to laboratory analysis for

particle size characterization. These applications aim

to allow real-time, non-intrusive, and low-cost analy-

Figure 1: Conceptual Architecture.

312

Cardoso, F., Silva, M., Meira, N., Oliveira, R. and Bianchi, A.

Towards a Novel Edge AI System for Particle Size Detection in Mineral Processing Plants.

DOI: 10.5220/0011748000003467

In Proceedings of the 25th International Conference on Enterprise Information Systems (ICEIS 2023) - Volume 1, pages 312-323

ISBN: 978-989-758-648-4; ISSN: 2184-4992

 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)

sis (Okada et al., 2020; Al-Thyabat et al., 2007; Thur-

ley and Ng, 2008).

Although signiﬁcant advances have been made

in online particle size measurement by computer vi-

sion techniques, challenges to improve accuracy and

reduce prediction errors remain in the mineral pro-

cess due to irregular, noisy, and fuzzy particle ore

images caused by adverse environmental conditions,

e.g., dust and uneven lighting. Moreover, current

market solutions are costly and dependent on speciﬁc

embedded hardware and close-source software (e.g.,

(Guyot et al., 2004)).

As an alternative to overcome these previous re-

lated challenges, we propose a conceptual architec-

ture design to enable centralized processing of arti-

ﬁcial intelligence algorithms, displayed in Figure 1.

This concept aims to enable online particle size analy-

sis, by computer vision and deep convolutional neural

networks, at all critical points of mineral processing

plants. This approach tends to reduce on-site hard-

ware, easily integrate with process control and qual-

ity management systems, reduce maintenance costs,

increase processing capacity, reduce vulnerability to

obsolete hardware, and simplify technology upgrades.

Toward the proposed concept, the present paper

aims to validate the deep convolutional neural algo-

rithm as the ﬁrst step for online particle size analysis

based on Edge AI. The advances of this research will

include identifying hardware and software require-

ments to process real-time video images in a central-

ized cloudlet unit by these artiﬁcial intelligence algo-

rithms.

To investigate the feasibility of using a convolu-

tional neural algorithm to segment iron ore particles in

complex images, we deployed the Mask R-CNN(He

et al., 2017) algorithm, considered a state-of-the-art

algorithm for object detection and instance segmenta-

tion. The objective is to evaluate the model’s accuracy

to segment iron ore particles from primary crusher

system images. In addition, we propose a synthetic

dataset for training and model validation as an alter-

native to manually labeling data, an exhausting and

time-consuming operation needed to attend to a large

volume of data required for training.

This paper is organized as follows: Section 2

presents the theoretical references used to complete

this work. In Section 3, we overview the most rele-

vant related works recently found in the bibliography.

Section 4 presents the experimental methodology ap-

plied to validate the convolutional neural network al-

gorithm. In Section 5, we show the results obtained

by our approach. Finally, in Section 6 we discuss the

results achieved and future improvements.

2 THEORETICAL BACKGROUND

The fundamental concept for developing an applica-

tion with the desired features is Edge AI. This con-

cept relates to several authors’ practices in uniting

edge computing with artiﬁcial intelligence (AI) algo-

rithms to develop solutions for real-world problems

(Wang et al., 2020b). Thus, the theoretical references

initially cover understanding AI and edge computing

and how they can work together toward Edge AI ap-

plications.

2.1 Edge Computing

In this ﬁrst subsection of the theoretical references,

we provide a deﬁnition and evaluation of the concepts

within edge computing. We start from the perspective

displayed by Khan et al. (Khan et al., 2019). Accord-

ing to these authors, edge computing is a computing

landscape concept that brings cloud utilities closer to

the edge. Both Khan et at. and Wang et al. (Wang

et al., 2020b) recognize three main edge computing

perspectives:

• Cloudlets: Within this perspective, infrastructures

with large computing power provide virtualization

of cloud-like services;

• Fog computing: This perspective describes the us-

age of distributed computing to virtualize cloud

services;

• Mobile Edge Computing: In this case, the services

run within isolated edge computing networks or

environments.

Although these concepts describe a set of het-

erogeneous applications, they provide the central

premise of edge computing: reducing the latency of

a provided service and bringing the processing power

closer to the edge. Shi and Dustdar (Shi and Dustdar,

2016) enforce that novel interconnected technologies

and perspectives such as the Internet of Things (IoT)

have impacted the need for novel edge-based solu-

tions.

2.2 Artiﬁcial Intelligence (AI)

While edge computing is a relatively new concept,

artiﬁcial intelligence (AI) has been discussed for

decades. McCarthy (McCarthy, 2007) describes AI as

the science and technology to create novel intelligent

machines. In his concept, intelligence is the ability

to learn how to solve a real-world problem through

algorithms.

Shinde and Shah (Shinde and Shah, 2018) concep-

tualize AI as the capability of creating what they de-

Towards a Novel Edge AI System for Particle Size Detection in Mineral Processing Plants

313

scribe as intelligent agents. These agents are capable

of maximizing their success opportunities by analyz-

ing environmental variables.

The set of algorithms described by these rules

is named machine learning (El Naqa and Murphy,

2015), while the latest set of algorithms within this

perspective is named deep learning (LeCun et al.,

2015).

2.3 Edge AI

There is an increasing interest in uniting the latest

deep learning applications within the context of edge

computing (Deng et al., 2020). Although AI often

uses all possible resources from the device, there is an

interest in bringing these services closer to the edge,

where computing resources can be limited.

This edge computing and machine learning con-

vergence is named Edge AI (Li et al., 2019). It is also

referred to in the literature as edge intelligence (Deng

et al., 2020), In-edge AI (Wang et al., 2019), or treated

without speciﬁc names in other works. This comput-

ing perspective requires aspects from edge comput-

ing, distributed computing, and AI. Some important

constraints in this case are hardware acceleration for

AI (Li et al., 2019) and communication (Shi and Dust-

dar, 2016).

2.4 Mask R-CNN

The Mask Regional Convolutional Neural Network

(Mask R-CNN) is a meta-algorithm concept proposed

by He et al. (2017) (He et al., 2017), a family mem-

ber of region-based methods for object detection. In

general terms, Mask R-CNN uses the framework of

Faster R-CNN (Ren et al., 2015), adding a branch

for object mask prediction, being executed in paral-

lel with the function of classiﬁcation and delimitation

of this object by the bounding box. Figure 2 shows

the basic framework of Mask R-CNN.

The ﬁrst stage of Mask R-CNN is structured with

convolution layers, using a backbone composed of a

ResNet101 (He et al., 2015) and a Feature Pyramid

Network (FPN) (Lin et al., 2016) that will output a

high-resolution feature map. Then, a Region Proposal

Network (RPN) is responsible for the Region of Inter-

est (RoI) from the Feature Map, ranking its relevance

according to the backbone results.

The result of the RPN is a variable number of

RoIs, which in turn will have different dimensions

between them, according to the quantity and size of

the detected objects. This way, Mask R-CNN re-

sizes the RoIs for the second stage, adjusting them to

the dimensions established by the following convolu-

Figure 2: Mask R-CNN Framework(He et al., 2017).

tion layers. At this point, the concept proposed by

Mask R-CNN brings an essential improvement over

the solution usually applied in Faster R-CNN, us-

ing the Region of Interest Align (RoIAlign)(Girshick,

2015) method instead of Region of Interest Pooling

(RoIPool)(Girshick, 2015) to resize the images. The

fundamental difference between the two methods is

that RoIAlign can maintain a spatial alignment rela-

tionship between the pixels of the input image and

the resized output image. This feature of RoIAlign

preserves important spatial information, which is re-

sponsible for improved mask accuracy from 10% to

50%.

The second stage then starts after the RoIAlign

layer, having two distinct branches—the ﬁrst consists

of two Fully Convolutional Network (FCN) (Long

et al., 2014) layers being applied to each RoI. As the

output of this branch, we will have a binary mask

segmenting the object of interest at the pixel level.

The second branch performs the object classiﬁcation

and generates the bounding boxes, using a regression

method, exactly as performed in Faster R-CNN. The

concept of separate branches for classiﬁcation and

segmentation is somewhat of a paradigm break where

usually an object is ﬁrst segmented to be classiﬁed.

However, He et al. consider this a critical point be-

cause, in the loss functions deﬁned, the segmentation

process does not compete with the classiﬁcation.

3 RELATED WORKS

In this section, we display some works related to the

solution proposed in this paper. Mainly, we study

techniques that employ novel computing technologies

to perform the same task. We concentrated this sec-

tion on understanding how other authors use the Mask

R-CNN in materials science and mining. Also, we

overviewed other deep learning techniques employed

in the same areas.

ICEIS 2023 - 25th International Conference on Enterprise Information Systems

314

3.1 Mask R-CNN Applied to Materials

Science and Mining

Huang et al. (Huang et al., 2020) implemented the

Mask R-CNN for stockpile aggregate segmentation.

The authors manually labeled 164 images (11,795 ag-

gregates) for training and 20 images for validation.

The dataset organized by these authors contained ag-

gregates of various geological origins and different

sizes, colors, textures, and viewing angles. Detection

precision was higher than 87% for all test instances.

Maxwell et al. (Maxwell et al., 2020) combined

LiDAR to obtain digital terrain data and the Mask

R-CNN algorithm to extract valley ﬁll faces (VFFs),

which are a product of mountaintop removal (MTR)

coal mining. The precision, recall, and F1-score met-

rics were above 0.85.

Iyas et al. (Iyas et al., 2020) implemented the

Mask R-CNN algorithm to identify rock-forming

minerals in images. The authors tested four models,

modifying the backbone architecture (ResNet-50 and

ResNet-101) and the inﬂuence of the lighting on a po-

larizing microscope. The AP metric was higher for

model B (58.0%), which used the lighting on a polar-

izing microscope and ResNet-101 backbone.

Dong et al. (Dong et al., 2021) proposed a new al-

gorithm for segmenting mineral images of deep sea

nodules based on Mask R-CNN. The authors com-

pared it to other deep learning models such as U-Net

and Generative Adversarial Network. The experimen-

tal results showed that the method based on Mask R-

CNN obtained the best results for accuracy, recall,

and IoU. U-Net obtained the best result for the pre-

cision metric.

Yang et al. (Yang et al., 2021) developed a sys-

tematic tool to identify and evaluate images of parti-

cles and granular materials that the authors called im-

ages of cobble and ballast. First, they used the Mask

R-CNN to implement instance segmentation and ex-

tract the particle contour. Next, the authors used com-

putational techniques and statistical analysis to evalu-

ate these materials’ elongation, angularity, and rough-

ness.

Zhang et al. (Zhang et al., 2019) implemented the

Mask R-CNN to segment nanoparticle images from

images obtained from the Transmission Electron Mi-

croscope. Then the authors ﬁt the contour of the

nanoparticles with a circle and measure the size based

on that ﬁt.

3.2 Deep Learning Applied to Object

Segmentation in Materials Science

and Mining

Detecting and segmenting particles and agglomerates

is a challenge for several areas, especially in materials

science, mining and metallurgy. In addition to particle

detection, performing the granulometric distribution

of these materials is often necessary.

The work by Bukharev et al. (Bukharev et al.,

2018) sought to develop a method for segmenting

mineral grains in thin-section images of sandstone.

The images were obtained using an optical micro-

scope, and the authors implemented the model for a

training sample of 9,000 instances. The task consisted

of segmenting quasi-convex objects without occlu-

sions. The authors implemented an algorithm based

on a cascade of two fully-convolution neural networks

(FCNN). The authors used 10-fold validation to test

the model’s quality. The AP metric was above 76%.

Bamford et al. (Bamford et al., 2021) evaluated

the size of rock fragments from blast operations in

mines using deep learning strategies. The dataset con-

sisted of labeled images of dynamited rock fragments,

61,853 for training and 1263 for validation. The au-

thors implemented a base architecture of ResNet50.

When evaluated using the test suite, the percent error

for coarse feature size prediction ranges within ±25%.

Wang et al. (Wang et al., 2020a) used transfer

learning and implemented the Mask R-CNN algo-

rithm to monitor open pit mines. The images came

from satellite shots. The authors used the ResNet101

network as a backbone.

Frei and Kruis (Frei and Kruis, 2021) introduced

the new FibreR-CNN architecture that combines two

established R-CNN architectures (Mask and Keypoint

R-CNN) to automate the task of analyzing ﬁber-

shaped materials. FibeR-CNN surpassed Mask R-

CNN’s AP metric by 11 percentage points.

3.3 Section Remarks

These works describe applications similar to the pro-

posal of this article, emphasizing that implementa-

tions based on deep learning are intensely used as

an alternative to conventional methods of computer

vision in materials science, mining, and metallurgy.

However, they differ in application focus or technique

compared to our work. In addition, the industrial en-

vironment consists of a complex scenario, and the im-

ages present variability, such as occlusion, complex

background, rotation, lighting changes, image resolu-

tion, and noise (de C Meira et al., 2022).

Towards a Novel Edge AI System for Particle Size Detection in Mineral Processing Plants

315

Validating models and applications as proof of

concept in these case studies is still challenging. Our

work, therefore, contributes to the advancement of ap-

plications in this area.

4 EXPERIMENTAL

METHODOLOGY

This section presents the experimental methodology

suggested for validating the convolutional neural al-

gorithm. For this, we present the dataset develop-

ment method, the model training strategy, and the ap-

plied evaluation metrics. A schematic diagram of the

methodology is presented in Figure 3.

Figure 3: Experimental Methodology Flowchart.

Figure 4: Primary Crusher System.

The study was developed from digital images ob-

tained from a Primary Crushing System showed in

Figure 4. The images originate from the primary jaw

crusher feed screen, whose ore fragments are in the

granulometric range between 200mm and 1000mm.

Figure 5 displays a ﬂowchart that represents this pro-

cess.

Figure 5: Primary Crusher System.

ICEIS 2023 - 25th International Conference on Enterprise Information Systems

316

4.1 Dataset

Particular attention was paid to dataset development.

As mentioned before, the large volume of data re-

quired for training a neural network usually has been

the cause of considerable difﬁculties in obtaining

models with high accuracy in the face of manually

labeling data.

The synthetic dataset for training the network was

developed from real images of the application, ob-

tained from videos recorded at the place of interest

and an algorithm

. We propose this approach as an

alternative to manually labeling data. The algorithm

randomly selected the objects from a deﬁned set of

foregrounds (iron ore particles) and overlaid them at

random positions in the background. During the over-

lay, rotation, scale, and brightness transformations are

applied to the objects.

Using this approach, from a set of 100 objects

(fragments) and two backgrounds, the algorithm gen-

erated 7,000 images for training and 3,000 for valida-

tion (Table 1). Figure 6 presents an example of the

synthetic image and its respective mask generated by

the algorithm.

Table 1: Dataset composition.

Number of

images

Annotated

regions

Training 7,000 31,406

Validation 3,000 13,379

Figure 6: Example of synthetic image and mask generated.

After composing the image, the algorithm gener-

ates the mask of the instantiated objects of each image

and the labelings in MS COCO

dataset format. This

technique makes it possible to generate a considerable

amount of synthetic images to be applied to the net-

work training and validation stages.

The algorithm mentioned is an adap-

tation from open-source code available at

https://github.com/akTwelve/cocosynth

Microsoft Common Objects in Context (MS COCO) is

a large-scale image dataset

4.2 Hyperparameters

The selection of hyperparameters holds a crucial sig-

niﬁcance in determining the efﬁcacy of the model.

The optimization of these values requires a compre-

hensive grasp of both the input data and the underly-

ing model architecture, as well as a clear comprehen-

sion of the desired outcome. The Mask R-CNN

was

implemented from the original repository available on

GitHub, and allow the customization of 48 hyperpa-

rameters. The Table2 summarizes the values applied.

Table 2: Hyperparameter values adjusted in Mask R-CNN.

Variable name in Matterport

Implementation

Train 1 and

Tain 2

Train 3 and

Train 4

LEARNING RATE 0.001 0.001

BATCH SIZE 1 1

STEPS PER EPOCH 100 100

VALIDATION STEPS 10 10

NUM CLASSES 2 2

IMAGE MIN DIM 320 320

IMAGE MAX DIM 320 320

BACKBONE resnet50 resnet101

TRAIN ROIS PER IMAGE 16 16

MAX GT INSTANCES 10 10

The main adjustments were:

• LEARNING RATE: The learning rate was kept at

its default value of 0.001. This value was deemed

suitable as it did not result in instability during

training, as demonstrated by the loss curves.

• BATCH SIZE: refers to the number of samples

utilized in a single training step. It was set to 1 to

accommodate the GPU’s memory constraints. A

small batch size typically requires less computa-

tional resources to complete an epoch but requires

a high number of epochs to converge.

• STEPS PER EPOCH: The number of steps per

epoch is typically determined by the size of the

dataset in relation to the batch size. In this case,

the dataset comprises 10,000 images and the batch

size is set to 1, resulting in a maximum of 10,000

steps per epoch. However, this is not a practi-

cal option as the loss value is only measured at

the end of each epoch, which makes it difﬁcult

to track the progress of the model’s training. To

overcome this challenge and effectively monitor

the loss, the number of steps per epoch was re-

duced to 100. This change required an increase

in the number of epochs to ensure that the model

had sufﬁcient training time to converge and reach

stability. With this adjustment, the model’s train-

ing progress could be monitored more effectively,

enabling ﬁne-tuning for optimal performance.

https://github.com/matterport/Mask RCNN

Towards a Novel Edge AI System for Particle Size Detection in Mineral Processing Plants

317

• IMAGE MIN DIM and IMAGE MAX DIM: re-

sponsible for controlling the resolution of the in-

put image. The default value is 1024x1024. It

was adjusted to 320x320 according to the dataset

created in MS COCO format.

• BACKBONE: By default, Mask R-CNN uses

ResNet101 as its backbone. For comparison pur-

poses, training was done using ResNet101 and

ResNet50. Since ResNet101 is deeper, it does

have higher memory consumption, longer pro-

cessing time as well as longer trainning time,

however it reaches higher accuracy. On the other

hand, ResNet50 has lower memory consumption,

shorter processing time and shorter trainning time,

hence its accuracy is not as higher as ResNet101.

• TRAIN ROIS PER IMAGE: corresponds to the

number of regions of interest generated per im-

age. The default value is 32. It was reduced to

16 as the dataset used in this study only contains a

maximum of 16 fragments per image. This adjust-

ment was made with the aim of decreasing train-

ing time.

• MAX GT INSTANCES: corresponds to the max-

imum number of instances that can be detected in

one image and was set to 10. If the number of in-

stances in the images are limited, this can be set

to maximum number of instances that can occur

in the image to reduce false positives and training

time.

4.3 Training the Deep Learning Model

Two training sessions were performed for each back-

bone conﬁguration (ResNet101 and ResNet50). The

initial training was done in 100 epochs (with 100

steps per epoch) using the transfer of learning weights

available in MS COCO. In this step, only the head lay-

ers were trained as a transfer learning strategy, keep-

ing the weights of the convolutional layers frozen.

After the initial training, a reﬁnement was per-

formed, training the model for another 100 epochs.

The ﬁne-tuning step performed the training of the en-

tire network, initializing with the weights obtained

from the initial training. Thus, the training of each

model was done in two steps, totaling 200 epochs,

with 100 steps per epoch. The hardware used was

an Intel Core i7-11800H@2.30GHz computer with

64GB RAM.

4.4 Evaluation Metrics

In order to generate a graphical representation of

the loss values, we used Tensorboard, a Tensorﬂow

toolkit that allows monitoring and visualization of

metrics such as loss accuracy. The Mask R-CNN uses

three different types of losses as deﬁned in (1). The

total loss (L) is calculated as the sum of the classiﬁ-

cation loss (L

cls

), bounding box loss (L

box

), and mask

loss (L

mask

) for each object instance in the input im-

age.

L = L

cls

+ L

box

+ L

mask

(1)

Where:

cls

= L

cls(RPN)

+ L

cls(mask)

box

= L

box(RPN)

+ L

box(mask)

mask

= L

mask

The mean Average Precision (mAP) metric was

used to calculate the model’s accuracy according to

(2), where N represents the number of classes. The

Average Precision (AP) corresponds to the area under

the Precision Vs. Recall curve and could be deﬁned

as (3). In turn, the Precision p(k) and Recall r(k) val-

ues are calculated according (4) and (5), respectively.

Precision measures the accuracy of predictions, while

Recall measures the ability of a model to ﬁnd all ob-

jects that should have been detected.

mAP =

∑

(2)

AP =

∑

k=1

p(k) [r(k) − r(k + 1)] (3)

p(k) =

T P

+ FP

(4)

r(k) =

T P

+ FN

(5)

The classiﬁcation of detections is deﬁned ac-

cording to the Intersection over Union (IoU ) met-

ric, presented in (6). If the intersection between the

algorithm-generated bounding box and the labeled

bounding box is greater than 0.5, the detection will

be classiﬁed as True Positive (T P). If it is less than

0.5, the detection will be classiﬁed as False Positive

(FP), and if there is no detection by the algorithm for

any labeled bounding box, it will be classiﬁed as False

Negative (FN).

IoU =

Area of Overlap

Area of Union

(6)

ICEIS 2023 - 25th International Conference on Enterprise Information Systems

318

5 RESULTS

The results of the training sessions were categorized

as shown in Table 3, where the highlighted values

indicate the lowest value for the selected loss. The

overall loss of the model using ResNet50 as its back-

bone is depicted in Figures 7 and 8 for the training

and validation phases, respectively. The model using

ResNet101 as its backbone is presented in Figures 9

and 10.

Table 3: Loss values obtained in the training sessions per-

formed.

LOSSES ResNet50 ResNet101

Acronym Variable Name

100th

epoch

200th

epoch

100th

epoch

200th

epoch

Training

L loss 0,710 0,551 0,814 0,541

box(mask)

mrcnn bbox loss 0,144 0,106 0,148 0,107

cls(mask)

mrcnn class loss 0,086 0,063 0,085 0,056

mask

mrcnn mask loss 0,200 0,160 0,178 0,173

box(RPN)

rpn bbox loss 0,269 0,212 0,384 0,197

cls(RPN)

rpn class loss 0,012 0,010 0,019 0,056

Validation

L val loss 0,845 0,521 0,806 0,682

box(mask)

val mrcnn bbox loss 0,119 0,087 0,154 0,100

cls(mask)

val mrcnn class loss 0,120 0,081 0,085 0,115

mask

val mrcnn mask loss 0,246 0,118 0,216 0,198

box(RPN)

val rpn bbox loss 0,354 0,227 0,335 0,262

cls(RPN)

val rpn class loss 0,007 0,008 0,016 0,008

The effectiveness of the model can be evaluated

by analyzing the various losses calculated during the

training and validation phases, as described below:

• L

cls(RPN)

(rpn class loss): refers to the loss in-

curred by the Region Proposal Network’s incor-

rect labeling of anchor boxes as having or not hav-

ing an object. A low value indicates a high level

of accuracy in the model’s object detection.

• L

cls(mask)

(mrcnn class loss): refers to the loss in-

curred due to incorrect labeling of objects within

the proposed region. A low value indicates that

the detected objects are being accurately classi-

ﬁed.

• L

box(RPN)

(rpn bbox loss): refers to the precision

of the RPN in placing the bounding box. A low

value indicates that the bounding box is accurately

positioned on the detected object.

• L

box(mask)

(mrcnn bbox loss): refers to the loss as-

signed to the placement of the bounding box for

a class, as determined by the mask branch. A

low value implies a high degree of accuracy in the

bounding box’s positioning.

• L

mask

(mrcnn mask loss): refers how precisely the

masks outline the objects at the pixel level. A

low value indicates that the mask accurately ﬁts

around the object’s boundaries.

Figure 7: ResNet50 Backbone - overall training loss.

Figure 8: ResNet50 Backbone - overall validation loss.

Figure 9: ResNet101 Backbone - overall training loss.

Figure 10: ResNet101 Backbone - overall validation loss.

Towards a Novel Edge AI System for Particle Size Detection in Mineral Processing Plants

319

Figure 11: Segmentation performed by Mask R-CNN with ResNet50 and ResNet101.

The loss results presented in Table 3 show that

both models are nearly equivalent. Figures from 7 to

10 indicate that the total losses in the training and val-

idation phases converged without any signs of overﬁt-

ting or underﬁtting. Additionally, a reduction of ap-

proximately 20% in total losses was observed after

the 100th epoch, when all layers of the model began

training.

The model’s performance was evaluated using the

mAP metric with an IoU of 0.5, computed from a

dataset of 100 images. The results are presented in

Table 4 for the model using ResNet50 as a backbone,

and Table 5 for the model using ResNet101 as a back-

bone.

The mAP scores achieved by both models were

similarly close, with the best result achieved by the

model using ResNet50 as its backbone in the 200th

epoch.

An example of model prediction tests, conducted

at different training times and using actual images,

is presented in Figure 11. Despite the losses of

both models being very similar, the tests on actual

images revealed that the model utilizing ResNet101

as a backbone exhibited superior mask adjustment,

fewer false positives, and the ability to detect more

fragments. However, as shown in Figure 12, the

ResNet101 model has an average processing time

of around 350ms per image, whereas the ResNet50

model takes less than 300ms on average, resulting in

detection times that are 20% faster.

Souza (Souza, 2020) used several deep learning

methods, including SSD, Faster R-CNN, YOLOv3,

and U-Net, to detect and segment iron ore fragments.

Among these, U-Net achieved the best results, with a

total loss of less than 0.5 and an mAP metric close

to 0.92. However, our work demonstrated that the

Table 4: mAP metric obtained for model using ResNet50.

Epoch mAP

Precision

p(k)

Recall

r(k)

TP FP FN

0.951 0.82 0.95 437 95 23

100

0.947 0.90 0.93 428 45 30

150

0.960 0.91 0.95 438 43 21

200

0.962 0.92 0.96 438 38 20

Table 5: mAP metric obtained for model using ResNet101.

Epoch mAP

Precision

p(k)

Recall

r(k)

TP FP FN

0.846 0.85 0.84 383 68 75

100

0.917 0.91 0.91 415 39 43

150

0.960 0.90 0.95 437 48 21

200

0.961 0.93 0.95 436 33 22

Mask R-CNN algorithm outperformed the others on

the same dataset, achieving an mAP of 0.96. Figures

13a and 13b show the prediction results of U-Net and

our model, respectively. The superiority of Mask R-

CNN in bounding box accuracy and the segmentation

of iron ore particles is apparent.

Figure 12: Mean processing time per image.

ICEIS 2023 - 25th International Conference on Enterprise Information Systems

320

(a) Detection performed by Souza (Souza, 2020) using U-Net (b) Detection performed by Mask R-CNN with ResNet101

Figure 13: Comparative evaluation of results in real images.

6 CONCLUSIONS AND FUTURE

WORK

This study is a starting point towards a novel Edge

AI architecture based on cloudlets to perform particle

size detection in mineral processing plants. We pro-

posed this architecture based on the common comput-

ing infrastructure present in these plants and validated

the algorithm that performed this task.

Edge AI comes from the convergence of edge

computing methods and AI algorithms. Edge comput-

ing is usually more resource restrained than clouds,

while the latest AI algorithms are usually “resource-

hungry”. Nonetheless, there is an increasing interest

in developing solutions combining these concepts to

solve real-world problems.

We proposed using a Mask R-CNN algorithm to

detect and segment iron ore particles from Primary

Crusher System images. The method and results re-

vealed promising aspects for advancing of practical

applications for online particle size estimation in min-

eral processing plants. In a ﬁrst analysis, we can high-

light some relevant points: the synthetic dataset and

Mask R-CNN’s accuracy.

The proposed dataset’s development proved fea-

sible in two aspects: agility in development and ca-

pacity dataset augmentation for training. The most

signiﬁcant development of agility refers to traditional

image labelling processes, which are too exhaustive

in the face of a large amount of data necessary for

training convolutional networks.

The use of synthetic datasets in neural network

training has several advantages:

• Data Availability: synthetic datasets can be gener-

ated easily and in large quantities, even for tasks

where collecting real-world data can be challeng-

ing or expensive.

• Data Diversity: synthetic datasets can be gener-

ated to cover a wide range of scenarios, allowing

the model to generalize better to unseen data.

• Data Annotation: synthetic datasets come with

ground truth annotations, which makes it easier

to train the model and evaluate its performance.

• Data Consistency: the annotations in synthetic

datasets are consistent and free of human error,

making it easier to evaluate the model’s perfor-

mance objectively.

• Data Augmentation: synthetic datasets can be eas-

ily augmented with different variations, such as

lighting conditions, background noise, and object

deformations, allowing the model to learn from a

wider range of scenarios.

• Data Privacy: the use of synthetic datasets can

protect sensitive information, such as personal

data, that may be present in real-world datasets.

These strategy allowed generating a dataset with

10,000 synthetic images from 102 authentic images

in less than 6 hours. The results measured by mAP

metric as well as the tests on real images, indicates the

feasibility of training the model by a synthetic dataset.

The accuracy of applying the Mask R-CNN con-

sidering the mAP as a metric was 0.96. We can con-

sider a satisfactory result, considering the network’s

training in a 100% synthetic dataset. The difference

in the results presented by the evaluated backbones

(ResNet50 and ResNet101) was not signiﬁcant in this

study.

To further our research, we plan to develop a new

synthetic dataset for training, considering the classi-

ﬁcation of particles in different granulometric ranges.

The objective of this new dataset is to enable the Mask

R-CNN to perform granulometric classiﬁcation with-

out additional processing. We hope that further tests

will conﬁrm our ﬁndings.

Towards a Novel Edge AI System for Particle Size Detection in Mineral Processing Plants

321

Future work will investigate the software perfor-

mance evaluation to establish the processing and stor-

age hardware requirements for edge computing archi-

tecture. Further aspects of this approach, such as in-

tegration with the process control system, depend on

the evolution of future steps mentioned.

ACKNOWLEDGMENTS

The authors would like to thank CAPES, Fapemig,

CNPq, and the Federal University of Ouro Preto for

supporting this work. Also, the authors would like to

thank Vale S/A for enabling the creation of a dataset

with real images.

This study was ﬁnanced in part by the

Coordenac¸

ao de Aperfeic¸oamento de Pessoal de

ıvel Superior - Brasil (CAPES) - Finance Code

001, the Conselho Nacional de Desenvolvimento

Cient

ıﬁco e Tecnol

ogico (CNPQ), the Instituto

Tecnol

ogico Vale (ITV) and the Universidade Federal

de Ouro Preto (UFOP).

REFERENCES

Al-Thyabat, S., Miles, N., and Koh, T. (2007). Estimation

of the size distribution of particles moving on a con-

veyor belt. Minerals Engineering, 20(1):72–83.

Allen, T. (2013). Particle size measurement. Springer.

Bamford, T., Esmaeili, K., and Schoellig, A. P. (2021). A

deep learning approach for rock fragmentation anal-

ysis. International Journal of Rock Mechanics and

Mining Sciences, 145:104839.

Bukharev, A., Budennyy, S., Lokhanova, O., Belozerov, B.,

and Zhukovskaya, E. (2018). The task of instance seg-

mentation of mineral grains in digital images of rock

samples (thin sections). In 2018 International Confer-

ence on Artiﬁcial Intelligence Applications and Inno-

vations (IC-AIAI), pages 18–23. IEEE.

Coetzee, L. C., Craig, I. K., and Kerrigan, E. C. (2009).

Robust nonlinear model predictive control of a run-of-

mine ore milling circuit. IEEE Transactions on con-

trol systems technology, 18(1):222–229.

de C Meira, N. F., Silva, M. C., Vieira, C. B., Souza, A.,

and Oliveira, R. A. (2022). Edge deep learning to-

wards the metallurgical industry: Improving the hy-

brid pelletized sinter (hps) process. In International

Conference on Enterprise Information Systems, pages

149–167. Springer.

Deng, S., Zhao, H., Fang, W., Yin, J., Dustdar, S., and

Zomaya, A. Y. (2020). Edge intelligence: The con-

ﬂuence of edge computing and artiﬁcial intelligence.

IEEE Internet of Things Journal, 7(8):7457–7469.

Dong, L., Wang, H., Song, W., Xia, J., and Liu, T. (2021).

Deep sea nodule mineral image segmentation algo-

rithm based on mask r-cnn. In ACM Turing Award

Celebration Conference-China (ACM TURC 2021),

pages 278–284.

El Naqa, I. and Murphy, M. J. (2015). What is machine

learning? In machine learning in radiation oncology,

pages 3–11. Springer.

Frei, M. and Kruis, F. E. (2021). Fiber-cnn: Expand-

ing mask r-cnn to improve image-based ﬁber analysis.

Powder Technology, 377:974–991.

Girshick, R. (2015). Fast r-cnn. cite

arxiv:1504.08083Comment: To appear in ICCV

2015.

Guyot, O., Monredon, T., LaRosa, D., and Broussaud, A.

(2004). Visiorock, an integrated vision technology for

advanced control of comminution circuits. Minerals

Engineering, 17(11):1227–1235. Communition ’04.

He, K., Gkioxari, G., Doll

ar, P., and Girshick, R. (2017).

Mask r-cnn. cite arxiv:1703.06870Comment: open

source; appendix on more results.

He, K., Zhang, X., Ren, S., and Sun, J. (2015).

Deep residual learning for image recognition. cite

arxiv:1512.03385Comment: Tech report.

Huang, H., Luo, J., Tutumluer, E., Hart, J. M., and Stolba,

A. J. (2020). Automated segmentation and morpho-

logical analyses of stockpile aggregate images using

deep convolutional neural networks. Transportation

Research Record, 2674(10):285–298.

Iyas, M. R., Setiawan, N. I., and Warmada, I. W. (2020).

Mask r-cnn for rock-forming minerals identiﬁcation

on petrography, case study at monterado, west kali-

mantan. In E3S Web of Conferences, volume 200,

page 06007. EDP Sciences.

Khan, W. Z., Ahmed, E., Hakak, S., Yaqoob, I., and Ahmed,

A. (2019). Edge computing: A survey. Future Gener-

ation Computer Systems, 97:219–235.

LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learn-

ing. nature, 521(7553):436–444.

Li, E., Zeng, L., Zhou, Z., and Chen, X. (2019). Edge

ai: On-demand accelerating deep neural network in-

ference via edge computing. IEEE Transactions on

Wireless Communications, 19(1):447–457.

Lin, T.-Y., Doll

ar, P., Girshick, R., He, K., Hariharan, B.,

and Belongie, S. (2016). Feature pyramid networks

for object detection. cite arxiv:1612.03144.

Long, J., Shelhamer, E., and Darrell, T. (2014). Fully

convolutional networks for semantic segmentation.

cite arxiv:1411.4038Comment: to appear in CVPR

(2015).

Maxwell, A. E., Pourmohammadi, P., and Poyner, J. D.

(2020). Mapping the topographic features of mining-

related valley ﬁlls using mask r-cnn deep learning and

digital elevation data. Remote Sensing, 12(3):547.

McCarthy, J. (2007). What is artiﬁcial intelligence.

Okada, N., Maekawa, Y., Owada, N., Haga, K., Shibayama,

A., and Kawamura, Y. (2020). Automated identiﬁca-

tion of mineral types and grain size using hyperspec-

tral imaging and deep learning for mineral processing.

Minerals, 10(9).

Ren, S., He, K., Girshick, R., and Sun, J. (2015). Faster

r-cnn: Towards real-time object detection with region

ICEIS 2023 - 25th International Conference on Enterprise Information Systems

322

proposal networks. cite arxiv:1506.01497Comment:

Extended tech report.

Shi, W. and Dustdar, S. (2016). The promise of edge com-

puting. Computer, 49(5):78–81.

Shinde, P. P. and Shah, S. (2018). A review of ma-

chine learning and deep learning applications. In

2018 Fourth international conference on computing

communication control and automation (ICCUBEA),

pages 1–6. IEEE.

Souza, L. E. R. (2020). Medic¸

ao de granulometria de

min

erio de ferro atrav

es de imagens em circuito de

britagem prim

aria. 2020. Master’s thesis, Escola de

Minas, Universidade Federal de Ouro Preto, Ouro

Preto.

Thurley, M. J. and Ng, K. C. (2008). Identiﬁcation and

sizing of the entirely visible rocks from a 3d surface

data segmentation of laboratory rock piles. Computer

Vision and Image Understanding, 111(2):170–178.

Wang, C., Chang, L., Zhao, L., and Niu, R. (2020a). Auto-

matic identiﬁcation and dynamic monitoring of open-

pit mines based on improved mask r-cnn and transfer

learning. Remote Sensing, 12(21):3474.

Wang, S., Tuor, T., Salonidis, T., Leung, K. K., Makaya, C.,

He, T., and Chan, K. (2019). Adaptive federated learn-

ing in resource constrained edge computing systems.

IEEE Journal on Selected Areas in Communications,

37(6):1205–1221.

Wang, X., Han, Y., Leung, V. C., Niyato, D., Yan, X., and

Chen, X. (2020b). Edge AI: Convergence of edge com-

puting and artiﬁcial intelligence. Springer.

Wills, B. A. and Finch, J. (2015). Wills’ mineral process-

ing technology: an introduction to the practical as-

pects of ore treatment and mineral recovery, volume 8.

Butterworth-Heinemann.

Yang, D., Wang, X., Zhang, H., Yin, Z.-y., Su, D., and Xu,

J. (2021). A mask r-cnn based particle identiﬁcation

for quantitative shape evaluation of granular materials.

Powder Technology, 392:296–305.

Zhang, F., Zhang, Q., Xiao, Z., Wu, J., and Liu, Y.

(2019). Spherical nanoparticle parameter measure-

ment method based on mask r-cnn segmentation and

edge ﬁtting. In Proceedings of the 2019 8th interna-

tional conference on computing and pattern recogni-

tion, pages 205–212.

Towards a Novel Edge AI System for Particle Size Detection in Mineral Processing Plants

323