Application of Artificial Intelligence in Microwave Radiometry

(MWR)

Christoforos Galazis

, Sergey Vesnin

and Igor Goryanin

1,2

University of Edinburgh, Edinburgh, U.K.

Okinawa Institute of Science and Technology, Okinawa, Japan

Medical Microwave Radiometry Ltd., U.K.

Keywords: Microwave Radiometry, Breast Cancer, Diagnostic System, Artificial Intelligence, Machine Learning,

Neural Network, Cascade Correlation Neural Network, Convolutional Neural Network, Random Forest,

Support Vector Machine.

Abstract: Microwave radiometry is being developed more actively in recent years for medical applications. One such

application is for diagnosis or monitoring of cancer. Medical radiometry presents a strong alternative to

other methods of diagnosis, especially with recent gains in its accuracy. In addition, it is safe to use, non-

invasive and has a relative low cost of use. Temperature readings were taking from the mammary glands for

the purpose of detecting cancer and evaluating the effectiveness of radiometry. Building a diagnostic system

to automate classification of new samples requires an adequate machine learning model. Such models that

were explored were random forest, XGBoost, k-nearest neighbors, support vector machines, variants of

cascade correlation neural network, deep neural network and convolution neural network. From all these

models evaluated, the best performing on the test set was the deep neural network with a significant

difference from the rest.

1 INTRODUCTION

Microwave radiometry has seen in recent years in-

creased usage and interest for further development

and research within medical applications (Vesnin et

al., 2017). This has resulted in a significant

improvement of the system’s accuracy in taking

internal temperature measurements. One of its main

applications is for cancer detection and monitoring,

such as breast cancer which will be the focus of this

paper.

However, while gaining momentum in its

utilization it is still not widely adopted. Main reason

for this can be attributed to the fact that it has

recently being adopted for medical use and so

medical or clinical professionals have not yet

received adequate training to interpret the

information. However, this leads to the initial issue, if

the professionals are unable to use the system then

they will be more hesitant to acquire them.

The deadlock can be resolved with the

introduction of an automated diagnostic systems

which will extract useful information from the

readings and offer a diagnostic prediction. For this

paper, the focus will be in evaluating how effective

such data alone can be used for diagnosis of cancer,

using data collected from mammary glands.

Additionally, a furtherscope is to contribute into

determining an ideal machine learning algorithm for

such a task.

The paper will start off with a brief development

history of radiometry within the medical field, what

it captures and why it is an attractive system to be

used for cancer detection and monitoring. Following,

a description of the data set will be provided which

includes readings from radiometry of the mammary

glands for cancer detection. In addition, any pre-

processing that was conducted on the data will be

documented. After the setup information has been

provided, the description and results of various non-

neural network and neural network models will be

presented on classifying low or high risk of presence

of breast cancer. Finally, the paper will finish off with

the conclusions and some possible future work.

112

Galazis, C., Vesnin, S. and Goryanin, I.

Application of Artiﬁcial Intelligence in Microwave Radiometry (MWR).

DOI: 10.5220/0007567901120122

In Proceedings of the 12th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2019), pages 112-122

ISBN: 978-989-758-353-7

2 MICROWAVE RADIOMETRY

Microwave radiometry’s started from the theoretical

research of James Clerk Maxwell and experimentally

verified by Heinrich Hertz, with development of the

first radars in the 1930s (Skolnik, 2018). Later, dur-

ing WWII Robert invented a kind of radio receiver,

was known as ”Dicke Radiometric Receiver” or just

”Dicke Radiometer”. His radiometer used a switch-

able resistor, the ”Dickle Resistor”, as a technique to

allow for noise temperature calibration (Dicke, 1982).

However, only later on, from 1970s, such

technology was first applied for medical and clinical

us- age (Myers et al., 1979; Bolomey et al., 1982;

Peronnet et al., 1983; Pichot et al., 1985). But there

was no significant investment until the late 1990s,

which gain a lot more interest from the medical sci-

entific community (Conceicao et al., 2016). Since

then it has found applications for detecting or mon-

itoring breast cancer (Vesnin et al., 2017), thermal

denaturation of albumin (Ivanov et al., 2018), carotid

artery diseases (Drakopoulou et al., 2018), brown adi-

pose tissue activity (Crandall et al., 2018), rheuma-

toid arthritis (Pentazos et al., 2018), inflammation lev-

els in joints (Laskari et al., 2018), brain temperature

(Rodrigues et al., 2018) and transcapillary water ex-

change in the lungs (Bondar et al., 2017).

It is able to capture the temperature at the skin or

at a depth from the surface, which is particular use-

ful for diagnosing and monitoring treatment progress

of cancerous tumors (Vesnin et al., 2017). It achieves

this by measuring the electromagnetic radiation omit-

ted by the tissues in the microwave range (Vesnin

et al., 2017). The measurement obtained depends on

the variation of the properties of the various biologi-

cal tissues (Semenov, 2009). In turn, these properties

are impacted by the level of water found in the tissue,

with a significant difference between muscle, which is

high in water levels, and fat and bone, which have low

levels (Gabriel et al., 1996a; Gabriel et al., 1996b). In

addition, both physiological and pathological condi-

tions can alter the levels of dielectric properties of the

tissues (Semenov, 2009).

Specifically for cancer, it has been found that

tumors emit heat which is connected to their growth

rate (Gautherie, 1980). As the tumorous cells grow,

they replicate themselves at a much higher rate

leading to the release of higher amounts of energy

compared to neighboring healthy cells. The tumors’

ability to cre- ate new vasculature will determine its

maximum vol- ume (Schneider and Miller, 2005). At

such a stage, cell growth and cell death rate reach an

equilibrium. However, when the growth slows down

this will result to near normal temperature readings

making cancer detection more difficult for such

cases (Vesnin et al., 2017).

It is an attractive complementary technique to

other methods of diagnosis of cancer, such as mam-

mography or biopsy (Vesnin et al., 2017). The main

reason stated by Vesnin et al. is that advances in the

system have allowed it to achieve high sensitiv- ity

and specificity in cancer detection. Also, it is a

non-iodizing safe method, noninvasive, results are

obtained quickly and has a low cost (Vesnin et al.,

2017; Semenov, 2009). Hence, this enables it to be

used at any frequency, for any age group and by

some- one during pregnancy or lactation. In addition,

it can also occupy a supportive role in decision

making for professionals as for it adds information

not obtainable from other methods. Such information

is the thermal activity of the tissue, the rate which

cancerous cells multiply and the level of risk for

mutagenesis (Vesnin et al., 2017).

3 DATA SET

3.1 Description

To conduct the evaluation of microwave radiometry

for its effectiveness in detecting cancer a data set

compromising temperature values from mammary

glands was used. The values were recorded using the

RTM-01-RES (www.mmwr.co.uk) device from vari-

ous medical centers (Zenovich et al., 2016). The de-

vice captures temperature readings at nine different

locations on each gland, one at the nipple (defined

as point 0) and the rest equidistant around the nip-

ple (points 1 to 8), plus at the axillary region (point

9). In addition, two more locations where captured at

the lower chest (defined as points T1 and T2), as ref-

erences to normalize ambient temperature variations.

For each of these points, the temperature was mea-

sured at the skin and at a depth from the skin of 5cm.

A graphical representation of the capture points can

be seen in Figure 1.

In total, there are 363 pairs of mammary glands of

which 77 are classified as healthy or low risk (labeled

as class 0) and 286 classified as potentially cancer-

ous or high risk (labeled as class 1). For each sam-

ple, to be classified as low risk both glands must be

considered as healthy and for it to be considered as

high risk then at least one must be of high risk. In-

dividual glands compromise of 319 low risk and 407

as high risk, which consist of 13 as diffused cancer,

185 as nodal cancer, 119 as diffuse changes with no

presence of cancer and 90 as nodal changes with no

presence of cancer. All the following experiments had

Application of Artiﬁcial Intelligence in Microwave Radiometry (MWR)

113

the data class balanced split into three sets, training,

validation and test set, with allocated percentage of

60% (low risk: 46 and high risk:171), 20% (low risk:

15 and high risk:57) and 20% (low risk: 16 and high

risk:58) respectively.

Figure 1: Sampling points on each mammary gland (0-8) in-

cluding the axillary point (9). Points T1 and T2 are used as

reference values when normalizing the values against ambi-

ent temperature (Zenovich et al., 2016).

3.2 Ambient Temperature

Normalization

Having collected the data at multiple locations and at

different times the ambient temperature varies for

each sample in which the values will not be directly

comparable. Typically, the measurements were taken

under temperatures ranging from 20 to 27 degrees

Celsius. When analyzing temperature values for

prediction systems breast size, age and external

conditions that can impact the results must be taken

into account (Anisimova, 2013; Kobrinskiy, 2008).

Hence, a previous research (Losev and Lvshinskiy,

2015) that used the same data set proposed and

evaluated a normalization algorithm to overcome this

issue. The algorithm was defined as such (Losev and

Lvshinskiy, 2015):

For every point t

captured, plot their values

against one of the control temperature points T

where i = 0...9, c ∈ {1, 2}, j ∈ {skin, depth} and

d = 0...n  1, with n the total number of samples:

On the plotted graph between temperature

points and one of the reference values, we use linear

re- gression to find a and b such that the error is mini-

mized through least square fit method on the func-

tion:

∗

(1)

Calculate the average value of the temperature

point such that:

(2)

Update the temperature points:

d,i, j

= t

d,i, j

+ a∗ (Tavg

c, j

T

c,d, j

) (3)

Replace the control points with the average

value found:

= Tavg

, (4)

for d = 0...n  1

Losev and Lvshinskiy showed that when apply-

ing their proposed algorithm, it improved the speci-

ficity and sensitivity of a regression prediction model.

There was a strong linear correlation coefficient be-

tween all points 0-9 against either of the two refer-

ence points as the temperature increased. While they

showed comparing against both reference points re-

sulted in improvement in predictions, using reference

T2 obtained slightly better performance of overall

4%. Therefore, for the following experiments evaluat-

ing various prediction models this normalization algo-

rithm was applied against reference point T2. In turn,

this allowed the removal of the two control points as

for all samples would have the same value.

3.3 Oversampling

As described in section 3.1, the data set is heavily im-

balanced towards the high risk class with a total of 77

against 286 samples. Consequently, this introduces a

bias towards the higher proportion samples in which

most machine learning algorithms will favor when

classifying (Krawczyk, 2016). Some algorithms can

handle this imbalance by introducing sample weights

giving more importance to the least represented class

or by introducing an appropriate metric (He and Ma,

2013). While both of these techniques were used in

the experiments where applicable, applying oversam-

pling (He and Ma, 2013) guarantees consistency be-

tween the various algorithms.

The techniques explored were random re-

sampling, Synthetic Minority Over-Sampling Tech-

nique (SMOTE) with regular, borderline 1, border-

line 2 and Support vector Machine (SVM) variations

(Chawla et al., 2002; Han et al., 2005) and Adaptive

Synthetic (ADASYN) (He et al., 2008). The over-

sampling techniques were compared using a random

forest (Breiman, 2001) from the scikit-learn library

(Pedregosa et al., 2011) having set a sample weight

BIOINFORMATICS 2019 - 10th International Conference on Bioinformatics Models, Methods and Algorithms

114

importance to handle imbalance. Additionally, for

each case the tree was optimized using the hyperopt

(Bergstra et al., 2015) library with the tree of Parzen

(TPE) (Bergstra et al., 2011) optimizer and weighted

Geometric Mean (G-mean) loss (Kubat and Matwin,

1997; Barandela et al., 2003) as the loss function to

minimize on. Using both weight balance and G-Mean

loss means that it eliminates the need for oversam-

pling. However, we want to evaluate whether over-

sampling is equivalent and interchangeable with these

techniques and does not negatively impact the results.

The results of the various oversampling techniques

are summarized in Table 1. The main metric used for

comparison is G-mean loss then sensitivity and

specificity and lastly accuracy. Improvements against

no oversampling based on the loss function is

observed for SMOTE with all variations but SVM.

However, with borderline 1 variation one can observe

a significant improvement especially with specificity

without sacrificing significantly the model’s sensitiv-

ity. It obtained a G-mean loss value of 0.3268, sen-

sitivity of 0.8621, specificity of 0.5 and accuracy of

0.7838. Hence, for the model evaluations the low risk

class of the training set was oversampled using

SMOTE borderline 1. Oversampling was used until

the low risk class had the same number of samples

of that of the high risk, that is 286.

4 MODEL EVALUATIONS

4.1 Non-neural Network Models

Non-neural network models are still a vital alternative

to neural network ones and can set a good baseline for

future models (Wilkins et al., 1996; Lim et al., 2000).

Non-neural networks models usually can train their

weights with much less time than compared to their

counterparts. Also, this leads to requiring fewer com-

putational resources, allowing them to be trained on

personal machines. Lastly, they require less hyperpa-

rameter tuning and setup time and do not require an

architecture to be designed specifically for the prob-

lem, making them production-ready sooner. While

having all these benefits, the results can also be in

par with what is obtained from neural networks but

depends on the complexity of the problem at hand.

Additionally, the best non-neural network model can

act as a base line for comparison of various network

architectures.

The models evaluated were Random Forest (RF),

XGBoost (Chen and Guestrin, 2016), K-Nearest

Neighbors (K-NN) (Cover and Hart, 2006), Support

Vector Machine (SVM) with linear kernel and radial

basis function (RBF) (Burges, 1998; Cortes and Vap-

nik, 1995). The algorithms were obtained from the

scikit-learn library, apart from XGBoost which was

obtained from its own library (Chen and Guestrin,

2016). Each of these models, their optimal hapyer-

parameters were determined through the usage of the

hyperopt library with the TPE optimizer. Addition-

ally, the loss function used to minimize the error on

was the weighted G-mean loss.

From the models evaluated, the top performer

based on the lowest achieved weighted G-mean loss

value is XGBoost with a value of 0.3994. It also

obtained a decent sensitivity of 0.7069 but just 0.5 on

specificity and a biased accuracy of 0.6622.

Following came SVM with linear kernel obtaining a

weighted G-mean loss of 0.4241 and closely in third

RF with 0.4281. K-NN and SVM with RBF obtained

significantly worse results with a weighted G-mean

loss of 0.4829 and 0.5687 respectfully. The full

summary of the results on the test set are shown in

Table 2.

4.2 Neural Network Models

Here the neural networks will no longer directly use

the weighted G-mean loss function to optimize the

parameters on but instead use a categorical cross en-

tropy function (de Boer et al., 2005) to measure the

error of the network. The weighted G-mean loss was

not used because it is not possible to obtain a dif-

ferentiable global G-mean loss on batch operations.

However, to be able to compare the results to that

obtained in the non-neural networks a non-weighted

G-mean batch-wise loss function was applied. Also,

to be able to obtain a respective weighted loss value

from the batch-wise loss function class weight balanc-

ing was preferred over oversampling. Additionally, it

was used as an early stopping criteria on the valida-

tion set. Its implementation is the same as that of a

normal G-mean loss function and was executed at the

end of each batch, which was set to a size of 50 sam-

ples, during training and obtained the average at the

end of each epoch.

By using categorical cross entropy, the class labels

were transformed to binary values by applying one-

hot encoding. Hence, the classes were represented as

vectors with the low risk class (0) as (1, 0) and the

high risk (1) as (0, 1). Additionally, the loss function

assumes that the passed input represents the probabil-

ity for each encoding to be true. That is, it expects a

vector which sums to 1 and each individual value is

within [0, 1]. For the network to oblige by this con-

straint, the output layer’s activation function used was

a softmax function (Bishop, 2006) which from a vec-

Application of Artiﬁcial Intelligence in Microwave Radiometry (MWR)

115

Table 1: Summary of the results on the test set of a random forest classifier when using oversampling on the least

represented class (low risk) in the data set so it becomes balanced.

Oversampling G-Mean Loss Accuracy Sensitivity Specificity

No oversampling 0.3894 0.7702 0.8793 0.375

Random 0.3994 0.6622 0.7069 0.5

SMOTE regular 0.3749 0.6622 0.6897 0.5625

SMOTE borderline1 0.3268 0.7838 0.8621 0.5

SMOTE borderline2 0.3693 0.7568 0.8448 0.4375

SMOTE SVM 0.4126 0.7297 0.8276 0.375

ADASYN 0.401 0.7027 0.7759 0.4375

Table 2: Summary of the results on the test set for the non-neural network models.

Model G-Mean Loss Accuracy Sensitivity Specificity

RF 0.4281 0.7027 0.7931 0.375

XGBoost 0.3994 0.6622 0.7069 0.5

K-NN 0.4829 0.527 0.5345 0.5

SVM Linear Kernel 0.4241 0.6216 0.6551 0.5

SVM RBF Kernel 0.5687 0.7432 0.7826 0.0625

tor of real number outputs a probability distribution.

The optimal hyperparameters were found through the

usage of grid search based on the validation results

of the G-mean loss. Additionally, the architecture

which includes the number of layers and neurons,

activation functions, optimizers and regularization

methods were determined through experimentation

with a variety of combinations. All the networks

were implemented using Keras (Chollet et al., 2015)

with Tensorflow (Abadi et al., 2016) backend.

4.2.1 Cascade Correlation Neural Network

On this specific data set, the best performing diagnos-

tic model concluded from a variety of models from a

previous research (Zenovich et al., 2016) was a Cas-

cade Correlation Neural Network (CCNN) (Fahlman

and Lebiere, 1990). Subsequent goal of this paper is

to further explore and improve the CCNN. For the

evaluation, the previous network will be re-

implemented so results are comparable. This model

will be distinguished as the base CCNN model. Fur-

ther, taking advantage of the previously positive re-

sults, another two variations are being proposed in

this paper and are defined as improved and extended

CCNN models, in an attempt to further refine the re-

sults.

The CCNN was proposed by Fahlman and Lebiere

(1990) as an approach that is not only limited to tun-

ing the parameters of the network but also dynami-

cally determining the optimal architecture, constraint

to the number of hidden layers. The network initial

consists of a fully connected input and output layers,

which their size is defined by the problem. Then the

algorithm executes these following steps until conver-

gence:

• All weights of units connected to the output

layer are trained until the minimum error is

reached

• A pool of candidate units are generated which

• have as input the output of all previously

added layers excluding the output layer’s

• These candidate units are trained such that

their

• output maximizes the correlation coefficient

be- tween the residual error of the network

• The candidate that has the maximum

correlation

• is selected to be added to the network. Its

input weights are frozen and its output is

connected with the output layer

The network continues this iterative process until the

addition of a unit does not lead to a smaller error than

the previous execution.

The base CCNN model reflects closely the ini-

tially proposed algorithm (Fahlman and Lebiere,

1990) with some minor changes over and above those

mentioned in section 4.2. The hidden and candi-

date units used the sigmoid function as their activa-

tion function. Additionally, the weights were initial-

ized randomly from a normal distribution which had

a mean of 0 and standard deviation of 0.5 and the

bias was set to 0. After every loop the weights of the

BIOINFORMATICS 2019 - 10th International Conference on Bioinformatics Models, Methods and Algorithms

116

output layer were reinitialized to avoid being stuck at

bad local minimums. Furthermore, the optimiza-

tion function used was Stochastic Gradient Descent

(SGD) (Bottou, 2010). Its learning rate was set to

0.00001 and 0.000005 for the output and hidden can-

didate layers respectively. Noting that in the previous

research (Zenovich et al., 2016) the authors used Sim-

ulated Annealing, but SGD was preferred hoping for

better generalization. Finally, the candidate pool size

was set to 16 and each candidate layer had two units,

the same as the output layer.

For the proposed improved model, only the

differences from the base one will be noted. The

focus of this CCNN model is to utilize more recent

techniques to improve performance. Firstly, the

weight initialization scheme was changed from

random distribution to Xavier (Glorot and Bengio,

2010) sampling from a normal distribution. Also, the

optimizer was changed to Adam (Kingma and Ba,

2014) as a further improvement to SGD. Its learning

rate was set to 0.00001 and 0.000005 for the output

and candidate layers respectively. For both cases,

the decay of first-order gradient to 0.9, decay of

second-order gradient to 0.99 and a small epsilon of

1e-08. Addition- ally, the activation functions of the

hidden layers were changed to Rectified Linear Units

(ReLUs) (Nair and Hinton, 2010). Lastly, for the

output layer warm-start weight initialization was

added to carry over weights that contributed the most

to lowering the loss value.

The extended model, building from the improved

model, focused on further expanding the capabilities

of dynamically constructing the architecture by also

introducing regularization layers to the pool in an at-

tempt to improve generalization. The hidden candi-

date layer was changed to have the following format

and strict order:

• Gaussian noise layer with a mean of 0 and a

standard deviation of 0.5

• Dense layer (original unit)

• Batch normalization layer (Ioffe and Szegedy,

2015) with momentum at 0.99, epsilon at

0.00001 and a trainable beta value

• Dropout layer (Srivastava et al., 2014), which

randomly drops one of the two units

The candidate pool consisted twice of all possible

combinations of the regularization layers, while

strictly maintaining the order presented. Thus, the

total pool size was maintained to 16 with only two

candidate layers being the same in comparison to all

16 in the two previous models.

Based on the G-mean loss value on the test set

the best performer from the CCNNs was the im-

proved variation with a value of 0.5417, accuracy of

0.5541, sensitivity of 0.6207 and specificity of 0.375.

A marginal difference followed the extended model

with G-mean loss of 0.5495 and lastly, with signifi-

cantly worse results, the base model with a value of

0.5889. The full summary of the results on the test set

can be found in Table 3.

On the validation results there is a significant point

to note out. The G-mean loss value obtained by the

models on the validation set were 0.3512, 0.2677,

0.1578 for the base, improved and extended models

respectively. The extended model was able to extract

more information from the training set to improve its

score on that of the validation. However, having

nearly the same score as the top performer, there was

at least no loss of information compared to the

improved model, but the recognition of patterns that

were useful on the validation set were not so for the

test set. This is possibly due to the fact that the data

set contains considerable number of outliers, as far as

the network is concerned. This in turn prevents sepa-

ration of the data in such a way that each set samples

from the possible distribution of the problem,

hindering generalization capabilities.

4.3 Deep Neural Network

A Deep Neural Network (DNN) was also constructed

to compare the performance of the cascade networks.

Specifically, it was used to evaluate the results,

training speed and memory usage between the

models. The design of the DNN was based on the

results obtained previously in section 4.2.1. Thus,

the network will also focus on using various

generalization methods.

The DNN’s hidden layers used ReLu for their

activation function and their weights were initialized

used Xavier’s method. Also, the optimizer used was

Adam with a learning rate of 0.00005, the decay of

first- order gradient at 0.9, decay of second-order

gradient at 0.999 and an epsilon value at 1e-8. For

regularization, Gaussian noise layers, with a standard

deviation of 0.2 and mean of 0, and dropout layers,

with 20% dropout rate, were included in the model.

Additionally, batch normalization layers were added

with momentum set to 0.99, epsilon to 0.00001 and a

trainable beta value. Lastly, details described in

section 4.2 still apply here. The final layout of the

network consists of five hidden layers, excluding the

input and output layers. The network’s architecture

was formed as following:

• Input layer with 40 units

• Batch normalization, Gaussian noise and

dense layer with 1000 units

Application of Artiﬁcial Intelligence in Microwave Radiometry (MWR)

117

Table 3: Summary of the results on the test set for the neural network models.

Model G-Mean Loss Accurac

Sensitivity Specificity

Base CCNN 0.5889 0.4324 0.4483 0.375

Improved CCNN 0.5417 0.5541 0.6207 0.3125

Extended CCNN 0.5495 0.5405 0.6034 0.3125

Deep Neural Networ

0.2843 0.7703 0.8103 0.625

Convolution Neural Networ

0.3637 0.6081 0.5862 0.6875

• Batch normalization, dropout, Gaussian noise

and dense layer with 200 units

• Batch normalization, dropout, Gaussian noise

and dense layer with 200 units

• Batch normalization, dropout, Gaussian noise

and dense layer with 200 units

• Batch normalization, dropout and dense layer

with 200 units

• Dense output layer with 2 units

The DNN was able to obtain a G-mean loss of

0.2843, accuracy of 0.7703, sensitivity of 0.8103

and specificity of 0.625 on the test set, as shown in

Table 3.The results obtained are significantly better

than that obtained from the improved CCNN, which

had a G-mean loss of 0.5417. It was also able to

achieve this with a noticeably faster training time.

However, the CCNN model was able to obtain its

results requir- ing less memory, as for it constructed

a network with a total of 47 hidden layers with 2

units each based on the improved variant. But with

today’s state of avail- able hardware the memory

usage from the DNN is not of a concern.

Figure 2: The average batch-wise G-mean loss and accu-

racy of the deep neural network as it is trained.

As seen in Figure 2, the regularization techniques

prevented overfitting the training data against the val-

idation. While the network has extracted all possi-

ble information from the training set need to classify

those samples, it does not cover all possible cases in

the validation set. The limitation of the model is de-

rived again from the limited available data in express-

ing an accurate distribution of the problem within the

three sets.

4.3.1 Convolutional Neural Network

Convolutional Neural Networks (CNNs) have shown

great results for detecting breast cancer using vari-

ous imaging data (Cires¸an et al., 2013; Spanhol et al.,

2016; Arajo et al., 2017). In an attempt to further im-

prove the results of the DNN, a CNN model was also

explored. Its design was based on the previously ob-

tained results with higher focus around its ability to

generalize.

Building a CNN implies that the input will be a

2D image with one or more channels (usually color).

Thus, the input vector was transformed to a 2D image

with two channels. The channels were used to rep-

resent the measured data at the skin and at a depth.

The image itself will be of size 13x6 containing the

normalized measurements from both glands and axil-

lary points. The positioning on the image resembles

closely to that of the Figure 1, while also obtaining

the average of neighboring positions to better

represent the overlap as depicted in Figure 3. The

values, before being formed to an image representa-

tion, they were centered using a robust scaler based

on the in- terquartile range to maintain outliers.

Figure 3: Methodology in transforming a vector of temper-

ature measurements from the mammary glands, for both at

the skin and at a depth, to a 2D array. The L represents the

left gland and R the right gland. Any cells left blank have a

value of 0.

The training set used for the CNN was oversam-

pled, as described in section 3.3, and then applying

image augmentations. The intention of this was to

BIOINFORMATICS 2019 - 10th International Conference on Bioinformatics Models, Methods and Algorithms

118

obtain a more rotation invariance network when de-

tecting features on the glands, which in turn should

further improve generalization. The type of augmen-

tations applied were image flipping on the vertical

axis and rotations of the outer pointer of the mam-

mary glands. The result was a total of 5472 samples

split equally between low and high risk for the train-

ing set.

The hidden layers of the network used ReLu acti-

vation functions and Adam optimizer with a learning

rate of 0.0000005, decay of first-order gradient at 0.9,

decay of second-order gradient at 0.999 and epsilon at

1e-8. Additionally, the weights of all the layers were

initialized using the Xavier method from a uniform

distribution. The type of layers used were dense, con-

volutional (Lecun et al., 2015), separable convolution

(Chollet, 2016), max pooling (Lecun et al., 2015),

global average pooling (Lin et al., 2013), dropout,

spatial dropout (Tompson et al., 2014), batch nor-

malization and Gaussian noise. The convolution and

pooling layers used a kernel of size 3x3, stride of 1

with the exception of spatial which used 2, padding

set to same and no bias value. Lastly, all dropout lay-

ers had a dropout percentage of 20%. The full net-

work architecture was defined as following:Input

layer of size 13x6x2 Convolutional with 64 units

Batch normalization, ReLu activation and convo-

lutional with 64 units

• Batch normalization, ReLu activation,

Gaussian noise with standard deviation of

0.01, max pool- ing and convolutional with

128 units

• Batch normalization, ReLu activation,

convolu-

• tional with 128 units

• Batch normalization, ReLu activation,

Gaussian noise with deviation of 0.001, max

pooling and convolutional with 256 units

• Batch normalization, ReLu activation and

convo-

• lutional with 256 units

• Batch normalization, ReLu activation,

Gaussian noise with deviation of 0.001, spatial

dropout, max pooling and separable

convolution with 512 units

• Gaussian noise with deviation of 0.0001, batch

• normalization, ReLu activation and separable

convolution with 512 units. Repeated four

times Gaussian noise with deviation of 0.0001,

batch normalization, ReLu activation, global

average pool, dense with 512 units

• Batch normalization, ReLu activation,

dropout, Gaussian noise with deviation of 0.1

and dense with 2 units

Figure 4: The average batch-wise G-mean loss and accu-

racy of the convolutional neural network as it is trained.

The CNN obtained a G-mean loss value of 0.3637,

accuracy of 0.681, sensitivity of 0.5862 and speci-

ficity of 0.6875, which are included in Table 3. The

network was not able to outperform that of the DNN

based on the G-mean loss but did obtain the highest

specificity rate from all other models. Additionally,

the Figure 4 shows the training and validation G-mean

loss and accuracy as the training of the network pro-

gresses. With the addition of augmentation it should

of helped with generalization, but there was still some

slight overfitting of the training set against the valida-

tion. Finally, there is a similar pattern as before where

limited information from the training set can be gen-

eralized to the validation.

5 CONCLUSIONS

The best performing model introduced in this paper is

the DNN, described in section 4.2.2, with a weighted

G-mean loss of 0.2843 on the test set. Followed was

the CNN with a loss value of 0.3637 and in third place

XGBoost with a value of 0.3994. The three variants

of CCNN were not able to outperform the non-

neural network models, with the exception of SVM

with RBF kernel.

The results of the DNN indicate the potential in

automating readings from radiometry for the purposes

of diagnosis or monitoring cancer patients, which is

not only limited to breast cancer. The models here

used only temperature readings so that the effective-

ness of microwave radiometry in medical applications

can be evaluated. Taking that into account, by

including additional information from other systems

and what is recorded from a clinical professional

about the physiological condition of each person

(Zenovich et al., 2016), it is expected to further

Application of Artiﬁcial Intelligence in Microwave Radiometry (MWR)

119

improve the capabilities of such a system. In

addition, a more exhaustive search on deep neural

networks should be conducted as they show

promising results and can potentially bring further

improvements.

As a potential diagnostic system to aid clinical

professionals in making decisions it currently re-

turns limited information, low or high risk of can-

cer with a prediction confidence. The problem is

oversimplified due to limited amount of data. As

more descriptive and broader data becomes available

it can be expanded from a binary to a multi-class

task. Some possible labels are benign and malignant

tumors, noncancerous tumors, inflammation,

infection and healthy patients. Additionally, as a

future scope, the networks can be reevaluated and

altered, as needed, when readings are obtained for

cancerous tumors at various body locations.

The proposed variants of the CCNNs can be even

further improved on the aspect of dynamic

construction of the architecture. Additional

suggestion is to allow the network to try against

various activation and optimization functions and

conduct hyperparameter optimization through an

online grid search. However, this drastically

increases the number of possible combinations for

the network to explore, hence in- creased training

time, for a possibly small improvement. Moreover,

there is a risk with the preferential selection of units

against the validation set. With such a fine level of

selection and with the added high amount of noise it

is possible to overfit the validation set. Hence,

shifting the problem from generalizing from the

training set to generalizing from the valida tion set.

REFERENCES

Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A.,

Dean,

J., Devin, M., Ghemawat, S., Irving, G., Isard, M., Kudlur,

M., Levenberg, J., Monga, R., Moore, S., Murray, D.

G., Steiner, B., Tucker, P., Vasudevan, V., Warden, P.,

Wicke, M., Yu, Y., and Zheng, X. (2016). Tensorflow:

A system for large-scale machine learning. In

Proceedings of the 12th USENIX Conference on

Operating Systems Design and Implementation,

OSDI’16, pages 265–283, Berkeley, CA, USA.

USENIX Association.

Anisimova, E. V. (2013). Intllktualnyy analiz dannykh i

algoritmy klassifikatsii v diagnostik vnoznykh

zabolvaniy po dannym kombinirovannoy trmomtrii:

avtorf. dis. kand. tkhn. nauk [data mining and

classification algorithms in the diagnosis of venous

diseases according to the combination of thermometry.

abstract of diss. and. of technical sciences]. Vol-

gograd, page 16.

Arajo, T., Aresta, G., Castro, E., Rouco, J., Aguiar, P., Eloy,

C., Polnia, A., and Campilho, A. (2017). Classifica-

tion of breast cancer histology images using convolu-

tional neural networks. PLOS ONE, 12(6):1–14.

Barandela, R., Sanchez, J., Garca, V., and Rangel, E.

(2003). Strategies for learning in class imbalance

problems. 36:849–851.

Bergstra, J., Bardenet, R., Bengio, Y., and Ke´gl, B.

(2011). Algorithms for hyper-parameter optimization.

In Proceedings of the 24th International Conference on

Neural Information Processing Systems, NIPS’11,

pages 2546–2554, USA. Curran Associates Inc.

Bergstra, J., Yamins, D., and Cox, D. D. (2015).

Hyperopt: A python library for optimizing the

hyperparameters of machine learning algorithms.

Bishop, C. M. (2006). Pattern Recognition and Machine

Learning (Information Science and Statistics).

Springer-Verlag, Berlin, Heidelberg.

Bolomey, J. C., Izadnegahdar, A., Jofre, L., Pichot, C.,

Peronnet, G., and Solaimani, M. (1982). Microwave

diffraction tomography for biomedical applications.

IEEE Transactions on Microwave Theory and Tech-

niques, 30(11):1998–2000.

Bondar, S. S., Terekhov, I. V., Voevodin, A. A., Leonov,

B. I., and Khadartsev, A. A. (2017). Assessment of

transcapillary water exchange in the lungs by active

radiometry. Biomedical Engineering, 51(3):211–214.

Bottou, L. (2010). Large-scale machine learning with

stochastic gradient descent. In Lechevallier, Y. and

Saporta, G., editors, Proceedings of COMP-

STAT’2010, pages 177–186, Heidelberg. Physica-

Verlag HD.

Breiman, L. (2001). Random forests. Machine Learning,

45(1):5–32.

Burges, C. J. C. (1998). A tutorial on support vector

machines for pattern recognition. Data Mining and

Knowledge Discovery, 2:121–167.

Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer,

W. P. (2002). Smote: Synthetic minority over- sampling

technique. J. Artif. Int. Res., 16(1):321–357.

Chen, T. and Guestrin, C. (2016). Xgboost: A scalable tree

boosting system. In Proceedings of the 22Nd ACM

SIGKDD International Conference on Knowledge

Discovery and Data Mining, KDD ’16, pages 785–

794, New York, NY, USA. ACM.

Chollet, F. (2016). Xception: Deep learning with depthwise

separable convolutions. CoRR, abs/1610.02357.

Chollet, F. et al. (2015). Keras. https://keras.io.

Cires¸an, D. C., Giusti, A., Gambardella, L. M., and

Schmidhuber, J. (2013). Mitosis detection in breast

cancer histology images with deep neural networks. In

Mori, K., Sakuma, I., Sato, Y., Barillot, C., and Navab,

N., editors, Medical Image Computing and Computer-

Assisted Intervention – MICCAI 2013, pages 411– 418,

Berlin, Heidelberg. Springer Berlin Heidelberg.

Conceicao, R., O’Halloran, M., and Mohr, J. (2016). An

Introduction to Microwave Imaging for Breast Cancer

Detection.

BIOINFORMATICS 2019 - 10th International Conference on Bioinformatics Models, Methods and Algorithms

120

Cortes, C. and Vapnik, V. (1995). Support-vector

networks.

Mach. Learn., 20(3):273–297.

Cover, T. and Hart, P. (2006). Nearest neighbor pattern clas-

sification. IEEE Trans. Inf. Theor., 13(1):21–27.

Crandall, J. P., O, J. H., Gajwani, P., Leal, J. P.,

Mawhinney, D. D., Sterzer, F., and Wahl, R. L. (2018).

Measurement of brown adipose tissue activity using

microwave radiometry and 18f-fdg pet/ct. Journal of

nuclear medicine: official publication, Society of Nu-

clear Medicine, 59(8):12431248.

de Boer, P.-T., Kroese, D. P., Mannor, S., and Rubinstein,

R. Y. (2005). A tutorial on the cross-entropy method.

Annals of Operations Research, 134(1):19–67.

Dicke, R. H. (1982). The Measurement of Thermal Ra-

diation at Microwave Frequencies, pages 106–113.

Springer Netherlands, Dordrecht.

Drakopoulou, M., Moldovan, C., Toutouzas, K., and Tou-

soulis, D. (2018). The role of microwave radiometry in

carotid artery disease diagnostic and clinical

prospective. Current Opinion in Pharmacology, 39:99 –

104. Cardiovascular and renal.

Fahlman, S. E. and Lebiere, C. (1990). Advances in neural

information processing systems 2. chapter The

Cascade-correlation Learning Architecture, pages 524–

532. Morgan Kaufmann Publishers Inc., San

Francisco, CA, USA.

Gabriel, S., Lau, R. W., and Gabriel, C. (1996a). The

dielectric properties of biological tissues: Ii.

measurements in the frequency range 10 hz to 20 ghz.

Physics in Medicine and Biology, 41(11):2251.

Gabriel, S., Lau, R. W., and Gabriel, C. (1996b). The di-

electric properties of biological tissues: III. parametric

models for the dielectric spectrum of tissues. Physics

in Medicine and Biology, 41(11):2271.

Gautherie, M. (1980). Thermopathology of breast cancer:

Measurement and analysis of in vivo temperature and

blood flow. Annals of the New York Academy of Sci-

ences, 335(1):383–415.

Glorot, X. and Bengio, Y. (2010). Understanding the

difficulty of training deep feedforward neural networks.

In Teh, Y. W. and Titterington, M., editors, Proceedings

of the Thirteenth International Conference on Artifi-

cial Intelligence and Statistics, volume 9 of Proceed-

ings of Machine Learning Research, pages 249–256,

Chia Laguna Resort, Sardinia, Italy. PMLR.

Han, H., Wang, W.-Y., and Mao, B.-H. (2005). Borderline-

smote: A new over-sampling method in imbalanced

data sets learning. In Proceedings of the 2005 Interna-

tional Conference on Advances in Intelligent Comput-

ing - Volume Part I, ICIC’05, pages 878–887, Berlin,

Heidelberg. Springer-Verlag.

He, H., Bai, Y., Garcia, E. A., and Li, S. (2008). Adasyn:

Adaptive synthetic sampling approach for imbalanced

learning. In 2008 IEEE International Joint Confer-

ence on Neural Networks (IEEE World Congress on

Computational Intelligence), pages 1322–1328.

He, H. and Ma, Y. (2013). Imbalanced Learning: Foun-

dations, Algorithms, and Applications. Wiley-IEEE

Press, 1st edition.

Ioffe, S. and Szegedy, C. (2015). Batch normalization: Ac-

celerating deep network training by reducing internal

covariate shift. CoRR, abs/1502.03167.

Ivanov, Y., Kozlov, A. F., Galiullin, R. A., Tatur, V. Y., Zi-

borov, V. S., Ivanova, N. D., Pleshakova, T. O., Ves-

nin, S. G., and Goryanin, I. (2018). Use of microwave

radiometry to monitor thermal denaturation of albu-

min. Frontiers in Physiology, 9:956.

Kingma, D. P. and Ba, J. (2014). Adam: A method for

stochastic optimization. CoRR, abs/1412.6980.

Kobrinskiy, B. A. (2008). Konsultativny intllktualny

mditsinski sistmy: klassifikatsiya, printsipy postron-

iya, effktivnost [consulting intelligent medical sys-

tems: Classification, principles of construction, effi-

ciency]. Volgograd, (2):38–47.

Krawczyk, B. (2016). Learning from imbalanced data: open

challenges and future directions. Progress in Artificial

Intelligence, 5(4):221–232.

Kubat, M. and Matwin, S. (1997). Addressing the curse of

imbalanced training sets: One-sided selection. In In

Proceedings of the Fourteenth International Confer-

ence on Machine Learning, pages 179–186. Morgan

Kaufmann.

Laskari, K., Pitsilka, D., Pentazos, G., Siores, E., Tek-

tonidou, M., and Sfikakis, P. (2018). Sat0657

microwave radiometry-derived thermal changes of

sacroiliac joints as a biomarker of sacroiliitis in

patients with spondyloarthropathy. Annals of the

Rheumatic Diseases, 77(Suppl 2):1178–1178.

Lecun, Y., Bengio, Y., and Hinton, G. (2015). Deep learn-

ing. Nature, 521(7553):436–444.

Lim, T.-S., Loh, W.-Y., and Shih, Y.-S. (2000). A

compar- ison of prediction accuracy, complexity, and

training time of thirty-three old and new classification

algo- rithms. Machine Learning, 40(3):203–228.

Lin, M., Chen, Q., and Yan, S. (2013). Network in

network. CoRR, abs/1312.4400.

Losev, A. G. and Lvshinskiy, V. V. (2015). Regressionnaya

model diagnostiki patologiy molochnykh zhelez po

dannym mikrovolnovoy radiotermometrii [regression

model for diagnosis of breast pathology according to

microwaves radiometry data]. Vestnik Volgogradskogo

gosudarstvennogo universiteta. Seriya 1. Mathemat-

ica. Physica [Science Journal of Volgograd State Uni-

versity. Mathematics. Physics], 6(31):72–82.

Myers, P. C., Sadowsky, N. L., and Barrett, A. H. (1979).

Microwave thermography: Principles, methods and

clinical applications. Journal of Microwave Power,

14(2):105–115.

Nair, V. and Hinton, G. E. (2010). Rectified linear units im-

prove restricted boltzmann machines. In Proceedings

of the 27th International Conference on International

Conference on Machine Learning, ICML’10, pages

807–814, USA. Omnipress.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V.,

Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P.,

Weiss, R., Dubourg, V., Vanderplas, J., Passos, A.,

Cournapeau, D., Brucher, M., Perrot, M., and Duch-

esnay, E. (2011). Scikit-learn: Machine learning in

python. J. Mach. Learn. Res., 12:2825–2830.

Application of Artiﬁcial Intelligence in Microwave Radiometry (MWR)

121

Pentazos, G., Laskari, K., Prekas, K., Raftakis, J., P.

Sfikakis, P., and Siores, E. (2018). Microwave

radiometry-derived thermal changes of small joints as

additional potential biomarker in rheumatoid arthritis:

A prospective pilot study. 24:1.

Peronnet, G., Pichot, C., Bolomey, J. C., Jofre, L., Izad-

negahdar, A., Szeles, C., Michel, Y., Guerquin-Kern,

J. L., and Gautherie, M. (1983). A microwave diffrac-

tion tomography system for biomedical applications.

In 1983 13th European Microwave Conference, pages

529–533.

Pichot, C., Jofre, L., Peronnet, G., and Bolomey, J. (1985).

Active microwave imaging of inhomogeneous bod-

ies. IEEE Transactions on Antennas and Propagation,

33(4):416–425.

Rodrigues, D. B., Stauffer, P. R., Pereira, P. J. S., and Mac-

carini, P. F. (2018). Microwave Radiometry for Non-

invasive Monitoring of Brain Temperature, pages 87–

127. Springer International Publishing, Cham.

Schneider, B. P. and Miller, K. D. (2005).

Angiogenesis of breast cancer. Journal of Clinical

Oncology, 23(8):1782–1790. PMID: 15755986.

Semenov, S. (2009). Microwave tomography: Review of

the progress towards clinical applications. Philosoph-

ical Transactions: Mathematical, Physical and Engi-

neering Sciences, 367(1900):3021–3042.

Skolnik, M. I. (2018). Radar. Encyclopdia Britannica.

Available at https://www.britannica.com/technology/

radar/History- of-radar, Access date: 27/07/2018.

Spanhol, F. A., Oliveira, L. S., Petitjean, C., and Heutte, L.

(2016). Breast cancer histopathological image

classification using convolutional neural networks. In

2016 International Joint Conference on Neural

Networks (IJCNN), pages 2560–2567.

Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I.,

and Salakhutdinov, R. (2014). Dropout: A simple way

to prevent neural networks from overfitting. Journal of

Machine Learning Research, 15:1929–1958.

Tompson, J., Goroshin, R., Jain, A., LeCun, Y., and

Bregler, C. (2014). Efficient object localization using

convolu- tional networks. CoRR, abs/1411.4280.

Vesnin, S., Turnbull, A., Michael Dixon, J., and Goryanin,

I. (2017). Modern microwave thermometry for breast

cancer. 7.

Wilkins, M. F., Boddy, L., Morris, C. W., and Jonker, R.

(1996). A comparison of some neural and non-neural

methods for identification of phytoplankton from flow

cytomery data. Bioinformatics, 12(1):9–18.

Zenovich, A. V., Glazunov, V. A., Oparin, A. S., and Pri-

machenko, F. G. (2016). Algoritmy prinyatiya resh-

eniy v konsultativnoy intellektualnoy sisteme diag-

nostiki molochnykh zhelez [algorithms of decision-

making in the advisory intellectual system of diagnos-

tics of mammary glands]. Mathematical physics and

computer modeling, 6:129–142.

BIOINFORMATICS 2019 - 10th International Conference on Bioinformatics Models, Methods and Algorithms

122