Federated Learning on Distributed Medical Records for

Detection of Lung Nodules

Pragati Baheti, Mukul Sikka, K. V. Arya and R. Rajesh

ABV-Indian Institute of Information Technology and Management Gwalior, Gwalior, 474015, India

Keywords:

Federated Learning, Distributed Database, Decentralized training, Electronic Medical Records, Blockchain.

Abstract:

In this work, the concept of federated Learning is applied on medical records of CT scans images for detection

of pulmonary lung nodules. Instead of using the naive ways, the authors have come up with decentralizing

the training technique by bringing the model to the data rather than accumulating the data at a central place

and thus maintaining differential privacy of the records. The training on distributed electronic medical records

includes two models: detection of location of nodules and its conﬁrmation. The experiments have been carried

out on CT scan images from LIDC dataset and the results shows that the proposed method outperformed the

existing methods in terms of detection accuracy.

1 INTRODUCTION

(Sheller et al., 2018) showed that the amount of health

data generated increases by 48% annually and may

reach 2314 exabytes by 2020. If this ’big data’ is

available for training, the model would be much ac-

curate. Training data in a central place becomes re-

source consuming but it is hard to obtain due to pri-

vacy and ownership concerns.

Federated learning(Brisimi et al., 2018) is a ma-

chine learning technique involving training of algo-

rithm across multiple decentralized servers holding

local data samples, without exchanging their data.

This approach of collaboratively learning a shared

prediction model while keeping all the training data

on device hence decoupling the ability to do machine

learning from the need to store the data at a central

place. It is a concept where AI model can be gov-

erned by multiple owners and trained securely on an

unseen, distributed dataset. Instead of bringing all the

data in one place, federated learning bring the model

to the data. This allows a data owner to maintain the

only copy of their information and solve the problem

of security. In this learning model trusted hospitals

are considered as federated severs and data-holders

where the sensitive information of their patients are

secured, yet available when needed for training. All

such hospitals acting as nodes are supposed to update

the model by further training.

In this work the concept of federated learning is

applied for detection of pulmonary lung nodules. The

lung nodules are too small to be detected manually

and is often conﬂicted with blood vessels and other

small underlying biological structures due to similar-

ity in shape and size. If detected, it may be wrongly

confused to be tuberculosis. Further tests are required

for surety which delays the conﬁrmation of lung can-

cer and its treatment. This delays the survival rate of

patients by 67%(Sivakumar and Chandrasekar, 2013).

In all the traditional models, the data was accu-

mulated in a central place and trained together. But

to get a better accuracy for detection of nodules, the

data required for training should be very large. Pa-

tients do not want to share their medical records even

for research purposes. Our focus was to improve the

accuracy by increasing the dataset while working for

the privacy as well. Medical records should contribute

to training explicitly without actually being shared.

To deal with patient’s differential privacy many

theories were proposed. Applying the model on dis-

tributed databases and accumulating the updates for

further improving the initial model from different

nodes forms the basis of federated learning.

In case of sketched update(Kone

y et al.,

2016),the individual data centres apply the model on

their data and send complete the updated model to

the central trusted server and replace the initial model

with the sent model. Another approach highlighted

was that of structured approach in which instead of

sending the entire model, they send the updates in

the form of model parameters changes. The sketched

method of updation forms the bottleneck of federated

Baheti, P., Sikka, M., Arya, K. and Rajesh, R.

Federated Learning on Distributed Medical Records for Detection of Lung Nodules.

DOI: 10.5220/0009144704450451

In Proceedings of the 15th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2020), pages 445-451

ISBN: 978-989-758-402-2

445

learning due to bandwidth limitation. If the model

stored at the Central server is completely replaced by

new model, it would require good link connection and

would increase the number of bits being uploaded. In

the structured way of updation, it may lead to devia-

tion from accuracy. (Kim et al., 2019) has shown that

assigning the equal weightage to a model update with

less no of data samples and the model update with

more number of data samples may lead to misleading

results.

In literature many methods have been proposed on

detection of lung nodules. One such method proposed

by (Murphy et al., 2009) forms clusters of closely oc-

curring volumes and hence, forming one large vol-

ume by application of KNN in order to reduce false

cases. The input to such model were shape index,

maximum and minimum dimensions of structures to

be considered as nodules, number of voxels to be con-

sidered in the cluster to be classiﬁed into nodular or

non-nodular region the drawback of this method of

supervised classiﬁcation using KNN/SVM is the in-

crease in false positives as two or more non-nodular

regions of smaller volumes in cluster may be portrait

as a bigger voxel giving the false essence of a nodule.

The intensity based genetic method of detection

(Dehmeshki et al., 2007) of cancer initiating lung

nodules based on intensity in which the intensity in-

side lung nodules is higher than the surrounding vol-

ume. The shape based detection that takes the index

of sphere as 1 and of blood vessels as 0.75 whereas

threshold of nodules is taken to be 0.95. But this

method leads to difﬁculty in identifying nodules of

irregular shapes and density patterns. The partly solid

and non-solid nodules are not detected as they fail to

cross the threshold value of the shape index. The nod-

ules close to the pleural surface and those attached

to blood vessels are also not detected leading to false

cases.

2 PROCEDURE OF WORK

This work of decentralizing the ML model in dis-

tributed databases for detection of lung nodules and

predict its severity followed a series of work. It started

with 1. Designing the initial ML model 2. Distribut-

ing the model to all nodes while ensuring the security

3. Updating the model.

2.1 Initial Model

The basic model deployed in this work is an integra-

tion of two sequential models. The ﬁrst model detects

the occurrence of nodules while the second model

conﬁrms it’s presence. The different stages of the pro-

posed model are discussed below:

2.1.1 Dataset Acquisition

The LIDC dataset used for detection of pulmonary

nodules contains 1010 CT scans from 1010 different

patients(Armato III et al., 2011). Seven cases where

the scans were incomplete are excluded. Each scan

was checked and annotated by four radiologists man-

ually and a total of 2632 nodules were found in the

dataset. ’annotations’ is a csv ﬁle included in the

dataset that contains the information which are used

as a standard reference for nodule detection. An-

other csv ﬁle under the name ’candidates’ used for the

LUNA16 workshop contains a set of candidate loca-

tions for checking the correctness and completeness

of the nodule location thereby ensuring reduction in

false cases.

2.1.2 Preprocessing

Segmentation of lungs from the surrounding region

was the ﬁrst step involved where lungs from the CT

scan images are segmented by using predeﬁned edge

detection techniques so having the focus is within the

pulmonary region.

The next step in pre-processing is masking of the

segmented lung images to highlight the region of in-

terest based on the annotations ﬁle in LIDC dataset.

Based on the coordinate values and the radius of the

nodules speciﬁes in the annotations.

Both the segmented and masked images are sliced

layer by layer. The segmented images serve as input

images and masked images as corresponding labels.

The masked and segmented images to be fed as input

to the model are concentrated to the desired region of

interest where the nodules are present. The size of the

largest nodule found in the LIDC dataset upon traver-

sal through 2632 nodules is not more than 64× 64 pix-

els. Therefore, the setting of the ROI to this size will

sufﬁciently enclose all nodules in the LIDC dataset

and remove any chances of missing out any nodule.

For this, the segmented images and masked images

are converted to 64 × 64 pixels images and stacked

into 16 layers.

For the second model, cubes of size of the nodules

are generated based on the candidates ﬁle that con-

tains the location as coordinates and radius and the

corresponding label as 1/0 denoting nodular or non-

nodular region respectively. The number of dataset

examples depicting nodular regions labeled as 1 were

much less than non-nodular region. So, balancing

of dataset is required. Hence, augmentation is used

which is a technique that can be used to artiﬁcially

VISAPP 2020 - 15th International Conference on Computer Vision Theory and Applications

446

expand the size of a training dataset by creating mod-

iﬁed versions of images in the dataset. Image data

augmentation is used to expand the training dataset in

order to improve the performance and ability of the

model to generalize. Here, the number of cases with

diseases is relatively smaller than the number of nor-

mal cases So each nodule is rotated in x, y, z direc-

tion. After augmentation, the class 1 examples were

increased by 40 times to balanced the other class.

2.1.3 Architecture of Models

A feed forward network with a single massive layer

is prone to over ﬁtting the data. Therefore, there is

a common trend that the network architecture should

go deeper not wider as long and wider network tend

to have too many weights and is not good in terms of

memory usage and computational speed. A good way

to make the model learn interesting features deeper

and by working on kernel surface. However, it has

been noticed that after some depth, the performance

degrades.

The core idea is that stacking the layers or making

the model deeper should not degrade the performance

and the loss from the current model should be less

than the residual layer. Stacking the layers is not a so-

lution due to vanishing gradient problem. The authors

have used skip connections for the better functioning

of model, the stack of layers over which each skip

connections passes are called residual blocks which

helps to calculate the true output and also makes it

easier to calculate identity function i.e. setting the

output of residual block to zero if it doesn’t contribute

for the betterment of model.

A combination of downsampling and upsampling

is done in our model. Downsampling is done to teach

the network to focus on fewer activation points than

all of it to reduce the redundancy in the feature map.

Reducing the number of parameters ensures higher

computational speeds and makes the output tolerant

to small translational changes in input. Downsam-

pling might cause information loss due to its role of

extracting only the useful information.

Upsampling is applied after downsampling when

the features are reduced to a one dimensional array.

It is done to enlarge the sparse feature maps and im-

prove the resolution. This layer densiﬁes these feature

maps through convolution-like operations with mul-

tiple trainable ﬁlters and has nothing to do with re-

construction of lost information. The sole purpoe of

applying down/up sampling layrs is to reduce compu-

tations in each layer which keeping the dimension of

input/output as before.

Convolution layer is the ﬁrst layer applied that

makes the relationship between pixels of learning im-

Figure 1: Schematic representation of Vnet model for ex-

traction of feature maps.

age using small squares of 96 × 96 × 16 of input im-

age. Convolving the image with different ﬁlters or

kernels helps to extract better features. The stride is

2 × 2 × 2 as our kernel moves by 2 pixels at a time.

We drop the part of the image where the ﬁlter did not

ﬁt and keep only the valid part of the image. ReLu is

non linear action function which consist of two linear

equation can be written as

f (x) = max(0, x)

which helps to bring non-linearities in the network.

Stacking convolution layers vertically and keep-

ing the kernel size 3× 3× 3 which helps to makes the

model lighter and tends to improve the results. Each

input image will pass through a series a convolution

layers with kernals(ﬁlters) and pooling the image and

downsampling it to a particular value. The gradients

can ﬂow directly through the skip connections back-

wards from later layers to initial ﬁlters.

Pooling is done after a sequence of convolution

layers to reduce the number of parameters and down

Federated Learning on Distributed Medical Records for Detection of Lung Nodules

447

sample the image while retaining the important prop-

erties as well. We apply max pooling which takes the

maximum of the pixel value from the feature map.

The ﬁrst model is an implementation of Vnet 3D

architecture(Milletari et al., 2016) in which we gave

input of 16 images stacked upon each other which

contains the nodules. First the image is downsam-

pled to extract important features at each step, at ev-

ery step the feature map calculated are two times more

interesting than that of previous layer. Output of ev-

ery layer is connected to output of previous layer to

know if this layer plays a role in the proposed model.

After this Upsampling is done to further extract im-

portant features and to expand low resolution feature

map generated after down sampling. Down sampling

helps in better extraction of features whereas up sam-

pling gives overall presence of features, in our case as

lung nodules.

In second model Resnet architecture(He et al.,

2016) is used that comprises of convolution layers for

downsampling until the feature map is ﬂattened. This

is fed to a fully connected layer with an activation

function of Softmax to give the probabilistic value to

classify into 1/0 classes.

2.1.4 Integration of Models

The output of the ﬁrst model that gives the predicted

region of nodules in the CT scans as 384 × 384 pix-

els image which is a collection of 16 layers each of

96 × 96 in the form of a 4 × 4 grid structure which is

fed to our second model that cross validates the pre-

dicted ROI. This provides conﬁdence that the nodules

are present in this region by classifying into nodular

and non-nodular with a certain probability.

2.2 Network

The checkpoints that include meta graph, index and

data ﬁles are saved after training. The meta graph de-

scribes the structure of the models which are formed

by various layers described above. Index ﬁle is an im-

mutable object that contains the name of the variables

used in each layer and data ﬁle is a collection that

saves the value of all such variables used in the graph.

These ﬁles are added to the IPFS(Benet, 2014) server

which acts as a protocol and creates a network for

a content-addressable, peer-to-peer method of storing

and sharing resources in a distributed ﬁle system.The

IPFS provides an accessible hash link given to all the

nodes that want to use the model. Once a node gets all

these ﬁles it creates the structure of the graph and ini-

tializes the values of the variables from the data ﬁles.

The structure of learning process used in the pro-

posed model is depicted in Fig2, where all the hos-

Figure 2: Federated Learning Architecture.

pitals acting as nodes can use this link generated

to download the latest model/global updates. The

nodes can further train this model on their accessi-

ble EMR’s. In this the differential privacy of the

records are maintained still being used for training the

model explicitly and come up with updates in form of

changes in model parameters as local updates.

The structure of the model is not changed in any of

the further iterations and for further updates only the

data ﬁles consisting of the values of the variables are

sent through the network. This ﬁle contains array of

numbers which consumes very small space and hence,

requires a small bandwidth. These changes are stored

in IPFS and to maintain the security. The IPFS hash

can only be accessed by the central accumulator by

generating a pair of key pairs for every node. The

generated IPFS link after storing the weight changes

is ﬁrst encrypted with the public key of the central

accumulator and stored in blockchain. To access this

encrypted hash it is decrypted by the owner using its

private key.

2.3 Updation

The aggregation should be such that the local updates

i.e. the updates which are sent from every node af-

ter training model on the nodes with minimum loss

with maximum records should be given more weigh-

tage as compared to the others and not the standard

techniques.If the aggregation was only on the basis

of number of ﬁles(Kim et al., 2019), we came to no-

tice that it was biased as the nodes that processed less

number of ﬁles but with negligible loss would not be

given importance. Also, if only loss was considered

as the deciding factor it would be wrong as the nodes

that processed very less ﬁles would naturally come

up with lower loss and be given more priority which

may lead result in wrong direction. So, here must be

a trade off between the loss and the number of ﬁles.

VISAPP 2020 - 15th International Conference on Computer Vision Theory and Applications

448

The updates from nodes are computed using eq(1).

t+1

= W

∑

i=1

∑

i=1

(1)



∑

i=1

× 100



(2)



∑

i=1

× 100



(3)

where

= W

− W

(4)

: Weights distributed to all the nodes

: New Weights calculated by i

node on its

own dataset

n : No of nodes in distributed network

t : No of iterations/updates performed

: Weight difference of i

node

: Learning Loss of i

node in t

iteration

: No. of reports processed by i

node in t

iteration

Eq (2). signiﬁes the normalized loss weightage

which is the ratio of the loss of the i

node to the sum-

mation of the loses from all the nodes in the present

updation performed.

Eq (3). signiﬁes the normalized ﬁles weightage

which is the ratio of the number of ﬁles processed by

the i

node to all the ﬁles processed till then in the

present updation.

The updated weights are the aggregation of the

weights from the previous iteration and the change in

weights obtained by the above aggregation technique.

3 TRAINING

In the training mechanism initial model was trained

for 5000 cases in batch of 16 for 50000 epochs.Then

this latest model saved to IPFS server and the cor-

responding link is added to blockchain. Veriﬁed

users access that link to download the global updates.

Three nodes acting as updation servers were taken

for testing with sample size of 2500, 1500 and 500

CT scan images. Local Updates are generated after

these nodes train the model on their local records.

These updates from all the nodes are accumulated at

the federated server and aggregated by a time based

scheduler- cron. The aggregation follows the mecha-

nism of normalization stated above which leads to the

global update and provides weightage to all the local

updates. The central model is updated according to

the global update and is ready to follow the next it-

eration of updates. These server run 4,3,5 iteration

respectively of the above steps for the better aggre-

gation of models. This batch size of 500,1500,2500

records are not accumulated at a central place and is

not shared among other nodes but it’s inference is use-

ful to all the models.

4 TESTING & RESULTS

For testing, the latest model is initialized by apply-

ing the global update that was stored in blockchain

through federated server. The input to the ﬁrst model

is provided with a random image from the dataset in

a batch of 16. The output of which was sent to the

second model focusing on the nodule region .

(a) (b)

(c)

Figure 3: (a) Input test image taken in a batch of 16 (b) Ex-

pected masked image highlighting the region of interest of

nodules (c) Predicted image obtained from our ﬁrst model.

The set of 16 layers of one of the test image for

the ﬁrst model is shown in Fig 3(a). The predicted

image for the layer by layer input of the segmented

image is shown in Fig 3(c) which is very much close

to the expected image of Fig 3(b). The white patches

in the ﬁgure signiﬁes the presence of nodule in the

corresponding layer.

The accuracy of the ﬁrst model is measured

in terms of similarity index between the predicted

masked image and the actual masked image by us-

ing cumulative color histograms(Stricker and Orengo,

1995).

The overall accuracy of ﬁrst model (tested over

1305 data samples) was found to be 90.87% and that

was 97.65% for the second model (tested over 8750

samples). The loss of the ﬁrst model is gauged based

Federated Learning on Distributed Medical Records for Detection of Lung Nodules

449

on the metric of dice coefﬁcient that works as on ﬁnd-

ing similarity between the masked and predicted im-

age keeping our prime focus in the white patches de-

noting the nodular regions. The Dice score is not only

a measure of how many positives pixel match found,

but it also penalizes for the false matches that the

method found. For the second model, cross-entropy is

used that measures the performance of classiﬁcation

model whose output is a probability value between 0

and 1. Cross-entropy increases as the predicted prob-

ability diverges from the actual label. The training

loss graphs are shown in Fig 4-Fig 8.

Figure 4: Initial ﬁrst model loss vs epochs.

Figure 5: Initial second model loss vs epochs.

Figure 6: Logs of ﬁrst node loss vs training epochs.

5 CONCLUSION

In this paper an approach is proposed to effectively

utilize the medical health records for disease predic-

tion using federated learning. The proposed approach

helps to maintain health records for disease predic-

tion using federated learning. The proposed approach

Figure 7: Logs of second node loss vs training epochs.

Figure 8: Logs of third node loss vs training epochs.

helps to maintain medical ethics and conﬁdentiality of

patients record by training the system even on unseen

data in a distributed environment. This decentralized

training reduces the time of computation by not accu-

mulating the huge data at a central place. The initial

model is distributed and the updates are aggregated

by maintaining the trade off between the number of

samples and error. The proposed model evaluated on

LIDC dataset has outperformed all the existing mod-

els by exhibiting the prediction accuracy of 97.65%.

REFERENCES

Armato III, S. G., McLennan, G., Bidaut, L., McNitt-Gray,

M. F., Meyer, C. R., Reeves, A. P., Zhao, B., Aberle,

D. R., Henschke, C. I., Hoffman, E. A., et al. (2011).

The lung image database consortium (lidc) and image

database resource initiative (idri): a completed refer-

ence database of lung nodules on ct scans. Medical

Physics, 38(2):915–931.

Benet, J. (2014). Ipfs-content addressed, versioned, p2p ﬁle

system. arXiv preprint arXiv:1407.3561.

Brisimi, T. S., Chen, R., Mela, T., Olshevsky, A., Pascha-

lidis, I. C., and Shi, W. (2018). Federated learning

of predictive models from federated electronic health

records. International Journal of Medical Informatics,

112:59–67.

Dehmeshki, J., Ye, X., Lin, X., Valdivieso, M., and Amin,

H. (2007). Automated detection of lung nodules in

ct images using shape-based genetic algorithm. Com-

puterized Medical Imaging and Graphics, 31(6):408–

417.

VISAPP 2020 - 15th International Conference on Computer Vision Theory and Applications

450

He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep resid-

ual learning for image recognition. In 2016 IEEE Con-

ference on Computer Vision and Pattern Recognition

(CVPR), pages 770–778.

Kim, H., Park, J., Bennis, M., and Kim, S.-L. (2019).

Blockchained on-device federated learning. IEEE

Communications Letters.

Kone

y, J., McMahan, H. B., Yu, F. X., Richt

arik, P.,

Suresh, A. T., and Bacon, D. (2016). Federated learn-

ing: Strategies for improving communication efﬁ-

ciency. arXiv preprint arXiv:1610.05492.

Milletari, F., Navab, N., and Ahmadi, S.-A. (2016). V-

net: Fully convolutional neural networks for volumet-

ric medical image segmentation. In 2016 Fourth Inter-

national Conference on 3D Vision (3DV), pages 565–

571. IEEE.

Murphy, K., van Ginneken, B., Schilham, A. M., De Hoop,

B., Gietema, H., and Prokop, M. (2009). A large-

scale evaluation of automatic pulmonary nodule de-

tection in chest ct using local image features and k-

nearest-neighbour classiﬁcation. Medical Image Anal-

ysis, 13(5):757–770.

Sheller, M. J., Reina, G. A., Edwards, B., Martin, J., and

Bakas, S. (2018). Multi-institutional deep learning

modeling without sharing patient data: A feasibil-

ity study on brain tumor segmentation. In Interna-

tional MICCAI Brainlesion Workshop, pages 92–104.

Springer.

Sivakumar, S. and Chandrasekar, C. (2013). Lung nodule

detection using fuzzy clustering and support vector

machines. International Journal of Engineering and

Technology, 5(1):179–185.

Stricker, M. A. and Orengo, M. (1995). Similarity of color

images. In Storage and retrieval for image and video

databases III, volume 2420, pages 381–392. Interna-

tional Society for Optics and Photonics.

Federated Learning on Distributed Medical Records for Detection of Lung Nodules

451