learning due to bandwidth limitation. If the model
stored at the Central server is completely replaced by
new model, it would require good link connection and
would increase the number of bits being uploaded. In
the structured way of updation, it may lead to devia-
tion from accuracy. (Kim et al., 2019) has shown that
assigning the equal weightage to a model update with
less no of data samples and the model update with
more number of data samples may lead to misleading
results.
In literature many methods have been proposed on
detection of lung nodules. One such method proposed
by (Murphy et al., 2009) forms clusters of closely oc-
curring volumes and hence, forming one large vol-
ume by application of KNN in order to reduce false
cases. The input to such model were shape index,
maximum and minimum dimensions of structures to
be considered as nodules, number of voxels to be con-
sidered in the cluster to be classified into nodular or
non-nodular region the drawback of this method of
supervised classification using KNN/SVM is the in-
crease in false positives as two or more non-nodular
regions of smaller volumes in cluster may be portrait
as a bigger voxel giving the false essence of a nodule.
The intensity based genetic method of detection
(Dehmeshki et al., 2007) of cancer initiating lung
nodules based on intensity in which the intensity in-
side lung nodules is higher than the surrounding vol-
ume. The shape based detection that takes the index
of sphere as 1 and of blood vessels as 0.75 whereas
threshold of nodules is taken to be 0.95. But this
method leads to difficulty in identifying nodules of
irregular shapes and density patterns. The partly solid
and non-solid nodules are not detected as they fail to
cross the threshold value of the shape index. The nod-
ules close to the pleural surface and those attached
to blood vessels are also not detected leading to false
cases.
2 PROCEDURE OF WORK
This work of decentralizing the ML model in dis-
tributed databases for detection of lung nodules and
predict its severity followed a series of work. It started
with 1. Designing the initial ML model 2. Distribut-
ing the model to all nodes while ensuring the security
3. Updating the model.
2.1 Initial Model
The basic model deployed in this work is an integra-
tion of two sequential models. The first model detects
the occurrence of nodules while the second model
confirms it’s presence. The different stages of the pro-
posed model are discussed below:
2.1.1 Dataset Acquisition
The LIDC dataset used for detection of pulmonary
nodules contains 1010 CT scans from 1010 different
patients(Armato III et al., 2011). Seven cases where
the scans were incomplete are excluded. Each scan
was checked and annotated by four radiologists man-
ually and a total of 2632 nodules were found in the
dataset. ’annotations’ is a csv file included in the
dataset that contains the information which are used
as a standard reference for nodule detection. An-
other csv file under the name ’candidates’ used for the
LUNA16 workshop contains a set of candidate loca-
tions for checking the correctness and completeness
of the nodule location thereby ensuring reduction in
false cases.
2.1.2 Preprocessing
Segmentation of lungs from the surrounding region
was the first step involved where lungs from the CT
scan images are segmented by using predefined edge
detection techniques so having the focus is within the
pulmonary region.
The next step in pre-processing is masking of the
segmented lung images to highlight the region of in-
terest based on the annotations file in LIDC dataset.
Based on the coordinate values and the radius of the
nodules specifies in the annotations.
Both the segmented and masked images are sliced
layer by layer. The segmented images serve as input
images and masked images as corresponding labels.
The masked and segmented images to be fed as input
to the model are concentrated to the desired region of
interest where the nodules are present. The size of the
largest nodule found in the LIDC dataset upon traver-
sal through 2632 nodules is not more than 64× 64 pix-
els. Therefore, the setting of the ROI to this size will
sufficiently enclose all nodules in the LIDC dataset
and remove any chances of missing out any nodule.
For this, the segmented images and masked images
are converted to 64 × 64 pixels images and stacked
into 16 layers.
For the second model, cubes of size of the nodules
are generated based on the candidates file that con-
tains the location as coordinates and radius and the
corresponding label as 1/0 denoting nodular or non-
nodular region respectively. The number of dataset
examples depicting nodular regions labeled as 1 were
much less than non-nodular region. So, balancing
of dataset is required. Hence, augmentation is used
which is a technique that can be used to artificially
VISAPP 2020 - 15th International Conference on Computer Vision Theory and Applications
446