Best Fit Missing Value Imputation (BFMVI) Algorithm for Incomplete

Data in the Internet of Things

Benjamin Agbo, Yongrui Qin and Richard Hill

School of Computing and Engineering, University of Huddersﬁeld, U.K.

Keywords:

Missing Values, Imputation, Internet of Things (IoT), Best Fit Missing Value Imputation (BFMVI).

Abstract:

The noticeable growth in the adoption of Internet of Things (IoT) technologies, has led to the generation of

large amounts of data usually from sensor devices. When dealing with massive amounts of data, it is very

common to observe databases with large amounts of missing values. This is a challenge for data miners

because various methods for data analysis only work well on complete databases. A popular way to deal with

this challenge is to ﬁll-in (impute) missing values using adequate estimation techniques. Unfortunately, a good

number of existing methods rely on all the observed values in the entire dataset to estimate missing values,

which signiﬁcantly causes unfavourable effects (low accuracy and high complexity) on imputed results. In this

paper, we propose a novel imputation technique based on data clustering and a robust selection of adequate

imputation equations for each missing datapoint. We evaluate our proposed method using six University of

California Irvine (UCI) datasets, and relevant comparison with ﬁve recently proposed imputation methods.

The results presented showed that the performance of the proposed imputation method is comparable with the

Local Similarity Imputation (LSI) technique in terms of imputation accuracy, but is signiﬁcantly less complex

than all the existing methods identiﬁed.

1 INTRODUCTION

The Internet of Things (IoT) can be described as a net-

work of multiple devices that can sense, process and

share data generated from their surroundings (Singh

et al., 2018). The adoption of IoT in various plat-

forms has enabled easy communication and access to

a wide range of devices such as sensors, actuators,

home appliances, surveillance cameras, vehicles, etc.

Therefore, there is a need to deploy more applications

that will adapt to the potentially increasing amount

and variety of data that will be generated by IoT de-

vices (Agbo et al., 2019).

In order to ensure the usefulness of data gener-

ated by IoT devices in various data mining tasks, re-

searchers have attempted to curb the popular chal-

lenge of incompleteness associated with sensor gen-

erated data. According to (Lata and Chakraverty,

2014), data is often incomplete due to a number of

factors such as: human errors, erroneous measure-

ments, communication malfunctions or faulty equip-

ment e.g. sensors. Failure to account for missing data

will signiﬁcantly compromise the validity of ﬁndings

from a dataset. In general, it could undermine the

conclusions of a study by reducing the sample size

which introduces bias (Read, 2015). Popular meth-

ods that have been used in research to handle the issue

of missing data include: list-wise deletion, pair-wise

deletion, hot decking, mean imputation and regres-

sion imputation. Despite the fact that these methods

are straightforward to implement, they may lead to

loss of information or introduce bias in the results ob-

tained (Inman et al., 2015). In addition, most imputa-

tion methods consider the values of an entire dataset

before estimating missing values. This could have un-

favorable effects on the imputation process (e.g. high

complexity or low accuracy).

One of the leading reasons for handling missing

values is to improve the accuracy of clustering and

classiﬁcation tasks (Silva-Ram

ırez et al., 2015). How-

ever, most imputation methods are computationally

intensive and therefore, take time to estimate and im-

pute missing values. This may be inconsequential for

training processes but it will not be practical to spend

much time in estimating values for incomplete in-

stances during clustering or classiﬁcation tasks. This

is most especially true for complex imputation tech-

niques such as Multiple Imputation by Chained Equa-

tions (MICE), which rebuilds an imputation structure

from every training instance and new instance (Tran

130

Agbo, B., Qin, Y. and Hill, R.

Best Fit Missing Value Imputation (BFMVI) Algorithm for Incomplete Data in the Internet of Things.

DOI: 10.5220/0009578201300137

In Proceedings of the 5th International Conference on Internet of Things, Big Data and Security (IoTBDS 2020), pages 130-137

ISBN: 978-989-758-426-8

et al., 2018). Although recent literature has shown

signiﬁcant increase in the accuracy of advanced im-

putation methods, the high costs associated with these

methods in various tasks has often raised concerns.

Therefore, it has become paramount to address the

question of how the computation time of new meth-

ods could be reduced without sacriﬁcing their accu-

racy (Tran et al., 2018).

In recent years, various machine learning (ML) al-

gorithms have been introduced to handle the issue of

data incompleteness which often occurs as a result of

missing values (Angelov, 2017). These algorithms are

designed to handle this issue by imputing the most

plausible values in instances with missing values. In

contrast to popular statistical methods for ﬁlling in

missing values, machine learning algorithms use ex-

isting data in a dataset to train and develop a model

that will be used to impute missing values. Various

ML algorithms for imputing missing values have been

identiﬁed in literature such as probabilistic methods,

decision trees, rule based methods etc. (Farhangfar

et al., 2008).

In this paper, we propose a novel imputation tech-

nique which utilizes the similarity between observed

values to perform imputation. This is achieved by par-

titioning an incomplete dataset in the ﬁrst instance.

Then the similar records within cluster are used to

estimate the missing values. However, some chal-

lenging issues have been identiﬁed with the proposed

method including how to perform clustering on the

incomplete dataset before imputation. To solve this

problem, we initially assign distinctive values to re-

place all the missing values. This reduces the effect

of missing values in the datasets and enhances clus-

tering on the incomplete datasets.

We evaluate the performance of our pro-

posed BFMVI technique against existing techniques

namely- LSI, FIMUS, FCM, DMI and EMI, on six

datasets obtained from University of California Irvine

(UCI) machine learning repository.

2 RELATED WORKS

Many research efforts have been channelled towards

addressing the issue of data incompleteness by at-

tempting to develop more accurate and reliable im-

putation techniques. In this section, we will review

various related research and recent efforts aimed at

addressing this problem.

A framework for the imputation of missing values

using co-appearance, correlation and similarity anal-

ysis (FIMUS) was proposed by (Rahman and Islam,

2014). The overal idea behind this method is to make

educated guesses based on the correlation between at-

tributes, co-appearance of values and the similarity

between values that belong to an attribute. Unlike var-

ious existing technique, FIMUS can also be used to

impute missing categorical variables. To compute co-

appearances between values that belong to different

attributes, FIMUS ﬁrst of all summarizes the values

of numerical attributes into various categories. For in-

stance, the algorithm groups the values of an attribute

into

| number of categories, where |A

| is

the domain size of A

. This strategy of grouping is

advantageous due to its simplicity. However, it may

not always detect natural groups due to the fact that

it artiﬁcially makes the range of values for each cate-

gory equal.

Various missing value imputation techniques have

approached imputation using clustering schemes such

as k-means and FCM. Another technique proposed by

(Zhang et al., 2018) approaches imputation ﬁrstly by

partitioning a dataset into k clusters. This will re-

sult in the formation of membership values for items

within a particular cluster or cluster centroid. Then,

all the missing values are evaluated using the mem-

bership degree of objects that fall within the same

cluster centroid. The simplicity of this method con-

stitutes a major advantage. However, the accuracy of

the FCM imputation may be signiﬁcantly affected by

clustering results in usual situations when the selec-

tion of a suitable number of k clusters is challenging

for data miners.

The Expectation maximization imputation (EMI),

proposed by (Schneider, 2001; Dempster et al., 1977)

is one of the most popular missing value imputation

techniques identiﬁed in literature. To impute miss-

ing numerical values, this technique estimates the

mean and covariance matrix from observed values in

a dataset and iterates until no considerable change is

noticed in the values of the imputed data, mean and

covariance matrix, from one iteration to another. Ac-

cording to research, the EMI algorithm only works

best in datasets with values that are missing at ran-

dom. The main disadvantage of this method however,

is that it relies on the information from other values in

the dataset. Therefore, this method is only suitable for

datasets with high correlation among attributes (Deb

and Liew, 2016).

Another technique used to handle the issue of

missing data is the Decision tree based missing value

imputation (DMI) algorithm proposed by (Rahman

and Islam, 2013). This technique incorporates the de-

cision tree and the EMI algorithm for imputing miss-

ing values. The authors argue that attributes within

the horizontal partition of a dataset can have higher

correlation than the correlation of attributes over the

Best Fit Missing Value Imputation (BFMVI) Algorithm for Incomplete Data in the Internet of Things

131

entire dataset. The processes involved in DMI are de-

scribed below: ﬁrstly, it divides the complete dataset

f ull

) into two smaller datasets with one having in-

complete data i.e. with missing values (D

miss

) and the

other, (D

complete

) having complete records. Next, it

builds up decision trees based on (D

complete

), taking

the attributes with incomplete values in (D

miss

) as the

class attributes. After that step, it further assigns ev-

ery record having missing values in (D

miss

) to the leaf

it falls on the tree, which takes the attribute having the

missing value as the class attribute. Finally, the DMI

algorithm employs the EMI algorithm to ﬁll-in miss-

ing numerical values and majority of the class values

within each leave to impute missing categorical val-

ues.

Another method used to handle the issue of miss-

ing data is the Local Similarity Imputation (LSI) tech-

nique proposed by (Zhao et al., 2018). Here, miss-

ing values are estimated using top k-nearest neigh-

bours and fast clustering. Firstly, a dataset with miss-

ing values is partitioned into clusters, then the most

similar records from each cluster are used to estimate

the missing values. To enhance the accuracy of clus-

tering, this technique uses a two-layer deep learning

algorithm to detect important features within a clus-

ter. Therefore, this will enable the fast clustering al-

gorithm to effectively read important records from a

dataset one time. Lastly, the top k-nearest neighbour

algorithm is used to evaluate and impute missing val-

ues in individual clusters.

Though these methods show good performance in

terms of their imputation accuracy, their huge compu-

tation time will reduce their efﬁciency when dealing

with increasing volumes of data.

3 ROBUST BFMVI FOR

INCOMPLETE DATA

The structure of our method is represented under two

stages: ﬁrstly, the incomplete dataset is partitioned

into different groups and at the second stage, miss-

ing values within each partition is imputed using the

BFMVI algorithm.

3.1 Arbitrary Clustering

To partition our datasets, we ﬁrst of all ﬁll in all miss-

ing values with distinctive values. To enable fast exe-

cution of our algorithm, we stored the sample of our

dataset with pre-imputed records in a array. An arbi-

trary number (γ) of items were taken from the dataset

to form different groups, containing similar records.

According to (Zhang et al., 2015), better imputation

results could be achieved when similar samples are

used to evaluate missing values. However, (Zhao

et al., 2018) argued that existing clustering algorithms

perform minimally in incomplete datasets due to the

fact that missing values pose serious uncertainties and

affect the accuracy and usability of existing cluster-

ing algorithms. Although, more prospects still remain

for the improvement of our clustering approach, the

strength of our contribution however, lies in our im-

putation method.

Algorithm 1: Clustering Algorithm.

Input: Dataset with missing values, D ∈ X

n∗m

. Parameter

γ, β.

Output: Dataset Clusters and their number k

1: D ← PreImp (dv, D); //initially ﬁll missing values with

distinctive value

2: Arr ← GetValuesIn (D); //get preimputed values of D

and store in array

3: for i = 1 to l do

4: [Cluster, γ] ← Partition (Arr [γ], Clusters.β ;// Par-

tition arbitrary values of Arr [γ] into β groups.

5: end for

6: Return Clusters and their number k

3.2 BFMVI based on Arbitrary

Clustering

As stated earlier, the ﬁrst phase of our technique in-

volves partitioning our datasets into groups of items

with similar records, then the missing values are es-

timated using the observed values of records present

in each cluster. The strength of our contribution lies

in the ability of our model to choose the most suit-

able imputation method for each missing datapoint.

Lets assume [k

, k

,. . . , k

] to be k clusters gener-

ated from the pre-imputed dataset D and [x

, x

,. . . ,

] is a non-nominal distribution with missing values.

In the imputation process, the algorithm develops six

imputation results as seen in equation 1-6 and selects

a suitable imputation equation for each missing data-

point based on a deﬁned criteria.

Imputation 1: The average value of observed

records in each cluster are used to ﬁll in each miss-

ing datapoint. Our parameters γ and λ are set to 3 and

0.4 respectively.

∑

i=1

(1)

Imputation 2: For each partition with missing val-

ues x

, missing values are imputed as follows:

∑

i=1

(2)

IoTBDS 2020 - 5th International Conference on Internet of Things, Big Data and Security

132

Figure 1: Framework for Imputation Model.

where r

is the corresponding mode value of x

and

represents the distribution size.

Imputation 3: The log of d

is computed and the

parameter γ which is set to 3, is multiplied by the re-

sulting value.

R = logd

(γ) (3)

Imputation 4: For each missing value within clusters,

imputed values are also evaluated by:

I = log p

(γ) (4)

Imputation 5: The sum of I and R is computed and

their resulting average is used to ﬁll in the missing

values within each group.

logd

(γ) + log p

(γ)

(5)

Imputation 6: Finally, our parameter λ is added to the

resulting value of N

and Missing values within each

group is imputed using O

= (

logd

(γ) + log p

(γ)

) + λ (6)

After computing all the values for the missing in-

stances using equation (1-6), the error between each

previous imputation (r

pre

) and the six imputations

(α

curr

) are estimated using the following equation:

err = r

pre,i

− α

curr,i

(7)

For each missing data point, the value of r

pre,i

is com-

pared with all the values estimated from equations (1-

6). The difference between each α

curr,i

and the previ-

ous imputation r

pre,i

is computed and the value with

the lowest error shows a higher similarity with r

pre,i

and is used to impute the value for a particular miss-

ing data point within a cluster.

Considering further improvement and more ap-

plications in dynamic environments, our proposed

method would have the potential to handle dynamic

changes in a dataset as it selects the most appropriate

value for each missing data point.

Algorithm 2: Best Fit Missing Value Imputation.

Input: Dataset with missing values, D ∈ X

n∗m

. Parameters

α.

Output: Dataset with Imputed values P.

1: while 1 do

2: [Clusters,k

]← Clustering alg (D); // partitioning

the incomplete dataset using Algorithm 1.

3: for i = 1 to k

4: [InData, p] ← GetFromData(Clusters.k) ;// get

subsets with incomplete records p;

5: for j = 1 to p do

6: r1 = mean (InData[j], (Clusters.k);

7: r2 = (mode/length)(InData[j],

8: r3 = (log(r2)*3))(InData[j], Clusters.k)

9: r4 = (log(r1)*3)(InData[j], Clusters.k)

10: r5 = r3 + r4/2

11: r6 = r5 + 0.4

12: end for

13: Get set of imputation results r

curr

of Clus-

ters.k

14: α

curr

← GetSet (r

curr

) ;// get current set of

imputation results

15: end for

16: Calculate err between previous and current impu-

tations via (5)

17: Let α

= err(α

curr

)

18: for each α

curr

19: if r

= min α

then

20: P ← Out putDataset(D, r

curr

) ;// r

curr

with

lowest error is used for imputation

21: Stop

22: end if

23: end for

24: end while

25: Return complete dataset P;

4 EXPERIMENTS AND ANALYSIS

4.1 Experimental Design

To assess the plausibility of our technique against

other existing techniques, namely LSI, FIMUS, FCM,

DMI and EMI, we used six UCI machine learning

datasets with no missing values as ground truth. Then,

the missing values were artiﬁcially imposed on the

Best Fit Missing Value Imputation (BFMVI) Algorithm for Incomplete Data in the Internet of Things

133

Table 1: Description of Six UCI Datasets.

Dataset Records Attributes Classes

Iris 150 4 3

Pima 768 8 2

Wine 178 13 3

Yeast 1484 9 10

Housing 506 14 Null

Adult 48842 14 Null

Table 2: d

and average execution time (sec) of the six imputation techniques on the six UCI datasets (at 3, 6, 9, 12 and 15%

missing data).

Datasets

Imputation Methods Iris Pima Wine Yeast Housing Adult

(t) d

(t)

BFMVI 0.977 (0.031) 0.907 (0.145) 0.959 (0.043) 0.946 (0.258) 0.967 (0.105) 0.9657 (6.89)

LSI 0.983 (0.358) 0.914 (2.439) 0.952 (0.331) 0.948 (15.557) 0.983 (1.654) 0.971 (35.65)

FIMUS 0.966 (1.154) 0.90 (313.248) 0.938 (1.393) 0.854 (1412.75) 0.940 (7.257) 0.954 (1923.35)

FCM 0.964 (0.256) 0.882 (0.874) 0.788 (0.242) 0.929 (13.974) 0.916 (0.301) 0.751 (23.75)

DMI 0.954 (2.683) 0.860 (412.386) 0.864 (12.363) 0.936 (73.146) 0.912 (84.552) 0.881 (103.04)

EMI 0.957 (0.173) 0.848 (1.674) 0.868 (0.549) 0.911 (5.417) 0.905 (2.785) 0.713 (19.78)

datasets in order to test the accuracy of the six im-

putation techniques. Since the original values of the

datasets are known, we can easily evaluate the accu-

racy of the missing data imputation techniques by ob-

serving how close the imputed values are to the origi-

nal (Zhao et al., 2018). Each of the UCI datasets were

then regenerated into ﬁve unique data sets with differ-

ent percentages of missing values: 3%, 6%, 9%, 12%

and 15% respectively on each dataset.

The six imputation methods are then used to ﬁll

in the different percentages of missing values in each

dataset. For the purpose of our simulation, we used

a dimensionality reduction technique called Princi-

pal Component Analysis (PCA) to reduce interrelated

components, thereby retaining the variation of values

present in each dataset. This led to the generation

of new sets of uncorrelated records called principal

components, which were used to simulate the differ-

ent percentages of missing data. The criteria that are

used to quantify the performance of the imputation

methods are RMSE and d

. We further computed the

execution time for each technique to evaluate their

performance in resource constraint scenarios. From

equation 8 and 9, N represents the number of values

missing. P

and O

are the respective imputed and ac-

tual values of the ith missing values, and

O represents

the average of the actual values. The RMSE value can

range from 0 to ∞, with a lower value indicating bet-

ter imputing performance. The value of d

can range

from 0 to 1 with a higher value indicating better re-

semblance (Zhao et al., 2018).

RMSE =

∑

i=1

− O

)

(8)

= 1 − [

∑

i=1

− O

)

∑

i=1

(|P

−

O| + |O

−

O|)

] (9)

4.2 Results and Analysis

Figure 2-7 presents the accuracy of our BFMVI tech-

nique against LSI, FIMUS, FCM, DMI and EMI tech-

niques on iris, wine, boston housing, yeast, pima and

adult datasets in terms of their RMSE for 5 missing

data ratios.

Table 2 further shows the index of agreement (d

)

and execution time (in seconds) for the six imputation

techniques on the six UCI datasets.

From the results, it can be observed that the pro-

posed method shows a low error rate and good impu-

tation accuracy but does not completely outperform

the LSI technique. However, it shows the best per-

formance in terms of execution time compared to the

ﬁve other methods. Although, the popular EMI tech-

nique considers the entire instances in a dataset before

performing imputation, it still has the lowest accuracy

among all the six methods that were tested.

IoTBDS 2020 - 5th International Conference on Internet of Things, Big Data and Security

134

9 12

0.05

0.1

0.15

0.2

0.25

Missing Data Ratio (%)

RMSE

Boston Housing Data

BFMVI

LSI

FIMUS FCM

DMI EMI

Figure 2: RMSE of imputation methods on Housing Data.

9 12

0.03

0.06

0.09

0.12

0.15

Missing Data Ratio (%)

RMSE

Iris Data

BFMVI

LSI

FIMUS FCM

DMI EMI

Figure 3: RMSE of imputation methods on Iris Data.

In contrast, the LSI technique shows the best per-

formance in terms of imputation accuracy but fails to

completely outperform FCM, EMI and our method in

terms of execution time.

FIMUS is another hybrid method that considers

every record in a dataset before imputation. The ac-

curacy of this method is better than FCM DMI and

EMI in all ﬁve datasets and sometimes outperforms

our method when a higher percentage of missing-

ness is observed in a dataset. However, the execu-

tion time of this method is poor compared to LSI,

FCM, EMI and our method. From our observation,

the performance of the execution time reduced signif-

icantly when more records were observed ( e.g. in the

pima and yeast datasets). The DMI and FCM tech-

niques partition the datasets into small groups with

similar records which could have a positive effect on

the imputation of missing values when closely related

9 12

0.04

0.08

0.12

0.16

Missing Data Ratio (%)

RMSE

Wine Data

BFMVI

LSI

FIMUS FCM

DMI EMI

Figure 4: RMSE of imputation methods on Wine Data.

9 12

0.08

0.09

0.1

0.11

0.12

0.13

0.14

0.15

Missing Data Ratio (%)

RMSE

Yeast Data

BFMVI

LSI

FIMUS FCM

DMI EMI

Figure 5: RMSE of imputation methods on Yeast Data.

records are used to estimate missing records. How-

ever, DMI and FCM completely rely on the accuracy

of clustering or classiﬁcation and therefore perform

minimally due to clustering or classiﬁcation inaccu-

racy.

Overall, the accuracy of our proposed method ran

close to the LSI method on ﬁve out of six datasets

but showed a clear distinction from the LSI method

on the boston housing dataset. This was largely in-

ﬂuenced by the accuracy of the fast clustering algo-

rithm using a two-layer deep learning algorithm in the

LSI method. We will attempt to address these lim-

itations by improving the similarity between records

used to estimate these missing values in our proposed

method.

Best Fit Missing Value Imputation (BFMVI) Algorithm for Incomplete Data in the Internet of Things

135

9 12

0.1

0.12

0.14

0.16

0.18

Missing Data Ratio (%)

RMSE

Pima Data

BFMVI

LSI

FIMUS FCM

DMI EMI

Figure 6: RMSE of imputation methods on Pima Data.

9 12

Missing Data Ratio (%)

RMSE

Adult Data

BFMVI

LSI

FIMUS FCM

DMI EMI

Figure 7: RMSE of imputation methods on Adult Data.

5 CONCLUSION AND FUTURE

WORKS

Inferences drawn from various data mining tasks

(such as prediction, clustering, classiﬁcation etc.) can

signiﬁcantly be affected by the presence of missing

data. Therefore, to ensure the validity of information

drawn from these tasks, the imputation of missing val-

ues using adequate techniques is paramount. In this

paper, we present a BFMVI technique for handling

incomplete static databases. The proposed method

ﬁrst of all ﬁlls in missing data points with distinc-

tive values and partitions the pre-imputed dataset us-

ing arbitrary values. Secondly, based on the similar-

ity between values in each cluster, missing values are

estimated using the BFMVI algorithm. From the ex-

periments, it is observed that our proposed method is

less complex that other identiﬁed methods and shows

considerable performance in terms of imputation ac-

curacy, which makes it a good ﬁt for resource con-

straint scenarios.

Considering the characteristics of IoT data and its

contribution to the big data era, more work still needs

to be done with regards to developing robust and less

complex algorithms for handling missing values ob-

served in streams of continuously generated data. Our

future research will be based on the improvement of

the proposed imputation method and its adoption in

more dynamic scenarios.

REFERENCES

Agbo, B., Qin, Y., and Hill, R. (2019). Research directions

on big iot data processing using distributed ledger

technology: A position paper. In IoTBDS.

Angelov, B. (2017). Working with missing data in machine

learning.

Deb, R. and Liew, A. W.-C. (2016). Missing value impu-

tation for the analysis of incomplete trafﬁc accident

data. Information sciences, 339:274–289.

Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977).

Maximum likelihood from incomplete data via the em

algorithm. Journal of the Royal Statistical Society:

Series B (Methodological), 39(1):1–22.

Farhangfar, A., Kurgan, L., and Dy, J. (2008). Impact of

imputation of missing values on classiﬁcation error

for discrete data. Pattern Recognition, 41(12):3692–

3705.

Inman, D., Elmore, R., and Bush, B. (2015). A case

study to examine the imputation of missing data to

improve clustering analysis of building electrical de-

mand. Building Services Engineering Research and

Technology, 36(5):628–637.

Lata, K. and Chakraverty, S. (2014). Handling data incom-

pleteness using rough sets on multiple decision sys-

tems. In 2014 International Conference on Data Min-

ing and Intelligent Computing (ICDMIC), pages 1–6.

IEEE.

Rahman, M. G. and Islam, M. Z. (2013). Missing value im-

putation using decision trees and decision forests by

splitting and merging records: Two novel techniques.

Knowledge-Based Systems, 53:51–65.

Rahman, M. G. and Islam, M. Z. (2014). Fimus:

A framework for imputing missing values using

co-appearance, correlation and similarity analysis.

Knowledge-Based Systems, 56:311–327.

Read, S. H. (2015). Applying missing data methods to rou-

tine data using the example of a population-based reg-

ister of patients with diabetes.

Schneider, T. (2001). Analysis of incomplete climate data:

Estimation of mean values and covariance matrices

and imputation of missing values. Journal of climate,

14(5):853–871.

IoTBDS 2020 - 5th International Conference on Internet of Things, Big Data and Security

136

Silva-Ram

ırez, E.-L., Pino-Mej

ıas, R., and L

opez-Coello,

M. (2015). Single imputation with multilayer per-

ceptron and multiple imputation combining multilayer

perceptron and k-nearest neighbours for monotone

patterns. Applied Soft Computing, 29:65–74.

Singh, M., Singh, A., and Kim, S. (2018). Blockchain:

A game changer for securing iot data. In Internet

of Things (WF-IoT), 2018 IEEE 4th World Forum on,

pages 51–55. IEEE.

Tran, C. T., Zhang, M., Andreae, P., Xue, B., and Bui, L. T.

(2018). Improving performance of classiﬁcation on

incomplete data using feature selection and clustering.

Knowledge-Based Systems, 154:1–16.

Zhang, L., Pan, H., Wang, B., Zhang, L., and Fu, Z. (2018).

Interval fuzzy c-means approach for incomplete data

clustering based on neural networks. Journal of Inter-

net Technology, 19(4):1089–1098.

Zhang, Q., Yang, L. T., Chen, Z., and Xia, F. (2015). A

high-order possibilistic c-means algorithm for cluster-

ing incomplete multimedia data. IEEE Systems Jour-

nal, 11(4):2160–2169.

Zhao, L., Chen, Z., Yang, Z., Hu, Y., and Obaidat, M. S.

(2018). Local similarity imputation based on fast clus-

tering for incomplete data in cyber-physical systems.

IEEE Systems Journal, 12(2):1610–1620.

Best Fit Missing Value Imputation (BFMVI) Algorithm for Incomplete Data in the Internet of Things

137