Missing Data Imputation in Daily Wearable Data for Improved

Classiﬁcation Performance

Mikel Catalina

1 a

, Ander Cejudo

1,2 b

and Cristina Martín

1,2 c

Fundación Vicomtech, Basque Research and Technology Alliance (BRTA),

Mikeletegi 57, 20009 Donostia, San Sebastián, Spain

Faculty of Engineering, University of Deusto, Avda. Universidades, 24, Bilbao 48007, Spain

Keywords:

Wearables, Artiﬁcial Intelligence, Data Imputation and Classiﬁcation.

Abstract:

In the realm of wearable technology, the continuous monitoring of health parameters through smartwatches

provides a wealth of daily data for research and analysis. However, this data often encounters missing values,

presenting a challenge for interpretation and utilization. Remarkably, there exists a notable gap in the literature

concerning the imputation of missing daily data from smartwatches. To address this gap, our study systemati-

cally explores a diverse set of imputation methods with Fitbit wearable data, encompassing various scenarios

and missing rates. Our primary objectives are: (i) measure the inﬂuence of missing values rate and distribution

on the proposed imputation methods; (ii) assess the role of data imputation in enhancing the performance of

machine learning algorithms. Our results underscore the pivotal role of missing data patterns in imputation

method selection. Furthermore, we demonstrate that more advanced data imputation approaches positively

contributes to the efﬁcacy of classiﬁcation algorithms, improving 4,4% and 0,4% in terms of F-measure for

the proposed classiﬁcation tasks. This study not only addresses the challenges associated with missing data

in wearable daily monitoring but it also provides practical insights for the optimization of machine learning

applications in health monitoring.

1 INTRODUCTION

The wearable market including wrist wearable de-

vices, has being growing in the last decade reaching

an industry size of 137.89 billion USD in 2022 and

it is expected to continue increasing, reaching 1.300

billion USD by 2035 (Nester, 2023). This technology

allows the continuous and remote monitoring of the

users ’s health parameters.

The utility of wearable devices have turn them into

a suitable tool for research in the healthcare domain,

enabling the development of data analytics, data vi-

sualization and artiﬁcial intelligence techniques for

the prevention and analysis of upcoming health events

(Iqbal et al., 2021; Lu et al., 2016). Data gathered

by wearable devices is categorized into two primary

classes in this work: sparse health parameters, which

encompass raw time series collected from wearable

data with frequencies lower than one day, and daily

health statistics, which comprise daily summaries of

https://orcid.org/0009-0000-2076-8015

https://orcid.org/0000-0001-7944-2706

https://orcid.org/0000-0002-3919-2738

sparse health parameters.

Wearable devices include some limitations such

as a ﬁnite battery duration, error in the readings of

several parameters for not having the wearable well

tighten, connectivity problems, deterioration of the

hardware components or even the user can forget to

wear it (Baek and Shin, 2017). As a result, the data

gathered from these devices can present signiﬁcant

time spans without readings, even though removing

the registers would be the easiest procedure, this may

lead to unfavorable outcomes: less data to analyze,

inconsistencies and depending the class of missing

values (more precisely: missing not at random) re-

moving them will cause a biased result (Weber et al.,

2017).

Consequently, several works have attempted to de-

velop and evaluate automatic tools for missing data

imputation, learning the behavioral patterns of the dif-

ferent variables in the data and predicting the missing

values (Buczak et al., 2023). However, there are few

works that have evaluated the effectiveness of differ-

ent data imputation techniques on smart watch data,

and most of them evaluating large time series such as

Catalina, M., Cejudo, A. and Martín, C.

Missing Data Imputation in Daily Wearable Data for Improved Classiﬁcation Performance.

DOI: 10.5220/0012625500003699

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 10th International Conference on Information and Communication Technologies for Ageing Well and e-Health (ICT4AWE 2024), pages 59-72

ISBN: 978-989-758-700-9; ISSN: 2184-4984

heart rate or breathing, leaving out a wide range of

daily health statistics.

To address this issue, the main objective of this pa-

per is to evaluate the effectiveness of a wide range of

imputation methods on wearable data in order to im-

prove the classiﬁcation performance of the proposed

models. In our work we address the following re-

search questions:

• RQ1. Do existing imputation algorithms obtain

better results compared to mean and median meth-

ods (baselines) in wearable data?

• RQ2. Does considering data from previous days

improve the quality of data imputation?

• RQ3. Is the performance of the proposed data im-

putation algorithm better than baselines when the

missing value rate increases?

• RQ4. Does including imputed data in the training

set help to improve the performance of classiﬁca-

tion models?

The structure of the paper is organized as follows.

In section 2, we delve into the previous research con-

ducted by other authors on data imputation techniques

for data obtained from wearables. Section 3 provides

an explanation of the dataset utilized and the prepro-

cessing steps taken. Section 4 explains the techniques

used for data imputation and the different scenarios

where the proposed data imputation algorithms will

be tested. Section 5 shows the results obtained with

the proposed methodology, including a discussion,

whereas section 6 concludes the study by summariz-

ing the key ﬁndings.

2 PREVIOUS WORK

There are several works that have addressed the im-

putation of missing values in wearable data. In Lin

et al. (2020), the authors present a deep learning ap-

proach utilizing LSTM layers (Yu et al., 2019) to im-

pute missing heart rate values in time series data ac-

quired from Fitbit and Garmin wearables. Besides,

personal data for each subject is considered separately

for imputation. The authors feed the model with a

set of time series and make use of the adversarial

training (Zhao et al., 2022) to obtain results that im-

prove the performance of baselines, such as linear in-

terpolation (Noor et al., 2015) and moving average.

The proposed method is then tested in two smart-

watch datasets: in the case of the Garmin dataset

(Mattingly et al., 2019) increasing the RMSE (Will-

mott and Matsuura, 2005) score from 4,1% to 58,5%,

whereas in the Fitbit dataset (Faust et al., 2017) the

improvement went from 6,9% to 54,3% both of them

over the baselines and weighing speciﬁc periods of

time. In (Feng and Narayanan, 2019) the OMsig-

nal, a wearable that attaches to a shirt, is employed

to obtain various physiological metrics. The authors

decide to impute the missing values in the heart rate

time series, breath and steps time series using a re-

current neural network (Medsker and Jain, 2001) that

considers time dependency of the input to ﬁll the

gaps. The authors develop a model that enhances

the attained scores in comparison to mean imputa-

tion and KNN imputations—chosen as baseline meth-

ods—particularly as missing rates escalate. The au-

thors of Wu et al. (2020), propose a convolutional au-

toencoder (Masci et al., 2011) that has the ability to

evaluate adjacent values to ﬁll in the missing values,

moreover the authors make use of transfer learning

(Bozinovski, 2020) to incorporate the knowledge of a

model trained on wearable data with different users

of Garmin (Mattingly et al., 2019) or Fitbit (Faust

et al., 2017) devices to address the lack of data in

some subjects. The authors conclude that their model

is able to effectively impute data on heart rate time se-

ries. The performance of the model substantially im-

proves the results over the baseline methods obtaining

a 4,67% reduction of MAPE (De Myttenaere et al.,

2016) throughout two different datasets and distinct

test scenarios. Other studies as in (Huo et al., 2022),

consider accelerometer, gyroscope and magnetome-

ter data collected from smartphones with a frequency

of 20 seconds. Following the example of Wu et al.

(2020), the authors opted for the implementation of

an autoencoder (Bank et al., 2023), which comprises

LSTM layers to capture and retain information from

previous inputs. This design allows the model to con-

sider the temporal dependency of the input data. The

achieved results surpassed those of baseline methods,

including mean imputation, KNN, and random for-

est models. Notably, there was a substantial mean in-

crease in accuracy of 6,25% when the missing rate

values exceeded 10%. Additionally, increasing in the

rate of missing values does not have a signiﬁcant im-

pact in the performance of the model as in this work

for certain methods.

There are numerous works that have studied the

utility of daily health statistics reported by wearable

devices. In Sathyanarayana et al. (2016), the Acti-

Graph GT3X+ (Aadland and Ylvisåker, 2015) wear-

able is used to monitor the sleep of different sub-

jects. The objective of this study is to predict the

sleep efﬁciency of the user, differentiating between

good (SE > 85%) and bad sleep (SE < 85%). For

that, recorded physical activity of the same day ob-

tained from accelerometers is considered. They pro-

ICT4AWE 2024 - 10th International Conference on Information and Communication Technologies for Ageing Well and e-Health

posed various deep learning methods reaching a 89%

accuracy for their best one, a LSTM based neural net-

work. The research conducted in Conroy et al. (2022)

with Garmin smartwatches, Oura rings and Empat-

ica E4 wristbands aimed to detect Covid-19 with the

recorded physiological metrics. First of all, they stan-

dardize the metrics from different wearables, getting

a time series with a sampling frequency of 10 min-

utes (this allows to merge breath and heart param-

eter from different devices). Once the data is stan-

dardized, cleaning is done according to the adequate-

ness of sleep data. Using machine learning models an

AUC of 0,82 and an F-score of 0,44 is achieved for

the prediction of Covid-19. In the work of Kanokoda

et al. (2019), a glove is made in order to collect data

from strain sensors located in three ﬁngers to predict

hand gestures via a TDNN (Längkvist et al., 2014)

deep learning model. Attaining a model capable of

real-time result prediction, achieving a mean accuracy

of 84,6% when forecasting the next 10 steps, decreas-

ing to 61.1% when considering 30 steps ahead. In

the study conducted in Zhu et al. (2020), daily data

from various smartwatches, including Fitbit and Ap-

ple Watch, was collected from a substantial cohort of

30.529 participants over a two-month period. The

primary goal of this study was to monitor the out-

break of Covid-19 and predict potential infections.

Lastly, both Ghandeharioun et al. (2017) aim to pre-

dict depression based on wearable data. According

to Ghandeharioun et al. (2017), the authors employ

E4 wristbands and the smartphone usage data that

then is aggregated in both, intervals of 6 hours and

days. They introduce the data on an ensemble ma-

chine learning method to predict the Hamilton De-

pression Rating Scale (Williams, 2001). Achieving

an RMSE of 4,5 on the test exercise. The results sug-

gest that the information provided by health metrics

gathered from wearable devices can monitorize wear-

ers with the recorded data.

This study aims at assessing the impact of a di-

verse range of imputation methods on daily health

statistics derived from wearable devices. Notably, ex-

isting research predominantly focuses on the inﬂu-

ence of various data imputation algorithms on time

series data obtained from wearables. However, there

is a signiﬁcant gap in understanding how these algo-

rithms perform when applied to daily health statis-

tics obtained from wearable devices. Moreover, while

daily health statistics from wearables have been uti-

lized in various artiﬁcial intelligence tasks, the inves-

tigation into the effects of imputation algorithms on

commonplace tasks, such as classiﬁcation, remains

largely unexplored.

3 MATERIALS

The popularity of wearable devices has increased

to such an extent that using their data is becoming

more frequent (Lu et al., 2016). This has led to

several datasets employed by previous works: Faust

et al. (2017), Tesserae project (Mattingly et al., 2019),

WISDM (Weiss, 2019), Bent et al. (2021), harAGE

presented in (Mallol-Ragolta et al., 2021), Vaizman

et al. (2017) and PMData (Thambawita et al., 2020).

As for private datasets, in Faust et al. (2017), a

smartphone and Fitbit dataset is introduced, where the

Fitbit smartwatch is given to approximately 700 stu-

dents for two different periods of time. The Tesserae

project (Mattingly et al., 2019) collects data from 757

workers over a year using Garmin smartwatches col-

lecting capturing heart rate, sleep and calories data.

As for public datasets, in Weiss (2019), the WISDM

is introduced, consisting of accelerometer and gyro-

scope data obtained from the LG G Watch and smart-

phones of 51 people while performing certain ac-

tions. The dataset discussed in Bent et al. (2021), fo-

cuses on data collected using the Empatica 4 wrist-

band, which gathers information such as heart rate

(HR), blood volume pulse (BVP), and interbeat in-

terval (IBI). Additionally, data from glucose sensors

is included, involving a total of 16 subjects observed

over a period ranging from eight to ten days. The

harAGE dataset (Mallol-Ragolta et al., 2021) records

data of Garmin smartwatches of 30 people perform-

ing various physical activities. Lastly, a public dataset

is introduced in Vaizman et al. (2017) where the au-

thors collect data from both, pebble smartwatches and

smartphones from a group of 60 people reaching a to-

tal of 300k minutes of data focusing on accelerometer

and gyroscope sensors, although location and audio

data is collected as well.

Our work is focused on the PMData (Thambawita

et al., 2020) database. The dataset consists of data

collected from 16 individuals throughout a period of

ﬁve months (November 2019 to end of march 2020)

and using the Fitbit Versa 2 smartwatch. This smart-

watch is capable of detecting different states of physi-

cal activity such as time exercising or being sedentary

as well as heart rate, kcal burnt, steps and sleep data,

each being collected in different time frequencies. For

instance, a heart rate value is stored each ﬁve seconds

while the time stayed active is accumulated for each

day. The availability of the data given by the original

authors can be seen in table 1. Aside from the smart-

watch data, additional information from each subject

is gathered as well: age, weight, coronary prone be-

haviour (Johnston, 1993), sex etc. However, due to

the scope of this work subject data will be only con-

Missing Data Imputation in Daily Wearable Data for Improved Classiﬁcation Performance

Table 1: Categorization of the variables recorded and computed from the Fitbit smartwatch into two separate groups: daily

health statistics and sparse health parameters. For each variable the sample frequency of the wearable device is provided.

Group Category Data Frequency

Sleep data

Waso sleep, sleep latency, rem latency,

total sleep time, time rem, time light,

time deep, time awake and SRI

Daily

Daily health statistics Activity Kcal and steps Daily

Heart rate

Maximum, minimum, standard deviation

and mean values

Daily

Heart rate Beats per minute Per 5 seconds

Sparse health parameter Exercise/activity Distance, steps and kcal Per minute

Sleep data Sleep phases and sleep score When happens

sidered as label in the classiﬁcation problem in order

to characterize users given solely the data collected

from smartwatch.

Given the circumstances described previously in

section 2, the PMData dataset is the only one, to the

best of our knowledge, that collects a wide range of

physiological parameters: heart, sleep data and sports

data among others, reaching a total of 16 daily health

statistics, and does not focus on HAR (Human Activ-

ity Recognition). This number of variables is promi-

nent for this study and the volume of data is public

and large enough to test a wide range of imputation

methods.

As stated, values collected from wearables are

stored as a time series and converted to tabular data,

where each row represents the temporal component

and each column the different daily health statistics.

Then, the data is aggregated by day, reducing signif-

icantly the number of rows. In addition, the data ob-

tained from the wearable lets us compute other vari-

ables such as SRI (i.e. Sleep Regularity Index, aver-

aged over 7 days) (see eq. (1)) where N is the num-

ber of days and M is the number of epochs per day.

The function δ(s

i, j

, s

i+1, j

) is equal to one, when the

sleep–wake state is the same 24 hours apart. Other

example is WASO (Wakefulness After Sleep Onset)

sleep (see eq. (2)), which is computed taking into ac-

count TST (Total Sleep Time) and the sleep period.

Note that this variables are related to sleep but sleep

efﬁciency depicts a more complete assessment of the

sleep quality.

SRI = 100 −

200

M(N − 1))

∑

j=1

∑

i=1

δ(s

i, j

, s

i+1, j

) (1)

WASO = Sleep period − T ST (2)

Once the data is sorted and aggregated by day,

missing values are found in the recorded data. In ta-

ble 3, a small sample of the dataset is shown for a

given user. Every row of the table corresponds to a

day of a speciﬁc user, whereas each of the columns

corresponds to each of the daily health statistics that

have been either collected or calculated from the Fit-

bit smartwatch.

After the preprocessing, the data contains 2.397

days from which 603 have at least one missing value.

However, the subject referred as "p12" by Tham-

bawita et al. (2020) has only sleep registers for 3

days from the 152 days that lasted the study. For this

reason, the data of the subject is not representative

enough and the subject is removed from the study. A

brief analysis of the data is shown in table 2 for the

train and test subsets in both classiﬁcation tasks: sleep

efﬁciency and personality. The difference between

these two tasks is how the data has been divided. For

sleep efﬁciency, data has been randomly selected for

training, whereas the remaining data without miss-

ing values has been used for testing the imputation

methods. Leaving test data without missing values let

us evaluate classiﬁcation algorithms with and without

imputation methods in the training subset, under the

same conditions. The same happens with personality,

but this label does not vary per user, thus, data has

been split by user, leaving 10 users for training and

the rest for testing.

4 METHODOLOGY

This section presents the methodological approach

employed on the study. In subsection 4.1 the algo-

rithms employed for data imputation are presented,

whereas 4.2 presents the two evaluation approaches.

Finally, subsection 4.3 explains the classiﬁcation

tasks after data have been imputed.

4.1 Data Imputation Methods

This section presents the selected imputation meth-

ods, grouping them in baseline, column-based and

ICT4AWE 2024 - 10th International Conference on Information and Communication Technologies for Ageing Well and e-Health

Table 2: Description of the dataset gathered from Fitbit wearable devices and employed in this study. The dataset has been

split into train and test subsets in order to evaluate different classiﬁcation algorithms in each of the proposed classiﬁcation

tasks: personality and sleep efﬁciency.

Personality Sleep efﬁciency

Train Test Train Test

Total % Total % Total % Total %

Users 10 66% 5 34% 15 100% 15 100%

Instances 1.600 71,2% 645 28.7% 1.648 73,4% 597 26,5%

Missing values at random 1.575 8,7% 0 0% 1.600 8% 0 0%

Missing values at random per user 157,5 9,4% 0 0% 106,6 7,7% 0 0%

Missing values at lines 159 10,6% 0 0% 191 11,5% 0 0%

Missing values at lines per user 15,9 10,5% 0 0% 12,7 10,6% 0 0%

Consecutive missing values 2.878 72,6% 0 0% 2.941 74,7% 0 0%

Consecutive missing values per user 14,4 8,4% 0 0% 9,33 16,6% 0 0%

Personality label

A 989 44% 361 16,1% – – – –

B 611 27,2% 284 12,6% – – – –

Sleep efﬁciency label

< 85 (Normal / Bad) – – – – 171 7,6% 65 2,8%

≤ 90 (good) – – – – 825 36,7% 286 12,7%

≤ 95 (Very good) – – – – 602 26,8% 246 10,9%

Table 3: Sample of the dataset for a given user and for different daily health statistics, including TST (i.e. Total Sleep Time),

mean_hr (i.e. mean heart rate) and std_hr (i.e. standard deviation of heart rate).

user_id waso_sleep sleep_efﬁciency TST kcal steps ... mean_hr std_hr

P01 20 94,88 391 3.912,61 16.450 ... 64,78 14,57

P01 10 97,63 422,5 4.014,13 17.843 ... 64,82 17,05

P01 6 98,31 356 3.614,06 12.519 ... 68,15 22,08

P01 16 96,21 423 3.386,19 10.392 ... 63,91 13,30

P01 24 93,35 361 3.312,92 11.185 ... 62,41 12,58

row-based algorithms. The proposed approaches have

been widely used by previous works, excluding those

methods that require large amounts of data like re-

current neural networks (Medsker and Jain, 2001),

LSTM based networks (Yu et al., 2019) and trans-

formers (Vaswani et al., 2017), which are not feasible

as daily health statistics are recorded on a daily basis.

The imputation methods used are the following:

• Baselines. Baseline methods are considered the

easiest approach for data imputation (Engels and

Diehr, 2003). More complex approaches are ex-

pected to signiﬁcantly increase the performance

compared to these methods. In this way, other ap-

proaches can be compared and considered sufﬁ-

cient in order to be applied as input for a classiﬁ-

cation task.

– Mean. Sets the mean for each feature respec-

tively on all the missing values of the same fea-

ture.

– Median. In this case, the median of each pa-

rameter is set respectively in the missing val-

ues.

• Column-Based. Includes methods that use as in-

put the historical data recorded for each health pa-

rameter. For that, a model is trained for each daily

health statistics and the missing values are pre-

dicted using as input of the algorithm the previous

w values of previous days. Thus, w is a key hyper-

parameter that is tuned in order to maximize the

performance of the column-based methods. The

optimal w value will depend on the rationale be-

hind each of the proposed methods.

– Moving Average. Each missing value found

on the recorded daily health statistics will be

replaced by the mean of the last known values

of each parameter (Hyndman, 2011).

– LOCF (Last Observation Carried Forward).

This method takes the last observation before a

missing value and drags it to the missing value.

Note that if there are several missing values to-

gether, the same value will be imputed for all of

them (Twumasi-Ankrah et al., 2019).

Missing Data Imputation in Daily Wearable Data for Improved Classiﬁcation Performance

– Linear Interpolation. The linear interpolation

ﬁts a line between the last known value and the

next known value. With this, missing values

distributed together are imputed depending on

the amount of missing values. If only one is

missing the midpoint is used. In the case of

3 missing values the quartiles are used (Noor

et al., 2015).

– SVM. A support vector machine regressor al-

gorithm that predicts the missing value depend-

ing on the previous values of the same feature

(Boser et al., 1992).

– RF. A random forest regressor algorithm that

imputes the missing value taking into account

the last values of the same feature (Breiman,

2001).

– ARIMA. Auto Regressive (AR) Integrated (I)

Moving Average (MA) models, are statistic

models used for time series were given a set

of values the next one can be predicted. This

model consists of 3 parameters: p, d and q each

one refers to each of the acronyms respectively

(Box, 2013).

– KNN. This is a variation of KNN, where the

values taken to estimate the neighbours are the

previous values of the feature. Besides, this

method evaluates the amount of missing val-

ues placed together and if there are more than

one, the ﬁrst missing value will be imputed and

then, the last one evaluating the next values to

the missing value. Lastly, a linear gradient is

applied to the values this way the method has

the ability to weight the feature values with the

time.

• Row-Based. Methods that use as input other daily

health statistics recorded in the same day. Thus, in

this case, data imputation consists in a regression

task and every time a value is imputed, a model is

trained considering the daily health statistics with

known value in the day corresponding to the miss-

ing value. For example, if for a given day the total

sleep time must be imputed and the other daily

health statistics with known value are burnt calo-

ries and steps, a model is trained considering just

those two features to predict total sleep time, in-

cluding the data available of all the users.

– RF. A random forest algorithm that imputes the

missing values depending on the values of other

features gathered the same day, as for KNN, if

all the day is missing the mean of each daily

health statistics is imputed (Breiman, 2001).

– KNN. K-nearest neighbour takes the values

of other known variables on the same day

and computes the distance (uniform weights so

all points in each neighborhood are weighted

equally) between other days. Once the dis-

tances are computed, the K-nearest neighbours

are considered to ﬁll the value. This method

is obtained using the KNN imputer from (Pe-

dregosa et al., 2011).

Some of the selected methods are also considered

as baselines for many studies that tackle wearable

time series imputation. For example, mean imputa-

tion is used in Feng and Narayanan (2019); Huo et al.

(2022), other studies as Lin et al. (2020) consider Lin-

ear interpolation, moving average and LOCF. KNN is

widely used as baseline method as can be seen in Feng

and Narayanan (2019); Huo et al. (2022); Lin et al.

(2020). Lastly, Huo et al. (2022) considers random

forest approaches as well.

Before data imputation is carried out with the

aforementioned algorithms, a "MinMaxscaler" (Pe-

dregosa et al., 2011) is applied to standardize data

as the daily health statistics are in different magni-

tudes. This method turns the maximum value of each

feature to one and the minimum value to zero, being

the rest of the values converted proportionally. This

is done in order to compare the selected imputation

methods, making all metrics weigh the daily health

statistics equally.

4.2 Missing Data Scenarios

The distribution of the missing values may impact the

performance of the proposed approaches for data im-

putation. For this reason, the proposed methodology

aims at evaluating the methods for data imputation in

different scenarios and with an increasing number of

missing values in order to address the ﬁrst and sec-

ond research questions. In this work, two scenarios

have been considered: missing values at random and

missing values at lines. These scenarios are depicted

in Fig. 1, where ﬁrst, days without missing values

are selected and second, missing values are added for

evaluation of the proposed methods.

Evaluation for the imputation methods was car-

ried out creating missing values. For that, clean data

is obtained removing days with at least one missing

value and thus, only days with complete readings for

all the daily health statistics are considered. Then,

missing values are generated by removing known val-

ues randomly for each of the missing data scenarios.

Finally, each of the proposed methods is applied to

estimate the missing values and compare it with the

actual value, where the method that assigns the clos-

est value to the actual is considered to have the best

performance. In the case of missing values at lines,

ICT4AWE 2024 - 10th International Conference on Information and Communication Technologies for Ageing Well and e-Health

Figure 1: The proposed two missing value scenarios from clean data to evaluate the selected data imputation methods: missing

values at random and missing values at lines.

the range of applicable methods is reduced, as there

are no other daily health statistics in the same day to

be used for these methods. Thus, only baseline and

column-based methods are used in this scenario.

The rationale behind the proposed methods may

lead to signiﬁcantly different results, depending on

the missing values scenario and rate. For that reason,

both data imputation scenarios have been simulated

increasing the rate of missing values at random in

the ﬁrst scenario and the number of days without any

recording in the second scenario. Note that missing

values at lines (days) comes closer to the reality of the

wearables as explained in Chakrabarti et al. (2023).

Each scenario is executed ten times to enhance re-

liability. This repetition is essential because, in ev-

ery iteration, the missing values are placed in differ-

ent segments of the dataset. Running the various al-

gorithms only once may lead to biased or unrealistic

results, as they could adapt too easily to those speciﬁc

missing values in a single run.

4.3 Classiﬁcation

In order to answer the fourth research question pro-

posed in section 1, several machine learning algo-

rithms are proposed to be trained and then, to assess

the effect of the imputed data in different classiﬁca-

tion problems. In this case, all the data is considered

including days with missing values for both the train

subset.

Classiﬁcation models have been tested splitting

the data into train and test subsets and assuring that

there is not imputed data on the test subset. Thus,

days without any missing values have been consid-

ered for test and the rest for training the classiﬁca-

tion algorithms. Then, the missing values in the train

set are imputed with the best algorithm for the spe-

ciﬁc scenario and rate of missing values. Finally, sev-

eral algorithms are trained with the imputed train set

and evaluated in the test set, using the F-score and

the accuracy metrics. Beside this, various feature se-

lection methods are used to improve the representa-

tion of the input and increase overall performance of

the classiﬁcation algorithms: lasso penalty (Kim and

Kim, 2004), ﬁsher test (Gu et al., 2012) and decision

tree feature importance (Grabczewski and Jankowski,

2005). The overall data distribution for both the train

and the test subsets for the classiﬁcation exercise can

be seen in table 2.

The proposed classiﬁcation models are the fol-

lowing: SVM (Support Vector Machine)(Boser et al.,

1992), KNN (K-Nearest Neighbour)(Peterson, 2009),

Linear regression (Montgomery et al., 2021), decision

tree (Quinlan, 1986), Random forest (Breiman, 2001),

feed forward neural network (Sazli, 2006) and a gra-

dient boosting trees method (Friedman et al., 2000),

in each of them various models are created changing

different hyper parameters.

5 RESULTS

In this section, the results obtained after applying the

proposed algorithms in section 4 are discussed. In

subsection 5.1 the evaluation metrics for both the data

imputation and classiﬁcation tasks are explained. In

subsection 5.2 the results for the missing values at

random scenario are shown whereas in subsection 5.3,

the results of the missing values at lines scenario are

discussed. In subsection 5.4, classiﬁcation models are

trained to asses the effect of the imputed data and

lastly, in subsection 5.5 the results obtained are fur-

ther discussed.

5.1 Evaluation

Evaluation of the methods has been carried out com-

puting various metrics. The metrics considered for

Missing Data Imputation in Daily Wearable Data for Improved Classiﬁcation Performance

data imputation are the Mean Absolute Error (see eq.

3) and Mean Square Error (see eq. 4) which mea-

sure the difference between the imputed and the ac-

tual value.

MAE =

∑

i=1

− ˆy

(3)

MSE =

∑

i=1

− ˆy

)

(4)

In both eq. (3) and (4) y

stands for the actual value

and ˆy

stands for the predicted value by the data impu-

tation algorithm. A MAE or MSE close to 0 indicates

a good performance with a prediction close to the ac-

tual value.

Finally, the evaluation of the classiﬁcation task is

made by means of F-score (see eq. 5) and the accu-

racy (see eq. 6)(Vujovi

c et al., 2021). Both metrics

give a value between 0 and 1, being 1 a perfect clas-

siﬁcation score.

F1 = 2 ·

Precision · Recall

Precision + Recall

(5)

accuracy =

T P + T N

T P + T N + FP + FN

(6)

5.2 Missing Values at Random

The objective of these experiments, is to evaluate the

performance of different data imputation algorithms

when the distribution of missing values is random and

with an increased rate of missing values. For that, as

explained in section 4, three types of data imputation

algorithms are employed: baselines, row-based and

column-based methods. The results of these methods

can be seen in Fig. 2a with a missing values rate rang-

ing from 5% to 70%.

Preliminary experiments were carried out for each

of the proposed algorithms, testing the identical

model with various parameters and assessing which

conﬁguration yielded the best performance. The de-

tailed results can be found in table 4.

Comparing all the method types, when the ratio of

missing values changes from 5 to 70%, the mean rela-

tive performance in terms of MAE curiously increases

1% for baselines and decreases 5,65% for column-

based methods and 42,3% for row-based methods.

These results suggest that row-based methods have

a signiﬁcant decrement when the number of missing

values increases.

In addition, the results indicate that row-based

methods obtain the best performance when missing

value rates are lower than 35%. When missing rates

are lower than 30%, RF achieved better results than

KNN scoring an average MAE enhancement of 0,011

while KNN outperforms RF in higher missing (40 to

70%) rates averaging a 0,004 improvement. However,

both methods showed a relative improvement respect

to the best column-based method, the KNN, of 25%

and 11% (averaged for 5% to 30%), respectively.

When the number of missing values rate reaches

50%, column-based KNN algorithm achieves a rela-

tive improvement of nearly 7,5% compared to row-

based RF method in terms of MAE. In addition,

the mean performance of column-based methods im-

proves the mean performance of row-based methods

in a 13%, indicating that it is more convenient a

column-based approach over the 50% of missing val-

ues.

Comparing baseline methods with the best row-

based method, the RF, and the best column-based

method, the KNN, the results show a improvement of

23% and 18% respectively. In this scenario, baseline

methods do not improve the results achieved by other

methods.

All the proposed methods signiﬁcantly improve

the baselines except for LOCF, which achieves the

worst performance in all the missing rates. Although

the LOCF method performs reasonably well in time

series data due to short intervals between samples, it

is less effective in this context, where each parameter

is registered on a daily basis.

Among the column-based methods, the KNN and

the RF scored very similar results, but the KNN

achieved an average of 6% MAE improvement com-

pared to the RF.

As a conclusion, row-based methods achieve the

best data imputation performance with a low missing

values ratio. Having a ratio of missing values higher

than the 35% the column-based methods would be the

best suited for imputation. Although the difference

in performance between some of the column-based

methods is small, KNN method averaged the best re-

sults. Lastly, the baselines showed robustness to the

missing rate and only scored consistently better than

LOCF which, as stated, is not suitable for this task.

5.3 Missing Values at Lines

Similarly, in this set of experiments the proposed

algorithms are evaluated with an increasing rate of

missing values. In this case, missing values are

present in the whole day, consequently, row-based

methods cannot be used. The objective of these exper-

iments is to evaluate the proposed methods in a miss-

ing values distribution closer to the reality of wearable

devices, including the limitation of not being able to

use other daily health statistics from the same day for

data imputation.

The same missing rates as in the ﬁrst scenarios

ICT4AWE 2024 - 10th International Conference on Information and Communication Technologies for Ageing Well and e-Health

(a) Missing values at random. (b) Missing values at lines.

Figure 2: Normalized MAE (y axis) for various imputation methods with an increasing rate of artiﬁcially generated missing

values (x axis). Shaded areas show both the maximum and minimum MAE for each method across multiple runs for each

percentage of missing values. Scale over y axis is different in the second image to stand out the trend of column-based

methods.

have been tested for this scenario, the performances

of the methods is shown in Fig. 2b. Following

the same proceeding, the hyperparameters have been

tuned for this experiment as well. The detailed results

are shown in table 5.

In the column-based approaches, the MAE scores

are similar in both the ﬁrst and last rate of missing

values. Methods showed a 6% decrease in perfor-

mance when these points are compared. Once more,

the baseline methods demonstrate notable resilience

to missing data by scoring 4% better at 70% of miss-

ing rate, surpassing LOCF in performance.The ratio-

nale of the poor performance has is the same as men-

tioned in section 5.2.

Once more, the KNN proved to be the the top-

performing method, achieving overall a 18% im-

provement over the baseline methods and 1% over the

second best method the RF, making it the best option

for data imputation using information from previous

days.

In both scenarios, the SVM was the column-based

method that most suffered from the missing rate in-

crease, with an 8% worsening over the missing rate

making it unreliable when the missing values rate is

high.

In conclusion, both baseline methods and column-

based methods exhibit no signiﬁcant variation based

on the distribution of the missing values, averag-

ing similar scores in both scenarios. Regardless of

the missing value pattern, KNN consistently demon-

strated superior performance as the top-performing

column-based method.

Based on the results obtained and acknowledging

the linear pattern of missing values in the dataset, the

KNN is employed as the optimal imputation method

for conducting the classiﬁcation experiment. Results

of applying the model to the dataset can be seen in

Fig. 3. In this instance, the variables TST (Total

Sleep Time) and steps for the user ’p15’ are displayed,

where the column-based KNN effectively replicates

the variations in the original data. However, there are

cases where imputation may deviate from reality, as

observed in the steps imputation towards the conclud-

ing dates, where the KNN placed a noticeable peak.

5.4 Classiﬁcation

In this set of experiments, two classiﬁcations prob-

lems have been proposed in order to asses the ef-

fectiveness of data imputation when training machine

learning models. The goal of data imputation in this

work is to extend the data that can be used to train

the classiﬁcation models, if imputation is carried out

correctly and information is restored from the miss-

ing values, we hypothesize that overall performance

of the predictive models should be increased.

More speciﬁcally, the ﬁrst classiﬁcation task con-

sists in the prediction of the sleep efﬁciency of each

day considering the rest of the daily health statistics

(Johnston, 1993). In this case, the data is split ran-

Missing Data Imputation in Daily Wearable Data for Improved Classiﬁcation Performance

Table 4: Mean MAE and MSE for 5 to 70% of artiﬁcially generated missing values at random scenario.

5% 20% 50% 70%

Method MAE MSE MAE MSE MAE MSE MAE MSE

Baselines

Mean 0,1218 0,0241 0,1203 0,0236 0,1203 0,0237 0,1205 0,0237

Median 0,1205 0,0248 0,1189 0,0242 0,1190 0,0243 0,1192 0,0243

Column-based

Moving average 0,1008 0,0186 0,1004 0,0184 0,1028 0,0189 0,1046 0,0198

KNN 0,0972 0,0172 0,0970 0,0169 0,1004 0,0180 0,1044 0,0194

LOCF 0,1226 0,0285 0,1228 0,0285 0,1246 0,0286 0,1266 0,0300

Linear interpolation 0,1064 0,0212 0,1070 0,0214 0,1086 0,0219 0,1117 0,0231

ARIMA 0,1032 0,0194 0,1040 0,0198 0,1060 0,0204 0,1093 0,0202

SVM 0,0988 0,0174 0,0984 0,0173 0,1018 0,0183 0,1065 0,0196

RF 0,0977 0,0171 0,0973 0,0169 0,1008 0,0181 0,1062 0,0199

Row-based

KNN 0,0783 0,0117 0,0844 0,0133 0,1087 0,0203 0,1159 0,0255

RF 0,0592 0,0085 0,0748 0,0118 0,1111 0,0216 0,1222 0,0253

Table 5: Mean MAE and MSE for 5 to 70% of artiﬁcially generated missing values at lines scenario.

5% 20% 50% 70%

Method MAE MSE MAE MSE MAE MSE MAE MSE

Baselines

Mean 0,1213 0,0240 0,1203 0,0236 0,1205 0,0238 0,1206 0,0237

Median 0,1196 0,0246 0,1190 0,0243 0,1193 0,0245 0,1192 0,0243

Column-based

Moving Average 0,1010 0,0188 0,1015 0,0190 0,1022 0,0191 0,1048 0,0195

KNN 0,0976 0,0171 0,0973 0,0172 0,1004 0,0181 0,1046 0,0194

LOCF 0,1214 0,0281 0,1240 0,0293 0,1236 0,0289 0,1265 0,0301

Linear interpolation 0,1060 0,0210 0,1073 0,0216 0,1092 0,0221 0,1119 0,0234

ARIMA 0,1041 0,0200 0,1055 0,0205 0,1065 0,0206 0,1095 0,0215

SVM 0,0984 0,0174 0,0993 0,0178 0,1023 0,0185 0,1065 0,0196

RF 0,0972 0,0171 0,0983 0,0174 0,1013 0,0184 0,1060 0,0198

domly regardless the user. The second classiﬁcation

task is to predict the behaviour of the user based on

the daily health statistics. In this case, a user has the

same behaviour regardless the day and the data is split

into train and test subsets by user. That is, the same

user does not appear in both the train and test subset

as only one subset is considered for a user. A more de-

tailed description of the data and labels used for each

classiﬁcation tasks can be found in table 2. In table 6

the results for the classiﬁcation tasks are shown with

and without data imputation for the best method in

each case, for simplicity.

For personality classiﬁcation, the model that

reached the best results was the logistic regression

with Lasso’s penalty (Ranstam and Cook, 2018) re-

gardless of the imputation that is applied. For the pre-

diction of the sleep efﬁciency, the best model was the

neural network when missing values are ﬁlled. With-

out imputation, however, the support vector machine

classiﬁer achieved the best performance.

In the case of personality the F-score improved by

4,4% while the accuracy scored 2,9% higher. In the

case of sleep efﬁciency, although the enhancements

were not as pronounced as those seen in personality,

both the F-score and the accuracy went up by 0,3%.

Thus, it is made clear that both classiﬁcation task are

beneﬁted when data is imputed and added to the train-

ing set.

As previously noted, there is considerable varia-

tion in the improvement of the models between the

classiﬁcation tasks. This variability could be at-

tributed to the fact that in sleep efﬁciency, the value

itself is imputed, underscoring the critical importance

of ensuring accurate imputation for the feature.

5.5 Discussion

The results of the experiments showed that imputation

of data is beneﬁcial for the proposed classiﬁcation

methods. Addressing the research questions, the im-

ICT4AWE 2024 - 10th International Conference on Information and Communication Technologies for Ageing Well and e-Health

Table 6: Results of the best classiﬁcation algorithm in each of the classiﬁcation tasks when using only data without missing

values (i.e. Original) and when adding imputed values to Original data (i.e. Imputed).

Personality Sleep efﬁciency

Data F-score accuracy F-score accuracy

Original 67,3% 53,0% 80,2% 79,5%

Imputed 71,7% 55,9% 80,5% 79.8%

(a) Steps.

(b) Total sleep time (TST).

Figure 3: Example of data imputation in two different daily

health statistics.

putation employing various algorithms obtained sig-

niﬁcantly better results in some cases than using the

baseline methods. As seen in both scenarios, base-

line methods (i.e. mean and median imputation) are

outperformed by all methods except for LOCF and

these methods should only be considered when com-

putational time has to be reduced as much as possible

or the number of samples is very limited. Notably,

the results showed that using features of the same day

(i.e., row-based methods) is the best option whenever

feasible and if the missing values rate does not reach

35-40% for a dataset of a similar size and variables,

being RF the best algorithm. Nevertheless, when the

missing rates are higher or it is not possible to im-

pute missing values using other daily health statistics,

the column-based methods should be used rather than

mean or median imputation.

The experiments comparing data imputation be-

tween two different scenarios revealed that there is

not much difference in the effectiveness of algorithms

when applied to both cases, although the results for

random missing values scenario are slightly better.

However, understanding the distribution of missing

values is crucial for determining the most suitable al-

gorithm for the dataset. Among the proposed meth-

ods, we note that LOCF, which was the worst pre-

forming method, is suitable for situations where the

time elapsed between samples or records is sufﬁ-

ciently low (Twumasi-Ankrah et al., 2019). If this

does not fulﬁl, results may be poor as in this case, as

lineal interpolation is relevant when the feature being

imputed is dependent strictly on the last known value

and the next known value. However, in a more gen-

eral scenario where data doesn’t strictly depend on the

surrounding known value, and the time between sam-

ples is moderate, other column-based methods out-

performs these approaches. Among them, is worth

remarking KNN and RF algorithms, which obtained

the best results.

The classiﬁcation results show that data imputa-

tion has enhanced the performance of the models on

unseen data. However, the increase reported in the

performance may be dependent on the classiﬁcation

task to be carried out and each use case requires ex-

perimentation to test if data imputation is still beneﬁ-

cial.

It is necessary to take into consideration some of

the limitations of this work. First of all is the lack

of data, as more data would lead to an increase in

the performance of the algorithms and more accu-

rate imputations. However, as the main focus of this

work are daily health statistics, it would be required

to have a longitudinal study combined with data aug-

mentation techniques to acquire enough data for deep

learning models. In addition, the results obtained are

speciﬁc for the dataset employed and may vary de-

pending the subjects of the study and the daily health

Missing Data Imputation in Daily Wearable Data for Improved Classiﬁcation Performance

statistics considered. For that reason, this study is

also understood as a evaluation framework of data im-

putation methods for improved classiﬁcation perfor-

mance, where a methodological approach is proposed

to evaluate different data imputation algorithms con-

sidering the nature of wearable devices and popular

smartwatches such as Fitbit.

6 CONCLUSIONS

In this study, we have designed an evaluation frame-

work of different data imputation algorithms under

two missing values scenarios: missing values at ran-

dom and at lines (days). The best algorithm for miss-

ing values at lines scenario is then used due to re-

semblance of the data to this scenario for two speciﬁc

classiﬁcation tasks. Being able to consider more data

and improve the performance of the classiﬁcation al-

gorithms.

More speciﬁcally, in the ﬁrst two experiments, for

the evaluation of methods imputing missing values,

gaps were intentionally introduced in the dataset. This

is done with two different patterns to test the methods

in diverse scenarios. One scenario focused on miss-

ing values occurring randomly, whereas in the other,

missing values were situated in lines, resulting in the

deletion of an entire day of data. Additionally, each

scenario was executed 10 times to ensure that the re-

sults are not dependent on speciﬁc missing values.

In the third experiment, machine learning and

deep learning methods were developed to evaluate the

effectiveness of adding or not imputed data to clean

data. Two distinct classiﬁcation problems were un-

dertaken: in the ﬁrst, personality prediction was per-

formed, with the dataset divided by subjects as classes

repeated every day for each subject; in the second ex-

ercise, sleep efﬁciency was stratiﬁed into three levels

and the dataset was randomly divided as sleep efﬁ-

ciency changes every day.

In our study, we reached the conclusion that im-

puting missing values proves beneﬁcial for classiﬁca-

tion, enabling us to sidestep the challenges associated

with working with smaller datasets that can not be

representative enough of the data distributions behind.

Nevertheless, it is crucial to evaluate the proposed

imputation methods for each dataset to ensure that

the imputed values signiﬁcantly improve the baselines

and to select the best algorithm depending the miss-

ing values distribution in the dataset. This precaution

helps prevent biased outcomes stemming from inap-

propriate imputation methods or data deletion.

ACKNOWLEDGEMENTS

We acknowledge the trust of TECNUN University

of Navarra for the collaboration with VICOMTECH

through the master and degree students. We also

thank the funding received by Diputación Foral de

Gipuzkoa for ”OHARTU: Herramienta de detección

de anomalías de comportamiento para la prevención

del deterioro cognitivo” project under the program

Proyectos Gipuzkoa Next.

REFERENCES

Aadland, E. and Ylvisåker, E. (2015). Reliability of the acti-

graph gt3x+ accelerometer in adults under free-living

conditions. PloS one, 10(8):e0134606.

Baek, H. J. and Shin, J. (2017). Effect of missing inter-beat

interval data on heart rate variability analysis using

wrist-worn wearables. Journal of Medical Systems,

41:1–9.

Bank, D., Koenigstein, N., and Giryes, R. (2023). Autoen-

coders. Machine Learning for Data Science Hand-

book: Data Mining and Knowledge Discovery Hand-

book, pages 353–374.

Bent, B., Cho, P. J., Henriquez, M., Wittmann, A., Thacker,

C., Feinglos, M., Crowley, M. J., and Dunn, J. P.

(2021). Engineering digital biomarkers of interstitial

glucose from noninvasive smartwatches. NPJ Digital

Medicine, 4(1):89.

Boser, B. E., Guyon, I. M., and Vapnik, V. N. (1992). A

training algorithm for optimal margin classiﬁers. In

Proceedings of the ﬁfth annual workshop on Compu-

tational learning theory, pages 144–152.

Box, G. (2013). Box and jenkins: time series analysis,

forecasting and control. In A Very British Affair: Six

Britons and the Development of Time Series Analysis

During the 20th Century, pages 161–215. Springer.

Bozinovski, S. (2020). Reminder of the ﬁrst paper on trans-

fer learning in neural networks, 1976. Informatica,

44(3).

Breiman, L. (2001). Random forests. Machine learning,

45:5–32.

Buczak, P., Chen, J.-J., and Pauly, M. (2023). Analyzing

the effect of imputation on classiﬁcation performance

under mcar and mar missing mechanisms. Entropy,

25(3):521.

Chakrabarti, S., Biswas, N., Karnani, K., Padul, V., Jones,

L. D., Kesari, S., and Ashili, S. (2023). Binned data

provide better imputation of missing time series data

from wearables. Sensors, 23(3):1454.

Conroy, B., Silva, I., Mehraei, G., Damiano, R., Gross, B.,

Salvati, E., Feng, T., Schneider, J., Olson, N., Rizzo,

A. G., et al. (2022). Real-time infection prediction

with wearable physiological monitoring and ai to aid

military workforce readiness during covid-19. Scien-

tiﬁc reports, 12(1):3797.

ICT4AWE 2024 - 10th International Conference on Information and Communication Technologies for Ageing Well and e-Health

De Myttenaere, A., Golden, B., Le Grand, B., and Rossi, F.

(2016). Mean absolute percentage error for regression

models. Neurocomputing, 192:38–48.

Engels, J. M. and Diehr, P. (2003). Imputation of missing

longitudinal data: a comparison of methods. Journal

of clinical epidemiology, 56(10):968–976.

Faust, L., Purta, R., Hachen, D., Striegel, A., Poellabauer,

C., Lizardo, O., and Chawla, N. V. (2017). Explor-

ing compliance: Observations from a large scale ﬁtbit

study. In Proceedings of the 2nd International Work-

shop on Social Sensing, pages 55–60.

Feng, T. and Narayanan, S. (2019). Imputing missing

data in large-scale multivariate biomedical wearable

recordings using bidirectional recurrent neural net-

works with temporal activation regularization. In 2019

41st Annual International Conference of the IEEE En-

gineering in Medicine and Biology Society (EMBC),

pages 2529–2534. IEEE.

Friedman, J., Hastie, T., and Tibshirani, R. (2000). Additive

logistic regression: a statistical view of boosting (with

discussion and a rejoinder by the authors). The annals

of statistics, 28(2):337–407.

Ghandeharioun, A., Fedor, S., Sangermano, L., Ionescu, D.,

Alpert, J., Dale, C., Sontag, D., and Picard, R. (2017).

Objective assessment of depressive symptoms with

machine learning and wearable sensors data. In 2017

seventh international conference on affective comput-

ing and intelligent interaction (ACII), pages 325–332.

IEEE.

Grabczewski, K. and Jankowski, N. (2005). Feature selec-

tion with decision tree criterion. In Fifth International

Conference on Hybrid Intelligent Systems (HIS’05),

pages 6–pp. IEEE.

Gu, Q., Li, Z., and Han, J. (2012). Generalized ﬁsher score

for feature selection. arXiv preprint arXiv:1202.3725.

Huo, Z., Ji, T., Liang, Y., Huang, S., Wang, Z., Qian, X.,

and Mortazavi, B. (2022). Dynimp: Dynamic impu-

tation for wearable sensing data through sensory and

temporal relatedness. In ICASSP 2022-2022 IEEE In-

ternational Conference on Acoustics, Speech and Sig-

nal Processing (ICASSP), pages 3988–3992. IEEE.

Hyndman, R. J. (2011). Moving averages.

Iqbal, S. M., Mahgoub, I., Du, E., Leavitt, M. A., and As-

ghar, W. (2021). Advances in healthcare wearable de-

vices. NPJ Flexible Electronics, 5(1):9.

Johnston, D. W. (1993). The current status of the coronary

prone behaviour pattern. Journal of the Royal Society

of Medicine, 86(7):406–409.

Kanokoda, T., Kushitani, Y., Shimada, M., and Shirakashi,

J.-i. (2019). Gesture prediction using wearable sens-

ing systems with neural networks for temporal data

analysis. Sensors, 19(3):710.

Kim, Y. and Kim, J. (2004). Gradient lasso for feature selec-

tion. In Proceedings of the twenty-ﬁrst international

conference on Machine learning, page 60.

Längkvist, M., Karlsson, L., and Loutﬁ, A. (2014). A re-

view of unsupervised feature learning and deep learn-

ing for time-series modeling. Pattern recognition let-

ters, 42:11–24.

Lin, S., Wu, X., Martinez, G., and Chawla, N. V. (2020).

Filling missing values on wearable-sensory time se-

ries data. In Proceedings of the 2020 SIAM Inter-

national Conference on Data Mining, pages 46–54.

SIAM.

Lu, T.-C., Fu, C.-M., Ma, M. H.-M., Fang, C.-C.,

and Turner, A. M. (2016). Healthcare applica-

tions of smart watches. Applied clinical informatics,

7(03):850–869.

Mallol-Ragolta, A., Semertzidou, A., Pateraki, M., and

Schuller, B. (2021). harage: a novel multimodal

smartwatch-based dataset for human activity recogni-

tion. In 2021 16th IEEE International Conference on

Automatic Face and Gesture Recognition (FG 2021),

pages 01–07. IEEE.

Masci, J., Meier, U., Cire¸san, D., and Schmidhuber, J.

(2011). Stacked convolutional auto-encoders for hi-

erarchical feature extraction. In Artiﬁcial Neural Net-

works and Machine Learning–ICANN 2011: 21st In-

ternational Conference on Artiﬁcial Neural Networks,

Espoo, Finland, June 14-17, 2011, Proceedings, Part

I 21, pages 52–59. Springer.

Mattingly, S. M., Gregg, J. M., Audia, P., Bayraktaroglu,

A. E., Campbell, A. T., Chawla, N. V., Das Swain, V.,

De Choudhury, M., D’Mello, S. K., Dey, A. K., et al.

(2019). The tesserae project: Large-scale, longitudi-

nal, in situ, multimodal sensing of information work-

ers. In Extended Abstracts of the 2019 CHI Confer-

ence on Human Factors in Computing Systems, pages

1–8.

Medsker, L. R. and Jain, L. (2001). Recurrent neural net-

works. Design and Applications, 5(64-67):2.

Montgomery, D. C., Peck, E. A., and Vining, G. G. (2021).

Introduction to linear regression analysis. John Wiley

& Sons.

Nester, R. (2023). Wearable technology market size worth

usd 1.3 trillion by 2035, says research nester.

Noor, N. M., Al Bakri Abdullah, M. M., Yahaya, A. S., and

Ramli, N. A. (2015). Comparison of linear interpola-

tion method and mean method to replace the missing

values in environmental data set. In Materials Science

Forum, volume 803, pages 278–281. Trans Tech Publ.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V.,

Thirion, B., Grisel, O., Blondel, M., Prettenhofer,

P., Weiss, R., Dubourg, V., Vanderplas, J., Passos,

A., Cournapeau, D., Brucher, M., Perrot, M., and

Duchesnay, E. (2011). Scikit-learn: Machine learning

in Python. Journal of Machine Learning Research,

12:2825–2830.

Peterson, L. E. (2009). K-nearest neighbor. Scholarpedia,

4(2):1883.

Quinlan, J. R. (1986). Induction of decision trees. Machine

learning, 1:81–106.

Ranstam, J. and Cook, J. (2018). Lasso regression. Journal

of British Surgery, 105(10):1348–1348.

Sathyanarayana, A., Joty, S., Fernandez-Luque, L., Oﬂi, F.,

Srivastava, J., Elmagarmid, A., Arora, T., Taheri, S.,

et al. (2016). Sleep quality prediction from wearable

data using deep learning. JMIR mHealth and uHealth,

4(4):e6562.

Missing Data Imputation in Daily Wearable Data for Improved Classiﬁcation Performance

Sazli, M. H. (2006). A brief review of feed-forward neural

networks. Communications Faculty of Sciences Uni-

versity of Ankara Series A2-A3 Physical Sciences and

Engineering, 50(01).

Thambawita, V., Hicks, S. A., Borgli, H., Stensland, H. K.,

Jha, D., Svensen, M. K., Pettersen, S.-A., Johansen,

D., Johansen, H. D., Pettersen, S. D., et al. (2020).

Pmdata: a sports logging dataset. In Proceedings of

the 11th ACM Multimedia Systems Conference, pages

231–236.

Twumasi-Ankrah, S., Odoi, B., Adoma Pels, W., and

Gyamﬁ, E. H. (2019). Efﬁciency of imputation tech-

niques in univariate time series.

Vaizman, Y., Ellis, K., and Lanckriet, G. (2017). Recog-

nizing detailed human context in the wild from smart-

phones and smartwatches. IEEE pervasive computing,

16(4):62–74.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones,

L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I.

(2017). Attention is all you need. Advances in neural

information processing systems, 30.

Vujovi

c, Ž. et al. (2021). Classiﬁcation model evaluation

metrics. International Journal of Advanced Computer

Science and Applications, 12(6):599–606.

Weber, N., Härmä, A., and Heskes, E. P. D. T. (2017). Un-

supervised learning in human activity recognition: A

ﬁrst foray into clustering data gathered from wearable

sensors. PhD thesis, Radboud University Nijmegen,

The Netherlands.

Weiss, G. M. (2019). Wisdm smartphone and smartwatch

activity and biometrics dataset. UCI Machine Learn-

ing Repository: WISDM Smartphone and Smartwatch

Activity and Biometrics Dataset Data Set, 7:133190–

133202.

Williams, J. B. (2001). Standardizing the hamilton depres-

sion rating scale: past, present, and future. Euro-

pean archives of psychiatry and clinical neuroscience,

251:6–12.

Willmott, C. J. and Matsuura, K. (2005). Advantages of the

mean absolute error (mae) over the root mean square

error (rmse) in assessing average model performance.

Climate research, 30(1):79–82.

Wu, X., Mattingly, S., Mirjafari, S., Huang, C., and Chawla,

N. V. (2020). Personalized imputation on wearable-

sensory time series via knowledge transfer. In Pro-

ceedings of the 29th ACM International Conference

on Information & Knowledge Management, pages

1625–1634.

Yu, Y., Si, X., Hu, C., and Zhang, J. (2019). A review of

recurrent neural networks: Lstm cells and network ar-

chitectures. Neural computation, 31(7):1235–1270.

Zhao, W., Alwidian, S., and Mahmoud, Q. H. (2022). Ad-

versarial training methods for deep learning: A sys-

tematic review. Algorithms, 15(8):283.

Zhu, T., Watkinson, P., and Clifton, D. A. (2020). Smart-

watch data help detect covid-19. Nature biomedical

engineering, 4(12):1125–1127.

ICT4AWE 2024 - 10th International Conference on Information and Communication Technologies for Ageing Well and e-Health