Digital Touchpoints: Generating Synthetic Data for Elderly Smartphone

Interactions

Bilal Maqbool

and Sebastian Herold

Department of Mathematics and Computer Science, Faculty of Health, Science and Technology, Karlstad University,

Karlstad, Sweden

{bilal.maqbool, sebastian.herold}@kau.se

Keywords:

Usability Evaluation (UE), Accessibility, Elderly, Older Adult (OA), Synthetic Data Generation (SDG),

Machine Learning.

Abstract:

Context: Ensuring smartphone interfaces are usable and accessible is essential for elderly users, particularly

those with motor impairments, who face challenges with touchscreen interactions. Problem: Hand tremors

and limited motor control can hinder touchscreen accuracy and efﬁciency. Meanwhile, recruiting elderly par-

ticipants for usability studies can be challenging, often resulting in limited interaction data. Objectives: This

study aimed to investigate elderly users’ smartphone interaction patterns, identify key challenges, and gen-

erate synthetic data to address data scarcity for usability research. Method: A custom-designed mobile app

collected interaction data from 51 elderly participants performing tapping, dragging, and tracing tasks. Hand

steadiness was assessed using accelerometer data. Gaussian Process Regression (GPR) and Long Short-Term

Memory (LSTM) models were used to generate synthetic datasets replicating user interaction patterns. Re-

sults: Users with shaky hands struggled with precision tasks, especially involving smaller GUI elements, while

larger elements improved performance. Continuous control was also found to be challenging in tracing tasks.

Synthetic datasets successfully replicated spatial, temporal, and distributional metrics, demonstrating potential

utility in future usability evaluation research. Conclusions: Inclusive GUI designs and adaptive features can

improve accessibility for the elderly with limited motor control. Synthetic data can offer a potential solution

for further usability evaluation research in building AI-driven design evaluation tools, reducing reliance on

resource-intensive participant recruitment in earlier prototypes. Future work should examine diverse tasks and

scenarios and involve people with severe motor impairments.

1 INTRODUCTION

Ensuring usability and accessibility in digital systems

is essential to effective technology design. Although

closely related, these concepts focus on distinct yet

complementary aspects (Wegge and Zimmermann,

2007). Usability emphasizes efﬁciency, effectiveness,

and user satisfaction, often considering baseline phys-

ical and cognitive abilities. Accessibility expands this

perspective by designing systems to be inclusive, ac-

commodating equitable and diverse needs, including

users with varying disabilities. Integrating accessibil-

ity into the design process can ensure that the majority

of user groups can beneﬁt without requiring signiﬁ-

cant adaptations or retroﬁts.

Usability and accessibility are crucial aspects

in digital healthcare (DH), directly inﬂuencing user

https://orcid.org/0000-0002-1309-2413

https://orcid.org/0000-0002-3180-9182

engagement and digital health interventions suc-

cess (Shamsujjoha et al., 2021). Poor usability in

electronic health records (EHRs) has been linked to

serious errors, such as inappropriate drug administra-

tion, highlighting the risks of complex interface de-

sign (Pew Trusts, 2019). A study of 9,000 DH tech-

related safety reports found that usability issues con-

tributed to nearly one-third of reported errors, high-

lighting the pressing need for improved system de-

signs (Ratwani et al., 2018). Furthermore, research

suggests that businesses, including those in health-

care, achieve better outcomes by prioritizing usability

and design (Sheppard et al., 2018).

Smartphone usage is common in Europe, with

65–68% of individuals over 65 in the UK and Ger-

many, respectively, owning a smartphone (O’Dea,

2021; Davies, 2024). The widespread availabil-

ity of health-related mobile applications, exceeding

100,000 as of 2022, highlights the growing role of

126

Maqbool, B. and Herold, S.

Digital Touchpoints: Generating Synthetic Data for Elderly Smartphone Interactions.

DOI: 10.5220/0013439200003938

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 11th International Conference on Information and Communication Technologies for Ageing Well and e-Health (ICT4AWE 2025), pages 126-140

ISBN: 978-989-758-743-6; ISSN: 2184-4984

technology in health management (Butcher and Hus-

sain, 2022). However, DH applications often fail

to address key human-centric factors such as usabil-

ity and accessibility, resulting in ineffective solu-

tions that exclude critical user groups, such as older

adults and individuals with disabilities (Shamsujjoha

et al., 2021). Older adults, in particular, face dif-

ﬁculties with touchscreen technologies due to chal-

lenges such as small interface elements and tasks re-

quiring precision or speed, leading to frustration and

unintended inputs (Maqbool and Herold, 2024; Joshi,

2018; Elboim-Gabyzon et al., 2021). These issues,

often associated with age-related physical limitations

like reduced motor control, emphasize the importance

of tailored designs that could accommodate their spe-

ciﬁc needs (Maqbool and Herold, 2024; Joshi, 2018).

Despite its importance, recruiting the elderly for

usability evaluations remains challenging, particu-

larly among those with motor or cognitive impair-

ments, leading to small, non-representative sam-

ple sizes that can limit the generalizability of ﬁnd-

ings (Maqbool and Herold, 2024; Sinabell and Am-

menwerth, 2024). Studies have also reported dif-

ﬁculties in retaining participant involvement due to

accessibility barriers, health constraints, and logis-

tical issues, further complicating data collection ef-

forts (Maqbool and Herold, 2024).

User interface (UI) Interaction data from usability

and accessibility studies can offer valuable insights

into how users, particularly those with physical dis-

abilities, interact with technology. Despite its poten-

tial, this data is rarely reused, resulting in repeated

collection efforts and inefﬁcient resource use (Jiang

et al., 2024). The use of interaction data for user im-

itation modeling, which simulates scenarios such as

individuals with shaky hands interacting with touch-

screens, can help researchers evaluate design alterna-

tives and accessibility features (Maqbool et al., 2024).

This approach can optimize the use of existing inter-

action datasets and minimize the need for resource-

intensive recruitment efforts.

However, despite its potential, a key challenge

lies in the limited size of datasets typically produced

by usability and accessibility studies, which con-

strains the training of machine learning models re-

quired for robust simulation-based user models. Syn-

thetic data generation has emerged as a promising so-

lution to data scarcity, allowing researchers to repli-

cate the properties of limited datasets and increase

their size (Maqbool et al., 2024). Furthermore, it

can facilitate the training of accessibility-focused ma-

chine/imitation learning models.

Generating high-quality synthetic data demands

careful modeling of the unique interaction behaviors

exhibited by elderly users, particularly those with

motor impairments, to ensure ﬁdelity and usability.

Therefore, in this paper, our goal is to collect smart-

phone UI interaction data from elderly users using

a custom-designed mobile application, focused on

touchscreen tasks such as tapping, dragging, and trac-

ing to analyze user interaction patterns. The col-

lected data will support the generation of synthetic

datasets using machine learning techniques to miti-

gate the scarcity of user interaction data. Further-

more, the ﬁdelity of the synthetic data will be evalu-

ated for reliability and applicability in developing AI-

driven design evaluation tools. To guide this work, we

formulated the following research questions:

• RQ.1: What interaction patterns are exhibited by

elderly users during smartphone interaction tasks?

• RQ.2: How effectively can the synthetic data

replicate the observed interaction patterns of el-

derly users?

The structure of the paper is as follows: Section 2

reviews the existing literature; Section 3 details the re-

search methodology; Section 4 presents the ﬁndings;

Section 5 discusses the results, their implications, and

potential threats to validity; and Section 6 concludes

the paper, highlighting future research directions.

2 LITERATURE REVIEW

This literature review explores the challenges faced

by elderly users in interacting with touchscreen in-

terfaces and the role of synthetic data generation in

addressing data scarcity.

2.1 Motor Skill Limitations and

Accessibility Challenges

The increasing reliance on smartphones in daily life

has emphasized the need to address accessibility chal-

lenges for elderly users, as age-related declines in

motor skills, such as dexterity, can signiﬁcantly af-

fect their ability to use touchscreen technology effec-

tively, which is often not designed with their needs

in mind. This misalignment causes frustration, higher

error rates, slower responses, and often leads to dis-

engagement from technology (Nicolau et al., 2014;

Joshi, 2018; Nurgalieva et al., 2019).

Tapping, a fundamental smartphone interaction,

can be challenging for elderly users, especially those

with shaky hands. Hwangbo et al. found that smaller

targets and closely spaced icons increase error rates

and slow interaction times for elderly, recommending

larger touch targets and adequate spacing (Hwangbo

Digital Touchpoints: Generating Synthetic Data for Elderly Smartphone Interactions

127

et al., 2013). Additionally, Shao et al. observed that

right-handed elderly users often deviate to the right

when tapping, a tendency intensiﬁed by hand tremors,

and proposed offset models for automatic correction

to improve accuracy and reduce input errors (Shao

et al., 2023).

Dragging gestures, requiring precision and ﬁne

motor control, are particularly challenging for el-

derly users, especially those with hand tremors.

Salman et al. highlighted the difﬁculty elderly face

with drag-and-drop interactions, recommending task

simpliﬁcation or alternative methods (Salman et al.,

2019). Shao et al. noted that elderly users often

adopt a two-phase strategy: an initial movement to-

ward the target, followed by a calibration phase for

re-positioning (Shao et al., 2023). While effective,

this approach increases interaction time and cognitive

load, emphasizing the need for interfaces that mini-

mize precision demands.

Gestures requiring ﬁne motor skills, such as

pinch-to-zoom, pose signiﬁcant challenges for el-

derly. Brunzini et al. found that while tapping had the

highest success rates, complex gestures like drag-and-

drop and pinch-to-zoom were notably harder for indi-

viduals with systemic sclerosis (SSc) (Brunzini et al.,

2022). The study stressed the importance of adap-

tive designs tailored to speciﬁc motor impairments

and the role of prior technology familiarity in user

performance. Salman et al. further stressed the im-

portance of reducing gesture complexity to accom-

modate motor impairments, suggesting that simpliﬁed

layouts and alternative interaction methods could sig-

niﬁcantly improve accessibility (Salman et al., 2019).

Nicolau et al. also identiﬁed tapping as the most

effective method for motor-impaired users but with

difﬁculties around the button edges and corners, rec-

ommending larger target sizes (Nicolau et al., 2014).

Similarly, Kobayashi et al. found that while prac-

tice improved user performance in tasks like tapping,

dragging, and pinching, persistent challenges such as

small target sizes and unclear instructions highlighted

the need for interfaces with larger, well-deﬁned tar-

gets and simpliﬁed navigation structures (Kobayashi

et al., 2011).

A design framework for smartphone user inter-

faces tailored to elderly users emphasizes the need

for simpliﬁed layouts, larger icons, and customizable

settings to accommodate individual preferences and

abilities (Salman et al., 2023). Moreover, studies also

point to difﬁculties faced by the elderly with moving

targets, text entry on virtual keyboards, and dynamic

elements like scrolling text, underscoring the need for

intuitive and accessible input methods (Elguera Paez

and Zapata Del R

ıo, 2019).

2.2 Synthetic Data Generation (SDG)

The generation of synthetic time series data is a chal-

lenging yet has become increasingly crucial across di-

verse ﬁelds, from healthcare (Jamshidi et al., 2024) to

ﬁnance (Ranja et al., 2023). The growing use of data-

driven methods, privacy concerns about real-world

data, and the high costs and complexity of data acqui-

sition are some factors driving this demand. Gener-

ating synthetic sensor data is commonly achieved us-

ing Generative Adversarial Networks (GANs). These

networks include a generator, which generates syn-

thetic data based on real datasets, and a discrimina-

tor, which evaluates the data to identify whether it is

real or generated (Islam et al., 2022). TimeGANs are

a specialized form of GANs designed to capture the

temporal dependencies in time series data, which tra-

ditional GANs often fail to address adequately. This

is achieved by incorporating a seq2seq style adversar-

ial autoencoder that ensures the temporal distribution

of synthetic samples does not collapse (Yoon et al.,

2019; Beck and Chakraborty, 2024).

The DoppelGANger (DGANs) model, another

specialized GANs, is designed to handle the unique

challenges of complex time series data, such as long-

term temporal correlations (Lin et al., 2020). The

model leverages GANs to generate data, ensuring

that the synthetic data closely resembles training data

in terms of both temporal and feature characteristics

Lin et al. demonstrated the efﬁcacy of DGANs in gen-

erating synthetic network trafﬁc data, capturing struc-

tural properties, and achieving up to 43% better ﬁ-

delity than baseline methods (Lin et al., 2020). Dan-

nels utilized DGANs to generate synthetic time series

with associated recession indicators (Dannels, 2023).

The study showed that training forecasting models on

synthetic data improved short-range forecasting per-

formance for Treasury yields and enhanced the mod-

els’ ability to predict future recessions.

Gaussian Process Regression (GPR) is an-

other prominent method for generating synthetic

data (Schulz et al., 2018). GPR is a non-parametric

method, offering a means to quantify uncertainty in

predictions, which is critical for noisy or incomplete

real-world datasets. GPR deﬁnes a distribution over

functions, allowing synthetic data generation by sam-

pling from this distribution. Susiluoto et al. devel-

oped the satGP software, using GPR to generate syn-

thetic datasets from satellite observations by model-

ing the spatial and temporal dependencies in environ-

mental data for testing and validating predictive mod-

els (Susiluoto et al., 2020). In machine learning, GPR

has been used to generate synthetic datasets for eval-

uating algorithm performance under controlled con-

ICT4AWE 2025 - 11th International Conference on Information and Communication Technologies for Ageing Well and e-Health

128

ditions (Stephenson et al., 2022). By simulating data

with known properties, researchers can assess model

robustness and accuracy, supporting the development

of reliable and generalizable outcomes.

Long Short-Term Memory (LSTM) networks,

a type of recurrent neural network (RNN), are

also instrumental in generating synthetic time series

data (Hochreiter, 1997). LSTMs excel at capturing

long-term dependencies within sequential data, mak-

ing them suitable for tasks that require an under-

standing of temporal dynamics. Schwarz’s study uses

LSTM to generate a synthetic ﬁnancial time series

that closely represents the real market data’s probabil-

ity distributions (Schwarz, 2024). Notably, the model

outperforms traditional methods in non-linear scenar-

ios, offering robust applications in risk management,

scenario analysis, and trading strategy development.

Despite the growing importance of synthetic data

generation to address data scarcity, its application to

smartphone interaction data for elderly users remains

under-explored. This gap is particularly critical given

the need for inclusive technology design and the prob-

lems highlighted in Section 1. Building on our previ-

ous work (Maqbool et al., 2024), which focused on

generating synthetic drag-and-drop interaction data,

this study expands its scope to include elderly users

with shaky hands. It also includes additional interac-

tion tasks, such as tapping and tracing, to better sim-

ulate the range of motor control challenges faced by

this population.

3 METHODOLOGY

3.1 Target Population and Recruitment

Strategy

The study involved elderly participants aged 65 and

above, recruited through opportunistic sampling. Re-

cruitment relied on private and professional networks

to ensure access to this demographic, which is often

challenging to reach in research.

3.2 Questionnaire and Observation

At the start of the study, participants were asked a

structured questionnaire designed to collect demo-

graphic information and details about smartphone

usage habits. Participants’ smartphone interactions

were observed while performing tasks, focusing on

how they held and used the device. This observational

data complemented the questionnaire and offered in-

sights into interaction patterns.

3.3 Task 0: Hand Steadiness

Assessment

Participants completed a hand steadiness calibration

task to measure hand stability before starting the

smartphone interaction tasks. They sat on a chair,

placed the smartphone ﬂat on their palm with the

screen facing up, and were instructed to lift their

arm to chest level, holding the phone steady for 10

seconds with each hand. Hand movement/shakiness

was recorded using the smartphone’s built-in ac-

celerometer sensor, providing an objective measure-

ment (Polvorinos-Fern

andez et al., 2024). The ac-

celerometer sensor recorded three-dimensional (x, y,

z) acceleration data during this task.

Participants were categorized into two groups

based on the steadiness of their dominant hand during

calibration: those with minimal shakiness and those

with noticeable hand shakiness. The terms “shaky”

and “non-shaky” in this study do not indicate med-

ically diagnosed tremor-related conditions or severe

motor impairments, but instead reﬂect a comparative

difference in relative hand stability among partici-

pants.

Accelerometer data was preprocessed by trans-

forming three-dimensional acceleration into a scalar

magnitude using the Euclidean norm, preserving key

movement characteristics while simplifying the data.

Preprocessing also included outlier and noise re-

moval, data normalization, aligning timestamps to a

uniform interval, and resampling data points to ad-

dress sampling inconsistencies. A Butterworth Band-

Pass Filter (0.5–10 Hz) was applied to isolate hand

movement and shakiness frequencies while reducing

any potential sensor drift and data noise (Polvorinos-

Fern

andez et al., 2024). Finally, Gaussian smoothing

was used to reﬁne the signal for analysis.

Power Spectral Density (PSD) was used to ex-

tract frequency-domain features, focusing on the

3.5–7.5 Hz band, which corresponds to hand shaki-

ness frequencies during postural activities (Hess and

Pullman, 2012; Heida et al., 2013). Statistical fea-

tures within this frequency band and acceleration

summed magnitudes were derived from the ﬁltered

data. Participants were clustered using a threshold-

based method, where higher summed magnitudes and

frequencies within the speciﬁed band helped to iden-

tify “shaky” participants. K-Means clustering fur-

ther validated the “shaky” and “non-shaky” clusters

by grouping participants based on the extracted fea-

tures, ensuring the method’s robustness.

Digital Touchpoints: Generating Synthetic Data for Elderly Smartphone Interactions

129

3.4 Graphical User Interface (GUI)

Interaction Tasks

Participants engaged in a series of UI interaction

tasks using a custom-designed smartphone applica-

tion. These tasks were designed to assess common

touchscreen actions and various aspects of user inter-

action, including speed (time taken for each action),

accuracy (touch-points and tapping precision), preci-

sion (drag/tracing movements and control in position-

ing), and the number of attempts required to complete

each task.

1. Task 1: Participants tapped on a square button

of three sizes (48, 56, and 64 dp), randomly dis-

played on the screen locations. Each size ap-

peared 11 times, total 33 taps.

2. Task 2: Participants dragged a square button

(56 dp) from a starting position to a target box

(84 dp) that appeared randomly on the screen, re-

peated 11 times.

3. Task 3: Participants traced lines along the edges

of a square box, either once or several times.

3.5 Synthetic Data Generation

Building on the data collected and analyzed, this

phase focused on generating synthetic user interac-

tion data. We explored other GAN-based models

such as TIMEGANs and ACTGANs to assess their

potential for generating synthetic data. In our previ-

ous work, we used DoppelGANger (DGANs) to gen-

erate synthetic drag-and-drop interaction data (Maq-

bool et al., 2024). While DGANs demonstrated good

results, their computational demands limit practical

scalability, prompting this research to explore GPR

and LSTM models, considering comparative resource

efﬁciency and the required model complexity for gen-

erating synthetic data.

To generate synthetic data for Task 1, we em-

ployed GPR to model user taps on three square button

sizes. The GPR was trained on continues displace-

ment from the target center (Dx, Dy), timestamp data,

and encoded categorical features such as target size

and button location on the grid. The GPR kernel was

carefully constructed and ﬁne-tuned to model various

aspects of the data. The GPR kernel combined four

components. A Constant Kernel (C) was initialized

at 2.0 with bounds [10

−6

, 10

] to control the over-

all scale. An RBF Kernel for smooth relationships

and a Matern Kernel to account for less smooth varia-

tions. The RBF and Matern kernels had initial length

scales of 1.0, with bounds [10

−4

, 10

] and [10

−4

, 10

respectively. A White Kernel was initialized with a

noise level of 10

−5

and bounds [10

−9

, 10

], ensuring

adaptability to varying noise levels. The GPR ker-

nel was optimized iteratively for best-ﬁt parameters.

A regularization term α = 0.1 was added to prevent

overﬁtting and ensure robust predictions.

Feature preparation involved one-hot encoding

for categorical variables and scaling for continuous

variables to ensure standardization. The input ma-

trix combined these processed features into a uniﬁed

dataset for training. The GPR model optimization was

performed with the fmin l bfgs b optimizer, using

10 restarts to avoid local minima and allowing up to

1,000 iterations for effective convergence. Synthetic

taps were generated by sampling from the trained

GPR models, capturing realistic spatial and temporal

tap patterns and variability observed in the original

data.

To generate synthetic data for Task 2 and Task 3,

we ﬁrst preprocessed the dataset by removing outliers.

For Task 2, time series exceeding the fourth quartile

(Q4) in length were excluded to avoid skewing the

training process. For both tasks, the time series were

interpolated and resampled to ensure consistent se-

quence lengths across users, simplifying model train-

ing and ensuring uniform input data. To standardize

drag directions, Task 2 paths were preprocessed to

start from the top-right grid position relative to the tar-

get center, while Task 3 paths were aligned clockwise,

starting from the top-right corner. These adjustments

facilitated easier learning of patterns and ensured that

the trained model generalizes better, considering lim-

ited training data.

A bidirectional LSTM model was trained to pre-

dict drag and tracing paths based on timestamps,

capturing temporal correlations. Input data passed

through a bidirectional LSTM layer with dropout

and regularization to mitigate overﬁtting, followed

by a Dense output layer to predict numeric coordi-

nates while preserving temporal correlations. The

model compiled using Adam optimizer (learning rate:

0.0005) and was trained for 30 and 40 epochs with a

batch size of 8 and 20 for Task 2 and Task 3, respec-

tively. Early stopping and best-performing weights

ensured optimal performance. A custom callback

monitored the Mean Squared Error (MSE) loss and

the Mean Absolute Error (MAE) metric, ensuring ro-

bust model convergence.

Post-processing involved applying Exponential

Moving Average (EMA) smoothing to reduce noise

while preserving overall trends. Generated time

series were evaluated by comparing their distribu-

tions with the original data, using statistical mea-

sures (mean, std, etc.), Wasserstein distances (WD),

Jensen–Shannon distances (JSD), and qualitative as-

ICT4AWE 2025 - 11th International Conference on Information and Communication Technologies for Ageing Well and e-Health

130

sessments to analyze generated synthetic data and

model ﬁdelity (Stenger et al., 2024).

For Wasserstein Distance (WD) we compared nor-

malized (MinMaxScaler(-1, 1)) each synthetic time

series with all original time series. WD was calcu-

lated using optimal transport theory to determine the

minimum “cost” of transforming the synthetic distri-

bution into the original, with probability distributions

weights based on each value’s magnitude determined

its relative importance and proportional to drag/trace

paths. Each synthetic series was matched to the clos-

est original series by ﬁnding the smallest WD.

For Jensen-Shannon (JS) Distance, synthetic and

original time series values were binned into 50 equal-

sized bins, and the JS Distance was computed as

the square root of the average Kullback-Leibler (KL)

divergence between distributions. This symmetric,

bounded similarity measure (0 to 1) identiﬁed the

closest original series for each synthetic series by

minimizing JS distance.

3.6 Ethical Compliance

We applied the Etikpr

ovningsmyndigheten (EPM) -

(Dnr 2024-03934-01) to ensure ethical compliance.

The EPM has determined that our project is not sub-

ject to ethics review as it does not involve any inter-

ventions on research subjects or processing of per-

sonal data as deﬁned under Sections 3-4 of the Ethical

Review Act. Additionally, the Ethical Review Au-

thority has provided an advisory opinion stating that

there are no ethical objections to our research project.

Furthermore, participation in the study was volun-

tary and reporting was anonymous. Participants could

proceed upon reading the study information, getting

informed about their rights, and giving their consent.

4 RESULTS

4.1 Participants’ Information

A total of 51 elderly individuals from Sweden (25),

Pakistan (19), Italy (5), and Germany (2) participated

in the study. Based on the hand steadiness assessment

(described in Sec. 3.3), 21 participants were identi-

ﬁed as having higher levels of hand shakiness. Par-

ticipants were clustered into two groups using cut-

off thresholds determined by the midpoint between

k-means centroids from PSD analysis and summed

magnitude of acceleration. Participants were labeled

as “shaky” if the % of power in the 3.5–7.5 Hz fre-

quency band exceeded 22% and the summed magni-

tude of acceleration (Euclidean norm) was higher than

150.

Of the 21 participants, most were aged 65–69,

comprising 7 males (33%) and 2 females (10%). In

the 70–74 age group, there were 4 males (19%) and

4 females (19%), while the 80–84 group included 2

males (10%) and 2 females (10%).

Among 21 elderly with shaky hands, 9 used smart-

phones multiple times a day (8 males, 38%; 1 female,

4.8%). Another 10 used smartphones a few times

a day (5 males and 5 females, each 24%), while 2

females (10%) reported a few times a week usage.

These results indicate most participants use smart-

phones daily, with usage varying between frequent

and moderate levels.

During Task 1, 16 participants held the smart-

phone in their left hand and interacted using their

right-hand ﬁngers, 4 reversed this style, and 1 used

both hands for holding and both thumbs for interac-

tion. This interaction style remained consistent across

Tasks 2 and 3.

4.2 Graphical User Interface (GUI)

Interaction

4.2.1 Task 1

In general, for Task 1, the results showed that larger

button sizes (i.e., 56 dp and 64 dp) were associated

with slightly lower average tap durations compared

to 48 dp, and fewer repeated attempts—particularly

among shaky participants.

Tap Duration: Shaky and non-shaky partici-

pants had almost similar tap durations across all but-

ton sizes, with a mean of 1356 ms for shaky (ap-

prox. 7.6% longer) and 1260 ms for non-shaky. This

suggests that while shaky participants took slightly

longer on average to complete taps, they might also

exhibit comparatively different tap behaviors (e.g.,

more misses or corrections).

Participants with shaky hands were further ana-

lyzed based on changes in velocity between the ﬁrst

and second halves of the tapping task. The analy-

sis revealed that 19 participants had an increase in

average velocity in the second half (average 32%,

ranging between 10%-116%), suggesting familiar-

ity/adaptation to the task over time or improved motor

control. One participant maintained similar velocities

across both halves, reﬂecting consistent performance

throughout the task. In contrast, only one participant

had a decrease in velocity (8%) in the second half,

possibly due to fatigue, loss of focus, or reduced mo-

tor control as the session progressed.

Number of Attempts: The average number of at-

Digital Touchpoints: Generating Synthetic Data for Elderly Smartphone Interactions

131

Figure 1: Tap heatmaps for shaky vs. non-shaky partici-

pants.

tempts highlights the relationship between speed and

accuracy. Shaky participants took more attempts, es-

pecially on smaller button sizes. At 48 dp, shaky

participants averaged 2.2 attempts, 83% higher than

1.2 among non-shaky participants. As button size in-

creased to 56 dp and 64 dp, the shaky group’s mean

attempts dropped to 1.2 and 1.1, respectively. Overall,

shaky participants required more attempts (1.6) across

all button sizes than non-shaky participants (1.1), sug-

gesting higher difﬁculty in achieving a successful tap

on the ﬁrst try, particularly for smaller targets.

Tap Accuracy: For square buttons of widths 48,

56, and 64 dp, we assessed how often shaky hands

participants’ taps were scattered away from the tar-

get center location (Fig .1) and landed outside the tar-

get area. For 48 dp, shaky participants had roughly

40% of taps outside, compared to about 13% for

non-shaky. This gap became smaller as the button

size increased: for 56 dp and 64 dp, shaky partici-

pants’ outside-tap proportion dropped to about 16%

and 10% as compared to 10% and 6% for non-shaky

participants, respectively. This possibly reﬂects that

bigger target areas can reduce the impact of hand

shakiness. Overall, across all sizes, shaky participants

still registered a higher mean proportion of taps out-

side (25%) than non-shaky (10%).

Participants with shaky hands had higher average

tap deviations (distance) from the target button center

compared to non-shaky participants: 57 px (79%) vs.

43 px (58%) for 48 dp, 51 px (61%) vs. 46 px (55%)

for 56 dp, and 50 px (52%) vs. 45 px (49%) for

64 dp, with higher percentages indicating poorer pre-

cision. While the tap rate inside the button improved

for shaky participants with larger button sizes, the tap

distance increased, possibly due to inconsistent con-

trol over larger tap ranges.

The on-screen buttons were categorized based on

their positions within a 3x2 UI grid layout, con-

sisting of three rows (top, middle, bottom) and two

columns (left, right) for shaky participants’ accessi-

bility analysis. This division provided a structured

framework to analyze button appearances and interac-

tions across distinct screen regions. For small buttons,

the Top-Right grid was most challenging, with 37% of

taps outside boundaries. Medium-sized buttons im-

proved accuracy across most locations, with Bottom-

Left and Top-Right grids showing 19% and 16% of

taps outside boundaries, respectively. Larger buttons

achieved over 90% accuracy (taps inside) in most po-

sitions, demonstrating ease of use. Notably, Top-

Right reached 100% accuracy, while Bottom-Left had

92%.

In summary, participants with shaky hands tends

to tap almost the same as non-shaky participants but

were less accurate (higher outside rate for smaller but-

tons). As button size increases from 48 dp to 64 dp,

both groups see improvements in accuracy (fewer out-

side taps) and require fewer attempts overall, indicat-

ing that larger targets help accommodate user vari-

ability, particularly for those with hand shakiness.

4.2.2 Task 2

Analysis of the dragging task revealed underlying dif-

ferences in performance between participants with

shaky and non-shaky hands when interacting with a

56 dp button and dropping it into an 84 dp target. Al-

though the mean drag duration for shaky participants

was approximately 1435 ms, compared to 1450 ms

for non-shaky participants. However, the standard

deviation for shaky participants was 866 ms while

non-shaky participants had 749 ms. A slightly higher

standard deviation for shaky participants suggests that

their drag durations are more varied and less consis-

tent compared to non-shaky participants.

Overall, both groups required around 1.12 at-

tempts per trial, indicating comparable efﬁciency at

the task level despite the motor challenges faced by

shaky users. Success rates followed a similar pattern,

with shaky participants achieving about 91% success

rate and non-shaky participants around 92%.

Additional insights come from the velocity and

acceleration metrics. Shaky users showed slightly

higher mean velocities (877 px/s vs. 840 px/s) and

higher variability (standard deviation of 467 px/s

vs. 436 px/s), indicating faster, yet more inconsis-

tent movements compared to non-shaky users. The

distribution of mean velocity in Fig. 2 showed no-

table differences in variability. Non-shaky partici-

pants had a narrower distribution, indicating more

consistent performance, whereas shaky participants

had a broader spread, reﬂecting higher variability in

their mean velocity. A more pronounced difference

was also observed in acceleration, with shaky par-

ticipants showing a higher mean acceleration differ-

ence of 4,642 px/s

(31%) than non-shaky partici-

pants. This likely reﬂects abrupt or jerky changes in

ICT4AWE 2025 - 11th International Conference on Information and Communication Technologies for Ageing Well and e-Health

132

Figure 2: Distribution of mean velocity for shaky vs. non-

shaky participants.

Figure 3: Drag accuracy heatmaps (target box-view) for

shaky vs. non-shaky participants.

speed to correct their drag path due to handshakes.

Despite comparable success rates and the number

of attempts observed, participants with shaky hands

dropped the button farther from the target center than

non-shaky participants. Offset data (Fig. 3) show a

mean distance of 181 px (+24%) for shaky partici-

pants, compared to 146 px for non-shaky participants.

The higher offset indicates a tendency to drop the but-

ton near the edges of the target box, suggesting partic-

ipants still rely on the target’s tolerance to complete

tasks. The variability in accuracy was also notably

higher for shaky participants (std: 242 px) than for

non-shaky participants (std: 150 px). This highlights

that participants with shaky hands not only tended to

overshoot the target but also exhibited higher incon-

sistency in their interactions, highlighting the critical

impact of hand stability on precision.

4.2.3 Task 3

The analysis of the tracing task provides insights

into the interaction differences between participants

with comparatively shaky and non-shaky hand move-

ments. The results show notable differences in the

two groups’ attempts, tracing durations, and devi-

ations. Shaky participants required slightly more

attempts per trial, averaging 1.4 (std: 0.8), com-

pared to non-shaky participants, who averaged 1.3 at-

Figure 4: Elderly Users with Shaky Hands Tracing Patterns.

tempts (std: 0.6). Similarly, shaky participants had

a longer mean tracing duration of 6129 ms (std:

3780 ms), whereas non-shaky participants completed

the tasks faster, with an average duration of 4503 ms

(std: 2751 ms). The total deviation from the expected

path was also higher for shaky participants, averaging

45,183 px (std: 45,548 px), compared to 26,207 px

(std: 27,603 px) for non-shaky participants. Fig. 4

shows how elderly users with shaky hands trace a

square box, highlighting varied interaction patterns,

including differences in path smoothness, deviations,

movement dynamics, and completion times. While

shaky participants needed slightly more attempts than

non-shaky participants, their longer tracing durations

and higher deviations suggest increased difﬁculty in

maintaining precise control during the task.

The analysis of starting positions revealed that

most participants began tracing at the Top-Left (39

instances), followed by the Top-Right (16), with

fewer starting at the Bottom-Left (7) or Bottom-Right

(2). The tracing direction was predominantly clock-

wise (54 instances), with fewer participants tracing

counter-clockwise (10). These trends suggest a pref-

erence for speciﬁc starting points and movement pat-

terns, offering insights for designing tasks that align

better with user behaviors.

4.2.4 Observations

In addition to the data-driven analysis, we also ob-

served several important behaviors during the inter-

action tasks:

• Quick Taps: Overall, participants seemed to en-

joy the tapping tasks, almost like a simple game.

Digital Touchpoints: Generating Synthetic Data for Elderly Smartphone Interactions

133

Many participants showed quick responses when

tapping on different screen locations.

• Adjusting Smartphone Position: Many partici-

pants were observed repositioning the smartphone

using their hand holding the smartphone to com-

pensate range of hand interacting with the smart-

phone. This behavior allowed them to better align

their dominant hand with the on-screen targets,

potentially improving their interaction accuracy.

• Long Press: Despite the presence of vibration

feedback to conﬁrm successful taps, some partic-

ipants were observed holding their taps for ex-

tended durations. This behavior may reﬂect un-

certainty about whether the input was registered,

or an effort to stabilize their ﬁnger on the target.

• Preference for Anchoring: Participants fre-

quently stabilized their elbows on surfaces such

as a lap. The data show that participants during

Tasks 1, 2, and 3, usually rested their elbows on

their lap (20, 23, 23) or kept them tucked close

to their body (9, 11, 9), respectively. This stabi-

lization appeared to mitigate the effects of hand

shakiness and provided better control during in-

teraction tasks requiring ﬁner precision.

4.3 Synthetic Data Generation for

Shaky Hand Participants

4.3.1 Task 1

The training dataset consists of 828 tap events across

different button sizes and grid locations. The mean

Dx was 15 px (std: 91) and Dy was 23 px (std: 139),

indicating a tendency for taps slightly upward and

right from the button center on average. The mean

tap time was, 1,356 ms (std: 1,021), with min 148 ms

and max 12,261 ms.

The GPR model conﬁguration effectively cap-

tured both structured patterns and modeled variability

in the dataset. We generated synthetic data 15 times

the size of the training dataset (n= 12,420). The GPR-

synthetic dataset closely mirrors the properties of the

original dataset while offering consistency across grid

locations and button sizes. The synthetic data showed

mean displacements (Dx: 16 px, Dy: 24 px) and

tap duration (1,231 ms) closely matching the origi-

nal dataset. Variability, indicated by standard devi-

ations, was slightly lower in the synthetic data (Dx:

86 px, Dy: 130 px) compared to the original (Dx:

91 px, Dy: 138 px), overall synthetic data maintain-

ing the diversity of user tapping behaviors. Addition-

ally, the synthetic data preserved the tapping difﬁcul-

ties that participants encountered by reproducing er-

Figure 5: Task 1 - Tap Dx Density.

Figure 6: Task 1 - Tap Dy Density.

Figure 7: Task 1 - Tap Time Density.

ror rates across locations and button sizes, original:

n = 135 (16.3%) and synthetic: n = 1,977 (15.9%).

The synthetic data closely replicated the original data

distributions for Dx, Dy, and time, as also seen in the

plots in Fig .5, Fig .6 and Fig .7, with aligned cen-

tral peaks and preserved variability, including extreme

ranges.

4.3.2 Task 2

For Task 2, 225 time series were used to train and gen-

erate synthetic data. A Bidirectional LSTM layer with

512 units was conﬁgured to process combined numer-

ical (x and y drag paths) and categorical features (time

stamps/intervals). The model demonstrated rapid im-

provement during the initial epochs, with signiﬁcant

reductions in MSE and MAE by Epoch 6, followed by

ICT4AWE 2025 - 11th International Conference on Information and Communication Technologies for Ageing Well and e-Health

134

Figure 8: Task 2 - Drag x-axis Distribution.

Figure 9: Task 2 - Drag y-axis Distribution.

gradual convergence after Epoch 12. Minor loss os-

cillations after Epoch 16 likely reﬂected the model’s

ﬁne-tuning of predictions, driven by the interplay be-

tween numerical and categorical features.

The synthetic data closely replicated the original

dataset’s distribution for both X and Y axes, with

slightly higher means (X: 125 px vs. 116 px, Y:

299 px vs. 289 px), standard deviations (X: 183 vs.

178, Y: 373 vs. 367), and medians (X: 46 px vs.

39 px, Y: 146 px vs. 135 px), while ranges remained

consistent. The synthetic data slightly overestimates

average starting drag positions (points), with the x-

axis averaging 366 px compared to 339 px (ranges:

20–878 px vs. 3–850 px) and the y-axis averaging

821 px compared to 793 px (ranges: 52–1,838 px vs.

30–1,794 px).

These results indicate that overall, the synthetic

data effectively captures the distributional character-

istics of the original dataset for both axes, with minor

variations and variability. The histogram in Fig .8 and

Fig .9 also shows an overlap between the original and

synthetic data distributions. Both distributions peak

near zero, reﬂecting as user approach to the target lo-

cation.

For timestamp intervals, the synthetic data closely

matched the original in mean (146 ms), me-

Figure 10: Task 2 - Synthetic time series Nearest Neighbors.

Figure 11: Task 2 - PCA and T-SNE Analysis.

dian (144 ms), and standard deviation (84 ms), repli-

cating the overall distribution effectively. However,

it introduced negative values (min: -9 ms) that were

not in the original dataset and slightly overestimated

the maximum time intervals (306 ms vs. 290 ms), ex-

tending the range slightly.

Nearest neighbor (NN) analysis was conducted to

compare the synthetic time series data to the original

dataset’s three closest neighbors based on similarity in

time series patterns. Both synthetic and original time

series demonstrated consistent monotonic decreases

in relative distance over time, indicating that the syn-

thetic data effectively captures the temporal and struc-

tural patterns of the original dataset (see Fig .10).

The PCA plot also showed an overlap between

synthetic and original data (Fig .11), indicating the

synthetic data captures the global variance and diver-

sity of the original. The t-SNE plot further demon-

strates that the synthetic data maintains local struc-

tures and clustering, replicating intricate patterns of

the original dataset.

Furthermore, we calculated Wasserstein Dis-

tance (WD) for each synthetic time series by compar-

ing it to every time series in the original dataset. The

results demonstrated strong alignment, with a mean

WD of 0.00099, a median of 0.00063, and a standard

deviation of 0.00166 across 225 samples, indicating

Digital Touchpoints: Generating Synthetic Data for Elderly Smartphone Interactions

135

consistent similarity and minimal distance from their

closest original time series. The maximum WD was

0.02223, which remains acceptable given the data’s

small-scale normalization (-1,1). These ﬁndings con-

ﬁrm high ﬁdelity in the synthetic data generation pro-

cess, with only minor variations in a few cases.

With JS bounded being between 0 and 1, the JS

distance results indicate a small distance between syn-

thetic and original time series distributions. Across

225 samples, the mean JS distance was 0.15088, with

a standard deviation of 0.04881, suggesting consis-

tent yet slightly varied similarity levels. The mini-

mum JS distance was 0.05950 and the maximum was

0.35904, highlighting a few cases with comparatively

higher distance. Overall, the small WD values reﬂect

good spatial similarity, whereas the JS values high-

light minor discrepancies in the relative probability

distributions of the synthetic and original data.

4.3.3 Task 3

We used 38 time series for training and generating

synthetic data for Task 3. A single Bidirectional

LSTM layer with 512 units was conﬁgured to pro-

cess the combined numerical (x and y trace paths) and

categorical features (time stamps/intervals). The syn-

thetic data generation showed good results in repli-

cating the statistical properties of the original dataset.

For trace points, the means were similar (X: 371 px

vs. 375 px, Y: 379 px vs. 384 px), with slight reduc-

tions in variability (X std: 316 vs. 321, Y std: 321

vs. 325). The ranges were slightly narrower in the

synthetic data X: [-120, 868] vs. [-127, 888] and Y: [-

77, 860] vs. [-81, 870], reﬂecting a minor smoothing

effect. Temporal features were also consistent, with

time incremental means 162 (synthetic) vs. 163 (orig-

inal) and identical time interval means (1.53 vs. 1.54).

These results highlight the synthetic dataset’s ability

to preserve spatial and temporal patterns, while rep-

resenting extreme values. The histogram in Fig .12

and Fig .13 also shows an overlap between the origi-

nal and synthetic data distributions. Fig .14 shows a

plot of a generated sample of time series compared to

its closed sample in the original data.

Nearest neighbor (NN) analysis showed that syn-

thetic time series data effectively replicates the peak

structure, timing, and variability of their closest orig-

inal dataset neighbors (see Fig .15). The PCA and

t-SNE visualizations compare the diversity and dis-

tribution of synthetic and original data for the box-

tracing task (Fig .16). The PCA and t-SNE visualiza-

tions also show that synthetic data closely aligns with

the original data in capturing the geometric structure

(PCA) and clustering patterns (t-SNE) of the box-

tracing task.

Figure 12: Task 3 - Trace x-axis Distribution.

Figure 13: Task 3 - Trace y-axis Distribution.

Figure 14: Task 3 - Sample 1 Trace Plot.

Figure 15: Task 3 - Synthetic time series Nearest Neighbors.

ICT4AWE 2025 - 11th International Conference on Information and Communication Technologies for Ageing Well and e-Health

136

Figure 16: Task 3 - PCA and T-SNE Analysis.

We calculated the WD for each generated time

series. The results show that the generated time se-

ries needed less work/distance to move to their clos-

est time series in the original data, with a mean WD

of 0.00158 and a median of 0.000657. The low stan-

dard deviation (0.00217) reﬂects consistent ﬁdelity,

the synthetic data closely replicates the original time

series patterns with minimal variation. The JS Dis-

tance results also demonstrate high ﬁdelity in the syn-

thetic data generation process with low mean dis-

tance (0.0796), median (0.0556), and standard devi-

ation (0.0575).

5 DISCUSSION

5.1 RQ.1: GUI Interaction Patterns

Among Elderly Users

Our ﬁndings align with previous studies highlight-

ing the challenges of tapping for elderly users, par-

ticularly those with shaky hands. For instance, a

study highlighted that smaller target sizes raise error

rates (Hwangbo et al., 2013), which our results con-

ﬁrm as participants with shaky hands showed a 40%

error rate for 48 dp buttons compared to 13% for non-

shaky participants. Although Google accessibility

guidelines for mobile user interfaces suggest a min-

imum touch target size of 48 dp (Google, 2023), our

study found that this size may still be insufﬁcient for

elderly users with shaky hands. Therefore, increasing

button sizes beyond the standard recommendations

could further improve accessibility for this demo-

graphic. Similarly, studies advocated for larger tar-

gets to improve accessibility (Hwangbo et al., 2013;

Nicolau et al., 2014; Kobayashi et al., 2011; Salman

et al., 2023), a recommendation supported by our

ﬁndings that error rates for shaky participants de-

creased signiﬁcantly with larger button sizes (16% for

56 dp and 10% for 64 dp).

Our study also provided insights into the speed-

accuracy trade-offs in tapping tasks. We observed that

shaky participants managed to tap with a similar du-

ration to non-shaky participants, although with higher

error rates. This ﬁnding suggests a behavioral adapta-

tion or a compensatory mechanism where users prior-

itize speed over precision. Notably, participants gen-

erally approached tapping tasks as a game, respond-

ing quickly, which indicates positive engagement,

aligning with ﬁndings that elderly users often enjoy

simpliﬁed, gamiﬁed interactions (An et al., 2024).

The observed diversity in interaction strategies em-

phasizes the need for customizable interfaces (Brun-

zini et al., 2022). While some users prefer quick

taps, others may prefer slower, more deliberate inter-

actions, indicating that a one-size-ﬁts-all approach is

insufﬁcient to accommodate diverse user preferences.

Previous research has identiﬁed dragging as par-

ticularly challenging for elderly users (Salman et al.,

2019; Brunzini et al., 2022; Shao et al., 2023). Our

ﬁndings revealed minimal differences between shaky

and non-shaky participants in duration and attempts

during drag-and-drop tasks. However, shaky par-

ticipants showed higher variability in velocities and

higher accelerations, along with a 24% higher offset

distance from the target center, indicating a tendency

to overshoot targets more frequently.

While (Shao et al., 2023) described a two-phase

dragging approach, initial movement followed by pre-

cision ﬁne-tuning, our ﬁndings did not identify a dis-

tinct calibration phase. Instead, the higher veloc-

ity and acceleration metrics (indicating abrupt, jerky

movements) among shaky participants suggest re-

liance on corrective actions during drags or traces

rather than a deliberate two-step strategy.

5.2 RQ.2: Synthetic Data Generation

A key challenge is generating synthetic data that ac-

curately reﬂects the complexities of real user behav-

ior. Synthetic data should preserve the underlying

patterns and behaviors of the original dataset. Find-

ings presented and discussed in Sec. 4.3 show that our

generative models, GPR for tapping and Bidirectional

LSTM for dragging and tracing, effectively captured

older adults’ UI interaction patterns with high ﬁdelity

in spatial, temporal, and distributional patterns. Be-

yond visual or descriptive comparisons, quantitative

assessments like Nearest Neighbor analyses, Wasser-

stein Distances (WD), and Jensen–Shannon (JS) Dis-

tances, also showed minimal divergence between real

and synthetic data. Overall, GPR and LSTM mod-

els are capable of identifying the distinctive gestures

of the elderly, particularly when training data in-

cludes variations in button sizes, grid locations, and

drag/trace paths.

Digital Touchpoints: Generating Synthetic Data for Elderly Smartphone Interactions

137

The ﬁndings also highlight the potential of syn-

thetic data to replicate human interaction patterns,

aligning with earlier research (Breuer et al., 2024)

on its effectiveness in data-scarce scenarios. Simi-

larly, (Brandt and Dasgupta, 2023) highlighted syn-

thetic data utility in modeling complex behaviors, re-

inforcing its value as a complement to real user inter-

actions in usability and accessibility evaluations. The

synthetic datasets’ ability to generate user “error” be-

haviors (e.g., off-target taps) is critical for modeling

real-world usability: systems designed for the elderly

must account for occasional mis-taps or mis-drags

due to reduced dexterity or low precision. By repro-

ducing these errors, synthetic datasets can help eval-

uators anticipate where and how elderly with shaky

hands might struggle.

Beyond the purely technical metrics, an important

consequence of these high-ﬁdelity results is the op-

portunity to scale up usability and accessibility evalu-

ations of early-stage design/prototypes. Since recruit-

ing the elderly can be challenging, large and diverse

synthetic datasets can be generated to test multiple

interface layouts or interaction elements. Thus, syn-

thetic data can further support the development of AI-

driven usability evaluation tools, discussed in our pre-

vious study (Maqbool et al., 2024), promoting inno-

vative and cost-effective design evaluation processes.

5.3 Implications

The study’s ﬁndings present the following key re-

search and practical implications:

• Existing accessibility guidelines may be insufﬁ-

cient for users with motor impairments. The study

showed that the minimum recommended UI ele-

ment size still signiﬁcantly increased error rates,

suggesting that accessibility standards and guide-

lines require further empirical validation for such

user groups.

• Increasing button sizes reduced errors and im-

proved accuracy for elderly users, particularly

those with shaky hands. UX designers, in general,

should prioritize larger, well-spaced UI elements

to improve accessibility.

• The study revealed variability in elderly users’ in-

teraction styles—some preferred quick taps de-

spite lower accuracy, while others took a more de-

liberate approach. A one-size-ﬁts-all approach is

inadequate; applications should provide adaptable

GUIs or user-adjustable settings (e.g., touch sen-

sitivity, input delay buffers) to accommodate dif-

ferent motor abilities. Furthermore, future work

can explore how built-in device sensors (e.g.,

accelerometer) could automatically detect hand

shakiness and trigger adaptive GUIs.

• Some elderly users held their taps longer than nec-

essary, possibly due to uncertainty about whether

the input was registered. More explicit haptic, vi-

sual, or audio feedback mechanisms should be in-

tegrated to conﬁrm user actions and reduce uncer-

tainty.

• The study demonstrated that synthetic data can

closely replicate real user interactions, including

errors like mis-taps and inaccurate drag comple-

tions. Future research should explore synthetic

data’s applicability to additional interaction types,

such as scrolling through lists and on-screen key-

board usage.

• The high ﬁdelity of generated synthetic data sug-

gests that early-stage usability and accessibility

evaluations can be conducted using ML-generated

GUI interaction datasets before involving actual

users. This may reduce the time and costs of itera-

tive UI prototyping, particularly for accessibility-

focused design. Additionally, AI-generated syn-

thetic users could complement real-world user

testing, enabling hybrid human-AI usability eval-

uation methods.

5.4 Threats to Validity

External Validity: Our study included elderly par-

ticipants from different countries, like Sweden, Pak-

istan, Italy, and Germany. However, we acknowl-

edge that global representation and generalizability

require even wider demographic diversity. Nonethe-

less, securing 51 participants and conducting a com-

prehensive analysis was a signiﬁcant achievement, es-

pecially given the ethical, privacy, and resource con-

straints inherent in such research. Conducting tasks

in natural environments also helped ensure realistic

assessments of smartphone interactions.

Construct Validity: A threat was that our

threshold-based clustering could lead to under- or

over-estimation of “shaky” or “non-shaky” partici-

pants. To mitigate this, we used cross-validations with

PSD analysis, band-pass ﬁltering, and clustering al-

gorithms like K-means to reﬁne and validate the clus-

tering process. The inclusion of speciﬁc frequency

bands also ensured alignment with the literature on

motor control and shakiness.

Internal Validity: For RQ.1, the study refrains

from making causal claims and instead focuses on

presenting data and argumentation to explore elderly

interaction patterns and associated challenges. How-

ever, uncontrollable factors, such as environmental

ICT4AWE 2025 - 11th International Conference on Information and Communication Technologies for Ageing Well and e-Health

138

variations, may have inﬂuenced the results.

6 CONCLUSIONS AND FUTURE

WORK

This study explored the smartphone interaction pat-

terns of elderly users, focusing on those with shaky

hands, through designing speciﬁc touchscreen tasks.

Participants with shaky hands encountered distinct

difﬁculties in touchscreen interactions, especially

with smaller buttons, abrupt velocities during drag-

ging tasks, and path deviations during tracing tasks

where precision and stability were critical. In con-

trast, larger GUI elements were more effective in ac-

commodating their variability in motor control.

GPR and LSTM models successfully generated

synthetic datasets, replicating interaction patterns

with high spatial, temporal, and distributional ﬁdelity,

demonstrating their utility for future AI-driven usabil-

ity and accessibility evaluation research.

Future studies could explore complex interactions

like scrolling, multi-gesture, and text input, include

participants with motor impairments (e.g., Parkin-

son’s), and investigate adaptive UI designs that adjust

to motor limitations. Synthetic datasets can also be

used to develop predictive tools for accessibility and

usability evaluations.

ACKNOWLEDGEMENTS

This work was partly funded by Region V

armland

through the DHINO project (ref: RUN/220266) and

DHINO 2 project (ref: 2023/828).

REFERENCES

An, S., Cheung, C. F., and Willoughby, K. W. (2024).

A gamiﬁcation approach for enhancing older adults’

technology adoption and knowledge transfer: A case

study in mobile payments technology. Technological

Forecasting and Social Change, 205:123456.

Beck, J. and Chakraborty, S. (2024). Fully embedded time

series generative adversarial networks. Neural Com-

puting and Applications, pages 1–10.

Brandt, B. and Dasgupta, P. (2023). Synthetically gener-

ating human-like data for sequential decision-making

tasks via reward-shaped imitation learning. In Syn-

thetic Data for Artiﬁcial Intelligence and Machine

Learning: Tools, Techniques, and Applications, vol-

ume 12529, pages 151–163. SPIE.

Breuer, T., Fuhr, N., and Schaer, P. (2024). Validating syn-

thetic usage data in living lab environments. ACM

Journal of Data and Information Quality, 16(1):1–33.

Brunzini, A., Papetti, A., Grassetti, F., Moroncini, G., and

Germani, M. (2022). The effect of systemic sclero-

sis on use of mobile touchscreen interfaces: Design

guidelines and physio-rehabilitation. International

Journal of Industrial Ergonomics, 87:103256.

Butcher, C. J. and Hussain, W. (2022). Digital healthcare:

the future. Future healthcare journal, 9(2):113–117.

Dannels, S. (2023). Creating disasters: Recession fore-

casting with gan-generated synthetic time series data.

arXiv preprint arXiv:2302.10490.

Davies, K. (2024). Share of smartphone

users in germany 2021, by age group.

https://www.statista.com/statistics/469969/share-

of-smartphone-users-in-germany-by-age-group/.

Elboim-Gabyzon, M., Weiss, P. L., and Danial-Saad, A.

(2021). Effect of age on the touchscreen manipula-

tion ability of community-dwelling adults. Interna-

tional Journal of Environmental Research and Public

Health, 18(4):2094.

Elguera Paez, L. and Zapata Del R

ıo, C. (2019). Elderly

users and their main challenges usability with mobile

applications: a systematic review. In Design, User Ex-

perience, and Usability. Design Philosophy and The-

ory: 8th International Conference, DUXU 2019, Held

as Part of the 21st HCI International Conference,

HCII 2019, Orlando, FL, USA, July 26–31, 2019, Pro-

ceedings, Part I 21, pages 423–438. Springer.

Google (2023). Material design guidelines - touch tar-

get size. https://support.google.com/accessibility/

android/answer/7101858?hl=en. Accessed: 2024-03-

01.

Heida, T., Wentink, E. C., and Marani, E. (2013). Power

spectral density analysis of physiological, rest and ac-

tion tremor in parkinson’s disease patients treated with

deep brain stimulation. Journal of neuroengineering

and rehabilitation, 10:1–11.

Hess, C. W. and Pullman, S. L. (2012). Tremor: clinical

phenomenology and assessment techniques. Tremor

and other hyperkinetic movements, 2.

Hochreiter, S. (1997). Long short-term memory. Neural

Computation MIT-Press.

Hwangbo, H., Yoon, S. H., Jin, B. S., Han, Y. S., and Ji,

Y. G. (2013). A study of pointing performance of el-

derly users on smartphones. International Journal of

Human-Computer Interaction, 29(9):604–618.

Islam, M. M., Nooruddin, S., Karray, F., and Muhammad,

G. (2022). Human activity recognition using tools of

convolutional neural networks: A state of the art re-

view, data sets, challenges, and future prospects. Com-

puters in biology and medicine, 149:106060.

Jamshidi, A., Arif, M., Kalhoro, S. A., and Gelbukh, A.

(2024). Synthetic time series data generation for

healthcare applications: A pcg case study. arXiv

preprint arXiv:2412.16207.

Jiang, T., Li, W., and Liu, J. (2024). The landscape of data

reuse in interactive information retrieval: Motivations,

sources, and evaluation of reusability. arXiv preprint

arXiv:2411.15430.

Digital Touchpoints: Generating Synthetic Data for Elderly Smartphone Interactions

139

Joshi, S. G. (2018). Confronting common assumptions

about the psychomotor abilities of older adults in-

teracting with touchscreens. In Human Aspects of

IT for the Aged Population. Acceptance, Communica-

tion and Participation: 4th International Conference,

ITAP 2018, Held as Part of HCI International 2018,

Las Vegas, NV, USA, July 15–20, 2018, Proceedings,

Part I 4, pages 261–278. Springer.

Kobayashi, M., Hiyama, A., Miura, T., Asakawa, C., Hi-

rose, M., and Ifukube, T. (2011). Elderly user evalua-

tion of mobile touchscreen interactions. In Human-

Computer Interaction–INTERACT 2011: 13th IFIP

TC 13 International Conference, Lisbon, Portugal,

September 5-9, 2011, Proceedings, Part I 13, pages

83–99. Springer.

Lin, Z., Jain, A., Wang, C., Fanti, G., and Sekar, V. (2020).

Using gans for sharing networked time series data:

Challenges, initial promise, and open questions. In

Proceedings of the ACM Internet Measurement Con-

ference, pages 464–483.

Maqbool, B. and Herold, S. (2024). Potential effectiveness

and efﬁciency issues in usability evaluation within

digital health: A systematic literature review. Jour-

nal of Systems and Software, 208:111881.

Maqbool, B., Jalal, L., and Herold, S. (2024). Towards us-

ing synthetic user interaction data in digital healthcare

usability evaluation. In BIOSTEC (2), pages 595–603.

Nicolau, H., Guerreiro, T., Jorge, J., and Gonc¸alves, D.

(2014). Mobile touchscreen user interfaces: bridg-

ing the gap between motor-impaired and able-bodied

users. Universal access in the information society,

13:303–313.

Nurgalieva, L., Laconich, J. J. J., Baez, M., Casati, F., and

Marchese, M. (2019). A systematic literature review

of research-derived touchscreen design guidelines for

older adults. IEEE Access, 7:22035–22058.

O’Dea, S. (2021). Uk: smartphone owner-

ship by age from 2012–2021. Online.

https://www.statista.com/statistics/271851/

smartphone-owners-in-the-united-kingdom-uk-by-age.

Pew Trusts (2019). Poor usability of electronic health

records can lead to drug errors, jeopardizing pediatric

patients. Accessed: 2024-12-25.

Polvorinos-Fern

andez, C., Sigcha, L., de Pablo, L. P., Borz

ı,

L., Cardoso, P., Costa, N., Costa, S., L

opez, J. M.,

de Arcas, G., and Pav

on, I. (2024). Evaluation of the

performance of wearables’ inertial sensors for the di-

agnosis of resting tremor in parkinson’s disease. In

Proceedings of the 17th International Joint Confer-

ence on Biomedical Engineering Systems and Tech-

nologies (BIOSTEC 2024), volume 2, pages 820–827.

SCITEPRESS.

Ranja, F., Nababan, E. B., and Candra, A. (2023). Synthetic

data generation using time-generative adversarial net-

work (time-gan) to predict cash atm. In 2023 Interna-

tional Conference on Computer, Control, Informatics

and its Applications (IC3INA), pages 418–423. IEEE.

Ratwani, R. M., Savage, E., Will, A., Fong, A., Karavite,

D., Muthu, N., Rivera, A. J., Gibson, C., Asmonga,

D., Moscovitch, B., et al. (2018). Identifying elec-

tronic health record usability and safety challenges in

pediatric settings. Health affairs, 37(11):1752–1759.

Salman, H. M., Wan Ahmad, W. F., and Sulaiman, S.

(2019). Usability evaluation of smartphone gestures in

supporting elderly users. In Advances in Visual Infor-

matics: 6th International Visual Informatics Confer-

ence, IVIC 2019, Bangi, Malaysia, November 19–21,

2019, Proceedings 6, pages 672–683. Springer.

Salman, H. M., Wan Ahmad, W. F., and Sulaiman, S.

(2023). A design framework of a smartphone user

interface for elderly users. Universal Access in the

Information Society, 22(2):489–509.

Schulz, E., Speekenbrink, M., and Krause, A. (2018). A tu-

torial on gaussian process regression: Modelling, ex-

ploring, and exploiting functions. Journal of mathe-

matical psychology, 85:1–16.

Schwarz, C. (2024). Interpretable genai: Synthetic ﬁnancial

time series generation with probabilistic lstm. Avail-

able at SSRN 4877007.

Shamsujjoha, M., Grundy, J., Li, L., Khalajzadeh, H., and

Lu, Q. (2021). Human-centric issues in ehealth app

development and usage: A preliminary assessment.

In 2021 IEEE International Conference on Software

Analysis, Evolution and Reengineering (SANER),

pages 506–510. IEEE.

Shao, Y., Zhou, J., and Wang, W. (2023). Smartphone touch

gesture for right-handed older adults: touch perfor-

mance and offset models. Journal of Ambient Intelli-

gence and Humanized Computing, 14(3):2549–2566.

Sheppard, B., Kouyoumjian, G., Sarrazin, H., and Dore, F.

(2018). The business value of design. mckinsey &

company.

Sinabell, I. and Ammenwerth, E. (2024). Challenges and

recommendations for ehealth usability evaluation with

elderly users: systematic review and case study. Uni-

versal Access in the Information Society, 23(1):455–

474.

Stenger, M., Leppich, R., Foster, I., Kounev, S., and Bauer,

A. (2024). Evaluation is key: a survey on evalua-

tion measures for synthetic time series. Journal of Big

Data, 11(1):66.

Stephenson, A., Allison, R., and Pyzer-Knapp, E. (2022).

Provably reliable large-scale sampling from gaussian

processes. arXiv preprint arXiv:2211.08036.

Susiluoto, J., Spantini, A., Haario, H., H

ark

onen, T., and

Marzouk, Y. (2020). Efﬁcient multi-scale gaussian

process regression for massive remote sensing data

with satgp v0. 1.2. Geoscientiﬁc Model Development,

13(7):3439–3463.

Wegge, K. P. and Zimmermann, D. (2007). Accessibility,

usability, safety, ergonomics: concepts, models, and

differences. In Universal Acess in Human Computer

Interaction. Coping with Diversity: 4th International

Conference on Universal Access in Human-Computer

Interaction, UAHCI 2007, Held as Part of HCI Inter-

national 2007, Beijing, China, July 22-27, 2007, Pro-

ceedings, Part I 4, pages 294–301. Springer.

Yoon, J., Jarrett, D., and Van der Schaar, M. (2019). Time-

series generative adversarial networks. Advances in

neural information processing systems, 32.

ICT4AWE 2025 - 11th International Conference on Information and Communication Technologies for Ageing Well and e-Health

140