Context Data Compact Prediction Tree (CD-CPT): Transforming User

Experience Through Predictive Analysis

Pooja Goyal

, Md Khorrom Khan

, Natnael Teshome

, Brendan Geary

and Renee Bryce

Computer Science & Engineering, University of North Texas, Denton, Texas, U.S.A.

Computer Science, Florida Polytechnic University, Florida, U.S.A.

ﬂ

Keywords:

Context-Aware Applications, Sequence Prediction, Sequential Rule Mining, Compact Prediction Tree,

Transition Directed Acyclic Graph, Prediction by Pattern Mining, All-k Order Markov, Dependency Graph,

Android Testing, Context Aware Environments, Mobile Application Testing.

Abstract:

Use of IoT (Internet of Things) devices have signiﬁcantly increased over the last decade, speciﬁcally smart-

phones as compared to desktops, and laptops have become an integral part of our everyday lives. Smartphone

applications operate in dynamic environments and generate huge and vast amount of context events such as

screen orientation, location, battery life, and network connectivity throughout the day. Such context events

may affect usage of the smartphone and smartphone applications by the user and the behaviour of these ap-

plications, Sparsity and complexity of these events make it difﬁcult to identify patterns and trends in the data

using traditional data mining techniques. Hence, predictive analysis of these events and ﬁnding patterns in

context event data can have drastic impact on the application usage and enhance user experience. Prediction

trees can be used to predict future events based on the context of past events, This work proposes a modiﬁed

method of Compact Prediction Tree (CPT) called Context Data Compact Prediction Tree (CD-CPT) to pre-

dict real-world context data for multiple users. The experiments conducted used Transition Directed Acyclic

Graph (TDAG) and All-k Order Markov (AKOM) algorithms to generate short-term predictions based on

current context events and compare with baseline models such as Prediction by Pattern Mining (PPM), De-

pendency Graph (DG), CPT, and CPT+. The experimental results indicate that AKOM and TDAG outperform

other algorithms, achieving a 50.4% weighted F-1 score for the highest supported context event. CD-CPT,

without referencing the test ﬁle, still achieves a 14.27% weighted F-1 score for the same event, showing po-

tential for improved accuracy in predicting context data compared to other algorithm.

1 INTRODUCTION

As smartphone usage continues to rise worldwide, the

growing availability of these devices necessitates the

optimization of applications for improved efﬁciency

and security (Data.ai, 2022).Numerous devices em-

ploy context-aware apps that adapt to alterations in

their surroundings as android applications are evolv-

ing to be more complex and sophisticated (and, 2022).

Smartphones, equipped with complex hardware and

software, generate an extensive range of data about

context events, i.e., connecting to WiFi, connect-

ing/disconnecting a headset, changing screen orien-

tation, bluetooth, location changes, etc (Rahmati and

Zhong, 2012). By interpreting this data, we can better

utilize time and resources, while also providing de-

velopers with valuable information for testing appli-

cations (Goyal et al., 2023) with respect to important

sequences of context events.

This study primarily aims to determine if algo-

rithms from SPMF (Fournier Viger, 2016), including

a modiﬁed version of the Compact Prediction Tree

(CPT) (Gueniche et al., 2013) called Context Data

Compact Prediction Tree (CD-CPT), Transition Di-

rected Acyclic Graph (TDAG)(Laird and Saul, 1994)

and All-k Order Markov (AKOM)(Pitkow and Pirolli,

1999), may identify patterns in real-world context

data sequences from users. We compare AKOM and

TDAG to a majority baseline of algorithms, including

CPT, Compact Prediction Tree+ (CPT+)(Gueniche

et al., 2015), Dependency Graph (DG)(Padmanabhan

and Mogul, 1996), and Prediction by Partial Match-

ing (PPM)(Cleary and Witten, 1984). We utilize a

dataset of context data from 58 real-world Android

users on different devices for 30 day periods us-

ing the ContextMon application(Piparia et al., 2021)

166

Goyal, P., Khan, M., Teshome, N., Geary, B. and Bryce, R.

Context Data Compact Prediction Tree (CD-CPT): Transforming User Experience Through Predictive Analysis.

DOI: 10.5220/0012615800003705

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 9th International Conference on Internet of Things, Big Data and Security (IoTBDS 2024), pages 166-173

ISBN: 978-989-758-699-6; ISSN: 2184-4976

(Goyal et al., 2023). We modiﬁed the Java version of

SPMF(Fournier-Viger et al., 2014) to enable the CPT

algorithm to generate multiple context data sequences

representing different use cases.

This work uses SPMF (Fournier Viger, 2016) i.e, a

java based library that have around 250+ algorithms,

this work modiﬁes SPMF’s CPT model to make rele-

vant predictions, referring to this modiﬁed version as

CD-CPT. In order to train the CD-CPT model, we uti-

lized real-world context data sequences from 15 An-

droid devices, as the CPT model portion of CD-CPT

had limitations and we could not use all 47 sequences

for training. Subsequently, we employed this trained

model to predict context event sequences for the next

11 Android devices, and compared these results with

the sequences of context data obtained from the re-

maining 11 real-world Android devices volunteered

by users. As for AKOM, TDAG, PPM, CPT+, CPT,

and DG models, we utilized all 47 sequences as a train

ﬁle, as it did not take much time to train these mod-

els. For predicting the next context event, we utilized

a window of two for PPM and DG, a window of ﬁve

for CPT and CPT+, and a window of four for AKOM

and TDAG. These different window sizes proved to

be beneﬁcial in creating the best performing parame-

ters for each of the algorithms.

The following sections in this paper are arranged

as follows: Section II describes background infor-

mation on context events and related concepts, Sec-

tion III explores the use of CD-CPT, AKOM, TDAG,

PPM, CPT, CPT+, and DG models. Section IV cov-

ers data collection, algorithm model implementation,

and the modiﬁcation of the Java version of SPMF’s

CPT model to develop CD-CPT. Section V outlines

the evaluation metrics and experimental setup. Sec-

tion VI assesses the prediction success of CD-CPT,

TDAG, and AKOM, and discusses the results derived

from the study. Finally, Section VII of the paper pro-

vides the conclusion and outlines potential avenues

for future exploration.

2 BACKGROUND AND RELATED

WORK

In this section, we provide a review of background

work in context-aware computing, context event pre-

diction, and optimization of smartphone applications,

and highlight how our research differs from previous

works in these areas. To better understand the signif-

icance and relevance of this research, we delve into

the broader ﬁeld of context-aware computing, context

event prediction, and optimization of smartphone ap-

plications.

Context events in smartphone applications are a

2-tuple (x, y), where x denotes the context category

and y represents the context action. These events

may inﬂuence the way a smartphone application re-

acts(Dey, 2001), By incorporating context-awareness

in the algorithms for predicting context events, our

paper aims to improve the efﬁciency and personaliza-

tion of smartphone applications. Examples of con-

text events includes changes to network connections,

volume adjustments, battery level changes, and more.

Some applications may adjust behavior due to a con-

text event change. For instance, an app may choose to

respond differently when the battery is low. Research

on context events in smartphones over the past few

years(Rahmati and Zhong, 2012) have highlighted the

potential applications and limitations of this knowl-

edge. By understanding context events, developers

can optimize applications for improved user experi-

ence and energy efﬁciency. However, challenges arise

due to the complexity of context events and concerns

about user privacy.Within the ﬁeld of smartphone test-

ing for context events, a variety of approaches such

as sensor-based testing, user-based testing, and hy-

brid testing are typically used to achieve a thorough

insight into context usage. Despite this, these tech-

niques struggle with achieving accuracy, managing

time complexity, and representing a wide range of

users in the simulation of real-world events (Bosmans

et al., 2019). Sequence prediction strives to forecast

the next event or symbol in a sequence based on his-

torical data. Several sequence prediction algorithms

exist, addressing different aspects of sequence pre-

diction and offering diverse levels of performance,

complexity, and applicability. Prediction by Partial

Matching (PPM)(Cleary and Witten, 1984), a fast and

simple sequence prediction model that remains popu-

lar today, despite being less accurate than some newer

models. PPM has been used in various applications,

such as identifying manufacturing patterns.

An Another study i.e, AppsPred (Sarker and

Salah, 2019) is a data-driven model that utilizes real-

world data collected from university students to pre-

dict smartphone app usage based on daily life activ-

ities. This model’s performance is attributed to its

optimal use of decision trees within a forest, outper-

forming other machine learning techniques. How-

ever, the study’s dataset was limited in size and

only focused on single-user predictions. In con-

trast, our CD CPT model predicts event sequences

by analyzing data from multiple users. In addition,

the ”BehavDT”(Sarker et al., 2020) model addressed

the problem of building behavioral activities using a

context-aware predictive model by considering indi-

vidual user preference levels. It is worth noting that

Context Data Compact Prediction Tree (CD-CPT): Transforming User Experience Through Predictive Analysis

167

CD CPT and BehavDT differ in their primary fo-

cus; while our model predicts event sequences, Be-

havDT is designed to build behavioral activities using

a context-aware predictive model.

The Dependency Graph (DG)(Padmanabhan and

Mogul, 1996) algorithm, a sequence prediction model

that utilizes directed graphs to describe dependen-

cies between entities in a system. DG has demon-

strated good performance and memory efﬁciency in

different applications. The Compact Prediction Tree

(CPT)(Gueniche et al., 2013) algorithm, a novel se-

quence prediction model that losslessly compresses

training data with low time complexity. The model

has been applied to robotic systems for predicting

event sequences and enabling quick learning. Build-

ing on the CPT model, proposed CPT+ (Gueniche

et al., 2015), an improved version designed to reduce

time and space complexity. CPT+ has shown sig-

niﬁcant improvements in performance and efﬁciency

compared to the original CPT.

The Compact Prediction Tree (CPT)(Gueniche

et al., 2013) is a sequence prediction algorithm cho-

sen for this study due to its ability to losslessly com-

press training data, ensuring all relevant informa-

tion is retained for subsequent predictions (Mani and

Suneetha, 2020). However, the original CPT model

outputs data for only one predicted context event

rather than an entire sequence of predictions for a new

user’s context data.To address this limitation, we im-

plemented our own modiﬁcations to CPT, creating the

Context Data Compact Prediction Tree (CD-CPT).

This modiﬁed algorithm is capable of exploring and

outputting predictions based on the inputted sequence

length for each new user’s context events, generating

multiple predictions for several new users in a single

runtime, and exporting the results into a CSV ﬁle for

further analysis. These enhancements make CD-CPT

more suitable for predicting context data in real-world

scenarios.

In this study, we leverage these existing algo-

rithms, including TDAG (Transition Directed Acyclic

Graph) and AKOM (All-k Order Markov) to gener-

ate short-term predictions based on current context

events. We propose a modiﬁed method of CPT, called

Context Data Compact Prediction Tree (CD-CPT), to

predict real-world context data for multiple users. By

comparing AKOM, TDAG, and other algorithms such

as PPM, DG, CPT, and CPT+, we establish a majority

baseline for context data prediction.

2.1 Sequence Pattern Mining

Techniques

SPMF. To facilitate the implementation and compar-

ison of CPT, CD-CPT, and other related algorithms,

we employed the SPMF open-source data mining li-

brary. We opted for the Java version, as it provides

access to additional algorithms and allows for cus-

tomization of the code to obtain the desired output

AKOM and TDAG. All-k Order Markov and Tran-

sition Directed Acyclic Graph are sequence predic-

tion models combining Markovian models of orders

1 to K . The techniques mentioned in this context

ﬁnd applications in diverse domains, such as natu-

ral language processing and image captioning. The

K value is a user-adjustable parameter inﬂuencing the

look-up window size and prediction accuracy. How-

ever, larger K values may consume more memory,

making it less optimal than other algorithms with

lower memory usage. In a separate study, AKOM,

alongside the Long Short-Term Memory (LSTM)

model(Hochreiter and Schmidhuber, 1997), Depen-

dency Graph (DG), and Prediction by Pattern Mining

(PPM), was found to be among the highest perfor-

mance were utilized to forecast the upcoming three

activities.(Tax, 2018).

Figure 1: A prediction Tree (PT), Inverted Index (II) and a

Lookup Table [2].

CPT. The Compact Prediction Tree (CPT) algorithm

emphasizes lossless compression and low time com-

plexity. Our modiﬁed version, CD-CPT, expands its

capabilities to predict entire sequences of context data

instead of single predictions. In Figure 1, the CPT al-

gorithm’s structure is illustrated, highlighting the use

of a Prediction Tree (PT), an Inverted Index (II), and

a Lookup Table (LT)(Gueniche et al., 2013). These

components work together to enable efﬁcient and ac-

curate sequence predictions. In a different study, the

CPT model was successfully applied to predict event

sequences and enable quick learning in a robotic sys-

tem, demonstrating the algorithm’s adaptability and

potential for various applications(Persia et al., 2020).

CPT+. The Enhanced Compact Prediction Tree

(CPT+) algorithm is an upgraded version of the orig-

IoTBDS 2024 - 9th International Conference on Internet of Things, Big Data and Security

168

inal CPT model that addresses limitations like time

and space complexity. It stands out from other exam-

ined techniques due to its superior performance, efﬁ-

ciency, and its ability to be more compact and faster

in predicting sequences.

CD-CPT: builds upon the original CPT model, en-

hancing its time and space complexity, which enables

it to handle larger datasets and deliver more accu-

rate predictions. This improvement distinguishes CD-

CPT from other algorithms examined in this research,

making it a powerful tool for predicting context data

sequences across various scenarios.

DG. Dependency Graph (DG) is a sequence predic-

tion model is used in this study due to its memory

efﬁciency and ability to predict future symbols or

events based on training sequences. It is compati-

ble with the Java version of the SPMF library, allow-

ing seamless integration with other SPMF algorithms

used in this research. Originally designed for reduc-

ing user-perceived latency by predicting and prefetch-

ing ﬁles, DG offers good performance in analyzing

context events.

Figure 2: Research Methodology for AKOM and TDAG.

Figure 3: Research Methodology for PPM.

Figure 4: Research Methodology for CD-CPT.

PPM. The Prediction by Partial Matching (PPM)

model, used in this study, is a fast and simple

sequence prediction method, applicable to various

ﬁelds. Although newer models like CPT+ may of-

fer increased accuracy, PPM’s versatility and adapt-

ability make it a relevant choice for predicting con-

text data. Despite the differences between context

data and natural language, PPM delivered a fair score

when processing and predicting the dataset in this re-

search(Gellert et al., 2021).

Remote MySQL Database. A MySQL database

combines context events from devices in the UNT

context events database used in this study. It re-

trieves context events from a local SQLite database on

a user’s smartphone at 15-minute intervals and trans-

mits the information to a remote server via HTTP and

pushed it to MySQL database.

SQLite. SQLite is a built-in, serverless SQL database

engine for the Android operating system . This soft-

ware library allows the smartphone application to

store context events locally on the device.

3 RESEARCH METHODOLOGY

Figures 2, 3, 4 to represent the research methodol-

ogy for the AKOM, TDAG, PPM, and CD-CPT algo-

rithms.

Figure 2 illustrates the research methodology

for AKOM and TDAG, sequence prediction mod-

els known as context trees. These models combine

Markovian models of different orders and adjust the

input window size, represented by parameter K, to

balance accuracy and memory consumption

Figure 3 illustrates the research methodology for

PPM, which uses a sequence database for predictions.

We experimented with parameters such as look-up

Context Data Compact Prediction Tree (CD-CPT): Transforming User Experience Through Predictive Analysis

169

Table 1: PPM compared to relative algorithms’ precision, recall, and F-1 score.

Algorithm Support Precision Recall F-1

PPM (order of 2) 13939 38.6 42.5 39.0

CPT (Original model) (window 5) 5575 8.6 19.7 10.6

CPT+(window 5) 5575 6.8 16.6 8.6

AKOM (order of 4) 6969 55.4 55.2 54.4

DG (order of 2) 13939 29.4 34.2 31.2

TDAG (order of 4) 6969 55.4 55.2 54.4

Scores that were produced by comparing the test ﬁles to AKOM and TDAG

prediction’s accuracy alongside the relative algorithms’ prediction accuracy

Table 2: Precision, Recall, and F-1 score from CD-CPT

Model.

Precision 12.46

Recall 11.67

F-1 11.36

Total support size 27880

window, sequence window, and train ﬁle length to

optimize performance. The look-up window, deter-

mining the number of previous events considered, is

crucial in PPM.

Figure 4 depicts data extraction and preprocess-

ing for the CD-CPT algorithm, which predicts context

data sequences instead of single events. By analyzing

key patterns and trends, we improved CD-CPT’s abil-

ity to make accurate predictions.

4 EMPIRICAL STUDY

4.1 Evaluation Metrics

To evaluate the effectiveness of our approach, we de-

veloped two scoring systems for calculating preci-

sion, recall, and F-1 scores. These systems were cre-

ated using Python with Google Colab for the CD-CPT

algorithm and Java for the rest of the non-modiﬁed

SPMF algorithms. Both methods provide accurate

metrics based on our prediction models and test ﬁles.

4.2 Experimental Setup

The experiments compare the performance of CD-

CPT, TDAG/AKOM,PPM, CPT+, CPT, and DG for

prediction of real-world context events. The exper-

iments in this study address the following research

questions:

RQ1. What is the effectiveness of AKOM and TDAG

compare to CD-CPT in predicting sequences of real-

world context events, as measured by precision, re-

call, and F-1 score?

RQ2. What is the comparative effectiveness of

AKOM and TDAG against other algorithms like

CPT+, CPT, DG, and PPM in predicting real-world

context events, as measured by recall, precision, and

F-1 score?

RQ3. How does the accuracy of CD-CPT compare

to that of AKOM and TDAG in analyzing the most

frequently occurring context events in the test ﬁle?

Table 2 shows the performance of CD-CPT, model

yields higher precision compared to recall, with

scores calculated based on each context event’s sup-

port size, leading to more accurate results. We further

analyzed the most frequently occurring events in the

11 real-world context data sequences from users and

CD-CPT’s performance in predicting them as shown

in Table 3. Conversely, we examined the least occur-

ring context events in the 11 user sequences. Since

these events did not appear in the test sequences, CD-

CPT scores were set to 0. CD-CPT operates by pre-

dicting context events for a given number of users and

their sequence lengths. After generating predictions,

they are compared to the test ﬁle content to assess ac-

curacy through precision, recall, and F-1 score. The

performance of AKOM and TDAG, as shown in Ta-

ble 4, demonstrates their ability to make good predic-

tions compared to the majority baseline. Both models

exhibit slightly stronger recall than precision. We fur-

ther analyze the most and least frequently occurring

events in the 11 real-world context data sequences

from users and the performance of AKOM and TDAG

in prediction. AKOM and TDAG signiﬁcantly out-

perform PPM, CPT, CPT+, and DG as shown in ta-

ble 1 Support size ﬂuctuations between different al-

gorithm models are due to speciﬁc look-up window

sizes. PPM and DG have a small look-up window of

2, while AKOM and TDAG have a look-up window

of 4, and CPT and CPT+ have the largest look-up win-

dow of 5. AKOM and TDAG use the ﬁrst four context

events from the test ﬁle to predict subsequent events

and generate the support size.

IoTBDS 2024 - 9th International Conference on Internet of Things, Big Data and Security

170

Table 3: Most frequent Context Events for CD-CPT.

Event Support Precision Recall F-1

4 (’data connection’, ’lte connected’) 3232 12.77 16.18 14.27

22 (’audio’, ’audio effects opened’) 2880 28.97 24.72 26.68

21 (’audio’, ’audio effects closed’) 2738 32.79 26.66 29.41

6 (’data connection’, ’wiﬁ connected’) 2557 10.25 14.12 11.88

2 (’conﬁguration’, ’changed’) 2243 8.07 12.80 9.90

Scores produced by comparing CD-CPT prediction results csv ﬁle to the test csv ﬁle.

Table 4: Most frequent Context Events for AKOM and TDAG using an order of 4.

Event Support Precision Recall F-1

21 (’audio’, ’audio effects closed’) 760 82.4 87.9 85.0

4 (’data connection’, ’lte connected’) 819 46.5 55.1 50.4

6 (’data connection’, ’wiﬁ connected’) 653 51.4 64.3 57.1

22 (“’audio’, ’audio effects opened’) 633 87.1 89.4 88.2

2 (’conﬁguration’, ’changed’) 586 48.1 46.6 47.4

Scores produced by comparing AKOM and TDAG prediction’s alongside the test ﬁle.

5 RESULTS AND DISCUSSION

RQ1. We compared AKOM and TDAG to CD-CPT

using weighted averages for precision, recall, and F-1

score from 11 real-world context data sequences. CD-

CPT’s results as shown in Table 2 used 15 training

sequences and achieved an F1-Score of 11.36% with

a support of 27,880. AKOM and TDAG as shown

in Table 1 used 47 training sequences and achieved a

weighted F-1 score of 54.4% with a support of 6,969.

The accuracy difference is signiﬁcant; AKOM and

TDAG outperformed CD-CPT but required test ﬁle

reference. CD-CPT had lower scores but predicted

entire sequences without referencing real-world con-

text data. In summary, AKOM and TDAG excel in

small event windows, while CD-CPT is better for pre-

dicting entire sequences without reference.

RQ2. Table 1 shows that AKOM and TDAG have

similar performance and do better than the majority

baseline when comparing their precision, recall, and

F-1 scores. Their precision and recall stand at 55.4%

and 55.2%, respectively, while the next best from the

baseline (PPM) has 38.6% precision and 42.5% re-

call. The F-1 scores for AKOM and TDAG are 55.4%,

with PPM having 39.0%. Different window sizes af-

fect support: AKOM and TDAG use a window of 4

(support of 6,969), PPM and DG use a window of 2

(support of 13,939), and CPT and CPT+ use a win-

dow of 5 (support of 5,575). Despite support differ-

ences, the scores remained consistent when adjusting

for normal count. In conclusion, AKOM and TDAG

signiﬁcantly outperform the majority baseline.

RQ3. To evaluate RQ3, we compared the high-

est supported context events for CD-CPT and

AKOM/TDAG. CD-CPT’s highest supported context

event ’data connection’, ’lte connected’) had a sup-

port of 3,232 and a weighted F-1 score of 14.27%.

In contrast, AKOM and TDAG had the same con-

text event with a support of 819 and a weighted F-1

score of 50.4%. CD-CPT uses 15 sequences of user

context events without referencing the test ﬁle, while

AKOM and TDAG have constant access to the test

ﬁle. The latter models have signiﬁcantly lower sup-

port due to their window size of 4, which leads to

discarded context events from the test ﬁle. This fac-

tor likely contributes to their higher F-1 scores. In

conclusion, AKOM and TDAG achieve more accurate

predictions with access to the test ﬁle, while CD-CPT

generates decent predictions without referencing the

test ﬁle.

In summary, RQ1, RQ2, and RQ3 offers guid-

ance for researchers when considering the results of

AKOM, TDAG, and CD-CPT. RQ1 highlights the

suitability of each algorithm model depending on

project requirements. RQ2 demonstrates that AKOM

and TDAG outperform similar models, suggesting

they are superior for short context event predictions.

Lastly, RQ3 provides insight into the accuracy of

AKOM, TDAG, and CD-CPT for speciﬁc context

event predictions, which could be valuable for appli-

cations testing the likelihood of particular events oc-

curring.

Context Data Compact Prediction Tree (CD-CPT): Transforming User Experience Through Predictive Analysis

171

5.1 Threats to Validity

The users, user behaviors, and devices/apps that they

used may not represent all users. We tried to minimize

this threat by collecting data from 58 test subjects

over a 30 day period. CD-CPT was tested on a limited

set of 11 sequences of real-world context events from

users and was optimized speciﬁcally for this dataset

to improve its performance. These optimizations in-

cluded adjusting the CPT-prediction scores for each

predicted context event after it was predicted, encour-

aging the model to explore. Other optimizations in-

volved adjusting the CPT prediction scores for other

context events based on their probability of occur-

rence after speciﬁc context events.To mitigate these

threats to validity, the data was cleaned for redun-

dancy and transformed to ensure compatibility with

the algorithm’s input format. Additionally, future re-

search may examine testing the model on larger and

more diverse dataset to better assess generalization of

this research.

6 CONCLUSIONS AND FUTURE

WORK

In this paper, we investigate various sequence predic-

tion algorithms, such as AKOM, TDAG, PPM, DG,

CPT, and CPT+, to predict real-world context data for

smartphones, and propose a new method called CD-

CPT (Context Data Compact Prediction Tree) for im-

proved performance. The results show that AKOM

and TDAG had the highest F-1 score of 54.4% with

a look-up window of four, while PPM had an F-1

score of 39.0% and performed the best with a look-

up window of two. CPT+ had a lower F-1 score of

8.6 compared to the other algorithms. CD-CPT, our

proposed method, was able to predict sequences of

real-world context data from users with an F-1 score

of 11.36% using only the training model. Overall, the

ﬁndings suggest that AKOM and TDAG are more ac-

curate for single event predictions and CD-CPT was

better at predicting full sequences of context data. Fu-

ture work may use AKOM or TDAG during software

testing to monitor different patterns of context events.

Researchers may further investigate the use of CD-

CPT for prediction and compare full sequences of

context data from users. The study highlights the im-

portance of choosing appropriate algorithms for pre-

dicting context data on smartphones, as this can sig-

niﬁcantly impact the performance and user experience

of various applications

Future work may examine fault ﬁnding and ef-

fectiveness of integrating context event sequences

into automated testing processes. Future work may

also explore CD-CPT applied to domains such as

smart watches, healthcare devices, various Internet of

Things (IoT) devices and autonomous vehicles.

ACKNOWLEDGEMENTS

This work was supported by NSF grant #2149969.

REFERENCES

(2022). Android Releases — Android Developers.

Bosmans, S., Mercelis, S., Denil, J., and Hellinckx, P.

(2019). Testing iot systems using a hybrid simulation

based testing approach. Computing, 101:857–872.

Cleary, J. and Witten, I. (1984). Data compression using

adaptive coding and partial string matching. IEEE

transactions on Communications, 32(4):396–402.

Data.ai (12 Jan 2022). Number of Mobile App Downloads

Worldwide from 2016 to 2021 (in Billions). Statistica.

Dey, A. K. (2001). Understanding and using context. Per-

sonal and ubiquitous computing, 5:4–7.

Fournier-Viger, P., Gomariz, A., Gueniche, T., Soltani, A.,

Wu, C.-W., Tseng, V. S., et al. (2014). Spmf: a java

open-source pattern mining library. J. Mach. Learn.

Res., 15(1):3389–3393.

Fournier Viger, P. e. a. (2016). The SPMF Open-Source

Data Mining Library Version 2. Proc. 19th European

Conference on Principles of Data Mining and Knowl-

edge Discovery (PKDD 2016) Part III, pages 36–40.

Gellert, A., Precup, S.-A., Pirvu, B.-C., Fiore, U., Zam-

ﬁrescu, C.-B., and Palmieri, F. (2021). An empirical

evaluation of prediction by partial matching in assem-

bly assistance systems. Applied Sciences, 11(7):3278.

Goyal, P., Khan, M. K., Steil, C., Martel, S. M., and Bryce,

R. (2023). Smartphone context event sequence pre-

diction with poermh and tke-rules algorithms. In

2023 IEEE 13th Annual Computing and Communica-

tion Workshop and Conference (CCWC), pages 0827–

0834. IEEE.

Gueniche, T., Fournier-Viger, P., Raman, R., and Tseng,

V. S. (2015). Cpt+: Decreasing the time/space com-

plexity of the compact prediction tree. In Advances in

Knowledge Discovery and Data Mining: 19th Paciﬁc-

Asia Conference, PAKDD 2015, Ho Chi Minh City,

Vietnam, May 19-22, 2015, Proceedings, Part II 19,

pages 625–636. Springer.

Gueniche, T., Fournier-Viger, P., and Tseng, V. S. (2013).

Compact prediction tree: A lossless model for ac-

curate sequence prediction. In Advanced Data Min-

ing and Applications: 9th International Confer-

ence, ADMA 2013, Hangzhou, China, December 14-

16, 2013, Proceedings, Part II 9, pages 177–188.

Springer.

Hochreiter, S. and Schmidhuber, J. (1997). Long short-term

memory. Neural computation, 9(8):1735–1780.

IoTBDS 2024 - 9th International Conference on Internet of Things, Big Data and Security

172

Laird, P. and Saul, R. (1994). Discrete sequence prediction

and its applications. Machine learning, 15:43–68.

Mani, K. and Suneetha, K. (2020). Performance evalu-

ation of compact prediction tree algorithm for web

page prediction. In 2020 International Conference on

Emerging Trends in Information Technology and En-

gineering (ic-ETITE), pages 1–7. IEEE.

Padmanabhan, V. N. and Mogul, J. C. (1996). Using predic-

tive prefetching to improve world wide web latency.

ACM SIGCOMM Computer Communication Review,

26(3):22–36.

Persia, F., D’Auria, D., and Pilato, G. (2020). Fast learn-

ing and prediction of event sequences in a robotic sys-

tem. In 2020 Fourth IEEE International Conference

on Robotic Computing (IRC), pages 447–452. IEEE.

Piparia, S., Khan, M. K., and Bryce, R. (2021). Discov-

ery of real world context event patterns for smart-

phone devices using conditional random ﬁelds. In

ITNG 2021 18th International Conference on Infor-

mation Technology-New Generations, pages 221–227.

Springer.

Pitkow, J. and Pirolli, P. (1999). Mininglongestrepeatin

g subsequencestopredict worldwidewebsurﬁng. In

Proc. UsENIX symp. on Internet Technologies and

systems, volume 1.

Rahmati, A. and Zhong, L. (2012). Studying smartphone

usage: Lessons from a four-month ﬁeld study. IEEE

Transactions on Mobile Computing, 12(7):1417–

1427.

Sarker, I. H., Colman, A., Han, J., Khan, A. I., Abushark,

Y. B., and Salah, K. (2020). Behavdt: a behavioral de-

cision tree learning to build user-centric context-aware

predictive model. Mobile Networks and Applications,

25(3):1151–1161.

Sarker, I. H. and Salah, K. (2019). Appspred: predicting

context-aware smartphone apps using random forest

learning. Internet of Things, 8:100106.

Tax, N. (2018). Human activity prediction in smart home

environments with lstm neural networks. In 2018 14th

International Conference on Intelligent Environments

(IE), pages 40–47. IEEE.

Context Data Compact Prediction Tree (CD-CPT): Transforming User Experience Through Predictive Analysis

173