A Smart Hybrid Enhanced Recommendation and Personalization

Algorithm Using Machine Learning

Aswin Kumar Nalluri and Yan Zhang

School of Computer Science and Engineering, California State University San Bernardino, 5500 University Parkway,

San Bernardino, CA, 92407, U.S.A.

Keywords:

Personalized Movie Recommendation, Hybrid Filtering, Content-Based Filtering, Term Frequency-Inverse

Document Frequency, Collaborative Filtering, Alternating Least Squares.

Abstract:

In today’s era of streaming services, the effectiveness and precision of recommendation systems are pivotal

in enhancing user satisfaction. Traditional recommendation systems often grapple with challenges such as

data sparsity in user-item interactions, the need for parallel processing, and increased computational demands

due to matrix densiﬁcation, all of which hinder the overall efﬁciency and scalability of recommendation sys-

tems. To address these issues, we proposed the Smart Hybrid Enhanced Recommendation and Personalization

Algorithm (SHERPA), a cutting-edge machine learning approach designed to revolutionize movie recom-

mendations. SHERPA combines Term Frequency-Inverse Document Frequency (TF-IDF) for content-based

ﬁltering and Alternating Least Squares (ALS) with weighted regularization for collaborative ﬁltering, offering

a sophisticated method for delivering personalized suggestions. We evaluated the proposed SHERPA algo-

rithm using a dataset of over 50 million ratings from 480,000 Netﬂix users, covering 17,000 movie titles.

The performance of SHERPA was meticulously compared to traditional hybrid models, demonstrating a 70%

improvement in prediction accuracy based on Root Mean Square Error (RMSE) metrics during the training,

testing, and validation phases. These ﬁndings underscore SHERPA’s ability to discern and cater to users’ nu-

anced preferences, marking a signiﬁcant advancement in personalized recommendation systems.

1 INTRODUCTION

In recent years, personalized recommendation sys-

tems have gained signiﬁcant popularity due to the

growing prevalence of online shopping platforms, so-

cial networks, and streaming services. Consider the

last time you tried to choose a movie on a stream-

ing site — it wasn’t easy, was it? The challenge lies

in the limitations of the engines behind those ”Rec-

ommended for You” lists. These systems often rely

on what you’ve already watched (collaborative ﬁlter-

ing) (Ni et al., 2021) or suggest content based on gen-

res you seem to prefer (content-based ﬁltering) (Per-

mana and Wibowo, 2023)(Philip et al., 2014). How-

ever, they frequently end up showing you more of the

same, making it difﬁcult to discover something new

and exciting. This highlights the need for a smarter

approach which truly understands your current mood

by blending various advanced techniques from the

world of machine learning, introducing you to con-

tent you’ll genuinely enjoy.

https://orcid.org/0000-0002-5474-4019

In the competitive landscape of streaming plat-

forms, the key to success hinges on engaging and de-

lighting audiences. A crucial element in achieving

this is providing movie recommendations that capti-

vate viewers, almost like a touch of magic. Getting

these recommendations right can increase user reten-

tion and encourage word-of-mouth promotion, which

is vital in the ongoing streaming wars. It’s not just

about suggesting what an algorithm thinks you should

watch; it’s about understanding what viewers really

want to see next, turning casual viewers into devoted

fans eager to discover their next favorite movies.

This project introduces the Smart Hybrid En-

hanced Recommendation and Personalization Algo-

rithm (SHERPA) with the goal of revolutionizng

movie recommendation processes. SHERPA com-

bines collaborative ﬁltering, content-based ﬁltering,

and advanced machine learning techniques to deliver

tailored, accurate, and personalized content recom-

mendations. Our goal is to simplify the movie discov-

ery process by aligning recommendations with your

preferences, not just based on what you’ve already

Nalluri, A. and Zhang, Y.

A Smart Hybrid Enhanced Recommendation and Personalization Algorithm Using Machine Learning.

DOI: 10.5220/0013064100003838

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 16th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2024) - Volume 1: KDIR, pages 465-472

ISBN: 978-989-758-716-0; ISSN: 2184-3228

465

seen. The focus is on creating a journey of content ex-

ploration that resonates with you, because, ultimately,

every movie night should be about discovering some-

thing that truly hits the spot. SHERPA aims to elimi-

nate the need for endless scrolling, ensuring that ﬁnd-

ing your next favorite movie is just a click away.

2 RELATED WORK

Traditional machine learning approaches in recom-

mendation systems primarily focus on collaborative

ﬁltering and content-based ﬁltering strategies (Ni

et al., 2021). Collaborative ﬁltering predicts user

preferences by analyzing interactions and drawing

insights from user behavior (Son and Kim, 2017).

While this technique is widely used for its simplicity

and effectiveness, it often faces challenges, particu-

larly with new users (the cold start problem) and spar-

sity in user-item interactions (Wu et al., 2018)(Rahul

et al., 2021).

Content-based ﬁltering, on the other hand, sug-

gests items based on their features and user prefer-

ences, emphasizing item metadata (Permana and Wi-

bowo, 2023). However, this method may lead to a

lack of diversity in recommendations, as it tends to

suggest items similar to those the user has already in-

teracted with (Philip et al., 2014).

Recent advancements in recommendation systems

have made signiﬁcant progress in overcoming these

limitations. Techniques such as Singular Value De-

composition (SVD) have been employed to analyze

user-item interactions and predict ratings by uncov-

ering latent factors (Rahul et al., 2021). Addition-

ally, new algorithms like Alternating Least Squares

(ALS) with Weighted Regularization have enhanced

collaborative ﬁltering by prioritizing known interac-

tions and incorporating regularization to prevent over-

ﬁtting (SurvyanaWahyudi et al., 2017).

By combining these approaches, hybrid models

that integrate elements of both content-based and

collaborative ﬁltering have been developed (Burke,

2002). These hybrid systems provide more compre-

hensive recommendations by considering both user

behavior and content characteristics (Parthasarathy

and Sathiya Devi, 2023). Zhou, et al. proposed

an collaborative ﬁltering algorithm Alternating-Least-

Squares with Weighted-λ-regularization (ALS-WR),

which is implemented on a parallel Matlab platform.

They claimed that the performance of ALS-WR (in

terms of root mean squared error (RMSE)) mono-

tonically improves with both the number of features

and the number of ALS iterations (Zhou et al., 2008).

Chiny, et al. implemented a recommendation System

based on TF-IDF and Cosine Similarity (Chiny et al.,

2022). Hybrid systems not only improve the preci-

sion of recommendations but also offers a deeper un-

derstanding of user preferences and content relevance,

paving the way for a new era in recommendation sys-

tems (Parthasarathy and Sathiya Devi, 2023).

3 DATASET AND

PREPROCESSING

3.1 Dataset

The project involves two main datasets: the Movie Ti-

tles dataset and the Movie Ratings dataset, which are

included in the Netﬂix Prize dataset posted on Kag-

gle (Netﬂix, 2006).

The Movie Titles dataset contains the information

of 17,770 movies, with each movie represented as a

tuple in the form: <Movie ID, Release Year, Movie

Title, Director, Cast, Genre, Overview>. The origi-

nal Movie Titles dataset contains Movie ID, Release

Year, and Movie Title information of movies. We get

extra information about these movies such as Direc-

tor, Cast, Genre, Overview of Movie, from IMDB, an

online database of information related to ﬁlms, televi-

sion series, etc.

This dataset provides a comprehensive overview

of movies released from 1890 to 2005, with titles in

English. The following is an examples of movie en-

tries:

• Example: <1, 2003, Dinosaur Planet, Christian

Slater, Scott Sampson, Animation, A four-episode

animated series charting the adventures of four di-

nosaurs each on a different continent in the pre-

historic world.>. This tuple shows that the movie

ID is 1, the release year of this movie is 2003,

the movie title is Dinosaur Planet, the director is

Christian Slater, the cast is Scott Sampson, and

the genre is Animation.

The Movie Ratings dataset comprises over 50

million ratings from 480,189 Netﬂix users, covering

17,770 movie titles, collected between October 1896

and December 2005. Each rating entry or instance

contains User ID, Movie ID, Date of Rating, and

Rating. Movie IDs are sequentially numbered from

1 to 17770. User IDs range from 1 to 2,649,429,

with some numbers missing, representing a total of

480,189 users. Date of Ratings are consistently for-

matted as YYYY-MM-DD across all ﬁles. Ratings

are on a ﬁve-star scale, ranging from 1 to 5 to show

user opinion, where 5 represents the highest rating.

To ensure customer privacy, unique customer IDs

KDIR 2024 - 16th International Conference on Knowledge Discovery and Information Retrieval

466

have been anonymized. The 50 million movie rat-

ings dataset is splitted into three datasets. The train-

ing dataset contains total of 35, 721, 947 ratings, the

test dataset contains total of 7, 654, 704 ratings, and

the validation dataset contains total of 7, 654, 704 rat-

ings. The following is an examples of movie rating

instances:

• Example: 1, 401047, 4, 2005-06-03

This example shows that the user with ID 401047

rated the movie with ID 1 as 4 stars on June 3,

2005.

3.2 Data Preprocessing

During the data preprocessing stage, we structured

unprocessed data to align with the machine learning

model’s format for effective learning. This involved

parsing data from a ﬁle, extracting movie IDs, cus-

tomer IDs, and ratings, and structuring them into a

list. We converted this list into a pandas DataFrame

for easier manipulation and handled format issues by

skipping lines that didn’t match the expected format.

Additionally, we cleaned the data by replacing any

NaN values with empty strings, preparing it for fur-

ther analysis.

4 METHODOLOGIES

Recommendation systems use ﬁltering algorithms

to provide recommendations to users. These al-

gorithms are classiﬁed or categorized majorly into

collaborative-based ﬁltering, content- based ﬁltering,

and hybrid algorithms. The proposed Smart Hybrid

Enhanced Recommendation and Personaliza- tion

Algorithm (SHERPA) integrates Term Frequency-

Inverse Document Frequency (TF-IDF) for content-

based ﬁltering and Alternating Least Squares (ALS)

with weighted regularization for collaborative ﬁlter-

ing, offering a sophisticated method for delivering

personalized suggestions.

4.1 Term Frequency-Inverse Document

Frequency (TF-IDF)

Term Frequency-Inverse Document Frequency (TF-

IDF) is a statistical measure used to evaluate how

important a word in a document within a collection

of texts known as a corpus (Rajaraman and Ullman,

2011). It is often used in text mining and informa-

tion retrieval to weight and evaluate words differently

based on their importance to a document relative to a

collection. Words that are frequent in one document

but less common across others receive a TF-IDF value

suggesting they could be crucial, for comprehending

the content of that document (Chiny et al., 2022).

Term Frequency (TF) is the number of times a

term appears in a document relative to the total word

count of that document. TF is calculated using Equa-

tion 1 as follows (Rajaraman and Ullman, 2011):

t f (t, d) =

t,d

, (1)

where N

t,d

represents the number of times that term

t occurs in document d, and N

represents the total

number of terms in the document d.

Inverse Document Frequency (IDF) measures the

rarity of a term across all documents. IDF is calcu-

lated using Equation 2 as follows (Rajaraman and Ull-

man, 2011):

id f (t, D) = log

d ∈ D : t ∈ d

, (2)

where N is the total number of documents in the col-

lection in the corpus N = |D|;

d ∈ D : t ∈ d

is the

number of documents where the term t appears.

By combining Equation 1 and Equation 2, The

TF-IDF score for term t in document d is calculated

as follows:

t f id f (t, d, D) = t f (t, d) × id f (t, D) (3)

Words with high TF-IDF scores in a document are

used more in that document and less in others, making

them key indicators of what the document is about.

4.2 Singular Value Decomposition

(SVD)

Singular Value Decomposition (SVD) is a matrix de-

composition method that allows you to approximate

a matrix as a product of 3 matrices (Kadhim et al.,

2017). This process allows us to uncover connections

in the data. For example, when we have information

about how users rated items such as movies, but not

every user rates every item, SVD comes in to com-

plete the missing information (Widiyaningtyas et al.,

2022). The SVD of an m × n complex matrix M is a

factorization of the form

M = U ×

∑

, (4)

where M is the original user item rating matrix, U is

the matrix where each row represents a user in terms

of latent factors, Σ is a diagonal matrix with singular

values that indicate the importance of each latent fac-

tor, V

is the transpose of a matrix where each column

represents an item in terms of latent factors.

A Smart Hybrid Enhanced Recommendation and Personalization Algorithm Using Machine Learning

467

4.3 Alternating Least Squares (ALS)

Alternating Least Squares (ALS) is a technique that

handles sparse data by optimizing matrix factoriza-

tion process by breaking it down into two smaller

or more manageable subproblems (Tak

acs and Tikk,

2012). Unlike Singular Value Decomposition (SVD),

which considers all entries in the user-item interac-

tion matrix (including unknown or missing values),

ALS focuses only on the known ratings and it scales

well for large datasets and integrates regularization di-

rectly to prevent overﬁtting, making it ideal for col-

laborative ﬁltering (Pil

aszy et al., 2010).

ALS with Weighted-λ-Regularization is an en-

hancement to the standard ALS approach. It intro-

duces a regularization term to the optimization pro-

cess, which helps to avoid overﬁtting a common prob-

lem where a model performs well on the training data

but poorly on unseen data. The goal of ALS with

Weighted-λ-Regularization is to ﬁnd user and item

feature matrices that predict how users would rate

items, even new or previously unrated ones (Zhou

et al., 2008).

The effectiveness of this method is measured by

a loss function that captures two things (Zhou et al.,

2008):

• How well the model predicts the known ratings.

• How complex the model is (the size of the user

and item feature matrices).

The loss function is represented mathematically as:

f (U, M) =

∑

(i, j)∈I

i j

− u

)

+ λ

∑

∥u

∥

∑

∥m

∥

(5)

where r

i j

is the actual rating of item j by user i, u

the feature vector representing user i, m

is the fea-

ture vector representing item j, I is the set of all (user,

item) pairs for which the rating is known, λ is the reg-

ularization weight that controls the trade-off between

ﬁtting the training data well and keeping the model

simple to avoid overﬁtting, n

is the number of items

rated by user i, which weighs the user’s feature vec-

tor, n

is the number of users who have rated item j,

which weighs the item’s feature vector.

Loss function with efﬁcient weighted regulariza-

tion controls the complexity of the model and pre-

vents overﬁtting by penalizing large values of the user

and item feature vectors.

ALS with Weighted-λ-Regularization is highly

suitable for large-scale datasets because of its abil-

ity to efﬁciently handle sparse user-item matrices by

focusing on observed interactions, reducing memory

requirements, and allowing for parallel computation.

4.4 Content-Based Filtering

Content-Based Filtering is a method used by recom-

mendation systems to suggest items to users based on

the characteristics of the items themselves rather than

on the user’s interaction with other users (Van Me-

teren and Van Someren, 2000). This method uses

item features (like overview, genre, director, cast in

movies) to recommend items similar to what the user

has liked and positively rated in the past (Philip et al.,

2014).

Several algorithms are commonly used in content-

based recommendation systems. TF-IDF is cho-

sen over traditional techniques because it provides a

more sophisticated way to evaluate the importance

of words (or terms) in the content (Van Meteren

and Van Someren, 2000). Unlike simple frequency

counts, TF-IDF accounts for the rarity of terms across

all documents, thus giving higher weight to terms that

are unique to a particular item (Permana and Wibowo,

2023). This is crucial in differentiating items with

similar but not identical content, as common terms do

not overly inﬂuence the similarity score.

4.5 Collaborative Based Filtering

Collaborative ﬁltering functions, as a recommenda-

tion system algorithm, forecasts a user’s preferences

by considering the preferences of users (Hameed

et al., 2012). It operates on the premise that if users A

and B share viewpoints on an item, it is probable that

A will align with B’s perspective on another item that

A has not yet encountered (Wu et al., 2018) (Konstan

and Riedl, 2012). By analyzing user item interactions

like ratings or viewing history, the algorithm detects

patterns and resemblances among users or items (Ni

et al., 2021) (Goyani and Chaurasiya, 2020). This ap-

proach enables tailored recommendations by tapping

into the preferences of the user community, making it

widely adopted in suggesting movies, music, and var-

ious products. Figure 1 illustrates the mechanisms of

collaborative and content-based ﬁltering techniques.

Collaborative ﬁltering recommends items by identify-

ing patterns among similar users, while content-based

ﬁltering suggests items based on their similarity to

content previously liked by the user.

4.6 Hybrid Filtering

A Hybrid ﬁltering algorithm enhances recommenda-

tion systems by merging collaborative and content-

KDIR 2024 - 16th International Conference on Knowledge Discovery and Information Retrieval

468

Figure 1: Comparison of collaborative and content-based

ﬁlterings.

based ﬁltering strategies leveraging the strengths of

each to compensate for their shortcomings (Goyani

and Chaurasiya, 2020)(Sharma et al., 2022). This

strategy integrates the Singular Value Decomposition

(SVD) technique, which forecasts user preferences

based on patterns, in user item interactions with TF-

IDF which examines item content to gauge its sig-

niﬁcance (Burke, 2002)(Thorat et al., 2015). By

merging the personalized forecasts of SVD and the

content speciﬁcity of TF-IDF, the hybrid model pro-

vides varied and thorough recommendations effec-

tively tackling issues, like the cold start dilemma and

enhancing recommendation accuracy (Parthasarathy

and Sathiya Devi, 2023).

4.7 SHERPA

The proposed Smart Hybrid Enhanced Recommen-

dation and Personalization Algorithm (SHERPA) is

a recommendation system that intelligently combines

the strengths of two methods: Alternating Least

Squares (ALS) with Weighted Regularization for col-

laborative ﬁltering, and Term Frequency-Inverse Doc-

ument Frequency (TF-IDF) for content-based ﬁlter-

ing, as shown in Figure 2.

By utilizing ALS with Weighted-λ-

Regularization, SHERPA focuses on implicit

data like known ratings and handles sparse data by

optimizing matrix factorization process with loss

function to avoid overﬁtting problem by computing

independently user and item matrices across multiple

processors or nodes in a cluster. At the same time,

the incorporation of TF-IDF allows SHERPA works

on explicit data by assigning weights ( ’Overview’ -

45%, ’Genre’ - 25%,’Director’ - 15%, ’Cast’ - 15%)

to movie attributes based on their importance in a

document. This weighting scheme helps identify

the most distinctive and relevant terms for each

Figure 2: SHERPA Recommendation System Architecture.

document and transforms text-based movie attributes

into numerical vectors. This vectorization allows the

system to quantify and compare movie characteristics

mathematically.

This dual strategy working on both implicit and

explicit data enables SHERPA to effectively handles

large datasets, supports scalability and parallelization.

it addresses the limitations of traditional methods to

deliver more relevant recommendation and enhancing

user satisfaction.

5 EXPERIMENT AND

EVALUATION

To demonstrate the capabilities of the proposed

SHERPA algorithm, we implemented a series of ex-

periments. In the experimental setup, a dual-core pro-

cessor and at least 2 GB of RAM are essential for gen-

eral system operation. For the computationally inten-

sive tasks of training and test, a GPU with a minimum

of 2 GB of VRAM is necessary. Examples of suitable

GPUs include the NVIDIA GTX 1050 or higher-end

models.

5.1 Evaluation Metric

Root Mean Square Error (RMSE) is a standard way

to measure the error of a model in predicting quan-

titative data (Hyndman and Koehler, 2006). It’s par-

ticularly useful in recommender systems to evaluate

A Smart Hybrid Enhanced Recommendation and Personalization Algorithm Using Machine Learning

469

the difference between predicted and actual ratings.

RMSE provides a way to quantify the magnitude of

prediction errors, taking the square root of the aver-

age squared differences between the prediction and

the actual observation. The formula of RMSE is:

RMSE =

∑

i=1

− a

)

, (6)

where p

represents the predicted value for the ith in-

stance, a

is the actual value for the ith instance, N is

the total number of instances.

A lower RMSE value indicates a better ﬁt of

the model to the data. It’s especially effective in

highlighting the impact of large errors, given that it

squares the differences before averaging. However, it

should be noted that RMSE can be sensitive to outliers

and might not be well-suited if the error distribution

is not uniform.

In the context of our paper, RMSE will serve as

a key indicator of the accuracy of our recommenda-

tion system’s predictions, allowing us to ﬁne-tune the

algorithm for optimal performance.

5.2 Evaluation Scenarios

We have designed two distinct scenarios to evaluate

the performance of the SHERPA algorithm. One is

designed for the existing users and the other is for new

users. These scenarios are constructed to evaluate the

system’s responsiveness to each user’s unique needs

whether they’re browsing casually or conducting spe-

ciﬁc searches based on their past interactions.

5.2.1 For Existing Users

For existing users, we designed two different scenar-

ios to evaluate the proposed algorithm. One is to rec-

ommend movies to existing users who log in but do

not conduct any search; the other is to recommend

movies to existing users who log in and search a key

word.

Existing User Log in and Without Search.

When an existing user logs in without conducting any

searching, the system uses their interactions to rec-

ommend movies. Since the user is simply browsing,

collaborative ﬁltering is used. This involves the algo-

rithm analyzing the activities of users, with interests

and suggesting movies that those users have enjoyed.

The following are the recommendation results

from Hybrid and SHERPA approaches, the top 10

movies for existing user id = 401047 and without

search keyword:

HYBRID Results:

1. Unknown Pleasures

2. The Swindle

3. Saint Sinner

4. Lone Wolf and Cub: Baby Cart in Peril

5. Die Hard 2: Die Harder

6. Seems Like Old Times

7. Kati Patang

8. Korn: Deuce

9. Hocus Pocus

10. The Usual Suspects

SHERPA Results:

1. Mel Gibson’s Passion of the Christ

2. The Best of Friends: Vol. 4

3. Stargate SG 1: Season 7

4. The Winds of War

5. Stargate SG 1: Season 8

6. Friends: Season 6

7. Alias: Season 3

8. 24: Season 1

9. CSI: Season 3

10. Shania Twain: Up Close and Personal

Existing User Log in and Search with Keyword.

When an existing user logs in and searches for a term

like ”The Company”, the system transitions to the rec-

ommendation method. It combines the user’s data

with the search query to suggest options that cater not

only to popular choices or similar users but also to

results directly related to the search term.

The following are the recommendation results

from Hybrid and SHERPA approaches, the top 10

movies for existing user id = 401047 and with search

keyword ”The Company”:

HYBRID Results:

1. Center Stage

2. Ballet Favorites

3. Expo: Magic of the White City

4. A Raisin in the Sun

5. Robin and the 7 Hoods

6. Unknown Pleasures

7. Out of Sync

8. Orchestra Rehearsal

9. Category 6: Day of Destruction

10. The Usual Suspects

SHERPA Results:

1. Center Stage

2. Ballet Favorites

3. Expo: Magic of the White City

4. A Raisin in the Sun

5. Robin and the 7 Hoods

6. Swan Lake: Tchaikovsky (Matthew Bourne)

7. Out of Sync

8. Orchestra Rehearsal

9. Category 6: Day of Destruction

10. What Have I Done to Deserve This?

KDIR 2024 - 16th International Conference on Knowledge Discovery and Information Retrieval

470

5.2.2 For New Users

New User Log in and Search with Keyword: When a

new user looks up a term like ”The Company” with-

out any viewing history, the algorithm uses content-

based ﬁltering. This approach analyzes factors such

as genre, storyline, and actors of the movie to sug-

gest movies with similar content to ”The Company”.

The aim is to provide tailored recommendations based

solely on the search query.

The following are the recommendation results

from Hybrid and SHERPA approaches, the top 10

movies for new user and with search keyword ”The

Company”:

HYBRID Results:

1. Center Stage

2. Ballet Favorites

3. Expo: Magic of the White City

4. A Raisin in the Sun

5. Robin and the 7 Hoods

6. Unknown Pleasures

7. Out of Sync

8. Orchestra Rehearsal

9. Category 6: Day of Destruction

10. The Usual Suspects

SHERPA Results:

1. Center Stage

2. Ballet Favorites

3. Expo: Magic of the White City

4. A Raisin in the Sun

5. Robin and the 7 Hoods

6. Swan Lake: Tchaikovsky (Matthew Bourne)

7. Out of Sync

8. Orchestra Rehearsal

9. Category 6: Day of Destruction

10. What Have I Done to Deserve This?

5.3 Results

In this Section, we compare the SHERPA algorithm’s

performance against traditional hybrid systems using

Root Mean Square Error (RMSE) metric across the

training, test, and validation datasets as detailed be-

low:

Table 1: The comparison of Hybrid and SHERPA algo-

rithms.

Models Training Test Validation

Hybrid 2.8289 2.9487 2.9492

SHERPA 0.8606 0.9039 0.9041

Improvement 69.6% 69.4% 69.3%

The comparison of the Hybrid and SHERPA al-

gorithms across training, test, and validation datasets

reveals signiﬁcant differences in their performance.

In the training dataset, the Hybrid model shows an

RMSE of 2.8289, indicating some challenges in un-

derstanding user preferences, while SHERPA im-

pressively reduces this to 0.8606, marking a sub-

stantial 69.6% improvement. Moving to the test

dataset, Hybrid exhibits an RMSE of 2.9487, sug-

gesting occasional inaccuracies, whereas SHERPA

achieves a more reliable RMSE of 0.9039, a 69.4%

enhancement. In the validation dataset, Hybrid scores

2.9492 in RMSE, highlighting room for improve-

ment, whereas SHERPA excels with an RMSE of

0.9041, showcasing consistent and reliable perfor-

mance.

SHERPA’s success is attributed to its ad-

vanced matrix factorization technique, weighted-λ-

regularization, parallelization for scalability, com-

putational efﬁciency, hybrid ﬁltering approach, and

continuous learning, which collectively result in a

70% improvement over traditional Hybrid algorithms.

SHERPA’s balanced approach ensures both technical

superiority and a more personalized recommendation

experience for users.

6 CONCLUSION

This paper introduced Smart Hybrid Enhanced

Recommendation and Personalization Algorithm

(SHERPA), an advanced machine learning algo-

rithm created to enhance and personalize the movie

recommendation process. By combining content-

based ﬁltering using TF-IDF and collaborative ﬁl-

tering through ALS with Weighted Regularization,

SHERPA has shown an improvement in recommen-

dation accuracy and user satisfaction.

Through analysis using metrics like RMSE,

SHERPAs performance compared to traditional hy-

brid models was highlighted. Notably SHERPA

achieved a decrease in prediction errors with en-

hancements of around 70% across training, testing

and validation datasets when compared to its pre-

decessor. This emphasizes the algorithms improved

capability to comprehend and forecast user prefer-

ences providing relevant content suggestions. More-

over, SHERPA’s innovative methodology tackles is-

sues seen in existing recommendation systems such

as overﬁtting and addressing the cold start problem.

This ensures a scalable solution that caters to user in-

teractions. Its proﬁciency in managing datasets and

customizing content based on user behaviors as well

as item traits sets a new standard in recommendation

system technology.

In summary, the SHERPA algorithm signiﬁes

A Smart Hybrid Enhanced Recommendation and Personalization Algorithm Using Machine Learning

471

a progression in recommendation systems. The

users content discovery experience is enhanced by

SHERPA, which also paves the way for advance-

ments in machine learning and artiﬁcial intelligence

research and development. In the changing world

personalized recommendation systems like SHERPA

play a crucial role in driving future innovations.

ACKNOWLEDGMENTS

This project is based upon work supported by the

U.S. National Science Foundation under Grant No.

2142503.

REFERENCES

Burke, R. (2002). Hybrid recommender systems: Survey

and experiments. User Modeling and User-adapted

Interaction, 12:331–370.

Chiny, M., Chihab, M., Bencharef, O., and Chihab, Y.

(2022). Netﬂix recommendation system based on tf-

idf and cosine similarity algorithms. In Proceedings of

the 2nd International Conference on Big Data, Mod-

elling and Machine Learning, pages 15–20.

Goyani, M. and Chaurasiya, N. (2020). A review of

movie recommendation system: Limitations, survey

and challenges. Electronic Letters on Computer Vi-

sion and Image Analysis, 19(3):0018–37.

Hameed, M. A., Al Jadaan, O., and Ramachandram, S.

(2012). Collaborative ﬁltering based recommendation

system: A survey. International Journal on Computer

Science and Engineering, 4(5):859.

Hyndman, R. J. and Koehler, A. B. (2006). Another look at

measures of forecast accuracy. International journal

of forecasting, 22(4):679–688.

Kadhim, A. I., Cheah, Y.-N., Hieder, I. A., and Ali, R. A.

(2017). Improving tf-idf with singular value decom-

position (svd) for feature extraction on twitter. In

3rd international engineering conference on develop-

ments in civil and computer engineering applications.

Konstan, J. A. and Riedl, J. (2012). Recommender systems:

from algorithms to user experience. User Modeling

and User-adapted Interaction, 22:101–123.

Netﬂix (2006). Netﬂix prize data on kaggle.com. Accessed:

2024-09-06.

Ni, J., Cai, Y., Tang, G., and Xie, Y. (2021). Col-

laborative ﬁltering recommendation algorithm based

on tf-idf and user characteristics. Applied Sciences,

11(20):9554.

Parthasarathy, G. and Sathiya Devi, S. (2023). Hybrid

recommendation system based on collaborative and

content-based ﬁltering. Cybernetics and Systems,

54(4):432–453.

Permana, A. H. J. P. J. and Wibowo, A. T. (2023).

Movie recommendation system based on synopsis us-

ing content-based ﬁltering with tf-idf and cosine simi-

larity. International Journal on Information and Com-

munication Technology, 9(2):1–14.

Philip, S., Shola, P., and Ovye, A. (2014). Application of

content-based approach in research paper recommen-

dation system for a digital library. International Jour-

nal of Advanced Computer Science and Applications,

5(10).

Pil

aszy, I., Zibriczky, D., and Tikk, D. (2010). Fast als-

based matrix factorization for explicit and implicit

feedback datasets. In Proceedings of the 4th ACM

conference on Recommender systems, pages 71–78.

Rahul, M., Kumar, V., and Yadav, V. (2021). Movie recom-

mender system using single value decomposition and

k-means clustering. In IOP Conference Series Ma-

terials Science and Engineering, volume 1022. IOP

Publishing.

Rajaraman, A. and Ullman, J. D. (2011). Mining of massive

datasets. Autoedicion.

Sharma, S., Rana, V., and Malhotra, M. (2022). Automatic

recommendation system based on hybrid ﬁltering al-

gorithm. Education and Information Technologies,

27(2):1523–1538.

Son, J. and Kim, S. B. (2017). Content-based ﬁltering

for recommendation systems using multiattribute net-

works. Expert Systems with Applications, 89:404–

412.

SurvyanaWahyudi, I., Affandi, A., and Hariadi, M. (2017).

Recommender engine using cosine similarity based on

alternating least square-weight regularization. In In-

ternational Conference on Quality in Research (QiR):

International Symposium on Electrical and Computer

Engineering, pages 256–261. IEEE.

Tak

acs, G. and Tikk, D. (2012). Alternating least squares

for personalized ranking. In Proceedings of the sixth

ACM conference on Recommender systems, pages 83–

90.

Thorat, P. B., Goudar, R. M., and Barve, S. (2015). Survey

on collaborative ﬁltering, content-based ﬁltering and

hybrid recommendation system. International Jour-

nal of Computer Applications, 110(4):31–36.

Van Meteren, R. and Van Someren, M. (2000). Using

content-based ﬁltering for recommendation. In Pro-

ceedings of the machine learning in the new informa-

tion age: MLnet/ECML2000 workshop, volume 30,

pages 47–56. Barcelona.

Widiyaningtyas, T., Ardiansyah, M. I., and Adji, T. B.

(2022). Recommendation algorithm using svd and

weight point rank (svd-wpr). Big Data and Cognitive

Computing, 6(4):121.

Wu, C. S. M., Garg, D., and Bhandary, U. (2018). Movie

recommendation system using collaborative ﬁltering.

In International Conference on Software Engineering

and Service Science (ICSESS), pages 11–15. IEEE.

Zhou, Y. H., Wilkinson, D., Schreiber, R., and Pan, R.

(2008). Large-scale parallel collaborative ﬁltering for

the netﬂix prize. In The 4th International Conference

on Algorithmic Aspects in Information and Manage-

ment, pages 337–348. Springer.

KDIR 2024 - 16th International Conference on Knowledge Discovery and Information Retrieval

472