Figure 1: UCB Algorithm results.
Figure 2 displays the cumulative regrets and
cumulative rewards based on four exploration
coefficients ๐ฝ in the LinUCB algorithm under the
given dataset. It can be observed that the performance
is poorest when ๐ฝ =2, while ๐ฝ =1 and ๐ฝ =4 show
relatively better performance. With the given number
of steps ๐, an average reward of above 3.5 can be
achieved.
Figure 2: LinUCB Algorithm results.
Figure 3 illustrates the cumulative regrets and
cumulative rewards of the mixed algorithm
mentioned earlier, including the results of the UCB
algorithm to aid in comparing algorithm performance.
It can be observed that the Weighted Averaging
approach performs significantly worse than other
algorithms, indicating that the allocation of
exponential weights has a significant impact on
algorithm results. The Dynamic Adjustment
Approach is the optimal algorithm in this
environment, achieving an average reward of
approximately 4 per round by adopting better results
every 1000 steps, making it the highest average
reward.
Figure 3: Muti approaches comparison.
4 CONCLUSIONS
In summary, this paper proposes a novel
recommendation system based on the Multi-Armed
Bandit model, integrating the UCB and LinUCB
algorithms. The system dynamically balances
exploration and exploitation to effectively optimize
user satisfaction and system performance, thus
addressing the challenges of personalized
recommendation.
Through empirical evaluations, the effectiveness
and feasibility of the hybrid algorithm in various
scenarios have been demonstrated. However, there
are still research deficiencies, such as the inability of
the context-aware hybrid algorithm to effectively
demonstrate its advantages in environments where
the data is complete. Future research could explore
more advanced techniques, such as integrating
additional contextual information or enhancing the
system's adaptive learning capabilities. Additionally,
studying real-world applications and user research
will provide valuable insights into the practical utility
and user acceptance of the recommendation system.
REFERENCES
Linden, G., Smith, B., & York, J. 2003. Amazon.com
recommendations: Item-to-item collaborative filtering.
IEEE Internet Computing, 7(1), 76-80.
Hill, W., Stead, L., Rosenstein, M., & Furnas, G. 1995.
Recommending and evaluating choices in a virtual
community of use. In Proceedings of the SIGCHI Conf.
on Human Factors in Computing Systems (CHI '95).
ACM Press/Addison-Wesley Publishing Co., 194โ201.
Resnick, P., Varian, H. R. 1997. Recommender systems.
Communications of the ACM, 40(3), 56-58.
Bubeck, S., Cesa-Bianchi, N. 2012. Regret analysis of
stochastic and nonstochastic multi-armed bandit
problems. Foundations and Trendsยฎ in Machine
Learning, 5(1), 1-122.
Gangan, E., Kudus, M., & Ilyushin, E. 2021. Survey of
multiarmed bandit algorithms applied to
recommendation systems. International Journal of
Open Information Technologies, 9(4), 12-27.
Chu, W., Li, L., Reyzin, L., & Schapire, R. 2011.
Contextual Bandits with Linear Payoff Functions.
Proceedings of the Fourteenth International Conf. on
Artificial Intelligence and Statistics, in Proceedings of
Machine Learning Research, 15, 208-214.
Lattimore, T., Szepesvรกri, C. 2020. Bandit algorithms.
Cambridge University Press.
Wang, D., Men, C., & Wang, W. 2022. A contextual multi-
armed bandit recommendation algorithm for nuclear
power. Journal of Intelligent Systems, 17(03), 625-633.