3.4 Comparation of the Two Algorithms
After the apply the algorithm individually, I need to
compare the performance of the EUCBV and UCB
algorithm to have a conclusion. The experiment
explored the cumulative regret between the UCB,
EUCBV, Thompson Sampling algorithms, and get
the result as follow:
Figure 8: Cumulative regret comparison.
As the figure shows that, for one dataset, the four
algorithms performed different result of the regret. By
analysing the figure 8, I can get that the EUCBV
algorithm has the lowest reget. According to the
experimental findings, the EUCBV algorithm
outperforms the UCB algorithm when it comes to
movie suggestion. In particular, the EUCBV
algorithm can converge to the neighbourhood of the
optimal solution faster over long time scales and earn
higher cumulative rewards at the same time scale. The
primary reason for this is that the EUCBV algorithm
takes into account the reward's variance in order to
more precisely assess the action's uncertainty and
prevent the UCB algorithm's potential over-
exploration issue.
The final result shows that both the UCB and
EUCBV algorithm can make contribution to the
movie recommendation system. Although there is a
gap in overall performance between two algorithm,
both of them effectively balanced the exploration and
the exploitation and let the final rating regret tend to
a stable value. The matches can consider to use them
in the movie recommendation system, which can
helps them make better decision to recommend what
kind of movies to the audience to get the higher rating
and income.
4 CONCLUSION
All in all, in this study, I introduce and utilize two
algorithms, UCB and EUCBV, in the context of
Multi-Armed Bandit (MAB) problems. Through
theoretical analysis and application to real-world
scenarios, I have obtained the following key findings:
Start with the dataset’s size, if using the large-
scale problem, the performance of the EUCBV is
much better than the traditional UCB algorithm. In
addition, the EUCBV algorithm has better flexibility
and adaptability, which can make up for the drawback
of the traditional UCB algorithm.
The second is that by applying the algorithm to the
movie rating dataset, I have found the potential of the
UCB and EUCBV algorithm to make contributions to
improving the movie recommendation system. By
applying the algorithm to the recommendation system,
it will improve the effectiveness of decision making
that recommend what genres of movie to the audience.
Not only can it improve the avenge rating of the
movie, but also save the time for the system to decide
which genres of movie do the audience prefer most.
Despite the improved performance of EUCBV
compared to UCB, further research and development
are still needed. What I need to do is to explore and
study the arguments of the EUCBV algorithm, and
apply it in different cases to test its effectiveness.
What’s more, the further study can force on exploring
more efficient algorithm to enhance the efficacy and
efficiency in addressing the challenge of solving the
Multi-Armed Bandit problem.
REFERENCES
Auer, P., Cesa-Bianchi, N., & Fischer, P, 2002. Finite-time
analysis of the multiarmed bandit problem. Machine
Learning, 47(2-3), 235–2561.
Jiantao, J, 2021. UCB-VI Algorithm. Theory of Multi-
armed Bandits and Reinforcement Learning. EE 290.
University of California Berkeley.
Kakade, S., n.d. Hoeffding, Chernoff, Bennet, and
Bernstein Bounds. Statistical Learning Theory. Stat
928. University of Washington.
Kaufmann, E., Cappé, O., & Garivier, A, 2012. On
Bayesian Upper Confidence Bounds for Bandit
Problems. Artificial Intelligence, 22, 592–600.
MIT OpenCourseWare., n.d. Sub-Gaussian Random Variables.
https://ocw.mit.edu/courses/18-s997-high-dimensional-sta
tistics-spring-2015/a69e2f53bb2eeb9464520f3027fc61e
6_MIT18_S997S15_Chapter1.pdf
MPA (Movie Picture Association), n.d. Driving Economic
Growth.