respect to the likes and dislikes. In particular, the
paper intends to show that by applying sensitivity
analysis to the dataset, it is possible to identify the
factors that play important roles to the popularity of
the video (Aprem & Anup, 2017). The following is
the outline of the paper. The following section is
about of discusses the Literature Review concerning
Sensitivity Analysis of YouTube videos. The third
section discusses the Methodology used to find the
results. The fourth section introduces the dataset and
its details. The fifth section gives detailed information
on the results and the analysis of the algorithms used.
The conclusion follows.
2 LITERATURE REVIEW
Data mining has been extensively examined in
YouTube, which is one of the most popular places for
user-generated content. Sensitivity and Sentiment
Analysis are two of the most popular peer-study
topics on YouTube. Despite its importance, trending
video analysis on YouTube has yet to be properly
examined. Many people have looked at the YouTube
recommendation system, but trending video analysis
still has a lot of room for improvement. Studies on the
popularity of videos have mainly focused on the
viewcount as a single metric (Zeni, Miorandi, & De
Pellegrini, 2016). But recently, other metrics have
rose to significant importance:
(a) According to studies, the YouTube
recommender uses the watchtime as a metric
for understanding how a video is popular
(Zeni, Miorandi, & De Pellegrini, 2016).
(b) The various meta-level attributes can be
used to build a conversion funnel to
characterize the impact of advertisement
campaigns. (Zeni, Miorandi, & De
Pellegrini, 2016; Abdulhadi Shoufan, 2019)
YouTube has become the most popular place to
watch videos online. Given the diversity of viewers
and content providers, it is difficult to determine the
popularity of the videos on the basis of the meta-level
attributes. Viral videos play an important role in
business marketing to reach target audience in a short
time-span (Gohar Feroz Khan & Sokha Vong, 2014).
Content creators can monetize their successful videos
through YouTube’s Partner program and enhance
their video popularity with the most sensitive meta-
level attributes like title, tag, thumbnail, etc. YouTube
uses a combination of measures to analyze and
provide a framework of understanding at different
levels.
Viewcount is an important popularity metric in
YouTube (Niyati Aggrawal, Anuja Arora, & Adarsh
Anand, 2018; Jussara M. Almeida, Flavio Figueiredo,
& Fabrício Benevenuto, 2011). Studies have
established that in the social dynamics setting, there
exists a causal relationship between the views and the
number of subscribers (William Hoiles, Anup Aprem,
& Vikram Krishnamurthy, 2017; Yan Duan &
Vikram Krishnamurthy, 2017). The Ordinary Least
Square Regression algorithm, which assesses the
relationship between one or more independent factors
and a dependent variable by minimizing the sum of
squares in the difference between the observed and
predicted values of the dependent variable defined as
a straight line, and the Stochastic Gradient Descent
Algorithm have been used.
3 METHODOLOGY
The main goal of this study is to determine how the
independent attributes affect the dependent attribute.
Likes and dislikes are the independent attributes that
have an impact on the dependent variable – view
count. To achieve this, two regression algorithms –
Ordinary Least Descent and Stochastic Gradient
Descent were used, and a model and a prediction
model is built (Quyu Kong, Marian-Andrei Rizoiu,
Siqi Wu, & Lexing Xie, 2018). The algorithms
determined the sensitivity of the view count against
the likes and dislikes and were then compared for
accuracy (Lau Tian Rui, Zehan Afizah Afif, & R. D.
Rohmat Saed, 2019).
Figure 1: Representation of Methodology.
The data was pre-processed, and the raw data was
transformed into efficient and usable data. The first
step for pre-processing was data cleaning. In this step,
all the inconsistent and incomplete data was removed.
The dataset which consisted of various categories,
was cleaned of every category except for Sports and
Media. In the second step, feature selection was
performed. For this research, only 4 attributes were
required – Category id, views, likes and dislikes. In a
predictive model, feature selection is the process of