processed data from the brief records of more than
300,000 patients over a four-year period through a
computerized CNN model to analyze the disease
problems that may have occurred after them, which
demonstrated that there is a great potential for machine
learning in the medical field (Cheng, et al. 2016).
Similarly, Chio et al. in the same year, utilized another
deep learning approach RNN to predict the symptoms of
heart failure in patients (Choi, et al. 2017). This again
shows that machine learning has more application
scenarios in the medical field. Moreover, apart from the
medical field aspect, machine learning can also work on
the prediction of energy utilization. Jun Wang et al.
accurately predicted the output power of a photovoltaic
power generation system using a combination of
algorithms and neural networks, which provided a great
deal of help to the researchers (Yu & Xu 2014). In
addition to this, Azad, Md Shawmoon et al. combined
the Theory of Planned Behavior (TPB) and machine
learning methods to construct a new prediction model
that can make predictions about consumer purchasing
behavior on social platforms, through which they went
to summarize what are the main factors that consumers
care about in their minds. These principles and methods
combined with the modeling approach have a greater
auxiliary role in the management and sales of products.
The main objective of this paper's work is to find
out the weights of these factors affecting the positive
reviews of the game by processing the dataset from
the Steam platform which contains multiple factors,
and to analyze the data by finding a better model to
ultimately give the game companies and others a way
to understand the player's intention and make better
decisions (Shawmoon, et al. 2023). In this paper, the
authors use a variety of machine learning algorithms,
including logistic regression, SVM, decision trees,
random forests, XGBoost, and LightGBM, to process
the sample dataset with these machine algorithmic
models in turn, and to compare the differences in the
performance of the above seven models by using the
metrics of accuracy and area under the curve (AUC)
in order to determine the most effective algorithm
model. After the experiments it can be seen that the
LightGBM model is the most effective among these
models. Specifically, from the data, LightGBM has
an accuracy of 85.62% and an AUC of 76.95%. This
experimental result shows that LightGBM
outperforms other models in these evaluation criteria.
The model is intended to serve as a strategic tool for
game developers and companies, and helps them to
make quick and informed decisions, ultimately
reducing investment risks. In addition, it provides
valuable insights for community administrators to
manage the player community more effectively, thus
contributing to the overall maintenance and
development of the gaming community.
2 METHODOLOGY
2.1 Dataset Description and
Preprocessing
This study uses a dataset called "Steam Store Games"
available on the Kaggle platform (Kaggle 2019).
Collecting nearly 27,000 games from the Steam Store
and SteamSpy APls, the dataset provides information
on various aspects of the games in the Steam store,
such as the type of game, number of owners, etc.,
developer and publisher information, game tags,
prices, and other characteristics. After that, the authors
fix a few problems with the dataset by removing rows
that had ratings missing, modifying the types of
columns, dealing with duplication, etc. Some games,
for instance, require having their missing user rating
data fixed before they could be analyzed.
2.2 Proposed Approach
This study uses a comparative systematic approach to
determine the most effective model for predicting
favourable reviews of game genres. A key aspect of
paper data processing approach is the introduction of
a crucial classification rule, where games receiving a
positive rating above 90% are designated as "positive
games." This categorization is instrumental in
streamlining the dataset for a more focused analysis,
particularly targeting games with favorable user
reception. Firstly, the dataset is subjected to
preprocessing and preliminary analysis, which is vital
in uncovering significant correlations among the data.
This process was important in determining what key
data elements were needed for further in-depth
analysis. Next, the authors used seven different
machine learning predictive models for their analysis.
Each model was rigorously evaluated using two
performance metrics: accuracy and the AUC. To
facilitate a thorough understanding of the
effectiveness of each model, author visualizes the
predictive results. Thirdly, the research involves a
comparative analysis of these models. The aim is to
pinpoint the model that most accurately predicts
positive reviews within the specific context of game
genres. This comparative approach is pivotal in
ensuring a data-driven and systematic selection of the
optimal predictive model, aligning with stated
research objectives. The process is shown in the
Figure 1.