perspectives on feature engineering and model
interpretability, adding nuanced layers to the
evolving landscape of stock prediction research
(Zhang and Chen 2013 & Wu and Li 2017).
The study aligns seamlessly with these
foundational perspectives, as we embark on the task
of forecasting Microsoft stock trends. We employ
these crucial features as key variables in both the
training and testing phases of our predictive models.
The overarching goal is to contribute substantively to
the ongoing discourse in financial forecasting. By
drawing insights from the methodological nuances
and findings of the referenced works, we aim to
enrich our investigation and emphasize the
multifaceted nature inherent in the art of stock price
prediction. The carefully curated literature,
encompassing more than eight prominent references,
provides a robust and diverse foundation for our
research, encapsulating a spectrum of methodologies
and perspectives within the dynamic and ever-
evolving field of stock market prediction.
2 METHODS
2.1 Data Source
This section details the methodology employed for
forecasting Microsoft stock trends. The dataset
utilized for this study spans from 1986 to 2023 and is
sourced from Kaggle (Smith et al 2020), providing a
comprehensive repository of historical stock data.
The overarching goal is to leverage machine learning
techniques, including proximity models, random
forests, and Support Vector Regression (SVR), to
predict stock prices.
2.2 Method Introduction
2.2.1 k-NN
K-Nearest Neighbors (k-NN) is a machine learning
algorithm classified under the category of proximity
models, specifically designed for regression tasks,
including stock price prediction. The core principle of
k-NN revolves around predicting the value of a data
point based on the average or weighted average of its
k nearest neighbors. In the context of stock
prediction, this translates to assessing historical data
points that closely resemble the current data point in
terms of features such as opening price, high, low,
and trading volume.
In practice, the k-NN algorithm involves
identifying the k data points in the training set that are
closest to the current data point. Subsequently, it
predicts the target value, which in this case is the
stock price, based on the average or weighted average
of the target values of these k neighbors. However,
it's essential to note that the choice of k is a critical
parameter that influences the model's sensitivity to
outliers, and careful consideration is required.
Additionally, k-NN is sensitive to the scale of
features, often necessitating the normalization of data
for optimal performance.
2.2.2 Random Forest
Random Forest stands out as a prominent ensemble
learning method widely utilized for regression tasks,
offering enhanced predictive accuracy and robustness
against overfitting. This approach involves
constructing multiple decision trees, each based on a
different subset of the data and features. Through a
process of voting or averaging, the predictions of
these individual trees are combined to generate a final
forecast. In the realm of stock prediction, Random
Forest leverages the collective intelligence of these
diverse trees, providing a more reliable and stable
prediction model.
The working principle of Random Forest
encompasses the creation of an ensemble of trees,
each utilizing different subsets of the data and
features. The diversity introduced through this
ensemble approach contributes to the model's
resilience against overfitting. Moreover, Random
Forests offer insights into feature importance, aiding
in the interpretation of the underlying patterns
influencing stock prices.
2.2.3 SVR
Support Vector Regression (SVR) emerges as a
potent regression technique, extending the principles
of support vector machines to predict continuous
outcomes. Particularly adept at capturing complex,
nonlinear relationships in data, SVR becomes
invaluable in the intricate task of stock price
prediction. The essence of SVR involves
transforming input data into a high-dimensional space
using a kernel function, followed by the identification
of a hyperplane that best fits the transformed data.
This hyperplane maximizes the margin between data
points and the regression hyperplane, enabling SVR
to navigate intricate patterns in stock data.
Key considerations in SVR implementation
include the choice of kernel, where common options
include linear, polynomial, and radial basis function
(RBF) kernels. Additionally, the regularization
parameter (C) plays a pivotal role in balancing fitting