In contrast, linear regression stands out as a strong
model, displaying remarkable accuracies above 90%
throughout the data sets. Annual spend is heavily
influenced by variables like uptime, and users are
more likely to devote their maximum time and
resources to mobile applications than to websites. In
summary, this study aims to contribute to the field of
forecasting analytics for e-commerce by a
comprehensive analysis of annual spending
forecasts. By providing strategic guidance and
actionable insights, it aims to empower E-Commerce
businesses to succeed in an increasingly competitive
world.
2 LITERATURE REVIEW
Research in expenditure forecasting for e-commerce
emphasizes the critical role of predictive analytics
methodologies. This research emphasizes on various
regression analysis to uncover the best expenditure
patterns. In a study [1], which shows how different
demographics of age, gender and marital status affect
consumer spending, the data was collected from the
state of Jammu and Kashmir, India with a total of 234
participants. This research shows young male with
marital status as single have a higher chance of e-
shopping Which helps the e-commerce website to
distribute advertisements accordingly. For owners to
understand how the revenue is distributed among
different categories, in this study [2], for e-commerce
sales forecasting the researcher builds a Directed
Acyclic Graph Neural Network (DAGNN). DAGNN
is used in deep learning for building neural network
in which the layers are presented as a directed acyclic
graph. A DAGNN can take inputs from multiple
layers and can give output to multiple layers. This
will be useful for long-term forecasts of product wise
daily sales revenue. The created forecasting will help
the owner to accurately predict the sales of the
product category for up to three months ahead. E-
commerce has helped both retailers and customers in
terms of cost, as demonstrated in study [3], which
examines how online shopping affects retailers'
selling prices and consumers' purchasing costs. The
study compared an online store with an offline store
and found that online shopping resulted in lower
costs for both retailers and consumers. This shows
that both retailers and customers have benefited from
the impact of e-commerce. In this study [4], research
was conducted for forecasting Walmart sales using
various machine learning models. The goal of this
research was to implement various machine learning
classification algorithms on the sales data of Walmart
stores present across the United States of America.
Algorithms used are Gradient Boosting, Random
Forest and Extremely Randomized Tree (Extra Tree)
and where compared using MAE evaluation R2
Score. This study shows Random Forest performs the
best as compared to other algorithms with the highest
R^2 accuracy of (0.94) and minimum MAE value of
(1979.4). Research [5] discussed various machine
learning algorithms which are commonly used in
sales forecasting, aiming to find the best machine
learning model with a better business understanding.
Algorithms on which the research was conducted are
Random Forest, Support vector machine, Decision
trees, Naïve bayes and Neural networks. The selected
algorithms are compared based on their accuracies.
The study shows Random Forest has the highest
accuracy score of 85℅, making it the most suitable
for sales prediction. In this study [6], research was
conducted to predict the sales of products based on
different factors like past history, seasonal trends,
location and festivities, with the help of machine
learning algorithms. Researchers selected five
algorithms KNN, NV (Naïve bayes), SVR, RF
(Random Forest) and MLR (Multiple linear
regression). Selected algorithms are compared based
on their Root Mean Square Error (RMSE) value.
After evaluating different algorithms, the researchers
found out MLR gives the most accurate results with
an RMSE value of 1.32, which is the lowest as
compared to other algorithms, followed by SVR, RF
and KNN with RMSE values of 2.35, 2.51 and 2.58
respectively. The algorithm that performed the worst
was NV with an RMSE value of 7.02.
3 METHODOLOGY
This paper contains the predictive analysis on the E-
commerce dataset with different Machine Learning
models to analyze the best model for regression
analysis.
3.1 Flowchart of Model
This article analyzes two datasets to get insight
into E-commerce clients' spending habits. This study
investigates numerous features such as session
length, app/website usage, membership term, and
annual spending to find hidden underlying patterns.
In Figure 1, the Model examines the e-commerce
dataset in a systematic manner. Initially, relevant
statistics are acquired and checked to ensure their