Forecasting Annual Expenditure in E-Commerce
Abhijeet Yadav, Vinayak Shukla, Dilip Woad, Romil Raina and Himani Deshpande
Thadomal Shahani Engineering College, Bandra (West) Mumbai 400050, India
Keywords: E-Commerce, Expenditure, Support Vector Regression, Adaboost, XGBoost.
Abstract: With the rapid penetration of E-commerce in modern society, there is a need to have a holistic analysis of
annual expenditure forecasting in e-commerce, underscoring its importance to modern consumer behaviour
and economic growth. To achieve the same, this study explores seven machine learning based methodologies
namely linear regression, random forest, decision trees, K-Nearest Neighbors (KNN), AdaBoost, Support
Vector Regression (SVR), and XGBoost. Through an extensive examination of the effectiveness of each
model, this research aims to provide useful information on the effectiveness of used techniques towards
predicting yearly expenditures in a dynamic environment like e-commerce. These findings are important for
the stakeholders seeking to improve their management strategies in the e-commerce sector, where it is
necessary to understand consumers for sustainable development. Two different datasets namely Open Mart
and E-Mart are used, which provides expenditure data of various companies found within different regions
operating on e-commerce platforms. Among the used methodologies, the Linear Regression is found to be the
most efficient one on both datasets, with 97% and 89% prediction accuracy on the Open Mart dataset and the
E-Mart dataset, respectively. In contrast, Support Vector Regression (SVR) performs the worst on both the
datasets. In depth analysis of the datasets reveals a strong relationship between the increasing use of apps,
membership orders, and consumer expenditure overall. Thus, this study suggests that e-commerce companies
may increase revenue and consumer engagement by optimizing app usage and promoting membership
programs with the help of this insight.
1 INTRODUCTION
Accurate annual expenditure forecasting is essential
for businesses trying to navigate competitive markets
and maximize resource allocation in the ever-
changing world of e-commerce. This study explores
the field of e-commerce spending forecasting using a
multivariate regression analysis methodology that is
deemed reliable. This paper's primary goal is to
provide insights for businesses using in-depth
modelling and predictive analysis of e-commerce
projects. By understanding trends and drivers,
companies can make better decisions about budget
allocation, inventory, pricing strategy and marketing
campaigns.
In e-commerce, machine learning is beneficial
when there is dynamic pricing and it can raise your
KPIs. This is because of the ML algorithms'
capability to identify new patterns in data. Because
of this, those algorithms are always picking up new
knowledge and identifying new trends and needs.
This is why Machine Learning models are opposed
to straightforward price markdowns and are used by
online retailers in the e-commerce sector for dynamic
pricing. Predictive algorithms help online retailers to
find the best deal on a given product, which gives an
advantage to online retailers. The best pricing, which
also considers condition of the warehouse, can be
chosen, along with the offer and real-time discounts
displayed. In order to maximize sales and optimize
inventory, these are performed predictive modelling
of the consumer spending requires a thorough
investigation of the delicate relationships between
numerous elements and how those ties affect the
purchasing behaviour. This study is after
investigation revealed significant variables
influencing yearly spend, providing essential
information for companies trying to maximize their
marketing budgets.
Significantly, support vector regression (SVR)
performs noticeably worse, particularly for sleep,
which displayed a negative R2 score on one data set.
186
Yadav, A., Deshpande, H., Shukla, V., Woad, D. and Raina, R.
Forecasting Annual Expenditure in E-Commerce.
DOI: 10.5220/0013305100004646
Paper published under CC license (CC BY-NC-ND 4.0)
In Proceedings of the 1st International Conference on Cognitive & Cloud Computing (IC3Com 2024), pages 186-195
ISBN: 978-989-758-739-9
Proceedings Copyright © 2025 by SCITEPRESS Science and Technology Publications, Lda.
In contrast, linear regression stands out as a strong
model, displaying remarkable accuracies above 90%
throughout the data sets. Annual spend is heavily
influenced by variables like uptime, and users are
more likely to devote their maximum time and
resources to mobile applications than to websites. In
summary, this study aims to contribute to the field of
forecasting analytics for e-commerce by a
comprehensive analysis of annual spending
forecasts. By providing strategic guidance and
actionable insights, it aims to empower E-Commerce
businesses to succeed in an increasingly competitive
world.
2 LITERATURE REVIEW
Research in expenditure forecasting for e-commerce
emphasizes the critical role of predictive analytics
methodologies. This research emphasizes on various
regression analysis to uncover the best expenditure
patterns. In a study [1], which shows how different
demographics of age, gender and marital status affect
consumer spending, the data was collected from the
state of Jammu and Kashmir, India with a total of 234
participants. This research shows young male with
marital status as single have a higher chance of e-
shopping Which helps the e-commerce website to
distribute advertisements accordingly. For owners to
understand how the revenue is distributed among
different categories, in this study [2], for e-commerce
sales forecasting the researcher builds a Directed
Acyclic Graph Neural Network (DAGNN). DAGNN
is used in deep learning for building neural network
in which the layers are presented as a directed acyclic
graph. A DAGNN can take inputs from multiple
layers and can give output to multiple layers. This
will be useful for long-term forecasts of product wise
daily sales revenue. The created forecasting will help
the owner to accurately predict the sales of the
product category for up to three months ahead. E-
commerce has helped both retailers and customers in
terms of cost, as demonstrated in study [3], which
examines how online shopping affects retailers'
selling prices and consumers' purchasing costs. The
study compared an online store with an offline store
and found that online shopping resulted in lower
costs for both retailers and consumers. This shows
that both retailers and customers have benefited from
the impact of e-commerce. In this study [4], research
was conducted for forecasting Walmart sales using
various machine learning models. The goal of this
research was to implement various machine learning
classification algorithms on the sales data of Walmart
stores present across the United States of America.
Algorithms used are Gradient Boosting, Random
Forest and Extremely Randomized Tree (Extra Tree)
and where compared using MAE evaluation R2
Score. This study shows Random Forest performs the
best as compared to other algorithms with the highest
R^2 accuracy of (0.94) and minimum MAE value of
(1979.4). Research [5] discussed various machine
learning algorithms which are commonly used in
sales forecasting, aiming to find the best machine
learning model with a better business understanding.
Algorithms on which the research was conducted are
Random Forest, Support vector machine, Decision
trees, Naïve bayes and Neural networks. The selected
algorithms are compared based on their accuracies.
The study shows Random Forest has the highest
accuracy score of 85℅, making it the most suitable
for sales prediction. In this study [6], research was
conducted to predict the sales of products based on
different factors like past history, seasonal trends,
location and festivities, with the help of machine
learning algorithms. Researchers selected five
algorithms KNN, NV (Naïve bayes), SVR, RF
(Random Forest) and MLR (Multiple linear
regression). Selected algorithms are compared based
on their Root Mean Square Error (RMSE) value.
After evaluating different algorithms, the researchers
found out MLR gives the most accurate results with
an RMSE value of 1.32, which is the lowest as
compared to other algorithms, followed by SVR, RF
and KNN with RMSE values of 2.35, 2.51 and 2.58
respectively. The algorithm that performed the worst
was NV with an RMSE value of 7.02.
3 METHODOLOGY
This paper contains the predictive analysis on the E-
commerce dataset with different Machine Learning
models to analyze the best model for regression
analysis.
3.1 Flowchart of Model
This article analyzes two datasets to get insight
into E-commerce clients' spending habits. This study
investigates numerous features such as session
length, app/website usage, membership term, and
annual spending to find hidden underlying patterns.
In Figure 1, the Model examines the e-commerce
dataset in a systematic manner. Initially, relevant
statistics are acquired and checked to ensure their
Forecasting Annual Expenditure in E-Commerce
187
validity and applicability to real-world settings.
Second, the data is cleaned and organized to meet the
analytical requirements. Finally, numerous machine
learning algorithms are used to extract useful insights
from preprocessed data. The performance of these
algorithms is then rigorously assessed using several
assessment metrics. Finally, we compared and
analyzed the benefits and drawbacks of each
algorithm. Finally, our research presents the findings
from this systematic procedure, demonstrating the
effectiveness of various algorithms in identifying
patterns and practices in e-commerce. This sequential
procedure directs our inquiry and facilitates the
systematic analysis of e-commerce trends and
patterns.
Fig.1. Flow of Machine Learning Model.
The datasets used are readily available on platforms
like Kaggle and Github under the name ‘Ecommerce
Customers’, with no missing values for any features.
3.2 Exploratory Dataset Analysis
This section of the paper gives an insight to two
publicly available datasets which concerns patterns in
expenditure of buyers. Open Mart dataset which has
eight attributes out of which three are object data type
and rest are float data type. Email, Address and Avatar
are object data type and ‘Avg, Session Length’, ‘Time
on App’, ‘Time on Website’, ‘Length of Membership’
and ‘Yearly Amount Spent’ are float data types and
also an important numeric feature for further analysis.
The E-mart dataset has only numeric dataset, it has
total five numeric features namely Session length,
App Screentime, Website Screen Time, Number of
purchases and Yearly Amount Spent. Total number of
datapoints in Open mart and E-mart are 500 and 1000
respectively.
E-mart dataset has additional feature named ‘Number
of Purchases’ which is not available in Open Mart
dataset. Fig. 2 and Fig. 3, demonstrates the features of
both the datasets using histograms where x-axis
shows the time (in minutes) and y-axis shows number
of customers engaged in Session, App and Website
while purchasing. X-axis in the attribute ‘Length of
Membership’ shows time (in months) and for
‘Number of Purchases’ it is the total item purchased
by a specific user.
Fig .2. Histograms of different features in Open Mart
Dataset
Fig. 2(a) shows that the length of the sessions ranged from
about 20 to 40 minutes. Fig. 2(b) depicts that only few
people stay members for longer than five months, which
indicates that Open Mart is having trouble keeping clients
for long time. Fig.2 (c) and (d) shows consumers are
spending more time on the website than on the app. This
(a) Histogram of Avg. Session Length
(b) Histogram of Length of Membership
(c) Histogram of Time on Website
(d) Histogram of Time on App
IC3Com 2024 - International Conference on Cognitive & Cloud Computing
188
paper also analyses whether people are actually spending
more money on websites or apps with respect to Yearly
Amount Spent.
Fig. 3(b) shows that the length of the sessions ranged from
about 20 to 50 minutes. Fig. 3(b) shows that E-mart
customers are buying more items which indicates higher
shopping activity. Fig.3 (c) and (d) shows consumers are
spending even more time on the website and app.
(a) Histogram of Number of Purchases
(b) Histogram of Session Length
(c) Histogram of App Screentime
(d) Histogram of Website Screentime
Fig. 3. Histograms of different features in E-Mart Dataset
3.3 Identifying Correlation among
features
The Pearson correlation coefficient (r) is the method
to calculate the strength and direction of the linear
relationship between two variables in a correlation
analysis. The correlation coefficient can be
represented as in eq. (1):
𝑟=
(𝑥
−𝑥̅)(𝑦
−𝑦)
(𝑥
−𝑥̅)
∗∑(𝑦
−𝑦)
(1)
To identify the best correlating attribute, performed
correlation among the numeric features and
calculated coefficient of correlation(r). The
categorization of correlation coefficient for the
analysis is given below
r = 0 to 1 (Positively correlated)
r = 0 (No correlation)
r = -1 to 0 (Negatively Correlated)
Tabl e 1. Correlation table of E-commerce Open Mart
Dataset
Features R value
Length of Membership 0.8090
Time on App 0.4993
Avg. Session Length 0.3550
Time on Website -0.0026
Table 1 shows strong positive correlation of ‘Yearly
Amount Spent’ with ‘Length of Membership’ and
‘Time on App’ which shows Open Mart is
emphasizing on app optimization. While ‘Avg.
Session Length’ shows a moderate positive
correlation (0.3550). Least correlated feature is
‘Time on Website’ with r = -0.0026, it demonstrates
that website has no impact on annual spend.
Tabl e 2 . Correlation table of E-commerce E- Mart Dataset
Features R value
Number of Purchases 0.7791
App Screentime 0.6253
Session Length 0.6040
Website Screentime -0.3291
Table 2 shows strong positive correlation of ‘Yearly
Amount Spent’ with ‘Number of Purchases’ and
App Screentime which shows E-Mart is
emphasizing on product optimization. While
‘Session Length’ also shows a good positive
correlation (0.6040). Least correlated feature is
‘Website Screentime’ with r = -0.3291, it
Forecasting Annual Expenditure in E-Commerce
189
demonstrates that a website has minimal impact on
annual spent.
Below Fig. (4) represent correlation heatmaps
depicting the associations between features within
their respective datasets in the Open Mart dataset.
Heat maps are used to visualize and display a
geographic distribution of data as it represent
different densities of data points on a geographical
map to help in better analytic understanding.
Fig.4. Heat-map of different features in Open Mart dataset
In Open Mart dataset, Length of Membership
exhibits a positive correlation with r = 0.890 with the
dependent variable, indicating a strong relationship
as membership duration increases. Similarly, Time
on App shows a moderate positive correlation,
suggesting its influence on the dependent variable.
Time on Website demonstrates a negligible
correlation with the dependent variable.
Fig.5. Heat-map of different features in E- Mart dataset
Fig. (5) represents E-mart dataset, Number of
Purchases reveals a significant positive correlation
with the dependent variable. Moreover, both App
Screen Time and Session Length demonstrate
moderate positive correlations, indicating their
influence on the dependent variable. On the other
hand, Website Screen Time exhibits a negative
correlation with the dependent variable, suggesting
an inverse relationship.
An illustration of the relationship between a number
of independent attributes and our dependent feature
(annual amount spent) in the Open Mart Dataset in
Fig. (6).
Fig. (6), illustrates the strong positive correlation
between the parameters Length of Membership and
Time on App, while Time on Website exhibits the
most negative correlation. The features ‘Time on
App’ and ‘Length of Membership’ have a positive
correlation with ‘Yearly Amount Spent’ as the line
slopes upward from left to right. This implies that the
annual expenditure tends to increase together with
the increase of membership or app users. On the other
hand, if the line is horizontal that suggests no
correlation between variables as in Fig. 6 (b). Also, if
the data points cluster closely around the line, it
shows strong correlation as in Fig. 6 (a). Pictorial
representation of relationship between different
features with our dependent feature (Yearly Amount
Spent) for the E-mart Dataset in Fig. (7).
Fig. (7) illustrates the strong positive correlation
between the parameters App Screen Time and
Number of purchases while Website Screentime
exhibits the most negative correlation. The features
‘App Screentime and ‘Number of purchases’ have a
positive correlation with ‘Yearly Amount Spent’ as
the line slopes upward from left to right. This implies
that the annual expenditure tends to increase together
with the increase of quality product or app users.
Also, if the data points cluster closely around the line,
it shows strong correlation as in Fig. 7 (b) between
‘Number of Purchases’ and ‘Yearly Amount Spent’.
IC3Com 2024 - International Conference on Cognitive & Cloud Computing
190
(a) Scatterplot of Length of Membership
(b) Scatterplot of Time on Website
(c) Scatterplot of Time on App
(d) Scatterplot of Avg. Session Length
Fig.6. Scatterplot between different features in Open Mart
Dataset
(a) Scatterplot of App Screentime
(b) Scatterplot of Number of Purchases
(c) Scatterplot of Session Length
(d) Scatterplot of Website Screentime
Fig.7. Scatterplot between different features in E- Mart
Dataset.
Forecasting Annual Expenditure in E-Commerce
191
(a)
Violin Plot showing Open Mart Dataset
(b) Box Plot showing Open Mart Dataset
(c) Violin Plot showing E-Mart Dataset
(d) Box Plot showing E-Mart Dataset
Fig.8. Violin and Box plot depicting both datasets.
Fig (8). Shows the box and violin plot of respective
datasets where it shows important parameters like
mean, IQR (Inter-Quartile Range), minimum and
maximum value of datasets etc. Fig 8(a) and Fig. 8(b)
shows the customers spend 33 minutes a session on
average, with a median session length of 33.08
minutes. 25% of sessions last less than 32.34
minutes, according to the distribution, and 75% of
sessions last longer than 33.71 minutes. Customers
use the platform for an average of 12 minutes on the
mobile app and 37 minutes on the website, with
respective median times of roughly 12 and 37
minutes. With 25% of customers having a
membership term of less than 2.93 years and 75%
having a duration of more than 4.13 years, the length
of membership demonstrates a median duration of
almost 3.53 years. The study of E-mart dataset is
shown in Figs. 8(c) and 8(d), and it demonstrates that
customers spend about thirty minutes a session on
average, with an average of 12.21 minutes spent on
apps and 10.01 minutes spent on websites.
Furthermore, consumers spend $1778.73 annually on
purchases, or about 47.56 transactions annually on
average. It also shows that features like “Avg.
Session Length”, “Time on App”, “App Screentime”
the data cluster closely around their means, with
narrow IQR, indicating relatively consistent behavior
whereasTime on Website displays a wider spread
of values, suggesting more variability. “Length of
Membership" shows a moderate spread, with a longer
right tail indicating some customers with extended
memberships. "Yearly Amount Spent" exhibits a
right-skewed distribution, with most customers
spending lower amounts annually, but with notable
outliers spending significantly more.
3.4 Regression Methods
We have used seven important Machine Learning
models to perform regression on datasets namely
Linear Regression, Random Forest, Ada Boost,
Decision Tree, KNN, SVR, XG Boost.
a) Linear regression:
Linear Regression is a supervised ML model which
learns through labeled input and output data and used
for analysing the relationship between a dependent
variable and one or more independent variables as
expressed in equation (2). It predicts the best-fit line
to the data points., Linear Regression objective is to
minimize the difference between the observed and
predicted values.
𝑦=𝑎 𝑏𝑥 (2)
b) Random Forest Regression:
Random Forest is an ensemble learning approach that
successively constructs various ML models, such as
decision trees, during training and returns the mode
of classes for categorization or the average prediction
of the individual trees. It is an approach that
generates decision trees during training and then
combines their predictions. Overall, this prediction
improves model accuracy while reducing overfitting.
IC3Com 2024 - International Conference on Cognitive & Cloud Computing
192
c) AdaBoost
AdaBoost is an adaptive boosting algorithm that
combines multiple weak learners or ML models
sequentially to predict or classify. It is also an
ensemble learning technique which combines
multiple weak models sequentially to create a strong
model. It is specifically used for classification tasks.
d) Decision Tree
Decision Tree is a non-parametric supervised
learning method which means it does not make any
assumptions regarding the sample beforehand and
this method is used for classification and regression
and can handle both numerical and categorical data.
It has tree structure which consist of root node,
internal node and leaf nodes. Decision tree is mainly
used for classification purpose.
e) K-Nearest Neighbor (KNN)
KNN (K-Nearest Neighbors) is a non-parametric
technique that performs classification and regression
tasks without making any prior assumptions about
the sample. based on the average value or majority
class of its k closest neighbors in the feature space,
predicts the class or value of a new data point. It
employs many distance functions, such as the
Euclidean distance (represented in equation (3)), the
Manhattan distance (expressed in equation (4)), and
the Minkowski distance (expressed in equation (5)),
where q denotes the order of norm, to determine the
nearest neighbor.
𝐸.𝐷=
∑(
𝑥
−𝑦
)

(3)
𝑀.𝐷=
𝑥
−𝑦
|

| (4)
𝑀𝑖𝑛𝑘.𝐷.=((|𝑥
−𝑦
|)
)

/
(5)
Where 𝑥
and 𝑦
represents the coordinates of x and
y datapoints on 𝑖

dimension and q the order of the
distance.
f) Support Vector Regression
SVR (Support Vector Regression) is a supervised
machine learning model used in regression purpose.
It includes the concept of hyperplane which classifies
data points in two separate classes. It finds the
optimal plane also called hyperplane to separate or
classify two datapoints into two different classes. It
aims to find a function that has a maximum margin
of tolerance to the given data points. SVR is
expressed in equation (6).
𝑆𝑉𝑅=𝑀𝐼𝑁
∑(
𝑦
−𝑤
𝑥
)

(6)
g) XG Boost
The XG Boost technique optimizes a differentiable
loss function for each iteration, such as the mean
squared error (MSE) for regression. It constructs
several decision trees one after the other, fixing the
mistakes of the first tree. It uses an objective function
regularization term to penalize and prevent
overfitting.
3.5 Result
This section of the paper depicts the experimental
results using the selected regression methods. Results
are evaluated on Four performance metrics namely
Mean Squared Error (MSE), Mean Absolute Error,
Root Mean Squared Error, R2(R-Squared). Mean
squared error is defined as the average of the absolute
squared difference between the output value and the
predicted value. MSE is represented as in equation
(7). Mean Absolute Error is defined as the average of
the absolute difference between the actual output and
the predicted output. Mathematically, its represented
as in equation (8). Root Mean Squared Error is
defined as the square root of the average of the
squared difference between the output value and the
predicted value. It can be represented as in equation
(9). R2 score represents the proportion of the
variance in the dependent variable that is explained
by the independent variables in the model.
Mathematically it is represented as in equation (10).
𝑀𝑆𝐸=
1
𝑁
(𝑦
−𝑦
)

(7)
𝑀𝐴𝐸=
1
𝑁
|𝑦
−𝑦
|

(8)
𝑅𝑀𝑆𝐸=
1
𝑁
(𝑦
−𝑦
)

(9)
𝑅
=1
𝑆𝑆

𝑆𝑆

=1
∑(
𝑥
−𝑥
)
∑(
𝑦
−𝑦
)
(10)
Forecasting Annual Expenditure in E-Commerce
193
Where 𝑦
and 𝑥
represents the predicted outcome on 𝑖

dimension.
Table.3. Experimental results on Open Mart Dataset
Algorithm MSE RMSE MAE R2
Linear
Regression
109.86 10.48 8.55 0.97
Random
Forest
337 18.36 14.04 0.93
AdaBoost 579.09 24.06 19.70 0.88
Decision
Tree
802 28.32 22.21 0.83
KNN 465.62 21.57 16.63 0.90
SVR 5024.6
7
70.88 54.33 -0.014
XG Boost 262.18 16.19 12.36 0.94
As shown in table 3, with the lowest Mean Squared
Error (MSE) of 109.86, Root Mean Squared Error
(RMSE) of 10.48, and Mean Absolute Error (MAE)
of 8.55, as well as the highest R-squared value (97%
accuracy), which demonstrates its superior predictive
capability, illustrates that Linear Regression is the
most accurate model in Open Mart dataset. With an
accuracy percentage of almost 94%, an MSE of
262.18, an RMSE of 16.19, an MAE of 12.36, and
competitive performance, XG Boost comes in close
second. Support Vector Regression (SVR), on the
other hand, performs the worst and has the worst
error metrics: 5024.67 MSE, 70.88 RMSE, 54.33
MAE, and an extremely low R-squared value, all of
which amply demonstrate SVR's inability to make
accurate predictions.
Table.4. Experimental results on E- Mart Dataset
Algorithm MSE RMSE MAE R2
Linear
Regression
2693.2
7
51.89 41.42 0.89
Random
Forest
3981.4
2
63.09 51.09 0.84
AdaBoost 4720.3
0
68.70 54.34 0.81
Decision
Tree
7749.0
1
88.02 69.65 0.69
KNN 4452.1
7
66.72 53.37 0.82
SVR 17593 132.64 102.32 0.30
XG Boost 4082.9
9
63.89 51.13 0.83
Experimental results on E-mart dataset suggests that
Linear regression is the best performing model when
it is evaluated against various machine learning
algorithms. Linear Regression has the lowest Mean
Squared Error (MSE) of 2693.27 and highest R-
squared value (89% accuracy), the lowest Root
Mean Squared Error (RMSE) of 51.89, and the
Mean Absolute Error (MAE) of 41.42. This shows
that, in comparison to other models, it has better
predicted accuracy and precision. As shown in table
4, XG Boost, comes second best with an MSE of
4082.99, RMSE of 63.89, MAE of 51.13, and an
accuracy percentage of almost 83%. Support Vector
Regression performs the worst out of all the models.
It has the highest error metrics, with an MSE of
17593, an RMSE of 132.64, an MAE of 102.32, and
a very relatively low R-squared value of 30%
accuracy only, which shows that it is not very good
at generalizing the patterns in the dataset.
4 CONCLUSION
Linear Regression is the best model with the
accuracy of greater than 90% in both the dataset.
Furthermore, XG Boost performs second best, with
accuracy and precision that are closely behind that of
Linear Regression. Support Vector Regression
(SVR), on the other hand, performs the worst out of
all the models. Its inability to forecast outcomes and
capture dataset patterns is highlighted by its high
error metrics and low R-squared values (30%) in E-
mart and negative R-squared value in Open Mart
Dataset
. This could me mainly due to potential nonlinear
relationships between the set of input features
together and the output variable.
These findings show
model effectiveness for e-commerce prediction tasks
.
After analysing the nuances of several attributes
and how they relate to each other in the datasets,
membership duration and app usage time are the
most important and significant factors affecting
annual yearly spending. Forecasting annual
expenditure of customers shows they prefer app than
website for shopping, so to cater large audience,
companies should focus to improve working of apps
and also offer some discounts and incentives to their
loyal customers. E-commerce Companies should
focus to improve and work on functioning of their
website as it is inversely correlated with annual
expenditure. They should improve the user interface
and other functionalities of website to make it more
IC3Com 2024 - International Conference on Cognitive & Cloud Computing
194
user friendly. Number of purchases is highly
positively correlated with annual expenditure which
tells about their good product quality and trust they
built with customers in term of quality assurance.
REFERENCES
Bhat, S.A., Islam, S.B. and Sheikh, A.H., 2021. Evaluating
the influence of consumer demographics on online
purchase intention: An E-Tail Perspective. Paradigm,
25(2), pp.141-160.
Petroșanu, D.M., Pîrjan, A., Căruţaşu, G., Tăbușcă, A.,
Zirra, D.L. and Perju-Mitran, A., 2022. E-Commerce
Sales Revenues Forecasting by Means of Dynamically
Designing, Developing and Validating a Directed
Acyclic Graph (DAG) Network for Deep Learning.
Electronics, 11(18), p.2940.
Miyatake, K., Nemoto, T., Nakaharai, S. and Hayashi, K.,
2016. Reduction in consumers’ purchasing cost by
online shopping. Transportation Research Procedia, 12,
pp.656-666.
Elias, N.S., & Singh, S. (2018). FORECASTING of
WALMART SALES using MACHINE LEARNING
ALGORITHMS.
Singh, K., Booma, P.M. and Eaganathan, U., 2020,
December. E-commerce system for sale prediction
using machine learning technique. In Journal of
Physics: Conference Series (Vol. 1712, No. 1, p.
012042). IOP Publishing.
"E-Commerce Sales Prediction", International Journal of
Emerging Technologies and Innovative Research
(www.jetir.org), ISSN:2349-5162, Vol.7, Issue 4, page
no.986-992, April-2020,
Ji, S.; Wang, X.; Zhao, W.; Guo, D. (2018) "An Application
of a Three-Stage XGboost-Based Model to Sales
Forecasting of a Cross-Border e- Commerce
Enterprise." Math. Probl. Eng. 8503252[1].
Liu, J.; Liu, C.; Zhang, L.; Xu, Y. (2020)"Research on Sales
Information Prediction System of E-Commerce
Enterprises Based on Time Series Model." Inf. Syst. e-
Bus. Manag., 18, 823–836[1].
Li, J.; Cui, T.; Yang, K.; Yuan, R.; He, L.; Li, M (2021)
"Demand Forecasting of E-Commerce Enterprises
Based on Horizontal Federated Learning from the
Perspective of Sustainable Sustainability., 13, 13050[3].
Zhang, X. (2020) "Prediction of Purchase Volume of Cross-
Border e- Commerce Platform Based on BP Neural
Network." Comput. Intell. Neurosci., 3821642[3].
Turóczy Z., Liviu Marian (2019) ‘Multiple regression
analysis of performance indicators in the ceramic
industry’
Micu, A., Geru, M., Capatina, A., Avram, C., Rusu, R. and
Panait, A.A., 2019. Leveraging e-Commerce
performance through machine learning algorithms.
Ann. Dunarea Jos Univ. Galati, 2, pp.162-
171.Prediction’
Sharma, M., Sharma, V. and Kapoor, R., 2022. Study of E-
Commerce and Impact of Machine Learning in E-
Commerce. In Empirical Research for Futuristic E-
Commerce Systems: Foundations and Applications (pp.
1-22). IGI Global.
Pavel J., Lenka V., Lucie S., Zdenek S., (2019) ‘Predictive
Performance of Customer Lifetime Value Models in E-
Commerce and the Use of Non-Financial Data’
Zhang, X., Guo, F., Chen, T., Pan, L., Beliakov, G. and Wu,
J., 2023. A Brief Survey of Machine Learning and Deep
Learning Techniques for E-Commerce Research.
Journal of Theoretical and Applied Electronic
Commerce Research, 18(4), pp.2188-2216.
Forecasting Annual Expenditure in E-Commerce
195