The Analysis of Hidden Units in LSTM Model for Accurate Stock

Price Prediction

Menghao Deng

The Faculty of Science, The University of Hong Kong, Hong Kong, China

Keywords: Stock Market Prediction, Long Short-Term Memory, Hidden Units, Deep Learning.

Abstract: Stock market forecasting has always been difficult due to its complicated and volatile character. Deep learning

approaches have demonstrated promising results in a variety of domains, including stock market prediction,

in recent years. This research introduces Long Short-Term Memory (LSTM) for forecast of stock market and

examines the impact of model's hidden units. The LSTM model is developed on historical stock market data

to find intricate patterns and linkages. Technical indicators and sentiment analysis can also be used as potential

input variables to improve the model's predictive capacity. The suggested model is tested against a large

dataset of stock market values and compared to established algorithms of deep learning. The purpose of this

research is to investigate the role and implementation of hidden units in LSTM networks. These hidden units

learn long-term dependencies and recall them in a way that many other forms of Recurrent Neural Networks

(RNN) do not. Investigations in paper show that changing the number of hidden units has an effect on

prediction results. The findings of this work add to the expanding corpus of research on using deep learning

techniques for stock market forecasting and analysis, with potential implications in financial decision-making

and risk management.

1 INTRODUCTION

The stock market is a financial market characterized

by complexity and volatility, making stock market

prediction a challenging task. Traditional prediction

methods often struggle to capture the intricate patterns

and dependencies within stock market data. Deep

learning algorithms, on the other hand, have

demonstrated encouraging results in time series

forecasting in recent years. It opens up new avenues

for stock market forecasting. The purpose of this

research is to present a novel approach for stock

market forecasting and analysis that is based on Long

Short-Term Memory (LSTM) model. This study

successfully capture the complex patterns and

dependencies present in the data by training an LSTM

model using historical stock market data. The Author

hopes to give relevant references and direction for

future research and practical applications in the field

of stock market prediction and analysis through this

work, ultimately providing more trustworthy decision

support for investors and financial institutions. Many

researches in Speech recognition and Image and

Video Processing are also based on LSTM. Sequence

to Sequence Learning with Neural Networks, which

introduced the use of LSTM for machine translation

(Sutskever et al 2014). Show and Tell: A Neural

Image Caption Generator, used LSTM for generating

captions for images (Vinyals et al 2015). DeepSpeech:

Scaling up end-to-end speech recognition, which

employed LSTM for speech recognition tasks

(Hannun et al 2014). The original LSTM architecture

included memory cells, input gates, forget gates, and

output gates. It enabled the network to selectively

recall or forget information over extended time

intervals, allowing it to capture long-term

dependencies in sequential data. Over the years,

researchers have proposed various modifications and

improvements to the original LSTM architecture

(Gers et al 2000). Some notable variants include Gated

Recurrent Unit (GRU), which simplified the LSTM

architecture, and Peephole LSTM, which introduced

peephole connections to the gates (Rui et al 2016).

Natural Language Processing (NLP) tasks such as

sentiment analysis, text synthesis, sentiment modeling,

and machine translation have all made extensive use

of LSTM. The capacity of LSTM to capture long-term

dependencies allows it to understand and generate

cohesive word sequences. LSTM has also been used

successfully for time series forecasting jobs such as

stock market forecasting, weather forecasting, and

demand forecasting. Because of its power to capture

temporal patterns, it is well suited for modeling and

predicting sequential data.

Deng, M.

The Analysis of Hidden Units in LSTM Model for Accurate Stock Price Prediction.

DOI: 10.5220/0012799100003885

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 1st International Conference on Data Analysis and Machine Learning (DAML 2023), pages 411-416

ISBN: 978-989-758-705-4

411

The objective of this study is to introduce LSTM

to build a stock price prediction model and analyze the

amount of LSTM hidden units to determine the

optimal model for improving the capability of the

stock price forecasting model. To be more specific,

this study explores how different quantities of LSTM

hidden units affect the accuracy and performance of

stock market predictions. By systematically adjusting

the amount of hidden units in the applied LSTM

model, the author can observe the variations in

prediction results and identify the configuration that

yields the best outcomes. The practical significance of

this research lies in utilizing deep learning models,

particularly LSTM models, to analyze and predict

time series data such as stock prices, weather data, etc.

This provides valuable insights and predictions for

decision-making in fields such as finance and

meteorology. he unique aspect of the paper could be

its focus on the influence of the number of hidden

units in LSTM models on the accuracy of stock price

predictions. This is a crucial aspect, as the number of

hidden units is a key hyperparameter that determines

the complexity and capacity of the model. Given the

intricate factors influencing stock prices, a more

complex model might be necessary to encapsulate

these variables. However, an overly complex model

might overfit the training data, leading to subpar

performance on unseen data. Conversely, a model

with an insufficient number of hidden units could

underfit the data, unable to capture the necessary

patterns for accurate prediction.

In this context, the authors' exploration of the

selection of an appropriate number of hidden units and

the impact of varying numbers of hidden units on

prediction accuracy could provide valuable insights.

This differentiates the paper from other research

papers that might not specifically investigate the

influence of the number of hidden units on prediction

accuracy or might not utilize LSTM models for stock

price prediction.

2 METHODOLOGY

2.1 Dataset Description and

Preprocessing

In this study, the author employs the AAPL database

collected from Yahoo Finance (Dataset 2023). The

AAPL database is a compilation of historical market

data for Apple Inc.'s shares. With data spanning from

2016 to 2023, this database offers a wide range of

financial indicators, including Apple's stock's trading

volume, lowest price, highest price, closing price, and

opening price. The following parameters are included

in the AAPL database: Date: The trading session's date.

The opening price, which is the value of the day's first

deal. High: The price attained during the trading

session. Low: The price attained during the trading

session. Close: The closing price, which is the price of

the day's final trade. Adj Close: The adjusted closing

price, which takes stock splits and dividends into

account. Volume: The total number of shares

exchanged during the trading session is referred to as

volume.

2.2 Proposed Approach

The major goal of this study is to analyze and

investigate the data within the AAPL database. By

utilizing LSTM models, the goal is to predict stock

market trends to a certain degree of accuracy.

Furthermore, through comparative analysis, the study

aims to enhance the LSTM model's performance by

adjusting crucial parameters like units. Specifically,

first, the ‘Sequential’ model is used to create an

instance of a sequential model so that the author can

add layers sequentially to build a complete neural

network model. Second, by adding an LSTM layer,

which has units of 125 to the sequential model, the

model is enabled by the author to recognize temporal

dependencies and patterns in the input data. LSTM

layers are particularly effective in capturing long-term

dependencies and are commonly used in time series

analysis and sequence prediction tasks. Third, the

author applies additional non-linear changes to the

input by introducing a fully connected layer to the

sequential model, allowing the model to learn more

complicated representations and patterns. Fully

connected layers are commonly used in neural

networks to perform tasks such as classification or

regression. Then, by compiling the model with the

optimizer and loss function, respectively using

Adaptive Moment Estimation (Adam) model and

Mean Square Error (MSE) lose function, the author

defines how the model will be trained and how the

model's performance will be evaluated. The optimizer

determines how the model's weights will be updated,

and the loss function quantifies the error between

those true values and predicted values, which the

optimizer will minimize during training. The author

last gives a summary function of the LSTM model,

including the layers, output shapes, and the number of

parameters in each layer.

2.2.1 LSTM

LSTM is an advanced type of Recurrent Neural

Network (RNN) that has garnered significant

attention in various fields, including the realm of

DAML 2023 - International Conference on Data Analysis and Machine Learning

412

stock market prediction (Vinyals et al 2015). What

sets LSTM apart is its unique capability to capture

and understand long-term dependencies and patterns

within sequential data. When it comes to predicting

stock market trends, LSTM demonstrates great

potential. It excels at analyzing historical stock prices,

trading volumes, and other pertinent factors over

time. By taking into account the sequential nature of

stock market data, LSTM can effectively uncover

intricate relationships and dependencies that may

exist, such as trends, seasonality, and irregular

patterns. The strength of LSTM lies in its ability to

learn from historical stock market data and utilize that

knowledge to make predictions based on the learned

patterns. This makes it well-suited for forecasting

future stock prices and market trends. By training the

LSTM model on a substantial dataset of historical

stock market information, it can potentially uncover

hidden patterns and trends that may elude human

analysts (Dataset 2023). Besides, LSTM models

possess the remarkable capability to adapt and update

their predictions as new data becomes available. This

flexibility allows them to continuously learn and

adjust their forecasts, making them highly suitable for

the dynamic and ever-changing nature of the stock

market (Karim et al 2017). Furthermore, LSTM

models have the ability to adapt to changing market

conditions. This is because of their inherent capability

to learn and forget information as required, enabling

them to dynamically adjust to new data and forget

irrelevant past data. This feature is especially

beneficial in the volatile and ever-changing landscape

of the stock market, where patterns and trends can

shift rapidly. Instead of relying solely on intuition or

traditional analysis methods, investors can leverage

the predictive power of LSTM models to analyze vast

amounts of historical stock data and identify potential

future trends. This empowers investors to make well-

informed investment decisions, potentially leading to

better investment outcomes. It can provide them a

competitive edge in the stock market by enabling

them to anticipate market movements and act

accordingly. Therefore, the use of LSTM models in

stock market prediction can revolutionize the way

investors strategize their investment plans, making

the process more efficient and effective. The process

of LSTM is shown in the Fig. 1.

2.2.2 Hidden Units

The author's main goal in this study is to investigate

the effect of updating concealed units. The hidden

units in an LSTM model play a crucial role. They are

the LSTM's main components, assisting the model in

learning patterns and contextual information in input

sequences, allowing it to recall and anticipate long-

term dependencies. An LSTM's hidden units perform

two vital tasks. Long-term dependency memory:

LSTM stores and propagates information via memory

cells, while hidden units control the flow of

information to determine whether to recall or forget

specific information. This method enables LSTM to

manage long-term dependencies successfully, which

is critical for tasks like language modeling and

Figure 1: LSTM model (Picture credit: Original).

The Analysis of Hidden Units in LSTM Model for Accurate Stock Price Prediction

413

machine translation. Information flow control: The

hidden units in LSTM are outfitted with gate units

that manage the flow of information. Gate units are

classified into three types: input gates, output gates,

and forget gates. These gates mentioned above

determine whether or not to allow information to flow

based on the input and previous states, thus

controlling the update and output of the memory cells

(Karim et al 2017).

Changing the number of hidden units in an LSTM

model has an effect. By increasing the number of

hidden units, the model's capacity and learning ability

can be improved, allowing it to capture complicated

patterns and long-term connections. Increasing the

number of hidden units, on the other hand, increases

model complexity and computational costs, and may

result in overfitting (Hochreiter and Schmidhuber

1996). Reducing the amount of hidden units may

reduce the model's capacity and learning capabilities,

making complex patterns and long-term connections

impossible to capture. However, reducing the number

of hidden units can decrease model complexity and

computational costs, helping to avoid overfitting and

improving generalization. In this study, the author

gives a comparison of using 125 units and 60 units.

The selection of this number is based on past habits of

LSTM research, as well as a desire to study the impact

of approximately doubling the number of hidden units

on various parameters. This approach provides a

balance between maintaining computational

efficiency and exploring the effects of increased

model complexity on prediction accuracy. The result

of the comparison is shown in section 3.

2.2.3 Loss Function

This study uses MSE and its square root pattern Root

Mean Squared Error (RMSE) in this report. MSE is a

popular loss function used in machine learning to

evaluate how well a model is predicting continuous

values.It works well with LSTM models, especially

when used to analyze sequential data like time series

data or natural language. MSE is a good choice for

these models because it is continuous, differentiable,

robust to outliers, and has a statistical interpretation

that aligns with many real-world regression problems

(Kim et al 2018 & Le et al 2019). Here is the

algorithm of MSE, as follows:

𝑀𝑆𝐸 





∑

𝑌



 𝑌













(1)

where n represents the total number of the samples,

𝑌



represents the real value, and 𝑌





represents the

estimated value.

3 RESULTS AND DISCUSSION

After conducting an in-depth study and predictive

analysis, the author has obtained the predicted stock

prices for the period from 2021 to August 31, 2023,

based on the comprehensive AAPL database spanning

from 2016 to 2020. This study showcases the results

and fitting degree when using 125 units, followed by

a demonstration of the results when reducing the units

to 60. Finally, this study conducts a comparative

analysis of the two, providing insights into whether

more or fewer units are more suitable for stock price

prediction analysis. These results can help readers

gain a deeper understanding of the optimal number of

units for stock price prediction analysis.

3.1 Results by Applying 125 Units

A relatively large number of hidden neurons can

increase the capacity and expressive power of the

model, enabling it to better capture complex patterns

and correlations in time series data. This can help

improve the accuracy and fitting degree of the

predictions, especially for complex stock price

prediction tasks. As shown in Fig. 2, it gives a result

which is highly similar to the real situation, with the

root mean squared error is 3.81. In this situation, the

total number of the parameter is 66,676.

Figure 2: The Prediction of Stock Price with hidden units of

125 (Picture credit: Original).

3.2 Results by Applying 60 Units

By turning the hidden units to 60, a relatively low

number, the author gets the following result shown in

Fig. 3. When the number of LSTM hidden neurons is

small, there are advantages such as high

computational efficiency and prevention of overfitting

(Hochreiter and Schmidhuber 1996). However, there

are disadvantages including information loss, limited

DAML 2023 - International Conference on Data Analysis and Machine Learning

414

expression capacity, and decreased prediction

accuracy. Under this choice, the square root of mean

squared error is 4.20, and the total parameters are

16,431. This study finds that using a larger number of

hidden neurons compared to a smaller one leads to

more accurate stock price predictions, as reflected in a

small MSE value. his is at the expense of a substantial

increase in the total number of parameters, which goes

from 16,431 to 66,676, or more than four times the

initial figure. Additionally, training and learning time

also increase as a result. In the view of getting a better

estimation of stock market, more units might be better

due to the analysis above. The author compares

different results by applying different number of units.

And it comes to a summary that more units give better

prediction of certain stock markets. By varying the

number of hidden units, we can draw the conclusion

that the number of hidden units significantly

influences the total number of parameters, satisfying

an O(n^2) complexity. Simultaneously, it directly

alters the prediction results, thereby indicating that

this hyperparameter has a direct and significant impact

on the final stock price prediction to a certain extent.

In the real world, this finding has both practical and

learning implications, and it suggests that optimizing

the number of hidden neurons can lead to improved

performance in stock price prediction.

Figure 3: The Prediction of Stock Price with hidden units of

60 (Picture credit: Original).

4 CONCLUSION

This study emphasizes the significance of optimizing

the number of hidden neurons in LSTM models for

accurate stock price prediction. The findings

underscore the trade-off between model complexity

and computational efficiency, necessitating a careful

balance. While increasing the number of hidden

neurons improves prediction accuracy, it also leads to

longer training times and increased model complexity.

Nevertheless, this research offers valuable insights for

practitioners and researchers in the field, contributing

to the broader understanding of machine learning

applications in finance. Indeed, LSTM models have

shown great potential in stock price prediction, and

there may be even more untapped possibilities. By

further modifying and fine-tuning other parameters

and hyperparameters, it may be able to develop even

better prediction tools. The field of machine learning

is constantly evolving, and advancements in model

architecture, data preprocessing techniques, and

optimization algorithms can contribute to enhanced

performance. Therefore, continued research and

experimentation with LSTM models, along with other

deep learning techniques, hold promise for improving

the accuracy and reliability of stock price prediction.

In the context of stock price prediction, due to the

complexity of factors influencing stock prices, a more

complex model might be required to capture these

influences. However, a model that is too complex may

overfit the training data and perform poorly on unseen

data. Therefore, choosing an appropriate number of

hidden units is a crucial aspect.

The distinctiveness of the article lies in its focus on

the number of hidden units in LSTM models as an

independent research subject. This is unlike most

other studies on the use of LSTM for stock price

prediction or other forecasting tasks, where the

number of hidden units is usually not the main focus.

In this paper, the author delve into the impact of the

number of hidden units on the prediction accuracy,

underlining the significance of this parameter in

shaping the outcome of the forecast. By doing so, they

not only highlight the importance of carefully tuning

this parameter for stock price prediction but also

indicate its potential influence on other applications of

LSTM models. This research could provide a

benchmark for future studies in this direction,

emphasizing the need for a more nuanced

understanding and manipulation of the number of

hidden units in LSTM models. This unique focus sets

this paper apart from other research in the field.

Overall, this study serves as a reference for future

research and highlights the importance of considering

model capacity and computational resources in stock

price prediction.

REFERENCES

Sutskever, Ilya, O. Vinyals, V. Quoc, “Sequence to

sequence learning with neural networks,” Advances in

The Analysis of Hidden Units in LSTM Model for Accurate Stock Price Prediction

415

neural information processing systems, vol.2014, pp.

Vinyals, Oriol, et al, “Show and tell: A neural image caption

generator,” Proceedings of the IEEE conference on

computer vision and pattern recognition., vol. 2015, pp.

3156-3164

Hannun, Awni, et al, “Deep speech: Scaling up end-to-end

speech recognition,” arXiv, 2014, unpublished.

Gers, A. Felix, S. Jürgen, “Learning to forget: Continual

prediction with LSTM,” Neural computation, vol.

2000, pp. 2451-2471.

F. Rui, Z. Zhang, L. Li, “Using LSTM and GRU neural

network methods for traffic flow prediction,” 2016 31st

Youth academic annual conference of Chinese

association of Automation (YAC), 2016. pp. 324-328

Dataset, "Apple Inc. (AAPL)," Yahoo Finance, Last

accessed 24 August 2023, https://finance.yahoo.

com/quote/AAPL/history?p=AAPL

Karim, Fazle, et al, “LSTM fully convolutional networks

for time series classification,” IEEE access, vol. 6,

2017, pp. 1662-1669

Hochreiter, J. Schmidhuber, “LSTM can solve hard long

time lag problems,” Advances in neural information

processing systems, vol. 1996, pp. 9

Kim, H. Young, C. Won, “Forecasting the volatility of

stock price index: A hybrid model integrating LSTM

with multiple GARCH-type models,” Expert Systems

with Applications, vol. 103, 2018, pp. 25-37

X. Le, et al, “Application of long short-term memory

(LSTM) neural network for flood forecasting,” Water,

vol. 2019, pp. 1387

DAML 2023 - International Conference on Data Analysis and Machine Learning

416