The Investigation of the Application of Apache Spark in Stock Analysis

Yuxi Liu

2024

Abstract

Due to the complexity and scale of data increasing rapidly in nowadays, and the requirements of effectiveness and accuracy to stock market analysis on processing data. Some ways based on Apache Spark become widely accepted by a lot of financial companies and organizations. This paper summarized two useful and satisfactory methods of how to enhance the accuracy and reliability of stock market forecast. For Nowcasting the financial time series with streaming data analytics under Apache Spark, this method integrates Apache Spark and various real-time data, through monitoring these data, system can recognise the trends, then put the results into model training to get more rigorous model. In Sentiment analysis and machine learning model, through combining Resilient Distributed Dataset (RDD) and Hadoop Distributed File System (HDFS), efficient data preprocessing, feature extracting and model training can be achieved. Furthermore, Resilient Distributed Dataset as the core of Apache Spark, provides memory management and fault tolerance to it. Meanwhile, the Hadoop Distributed File System offers a dependable method for distributed storage of large-scale textual data. The integration of Resilient Distributed Datasets and the Hadoop Distributed File System significantly enhances the accuracy of analytical outcomes. In conclusion, this paper demonstrates how forecasting financial time series using streaming data analytics within Apache Spark, alongside sentiment analysis and machine learning models, enhances the reliability and precision of stock market analyses. These approaches contribute to making the results of stock analyses more trustworthy and accurate for users.

Download


Paper Citation


in Harvard Style

Liu Y. (2024). The Investigation of the Application of Apache Spark in Stock Analysis. In Proceedings of the 1st International Conference on Engineering Management, Information Technology and Intelligence - Volume 1: EMITI; ISBN 978-989-758-713-9, SciTePress, pages 501-505. DOI: 10.5220/0012957600004508


in Bibtex Style

@conference{emiti24,
author={Yuxi Liu},
title={The Investigation of the Application of Apache Spark in Stock Analysis},
booktitle={Proceedings of the 1st International Conference on Engineering Management, Information Technology and Intelligence - Volume 1: EMITI},
year={2024},
pages={501-505},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012957600004508},
isbn={978-989-758-713-9},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 1st International Conference on Engineering Management, Information Technology and Intelligence - Volume 1: EMITI
TI - The Investigation of the Application of Apache Spark in Stock Analysis
SN - 978-989-758-713-9
AU - Liu Y.
PY - 2024
SP - 501
EP - 505
DO - 10.5220/0012957600004508
PB - SciTePress