Secondly, because the amount of data in the
original file is small, the original file is manually
divided into several small files, and then the whole
program is run. In practical applications, the
MapReduce model usually deals with large-scale data
sets, and manual allocation requires too much time,
which is difficult to achieve. And because this
program is always waiting for each thread to execute
before executing the next step, you can also set a
scheduler at the beginning to detect the size of the file
to be counted before triggering the mapper function,
and then reasonably allocate several files of similar
size to each thread to execute the program, so that the
program execution time of each thread is the same,
reducing the waiting time.
In addition, based on the program framework of
this experiment, we can use this program to study
other problems, such as image recognition, by
changing the original data file and changing the
internal I module for word frequency statistics
according to the needs. This reflects the high
scalability of the serverless framework and the
MapReduce model.
6 CONCLUSION
This article studies the application of the MapReduce
model based on a serverless computing platform in
the task of word frequency statistics. Specifically, the
basic principles and advantages of MapReduce model
and serverless computing technology is described,
following by the development, current situation and
practical application of these two technologies. A
program framework is then designed based on the
research content and a MapReduce model for text
word frequency statistics is successfully built on the
serverless platform based on this framework. The
total running time of the program, the communication
time and the calculation time of Map and Reduce
parts are counted, and their proportion in the time of
this part is calculated. The main factors affecting the
running time and total time of each part of the
program are explored. All the experiment results
show the effectiveness of combining the MapReduce
and serverless computing. To sum up, this article
provides a useful reference for the application of the
MapReduce model based on serverless computing
platform in the task of word frequency statistics, and
also provides new ideas and methods for the research
in related fields.
REFERENCES
Abhishek, V., Brian, Cho., Nicolas, Z., Indranil, G., Roy,
H., 2013. Breaking the MapReduce stage barrier.
Campbell Cluster Computing, 2013.
Bhat, S., Y., Abulaish, M., 2024. A MapReduce-Based
Approach for Fast Connected Components Detection
from Large-Scale Networks. Big Data. 2024 Jan 29.
Dean, J., Ghemawat, S., 2008. MapReduce: simplified data
processing on large clusters. Communications of the
ACM, 2008, 51(1): 107-113.
Faraz, A., Seyong, L., Mithuna, T., Vijaykumar, T., N.,
2013. MapReduce with communication overlap
(MaRCO). Journal of Parallel and Distributed
Computing, 2013(5).
Jonas, E., Schleier-Smith, J., Sreekanti, V., 2019. Cloud
programming simplified: A berkeley view on serverless
computing. arxiv preprint arxiv:1902.03383, 2019.
Kalia, K., Dixit, S., Kumar, K., 2022. Improving
MapReduce heterogeneous performance using KNN
fair share scheduling. Robotics and Autonomous
Systems, 2022, 157: 104228.
Li, J., Cu, J., Wang, R., Yan, L., Huang, Y., 2011. Survey
of MapReduce Parallel Programming Model. Acta
Electronica Sinica, 2011, 39 (11): 2635-2642.
Omar, A., Nur, S., 2018. Serverless Computing and
Scheduling Tasks on Cloud: A Review. American
Scientific Research Journal for Engineering,
Technology, and Sciences (ASRJETS).
Spillner, J., Mateos, C., Monge, D., 2017. A faster, better,
cheaper: the prospect of serverless scientific computing
and HPC. Proc of Latin American High Performance
Computing Conference, 2017:154-168.
Song, J., Sun, Z., Mao, K., Bao, Y., Yu, G., 2017. Research
Advance on MapReduce Based Big Data Processing
Platforms and Algorithms. Journal of Software, 2017,
28(03):514-543.
Tian, J., Yang, R., 2014. Research of digital image
processing based on MapReduce. Electronic Design
Engineering, 2014, 22(15): 93-95+100.
Wang, H., Niu, D., Li, B., 2019. Distributed Machine
Learning with a Serverless Architecture, IEEE
INFOCOM 2019-IEEE Conference on Computer
Communications, Paris, France, 2019, pp. 1288-1296.
Yang, B., Zhao, S., Liu F., 2022. A survey on serverless
computing. Computer Engineering and Science, 2022,
44(04):611-619.
Yang, H., Dasdan, A., Hsiao, R., L., 2007. Map-reduce-
merge: simplified relational data processing on large
clusters. Proceedings of the 2007 ACM SIGMOD
international conference on Management of data. 2007:
1029-1040.
Yang, Z., Niu, T,. Ming, Lyu., 2023. MapReduce Job
Scheduling in Hybrid Storage Modes. Computer
Systems and Applications, 2023, 32(03):70-85.
Zhao, Z., Liu, F., Cai, Z., Xiao, N., 2018. Edge Computing:
Platforms, Applications and Challenges. Journal of
Computer Research and Development, 2018(02).