to perform an analysis of the GPU Cache hierarchy.
REFERENCES
Abdelkhalik, H., Arafa, Y., Santhi, N., and Badawy, A.-
H. A. (2022). Demystifying the nvidia ampere archi-
tecture through microbenchmarking and instruction-
level analysis. In IEEE High Performance Extreme
Computing Conference.
Alappat, C. L. et al. (2020). Understanding hpc benchmark
performance on intel broadwell and cascade lake pro-
cessors. In ISC High Performance Computing, pages
412–433. Springer International Publishing.
Alavani, G., Desai, J., and Sarkar, S. (2021). Gppt:
A power prediction tool for cuda applications. In
2021 36th IEEE/ACM International Conference on
Automated Software Engineering Workshops (ASEW),
pages 247–250. IEEE.
Alavani, G. and Sarkar, S. (2022). Performance modeling of
graphics processing unit application using static and
dynamic analysis. Concurrency and Computation:
Practice and Experience, 34(3):e6602.
Ali, M. (2020). PyCaret: An open source, low-code ma-
chine learning library in Python. PyCaret version 1.0.
Andersch, M., Lucas, J., LvLvarez-Mesa, M. A., and Ju-
urlink, B. (2015). On latency in gpu throughput mi-
croarchitectures. In International Symposium on Per-
formance Analysis of Systems and Software (ISPASS).
Arafa, Y., Badawy, A., Chennupati, G., Santhi, N., and Ei-
denbenz, S. (2019). Ppt-gpu: Scalable gpu perfor-
mance modeling. IEEE Computer Architecture Let-
ters, pages 55–58.
Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J. W.,
Lee, S.-H., and Skadron, K. (2009). Rodinia: A
benchmark suite for heterogeneous computing. Inter-
national Symposium on Workload Characterization.
Cornelis, J. G. and Lemeire, J. (2019). The pipeline per-
formance model: a generic executable performance
model for gpus. In International Conference on Par-
allel, Distributed and Network-Based Processing.
Ding, N. and Williams, S. (2019). An instruction roofline
model for gpus. In IEEE/ACM Performance Mod-
eling, Benchmarking and Simulation of High Perfor-
mance Computer Systems (PMBS).
Hong, S. and Kim, H. (2009). An analytical model for a
gpu architecture with memory-level and thread-level
parallelism awareness. International Symposium on
Computer Architecture (ISCA).
Hristea, C.-A.-M. et al. (1997). Micro benchmarks for mul-
tiprocessor memory hierachy performance. PhD the-
sis, Massachusetts Institute of Technology.
Jia, Z., Maggioni, M., Staiger, B., and Scarpazza, D. P.
(2018). Dissecting the nvidia volta gpu architecture
via microbenchmarking. arXiv.
Kandiah, V., Peverelle, S., Khairy, M., Pan, J., Manjunath,
A., Rogers, T. G., Aamodt, T. M., and Hardavellas, N.
(2021). Accelwattch: A power modeling framework
for modern gpus. In International Symposium on Mi-
croarchitecture.
Konstantinidis, E. and Cotronis, Y. (2017). A quantitative
roofline model for gpu kernel performance estimation
using micro-benchmarks and hardware metric profil-
ing. J. Parallel Distrib. Comput., 107:37–56.
Kothapalli, K., Mukherjee, R., Rehman, M. S., Patidar, S.,
Narayanan, P., and Srinathan, K. (2009). A perfor-
mance prediction model for the cuda gpgpu platform.
International Conference on High Performance Com-
puting (HiPC).
Lemeire, J., Cornelis, J. G., and Segers, L. (2016). Mi-
crobenchmarks for gpu characteristics: The occu-
pancy roofline and the pipeline model. Euromicro
International Conference on Parallel, Distributed, and
Network-Based Processing (PDP).
Lucas, J. and Juurlink, B. (2019). Mempower: Data-
aware gpu memory power model. In Schoeberl, M.,
Hochberger, C., Uhrig, S., Brehm, J., and Pionteck,
T., editors, Architecture of Computing Systems, pages
195–207. Springer.
Markidis, S., Chien, S., Laure, E., Peng, I., and Vetter, J. S.
(2018). Nvidia tensor core programmability, perfor-
mance & precision. In International Parallel and Dis-
tributed Processing Symposium Workshops.
Mei, X. and Chu, X. (2017). Dissecting gpu memory hierar-
chy through microbenchmarking. IEEE Transactions
on Parallel and Distributed Systems, 28:72–86.
Papadopoulou, M.-M., Sadooghi-Alvandi, M., and Wong,
H. (2009). Micro-benchmarking the gt200 gpu. Com-
puter Group, ECE, University of Toronto, Tech. Rep.
R. Meltzer, C. Z. and Cecka, C. (2013). Micro-
benchmarking the c2070,. poster presented at GPU
Technology Conference.
Resios, A. (2011). GPU Performance Prediction using Pa-
rameterized Models. Master’s thesis, Utrecht Univer-
sity.
Saavedra-Barrera, R. H. (1992). CPU Performance Eval-
uation and Execution Time Prediction Using Narrow
Spectrum Benchmarking. PhD thesis, EECS Depart-
ment, University of California, Berkeley.
Volkov, V. (2016). Understanding Latency Hiding on
GPUs.
Volkov, V. (2018). A microbenchmark to study gpu per-
formance models. 23rd ACM SIGPLAN Symposium
on Principles and Practice of Parallel Programming,
53(1):421–422.
Wong, H., Papadopoulou, M.-M., Sadooghi-Alvandi, M.,
and Moshovos, A. (2010). Demystifying gpu microar-
chitecture through microbenchmarking. IEEE Inter-
national Symposium on Performance Analysis of Sys-
tems & Software (ISPASS).
Yan, D., Wang, W., and Chu, X. (2020). Demystifying ten-
sor cores to optimize half-precision matrix multiply.
In IEEE International Parallel and Distributed Pro-
cessing Symposium (IPDPS), pages 634–643.
ICSOFT 2023 - 18th International Conference on Software Technologies
70