tor matrix element, and rank of the input factor ma-
trices), and (3) Parameters of the memory controller
(i.e., DMA buffer sizes, number of cache lines, as-
sociativity of cache, and number of factor matrices
shared by a cache).
A module-by-module (e.g., Cache Engine and
DMA Engine) exhaustive parameter search can be
proposed to identify the optimal parameters for the
memory controller.
6 DISCUSSION
In this paper, we investigated the characteristics of a
custom memory controller that can reduce the total
memory access time of sparse MTTKRP on FPGAs.
Sparse MTTKRP is a memory-bound operation. It
has 2 types of memory access patterns that can be op-
timized to reduce the total memory access time. A
memory controller design that can be configured dur-
ing compile/synthesis time depending on the applica-
tion and targeted hardware is required.
We are developing a configurable memory con-
troller and a memory layout for sparse tensors to re-
duce the total memory access time of sparse MT-
TKRP operation.
Since synthesizing a FPGA can take a long time,
optimizing the memory controller parameters for a
given application can be a time-consuming process.
Hence, we are developing a Performance Model Sim-
ulator (PMS) software to identify the optimal param-
eters for a given application on a selected FPGA.
ACKNOWLEDGEMENTS
This work was supported by the U.S. National Sci-
ence Foundation (NSF) under grants NSF SaTC #
2104264 and PPoSS- 2119816.
REFERENCES
CCIX (2021). Cache Coherent Interconnect for Accelera-
tors (CCIX). https://www.ccixconsortium.com/.
Cheng, Z., Li, B., Fan, Y., and Bao, Y. (2020). A novel
rank selection scheme in tensor ring decomposition
based on reinforcement learning for deep neural net-
works. In ICASSP 2020-2020 IEEE International
Conference on Acoustics, Speech and Signal Process-
ing (ICASSP), pages 3292–3296. IEEE.
CXL (2021). Compute Express Link (CXL).
https://www.computeexpresslink.org/.
Helal, A. E., Laukemann, J., Checconi, F., Tithi, J. J.,
Ranadive, T., Petrini, F., and Choi, J. (2021). Alto:
Adaptive linearized storage of sparse tensors. In Pro-
ceedings of the ACM International Conference on Su-
percomputing, ICS ’21, page 404–416, New York,
NY, USA. Association for Computing Machinery.
Kolda, T. G. and Bader, B. W. (2009). Tensor decomposi-
tions and applications. SIAM review, 51(3):455–500.
Kuppannagari, S. R., Rajat, R., Kannan, R., Dasu, A., and
Prasanna, V. (2019). Ip cores for graph kernels on fp-
gas. In 2019 IEEE High Performance Extreme Com-
puting Conference (HPEC), pages 1–7.
Li, J., Sun, J., and Vuduc, R. (2018). Hicoo: Hierarchical
storage of sparse tensors. In Proceedings of the Inter-
national Conference for High Performance Comput-
ing, Networking, Storage, and Analysis, SC ’18. IEEE
Press.
Mondelli, M. and Montanari, A. (2019). On the connec-
tion between learning two-layer neural networks and
tensor decomposition. In The 22nd International Con-
ference on Artificial Intelligence and Statistics, pages
1051–1060. PMLR.
Nisa, I., Li, J., Sukumaran-Rajam, A., Vuduc, R., and Sa-
dayappan, P. (2019). Load-balanced sparse mttkrp on
gpus. In 2019 IEEE International Parallel and Dis-
tributed Processing Symposium (IPDPS), pages 123–
133.
Smith, S., Choi, J. W., Li, J., Vuduc, R., Park, J., Liu, X.,
and Karypis, G. (2017). FROSTT: The formidable
repository of open sparse tensors and tools.
Srivastava, N., Rong, H., Barua, P., Feng, G., Cao,
H., Zhang, Z., Albonesi, D., Sarkar, V., Chen,
W., Petersen, P., Lowney, G., Herr, A., Hughes,
C., Mattson, T., and Dubey, P. (2019). T2s-
tensor: Productively generating high-performance
spatial hardware for dense tensor computations. In
2019 IEEE 27th Annual International Symposium on
Field-Programmable Custom Computing Machines
(FCCM), pages 181–189.
Wen, F., So, H. C., and Wymeersch, H. (2020). Ten-
sor decomposition-based beamspace esprit algorithm
for multidimensional harmonic retrieval. In ICASSP
2020-2020 IEEE International Conference on Acous-
tics, Speech and Signal Processing (ICASSP), pages
4572–4576. IEEE.
Xilinx (2019). Alveo u250 data center accelerator
card. https://www.xilinx.com/products/boards-and-
kits/alveo/u250.html.
Towards Programmable Memory Controller for Tensor Decomposition
475