and further validate whether other overparameterized
models also accelerate the optimization process. This
is one of the topics we are interested in continuing.
Another possible future work is to overparameter-
ize other supervised learning-based recommendation
models, such as factorization machine, NeuralCF, and
Wide & Deep. We are interested in investigating, both
theoretically and empirically, the optimization speed
of these models when these models are overparame-
terized.
ACKNOWLEDGEMENTS
We acknowledge partial support by the Ministry of
Science and Technology in Taiwan under Grant No.:
MOST 107-2221-E-008-077-MY3.
REFERENCES
Arora, S., Cohen, N., and Hazan, E. (2018). On the opti-
mization of deep networks: Implicit acceleration by
overparameterization. In International Conference on
Machine Learning.
Behera, L., Kumar, S., and Patnaik, A. (2006). On adaptive
learning rate that guarantees convergence in feedfor-
ward networks. IEEE Transactions on Neural Net-
works, 17(5):1116–1125.
Chen, H.-H. (2017). Weighted-SVD: Matrix factorization
with weights on the latent factors. arXiv preprint
arXiv:1710.00482.
Chen, H.-H. (2018). Behavior2Vec: Generating distributed
representations of users’ behaviors on products for
recommender systems. ACM Transactions on Knowl-
edge Discovery from Data (TKDD), 12(4):43.
Chen, H.-H. and Chen, P. (2019). Differentiating regular-
ization weights–a simple mechanism to alleviate cold
start in recommender systems. ACM Transactions on
Knowledge Discovery from Data (TKDD), 13(1).
Chen, H.-H., Chung, C.-A., Huang, H.-C., and Tsui, W.
(2017). Common pitfalls in training and evaluating
recommender systems. ACM SIGKDD Explorations
Newsletter, 19(1):37–45.
Cheng, H.-T., Koc, L., Harmsen, J., Shaked, T., Chandra,
T., Aradhye, H., Anderson, G., Corrado, G., Chai,
W., Ispir, M., et al. (2016). Wide & deep learning
for recommender systems. In Proceedings of the 1st
Workshop on Deep Learning for Recommender Sys-
tems, pages 7–10. ACM.
Du, S. S., Zhai, X., Poczos, B., and Singh, A. (2019). Gra-
dient descent provably optimizes over-parameterized
neural networks. In 6th International Conference on
Learning Representations, ICLR 2018.
Duchi, J., Hazan, E., and Singer, Y. (2011). Adaptive sub-
gradient methods for online learning and stochastic
optimization. Journal of Machine Learning Research,
12(Jul):2121–2159.
Graves, A. (2013). Generating sequences with recurrent
neural networks. arXiv preprint arXiv:1308.0850.
Grbovic, M., Radosavljevic, V., Djuric, N., Bhamidipati,
N., Savla, J., Bhagwan, V., and Sharp, D. (2015). E-
commerce in your inbox: Product recommendations
at scale. In Proceedings of the 21th ACM SIGKDD In-
ternational Conference on Knowledge Discovery and
Data Mining, pages 1809–1818. ACM.
Guo, G., Zhang, J., and Yorke-Smith, N. (2013). A novel
bayesian similarity measure for recommender sys-
tems. In Twenty-Third International Joint Conference
on Artificial Intelligence.
Guo, H., Tang, R., Ye, Y., Li, Z., and He, X.
(2017). Deepfm: a factorization-machine based
neural network for ctr prediction. arXiv preprint
arXiv:1703.04247.
Harper, F. M. and Konstan, J. A. (2016). The movielens
datasets: History and context. ACM Transactions on
Interactive Intelligent Systems, 5(4):19.
He, R. and McAuley, J. (2016). Ups and downs: Mod-
eling the visual evolution of fashion trends with one-
class collaborative filtering. In Proceedings of the 25th
International Conference on World Wide Web, pages
507–517. International World Wide Web Conferences
Steering Committee.
He, X., Liao, L., Zhang, H., Nie, L., Hu, X., and Chua, T.-S.
(2017). Neural collaborative filtering. In Proceedings
of the 26th International Conference on World Wide
Web, pages 173–182. International World Wide Web
Conferences Steering Committee.
J
¨
arvelin, K. and Kek
¨
al
¨
ainen, J. (2002). Cumulated gain-
based evaluation of ir techniques. ACM Transactions
on Information Systems, 20(4):422–446.
Kingma, D. P. and Ba, J. (2014). Adam: A
method for stochastic optimization. arXiv preprint
arXiv:1412.6980.
Koren, Y., Bell, R., and Volinsky, C. (2009). Matrix factor-
ization techniques for recommender systems. Com-
puter, (8):30–37.
Kurucz, M., Bencz
´
ur, A. A., and Csalog
´
any, K. (2007).
Methods for large scale svd with missing values. In
Proceedings of KDD Cup and Workshop, volume 12,
pages 31–38.
Luo, X., Zhou, M., Xia, Y., and Zhu, Q. (2014). An
efficient non-negative matrix-factorization-based ap-
proach to collaborative filtering for recommender sys-
tems. IEEE Transactions on Industrial Informatics,
10(2):1273–1284.
Marlin, B., Zemel, R. S., Roweis, S., and Slaney, M. (2012).
Collaborative filtering and the missing at random as-
sumption. arXiv preprint arXiv:1206.5267.
Marlin, B. M. and Zemel, R. S. (2009). Collaborative pre-
diction and ranking with non-random missing data. In
Proceedings of the third ACM Conference on Recom-
mender systems, pages 5–12. ACM.
Massa, P., Souren, K., Salvetti, M., and Tomasoni, D.
(2008). Trustlet, open research on trust metrics. Scal-
able Computing: Practice and Experience, 9(4).
Piatetsky, G. (2007). Interview with Simon Funk. ACM
SIGKDD Explorations Newsletter, 9(1):38–40.
DeLTA 2020 - 1st International Conference on Deep Learning Theory and Applications
96