loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Author: Arunselvan Ramaswamy

Affiliation: Dept. of Mathematics and Computer Science, Karlstad University, 651 88 Karlstad, Sweden

Keyword(s): Deep Learning, Adaptive Gradient Clipping, Dynamical Systems Perspective, Learning Theory, Supervised Learning.

Abstract: Neural networks are ubiquitous components of Machine Learning (ML) algorithms. However, training them is challenging due to problems associated with exploding and vanishing loss-gradients. Gradient clipping is shown to effectively combat both the vanishing gradients and the exploding gradients problems. As the name suggests, gradients are clipped in order to prevent large updates. At the same time, very small neural network weights are updated using larger step-sizes. Although widely used in practice, there is very little theory surrounding clipping. In this paper, we analyze two popular gradient clipping techniques – the classic norm-based gradient clipping method and the adaptive gradient clipping technique. We prove that gradient clipping ensures numerical stability with very high probability. Further, clipping based stochastic gradient descent converges to a set of neural network weights that minimizes the average scaled training loss in a local sense. The averaging is with respe ct to the distribution that generated the training data. The scaling is a consequence of gradient clipping. We use tools from the theory of dynamical systems for the presented analysis. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 3.145.102.187

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Ramaswamy, A. (2023). Gradient Clipping in Deep Learning: A Dynamical Systems Perspective. In Proceedings of the 12th International Conference on Pattern Recognition Applications and Methods - ICPRAM; ISBN 978-989-758-626-2; ISSN 2184-4313, SciTePress, pages 107-114. DOI: 10.5220/0011678000003411

@conference{icpram23,
author={Arunselvan Ramaswamy.},
title={Gradient Clipping in Deep Learning: A Dynamical Systems Perspective},
booktitle={Proceedings of the 12th International Conference on Pattern Recognition Applications and Methods - ICPRAM},
year={2023},
pages={107-114},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011678000003411},
isbn={978-989-758-626-2},
issn={2184-4313},
}

TY - CONF

JO - Proceedings of the 12th International Conference on Pattern Recognition Applications and Methods - ICPRAM
TI - Gradient Clipping in Deep Learning: A Dynamical Systems Perspective
SN - 978-989-758-626-2
IS - 2184-4313
AU - Ramaswamy, A.
PY - 2023
SP - 107
EP - 114
DO - 10.5220/0011678000003411
PB - SciTePress