Gradient Clipping in Deep Learning: A Dynamical Systems Perspective

Arunselvan Ramaswamy

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

Gradient Clipping in Deep Learning: A Dynamical Systems Perspective

Topics: Advanced Learning Methods; Deep Learning and Neural Networks; Machine Learning Methods

In Proceedings of the 12th International Conference on Pattern Recognition Applications and Methods ICPRAM - Volume 1, 107-114, 2023 , Lisbon, Portugal

Author: Arunselvan Ramaswamy

Affiliation: Dept. of Mathematics and Computer Science, Karlstad University, 651 88 Karlstad, Sweden

Keyword(s): Deep Learning, Adaptive Gradient Clipping, Dynamical Systems Perspective, Learning Theory, Supervised Learning.

Abstract: Neural networks are ubiquitous components of Machine Learning (ML) algorithms. However, training them is challenging due to problems associated with exploding and vanishing loss-gradients. Gradient clipping is shown to effectively combat both the vanishing gradients and the exploding gradients problems. As the name suggests, gradients are clipped in order to prevent large updates. At the same time, very small neural network weights are updated using larger step-sizes. Although widely used in practice, there is very little theory surrounding clipping. In this paper, we analyze two popular gradient clipping techniques – the classic norm-based gradient clipping method and the adaptive gradient clipping technique. We prove that gradient clipping ensures numerical stability with very high probability. Further, clipping based stochastic gradient descent converges to a set of neural network weights that minimizes the average scaled training loss in a local sense. The averaging is with respe ct to the distribution that generated the training data. The scaling is a consequence of gradient clipping. We use tools from the theory of dynamical systems for the presented analysis. (More)

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 18.116.14.133

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Ramaswamy, A. (2023). Gradient Clipping in Deep Learning: A Dynamical Systems Perspective. In Proceedings of the 12th International Conference on Pattern Recognition Applications and Methods - ICPRAM; ISBN 978-989-758-626-2; ISSN 2184-4313, SciTePress, pages 107-114. DOI: 10.5220/0011678000003411

@conference{icpram23,
author={Arunselvan Ramaswamy},
title={Gradient Clipping in Deep Learning: A Dynamical Systems Perspective},
booktitle={Proceedings of the 12th International Conference on Pattern Recognition Applications and Methods - ICPRAM},
year={2023},
pages={107-114},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011678000003411},
isbn={978-989-758-626-2},
issn={2184-4313},
}

TY - CONF

JO - Proceedings of the 12th International Conference on Pattern Recognition Applications and Methods - ICPRAM
TI - Gradient Clipping in Deep Learning: A Dynamical Systems Perspective
SN - 978-989-758-626-2
IS - 2184-4313
AU - Ramaswamy, A.
PY - 2023
SP - 107
EP - 114
DO - 10.5220/0011678000003411
PB - SciTePress