BGD: Generalization Using Large Step Sizes to Attract Flat Minima

Muhammad Ali; Omar Alsuwaidi; Salman Khan

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

BGD: Generalization Using Large Step Sizes to Attract Flat Minima

Topics: Deep Learning for Visual Understanding ; Machine Learning Technologies for Vision

In Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 5 VISAPP: VISAPP, 239-249, 2023 , Lisbon, Portugal

Authors: Muhammad Ali ; Omar Alsuwaidi and Salman Khan

Affiliation: Department of Computer Vision, Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), Abu Dhabi, U.A.E.

Keyword(s): Generalization, Optimization Method, Deep Neural Network, Bouncing Gradient Descent, Heuristic Algorithm, Large Step Sizes, Local Minima, Basin Flatness, Sharpness.

Abstract: In the digital age of ever-increasing data sources, accessibility, and collection, the demand for generalizable machine learning models that are effective at capitalizing on given limited training datasets is unprecedented due to the labor-intensiveness and expensiveness of data collection. The deployed model must efficiently exploit patterns and regularities in the data to achieve desirable predictive performance on new, unseen datasets. Naturally, due to the various sources of data pools within different domains from which data can be collected, such as in Machine Learning, Natural Language Processing, and Computer Vision, selection bias will evidently creep into the gathered data, resulting in distribution (domain) shifts. In practice, it is typical for learned deep neural networks to yield sub-optimal generalization performance as a result of pursuing sharp local minima when simply solving empirical risk minimization (ERM) on highly complex and non-convex loss functions. Hence, t his paper aims to tackle the generalization error by first introducing the notion of a local minimum’s sharpness, which is an attribute that induces a model’s non-generalizability and can serve as a simple guiding heuristic to theoretically distinguish satisfactory (flat) local minima from poor (sharp) local minima. Secondly, motivated by the introduced concept of variance-stability ∼ exploration-exploitation tradeoff, we propose a novel gradient-based adaptive optimization algorithm that is a variant of SGD, named Bouncing Gradient Descent (BGD). BGD’s primary goal is to ameliorate SGD’s deficiency of getting trapped in suboptimal minima by utilizing relatively large step sizes and ”unorthodox” approaches in the weight updates in order to achieve better model generalization by attracting flatter local minima. We empirically validate the proposed approach on several benchmark classification datasets, showing that it contributes to significant and consistent improvements in model generalization performance and produces state-of-the-art results when compared to the baseline approaches. (More)

In the digital age of ever-increasing data sources, accessibility, and collection, the demand for generalizable machine learning models that are effective at capitalizing on given limited training datasets is unprecedented due to the labor-intensiveness and expensiveness of data collection. The deployed model must efficiently exploit patterns and regularities in the data to achieve desirable predictive performance on new, unseen datasets. Naturally, due to the various sources of data pools within different domains from which data can be collected, such as in Machine Learning, Natural Language Processing, and Computer Vision, selection bias will evidently creep into the gathered data, resulting in distribution (domain) shifts. In practice, it is typical for learned deep neural networks to yield sub-optimal generalization performance as a result of pursuing sharp local minima when simply solving empirical risk minimization (ERM) on highly complex and non-convex loss functions. Hence, this paper aims to tackle the generalization error by first introducing the notion of a local minimum’s sharpness, which is an attribute that induces a model’s non-generalizability and can serve as a simple guiding heuristic to theoretically distinguish satisfactory (flat) local minima from poor (sharp) local minima. Secondly, motivated by the introduced concept of variance-stability ∼ exploration-exploitation tradeoff, we propose a novel gradient-based adaptive optimization algorithm that is a variant of SGD, named Bouncing Gradient Descent (BGD). BGD’s primary goal is to ameliorate SGD’s deficiency of getting trapped in suboptimal minima by utilizing relatively large step sizes and ”unorthodox” approaches in the weight updates in order to achieve better model generalization by attracting flatter local minima. We empirically validate the proposed approach on several benchmark classification datasets, showing that it contributes to significant and consistent improvements in model generalization performance and produces state-of-the-art results when compared to the baseline approaches.

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 216.73.216.119

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Ali, M., Alsuwaidi, O., Khan and S. (2023). BGD: Generalization Using Large Step Sizes to Attract Flat Minima. In Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2023) - Volume 5: VISAPP; ISBN 978-989-758-634-7; ISSN 2184-4321, SciTePress, pages 239-249. DOI: 10.5220/0011771700003417

@conference{visapp23,
author={Muhammad Ali and Omar Alsuwaidi and Salman Khan},
title={BGD: Generalization Using Large Step Sizes to Attract Flat Minima},
booktitle={Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2023) - Volume 5: VISAPP},
year={2023},
pages={239-249},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011771700003417},
isbn={978-989-758-634-7},
issn={2184-4321},
}

TY - CONF

JO - Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2023) - Volume 5: VISAPP
TI - BGD: Generalization Using Large Step Sizes to Attract Flat Minima
SN - 978-989-758-634-7
IS - 2184-4321
AU - Ali, M.
AU - Alsuwaidi, O.
AU - Khan, S.
PY - 2023
SP - 239
EP - 249
DO - 10.5220/0011771700003417
PB - SciTePress