loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Authors: Jan Spörer 1 ; Bernhard Bermeitinger 2 ; Tomas Hrycej 1 ; Niklas Limacher 1 and Siegfried Handschuh 1

Affiliations: 1 Institute of Computer Science, University of St.Gallen (HSG), St.Gallen, Switzerland ; 2 Institute of Computer Science in Vorarlberg, University of St.Gallen (HSG), Dornbirn, Austria

Keyword(s): Deep Neural Network, Convolutional Network, Computer Vision, Efficient Training, Resource Optimization, Training Strategies, Overdetermination Ratio, Stochastic Approximation Theory.

Abstract: In training neural networks, it is common practice to use partial gradients computed over batches, mostly very small subsets of the training set. This approach is motivated by the argument that such a partial gradient is close to the true one, with precision growing only with the square root of the batch size. A theoretical justification is with the help of stochastic approximation theory. However, the conditions for the validity of this theory are not satisfied in the usual learning rate schedules. Batch processing is also difficult to combine with efficient second-order optimization methods. This proposal is based on another hypothesis: the loss minimum of the training set can be expected to be well-approximated by the minima of its subsets. Such subset minima can be computed in a fraction of the time necessary for optimizing over the whole training set. This hypothesis has been tested with the help of the MNIST, CIFAR-10, and CIFAR-100 image classification benchmarks, optionally e xtended by training data augmentation. The experiments have confirmed that results equivalent to conventional training can be reached. In summary, even small subsets are representative if the overdetermination ratio for the given model parameter set sufficiently exceeds unity. The computing expense can be reduced to a tenth or less. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 18.216.147.211

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Spörer, J., Bermeitinger, B., Hrycej, T., Limacher, N. and Handschuh, S. (2024). Efficient Neural Network Training via Subset Pretraining. In Proceedings of the 16th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - KDIR; ISBN 978-989-758-716-0; ISSN 2184-3228, SciTePress, pages 242-249. DOI: 10.5220/0012893600003838

@conference{kdir24,
author={Jan Spörer and Bernhard Bermeitinger and Tomas Hrycej and Niklas Limacher and Siegfried Handschuh},
title={Efficient Neural Network Training via Subset Pretraining},
booktitle={Proceedings of the 16th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - KDIR},
year={2024},
pages={242-249},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012893600003838},
isbn={978-989-758-716-0},
issn={2184-3228},
}

TY - CONF

JO - Proceedings of the 16th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - KDIR
TI - Efficient Neural Network Training via Subset Pretraining
SN - 978-989-758-716-0
IS - 2184-3228
AU - Spörer, J.
AU - Bermeitinger, B.
AU - Hrycej, T.
AU - Limacher, N.
AU - Handschuh, S.
PY - 2024
SP - 242
EP - 249
DO - 10.5220/0012893600003838
PB - SciTePress