When Are 1.58 Bits Enough? A Bottom-up Exploration of Quantization-Aware Training with Ternary Weights

Jacob Nielsen, Lukas Galke, Peter Schneider-Kamp

2025

Abstract

Contemporary machine learning models, such as language models, are powerful, but come with immense resource requirements both at training and inference time. Quantization aware pre-training with ternary weights (1.58 bits per weight) has shown promising results in decoder-only language models and facilitates memory-efficient inference. However, little is known about how quantization-aware training influences the training dynamics beyond such Transformer-based decoder-only language models. Here, we engage in a bottom-up exploration of quantization-aware training, starting with multi-layer perceptrons and graph neural networks. Then, we explore 1.58-bit training in other transformer-based language models: encoder-only and encoder-decoder models. Our results show that in all of these settings, 1.58-bit training is on par with standard 32/16-bit models, yet we also identify challenges specific to 1.58-bit encoder-decoder models. Our results on decoder-only language models hint at a possible regularization effect introduced by quantization-aware training.

Download


Paper Citation


in Harvard Style

Nielsen J., Galke L. and Schneider-Kamp P. (2025). When Are 1.58 Bits Enough? A Bottom-up Exploration of Quantization-Aware Training with Ternary Weights. In Proceedings of the 17th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART; ISBN 978-989-758-737-5, SciTePress, pages 1440-1449. DOI: 10.5220/0013382400003890


in Bibtex Style

@conference{icaart25,
author={Jacob Nielsen and Lukas Galke and Peter Schneider-Kamp},
title={When Are 1.58 Bits Enough? A Bottom-up Exploration of Quantization-Aware Training with Ternary Weights},
booktitle={Proceedings of the 17th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART},
year={2025},
pages={1440-1449},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013382400003890},
isbn={978-989-758-737-5},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 17th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART
TI - When Are 1.58 Bits Enough? A Bottom-up Exploration of Quantization-Aware Training with Ternary Weights
SN - 978-989-758-737-5
AU - Nielsen J.
AU - Galke L.
AU - Schneider-Kamp P.
PY - 2025
SP - 1440
EP - 1449
DO - 10.5220/0013382400003890
PB - SciTePress