
guage model fine-tuning. Our findings not only have
implications for natural language processing but also
hold relevance for broader applications across data
analysis and machine learning.
REFERENCES
Chapman, D. G. and Robbins, H. (1951). Minimum vari-
ance estimation without regularity assumptions. The
Annals of Mathematical Statistics, pages 581–586.
Dettmers, T., Lewis, M., Belkada, Y., and Zettlemoyer,
L. (2022). Llm. int8 (): 8-bit matrix multipli-
cation for transformers at scale. arXiv preprint
arXiv:2208.07339.
Dettmers, T., Svirschevski, R., Egiazarian, V., Kuznedelev,
D., Frantar, E., Ashkboos, S., Borzunov, A., Hoefler,
T., and Alistarh, D. (2023). Spqr: A sparse-quantized
representation for near-lossless llm weight compres-
sion. arXiv preprint arXiv:2306.03078.
Dettmers, T. and Zettlemoyer, L. (2023). The case for 4-bit
precision: k-bit inference scaling laws. In Interna-
tional Conference on Machine Learning, pages 7750–
7774. PMLR.
Frantar, E., Ashkboos, S., Hoefler, T., and Alistarh, D.
(2022). Gptq: Accurate post-training quantization for
generative pre-trained transformers. arXiv preprint
arXiv:2210.17323.
Ghaffari, A., Tahaei, M. S., Tayaranian, M., Asgharian,
M., and Partovi Nia, V. (2022). Is integer arith-
metic enough for deep learning training? Advances
in Neural Information Processing Systems, 35:27402–
27413.
Li, L., Li, Q., Zhang, B., and Chu, X. (2023). Norm tweak-
ing: High-performance low-bit quantization of large
language models. arXiv preprint arXiv:2309.02784.
Lin, J., Tang, J., Tang, H., Yang, S., Dang, X., and Han, S.
(2023). Awq: Activation-aware weight quantization
for llm compression and acceleration. arXiv preprint
arXiv:2306.00978.
Micikevicius, P., Narang, S., Alben, J., Diamos, G., Elsen,
E., Garcia, D., Ginsburg, B., Houston, M., Kuchaiev,
O., Venkatesh, G., et al. (2017). Mixed precision train-
ing. arXiv preprint arXiv:1710.03740.
Rajpurkar, P., Zhang, J., Lopyrev, K., and Liang, P. (2016).
Squad: 100,000+ questions for machine comprehen-
sion of text. arXiv preprint arXiv:1606.05250.
Tayaranian, M., Ghaffari, A., Tahaei, M. S., Reza-
gholizadeh, M., Asgharian, M., and Nia, V. P. (2023).
Towards fine-tuning pre-trained language models with
integer forward and backward propagation. In Find-
ings of the Association for Computational Linguistics:
EACL 2023, pages 1867–1876.
Tukey, J. W. (1965). Which part of the sample contains the
information? Proceedings of the National Academy
of Sciences, 53(1):127–134.
Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., and
Bowman, S. R. (2018). Glue: A multi-task bench-
mark and analysis platform for natural language un-
derstanding. arXiv preprint arXiv:1804.07461.
Wei, X., Zhang, Y., Zhang, X., Gong, R., Zhang, S., Zhang,
Q., Yu, F., and Liu, X. (2022). Outlier suppres-
sion: Pushing the limit of low-bit transformer lan-
guage models. Advances in Neural Information Pro-
cessing Systems, 35:17402–17414.
Williamson, D. (1991). Dynamically scaled fixed point
arithmetic. In [1991] IEEE Pacific Rim Conference
on Communications, Computers and Signal Process-
ing Conference Proceedings, pages 315–318. IEEE.
Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C.,
Moi, A., Cistac, P., Rault, T., Louf, R., Funtowicz,
M., et al. (2019). Huggingface’s transformers: State-
of-the-art natural language processing. arXiv preprint
arXiv:1910.03771.
Xiao, G., Lin, J., Seznec, M., Wu, H., Demouth, J., and Han,
S. (2023). Smoothquant: Accurate and efficient post-
training quantization for large language models. In In-
ternational Conference on Machine Learning, pages
38087–38099. PMLR.
Yuan, Z., Niu, L., Liu, J., Liu, W., Wang, X., Shang, Y., Sun,
G., Wu, Q., Wu, J., and Wu, B. (2023). Rptq: Reorder-
based post-training quantization for large language
models. arXiv preprint arXiv:2304.01089.
Zhang, X., Liu, S., Zhang, R., Liu, C., Huang, D., Zhou, S.,
Guo, J., Guo, Q., Du, Z., Zhi, T., et al. (2020). Fixed-
point back-propagation training. In Proceedings of the
IEEE/CVF Conference on Computer Vision and Pat-
tern Recognition, pages 2330–2338.
Zhao, K., Huang, S., Pan, P., Li, Y., Zhang, Y., Gu, Z., and
Xu, Y. (2021). Distribution adaptive int8 quantization
for training cnns. In Proceedings of the AAAI Con-
ference on Artificial Intelligence, volume 35, pages
3483–3491.
Zhao, R., Hu, Y., Dotzel, J., De Sa, C., and Zhang, Z.
(2019). Improving neural network quantization with-
out retraining using outlier channel splitting. In In-
ternational conference on machine learning, pages
7543–7552. PMLR.
Zhu, F., Gong, R., Yu, F., Liu, X., Wang, Y., Li, Z., Yang,
X., and Yan, J. (2020). Towards unified int8 training
for convolutional neural network. In Proceedings of
the IEEE/CVF Conference on Computer Vision and
Pattern Recognition, pages 1969–1979.
ICPRAM 2024 - 13th International Conference on Pattern Recognition Applications and Methods
484