Authors:
Motaz Al-Hami
1
;
Marcin Pietron
2
;
Raul Casas
3
;
Samer Hijazi
3
and
Piyush Kaul
3
Affiliations:
1
Hashemite University and Cadence Design Systems, Jordan
;
2
AGH University and Cadence Design Systems, Poland
;
3
Cadence Design Systems, United States
Keyword(s):
Deep Learning, Quantization, Convolutional Neural Networks.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Biomedical Engineering
;
Biomedical Signal Processing
;
Computational Intelligence
;
Data Manipulation
;
Evolutionary Computing
;
Health Engineering and Technology Applications
;
Human-Computer Interaction
;
Knowledge Discovery and Information Retrieval
;
Knowledge-Based Systems
;
Machine Learning
;
Methodologies and Methods
;
Neural Networks
;
Neurocomputing
;
Neurotechnology, Electronics and Informatics
;
Pattern Recognition
;
Physiological Computing Systems
;
Sensor Networks
;
Signal Processing
;
Soft Computing
;
Symbolic Systems
;
Theory and Methods
;
Vision and Perception
Abstract:
Nowadays, convolutional neural network (CNN) plays a major role in the embedded computing environment.
Ability to enhance the CNN implementation and performance for embedded devices is an urgent demand.
Compressing the network layers parameters and outputs into a suitable precision formats would reduce the
required storage and computation cycles in embedded devices. Such enhancement can drastically reduce the
consumed power and the required resources, and ultimately reduces cost. In this article, we propose several
quantization techniques for quantizing several CNN networks. With a minor degradation of the floating-point
performance, the presented quantization methods are able to produce a stable performance fixed-point networks.
A precise fixed point calculation for coefficients, input/output signals and accumulators are considered
in the quantization process.