has a strong artificial dependency, relying on the
manual selection of pruning parameters, and the cost
of model compression increases greatly as the model
gets larger.
Recent studies mainly focus on automatic pruning
methods and solving the impact of unstructured
pruning on interpretability, such as using sparse
regularization algorithm to automatically identify
pruning parameters and reduce the dependence on
manual parameter selection (Mitsuno et al, 2020) .
The automated pruning field has uncultivated
potential. Most of the existing pruning methods are
only for specific models and tasks, which require
strong domain knowledge. Thus, it usually requires
AI developers to spend a lot of energy to apply these
methods to their own scenarios, which is very costly.
The ideal structural pruning algorithm should satisfy
the following conditions: the model can be trained
automatically from scratch, no further fine-tuning is
required after pruning, and can achieve high
performance and lightweight. However, the
complexity of neural networks makes achieving the
goal of automated pruning extremely challenging. To
achieve this end, it is necessary to systematically
address three core questions: identifying the parts of
the network that should be pruned, determining how
to prune without damaging the network, and finding
a way to automate these two processes. Successfully
answering these questions could be a significant
stepping stone towards automated pruning.
4 CONCLUSION
This review comprehensively explores neural
network pruning techniques used to compress large
Deep Neural Network models in areas such as image
classification and medical treatment. It discusses two
main types of pruning: unstructured and structured
pruning. Unstructured pruning, known for its
flexibility, can significantly compress models but
may reduce performance and interpretability. In
contrast, structured pruning maintains performance
and stability, making it suitable for embedded devices
and edge equipment. Each type's characteristics make
them suitable for different situations, and all factors
should be considered when choosing a pruning
strategy. The research on pruning impacts the field of
DNN by saving resource consumption and storage
space costs, improving the model deployment range,
enhancing inference speed, and enabling real-time
services. However, the application of pruning faces
some challenges, including decreased interpretability
and domain difference problems. Due to the varying
distribution and characteristics of model data across
different fields, pruning methods trained based on the
weight of data in different domains lack
transferability. Future research on pruning could
focus on enhancing model interpretability, generality,
and automation in pruning processes. Combining
pruning technology with model compression and
optimization techniques, such as knowledge
distillation and quantization, is expected to further
improve model efficiency and performance.
REFERENCES
T. Afouras, et al., IEEE Trans. Pattern Anal. Mach. Intell.
44(12), 8717-8727 (2022)
M.E. Basiri, et al., Future Gener. Comput. Syst. 115, 279-
294 (2021)
S.S. Qutub, et al., BEA: Revisiting anchor-based object
detection DNN using Budding Ensemble Architecture,
CoRR abs/2309.08036
https://arxiv.org/pdf/2309.08036v4.pdf
K. Jinhan, et al., Reducing DNN Labelling Cost using
Surprise Adequacy: An Industrial Case Study for
Autonomous Driving, in Proceedings of ACM
SIGSOFT Conf. Found. Softw. Eng. abs/2006.00894,
1466-1476 (2020)
M. Denil, et al., Adv. Neural Inf. Process. Syst., 2148–2156
(2013)
Y. Le Cun, et al., Conf. Neural Inf. Process. Syst., 598-605
(1989)
S. Han, et al., Deep Compression: Compressing Deep
Neural Network with Pruning, Trained Quantization
and Huffman Coding, in Proceedings of Int. Conf.
Learn. Represent. abs/1510.00149 (2015)
F.N. Iandola, et al., SqueezeNet: AlexNet-level accuracy
with 50x fewer parameters and <0.5MB model size,
arXiv: Comput. Vis. Pattern Recognit. (2017)
A.G. Howard, et al., Mobilenets: Efficient convolutional
neural networks for mobile vision applications, arXiv
preprint arXiv:1704.04861,
https://arxiv.org/pdf/1704.04861.pdf
S. Barra, et al., IEEE/CAA J. Autom. Sinica 7(3), 683-692
(2020)
S. Gu, et al., Convolutional Sparse Coding for Image Super-
Resolution, in Proceedings of IEEE Int. Conf. Comput.
Vis., 1823-1831 (2015)
I. Fedorov, et al., Adv. Neural Inf. Process. Syst. 32, 4978-
4990 (2019)
Y. Wu, et al., IEEE Trans. Neural Netw. Learn. Syst. 33(9),
5057-5069 (2022)
S. Anwar, et al., ACM J. Emerg. Technol. Comput. Syst.
13(3):32:1–32:18 (2017)
G. Fang, et al., Depgraph: Towards any structural pruning,
CoRR abs/2301.12900,
https://arxiv.org/pdf/2301.12900.pdf
M. Pietroń, D. Żurek, J. Comput. Sci. 67 (2022)
K. Xu, et al., Neurocomputing 451, 81-94 (2021)