![](bg6.png)
ordinates for the same model size. When adding the
additional feature F, the accuracies of all 13 classes
are treated as a vector C
C
Cl
l
la
a
as
s
ss
s
sA
A
Ac
c
cc
c
c
F
with dimensions
1×13. The cosine similarity S is calculated by finding
the average cosine similarity between the C
C
Cl
l
la
a
as
s
ss
s
sA
A
Ac
c
cc
c
c
F
vectors obtained for each feature addition. This co-
sine similarity serves as a measure of the impact of
each feature on the class accuracies. Smaller values
indicate that a feature has a different impact on class
accuracies compared to other features. Feature com-
putation cost is evaluated using a relative scale based
on the time it takes to calculate the feature. The com-
putation time strongly depends on the CPU perfor-
mance, so a relative scale is defined using the compu-
tational cost of calculating the Euclidean distance as 1
and the cost of calculating the non-needed intensity as
0. It can be observed that there is a change in similar-
ity depending on the size of the model. Furthermore,
with an increase in model size, there is an increase
in GPU memory usage and trainable parameter count.
However, the increase in these values due to the ad-
dition of features is marginal. While adding all fea-
tures yields positive impacts on inaccuracy, the com-
putational cost of feature computation becomes eco-
nomically impractical. On the other hand, while fea-
ture addition doesn’t significantly affect GPU mem-
ory usage during inference, it does involve compu-
tation costs. Therefore, selecting appropriate features
based on the available CPU and GPU resources is cru-
cial when embedding the model into edge comput-
ing devices. Consequently, adding feature-descriptive
features is advantageous for smaller models, while for
larger models, it’s more beneficial to leave the fea-
ture description to the model itself to minimize fea-
ture computation costs.
6 CONCLUSIONS
In this paper, we propose a method to incorporate
additional features into the input of a semantic clas-
sification model. While evaluating the effective-
ness of these additional features, we aim to achieve
both model lightweighting and accuracy preservation.
Moving forward, we will focus on observing changes
in learning efficiency through feature integration, and
strive to adapt the model to be more compact while
maintaining high accuracy. Additionally, we will ex-
plore comprehensive evaluation methods that encom-
pass various performance metrics.
REFERENCES
Anthony, B., Thierry, C., Stefan, D., Christophe, G., and
Christophe, B. (2021). Deep Model Compression and
Architecture Optimization for Embedded Systems: A
Survey. Journal of Signal Processing Systems, vol.93,
pp.863-878.
Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., and
Yuille, A. L. (2017). Deeplab: Semantic image seg-
mentation with deep convolutional nets, atrous con-
volution, and fully connected crfs. IEEE transactions
on pattern analysis and machine intelligence, vol.40,
no.4, pp.834-848.
Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., and
Adam, H. (2018). Encoder-decoder with atrous sep-
arable convolution for semantic image segmentation.
Proceedings of the European conference on computer
vision (ECCV), pp.801-818.
Felzenszwalb, P. F. and Huttenlocher, D. P. (2004). Effi-
cient graph-based image segmentation. International
journal of computer vision, vol.59, pp.167-181.
Han, S., Mao, H., and Dally, W. J. (2015). Deep compres-
sion: Compressing deep neural networks with prun-
ing, trained quantization and huffman coding. arXiv
preprint arXiv:1510.00149.
Hinton, G., Vinyals, O., and Dean, J. (2015). Distilling
the knowledge in a neural network. arXiv preprint
arXiv:1503.02531.
Iandola, F. N., Han, S., Moskewicz, M. W., Ashraf, K.,
Dally, W. J., and Keutzer, K. (2016). SqueezeNet:
AlexNet-level accuracy with 50x fewer parame-
ters and¡ 0.5 MB model size. arXiv preprint
arXiv:1602.07360.
Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Im-
agenet classification with deep convolutional neural
networks. Advances in neural information processing
systems, vol.25.
Long, J., Shelhamer, E., and Darrell, T. (2015). Fully con-
volutional networks for semantic segmentation. Pro-
ceedings of the IEEE conference on computer vision
and pattern recognition, pp.3431-3440.
Maturana, D. and Scherer, S. (2015). Voxnet: A 3d con-
volutional neural network for real-time object recog-
nition. 2015 IEEE/RSJ international conference on
intelligent robots and systems (IROS), pp.922-928.
Qi, C. R., Su, H., Mo, K., and Guibas, L. J. (2017). Pointnet:
Deep learning on point sets for 3d classification and
segmentation. Proceedings of the IEEE conference on
computer vision and pattern recognition, pp.652-660.
Ronneberger, O., Fischer, P., and Brox, T. (2015). U-net:
Convolutional networks for biomedical image seg-
mentation. Medical Image Computing and Computer-
Assisted Intervention–MICCAI 2015: 18th Interna-
tional Conference, Munich, Germany, October 5-9,
2015, Proceedings, Part III 18, pp.234-241.
Rusu, R. B., Blodow, N., and Beetz, M. (2009). Fast
point feature histograms (FPFH) for 3D registration.
2009 IEEE international conference on robotics and
automation, pp.3122-3217.
VISAPP 2024 - 19th International Conference on Computer Vision Theory and Applications
552