REFERENCES
Azuma, R. (1993). Tracking requirements for augmented
reality. Communications of the ACM, 36(7):50–51.
Boi
´
nski, T., Zawora, K., and Szyma
´
nski, J. (2022). How
to Sort Them? A Network for LEGO Bricks Classi-
fication. International Conference on Computational
Science, 22:627–640.
Caudell, T. P. and Mizell, D. W. (1992). Augmented Real-
ity: An Application of Heads-U Display Technology
to Manual Manufacturing Processes. volume 2, pages
659–669.
Cubuk, E. D., Zoph, B., Mane, D., Vasudevan, V., and Le,
Q. V. (2019). Autoaugment: Learning Augmentation
Strategies from Data. IEEE Conference on Computer
Vision and Pattern Recognition, 32:113–123.
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei,
L. (2009). ImageNet: A Large-Scale Hierarchical Im-
age Database. IEEE Conference on Computer Vision
and Pattern Recognition, 22:248–255.
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn,
D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer,
M., Heigold, G., Gelly, S., Uszkoreit, J., and Houlsby,
N. (2021). An Image is Worth 16×16 Words: Trans-
formers for Image Recognition at Scale. International
Conference on Learning Representations, 2021, 9:1–
21.
Evans, G., Miller, J., Pena, M. I., MacAllister, A., and
Winer, E. (2017). Evaluating the Microsoft HoloLens
through an augmented reality assembly application.
SPIE Defense + Security, 10197(101970V):282–297.
Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep
Learning. MIT Press.
Hastie, T., Tibshirani, R., and Friedman, J. (2009). The Ele-
ments of Statistical Learning: Data Mining, Inference,
and Prediction, volume 2. Springer.
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep
Residual Learning for Image Recognition. IEEE Con-
ference on Computer Vision and Pattern Recognition,
29:770–778.
Kress, B. C. and Cummings, W. J. (2017). Towards the Ul-
timate Mixed Reality Experience: HoloLens Display
Architecture Choices. SID symposium digest of tech-
nical papers, 48(1):127–131.
Krizhevsky, A. (2009). Learning Multiple Layers of Fea-
tures from Tiny Images.
Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Im-
agenet classification with deep convolutional neural
networks. Communications of the ACM, 60:84–90.
Loch, F., Quint, F., and Brishtel, I. (2016). Comparing
Video and Augmented Reality Assistance in Manual
Assembly. International Conference on Intelligent
Environments, 12:147–150.
Loshchilov, I. and Hutter, F. (2019). Decoupled Weight
Decay Regularization. International Conference on
Learning Representations, 7:1–8.
Minaee, S., Boykov, Y., Porikli, F., Plaza, A., Kehtarnavaz,
N., and Terzopoulos, D. (2021). Image Segmenta-
tion using Deep Learning: a Survey. IEEE Transac-
tions on Pattern Analysis and Machine Intelligence,
44(7):3523–3542.
Niu, S., Liu, Y., Wang, J., and Song, H. (2020). A Decade
Survey of Transfer Learning (2010–2020). IEEE
Transactions on Artificial Intelligence, 1(2):151–166.
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S.,
Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bern-
stein, M., et al. (2015). Imagenet Large Scale Vi-
sual Recognition Challenge. International Journal of
Computer Vision, 115:211–252.
Schinko, C., Ullrich, T., and Fellner, D. W. (2011). Simple
and efficient normal encoding with error bounds. The-
ory and Practice of Computer Graphics Conference,
29:63–65.
Schoosleitner, M. and Ullrich, T. (2021). Scene Under-
standing and 3D Imagination: A Comparison between
Machine Learning and Human Cognition. 16:231–
238.
Srinivas, A., Lin, T.-Y., Parmar, N., Shlens, J., Abbeel, P.,
and Vaswani, A. (2021). Bottleneck Transformers for
Visual Recognition. IEEE Conference on Computer
Vision and Pattern Recognition, 34:16519–16529.
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.,
Anguelov, D., Erhan, D., Vanhoucke, V., and Rabi-
novich, A. (2015). Going Deeper with Convolutions.
IEEE Conference on Computer Vision and Pattern
Recognition, 28:1–9.
Tan, M. and Le, Q. (2019). EfficientNet: Rethinking Model
Scaling for Convolutional Neural Networks. Inter-
national Conference on Machine Learning, 36:6105–
6114.
Tan, M. and Le, Q. (2021). EfficientNetV2: Smaller Mod-
els and Faster Training. International Conference on
Machine Learning, 38:10096–10106.
Tang, A., Owen, C., Biocca, F., and Mou, W. (2003). Com-
parative Effectiveness of Augmented Reality in Object
Assembly. Proceedings of the SIGCHI Conference on
Human Factors in Computing Systems, 5:73–80.
Vidal, J., Vallicrosa, G., Marti, R., and Barnada, M. (2023).
Brickognize: Applying photo-realistic image synthe-
sis for lego bricks recognition with limited data. Sen-
sors, 23(4):1898ff.
Wiedenmaier, S., Oehme, O., Schmidt, L., and Luczak,
H. (2003). Augmented Reality (AR) for Assembly
Processes Design and Experimental Evaluation. In-
ternational Journal of Human-Computer Interaction,
16(3):497–514.
Deep Learning-Powered Assembly Step Classification for Intricate Machines
507