Person Detection from UAV Based on a Dual Transformer Approach
Andrei-Stelian Stan, Dan Popescu, Loretta Ichim
2025
Abstract
The study introduces a novel object detection system that combines the strengths of two advanced deep learning models, the Detection Transformer (DETR) and the Vision Transformer (ViT), to enhance detection accuracy and robustness in unmanned aerial vehicle (UAV) applications. Both models were independently fine-tuned on the VisDrone dataset and then deployed in parallel, each processing the same input to leverage their advantages. DETR provides precise localization capabilities, particularly effective in crowded urban settings. At the same time, ViT excels at identifying objects at various scales and under partial occlusions, which is crucial for distant object detection. The fusion of their outputs is managed through a dynamic fusion algorithm, which adjusts the confidence scores based on contextual analysis and the characteristics of detected objects, resulting in a combined detection system that outperforms the individual models. The fused model significantly improved overall accuracy, achieving up to 90%, with a mean Average Precision (mAP50) of 85%, and a recall of 80%. These results underline the potential of integrating multiple transformer-based models to handle the complexities of UAV-based detection tasks, offering a robust solution that adapts to diverse operational scenarios and environmental conditions.
DownloadPaper Citation
in Harvard Style
Stan A., Popescu D. and Ichim L. (2025). Person Detection from UAV Based on a Dual Transformer Approach. In Proceedings of the 11th International Conference on Geographical Information Systems Theory, Applications and Management - Volume 1: GISTAM; ISBN 978-989-758-741-2, SciTePress, pages 95-102. DOI: 10.5220/0013467900003935
in Bibtex Style
@conference{gistam25,
author={Andrei-Stelian Stan and Dan Popescu and Loretta Ichim},
title={Person Detection from UAV Based on a Dual Transformer Approach},
booktitle={Proceedings of the 11th International Conference on Geographical Information Systems Theory, Applications and Management - Volume 1: GISTAM},
year={2025},
pages={95-102},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013467900003935},
isbn={978-989-758-741-2},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 11th International Conference on Geographical Information Systems Theory, Applications and Management - Volume 1: GISTAM
TI - Person Detection from UAV Based on a Dual Transformer Approach
SN - 978-989-758-741-2
AU - Stan A.
AU - Popescu D.
AU - Ichim L.
PY - 2025
SP - 95
EP - 102
DO - 10.5220/0013467900003935
PB - SciTePress