A Refined Multilingual Scene Text Detector Based on YOLOv7

Houssem Turki; Houssem Turki; Mohamed Elleuch; Mohamed Elleuch; Monji Kherallah

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

A Refined Multilingual Scene Text Detector Based on YOLOv7

Topics: Deep Learning; Hybrid Intelligent Systems; Industrial Applications of AI; Neural Networks; Vision and Perception

In Proceedings of the 17th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART, 512-519, 2025 , Porto, Portugal

Authors: Houssem Turki ^{1

;

2} ; Mohamed Elleuch ^{3

;

2} and Monji Kherallah ²

Affiliations: ¹ National Engineering School of Sfax (ENIS), University of Sfax, Tunisia ; ² Advanced Technologies for Environment and Smart Cities (ATES Unit), University of Sfax, Tunisia ; ³ National School of Computer Science (ENSI), University of Manouba, Tunisia

Keyword(s): Multilingual Scene Text Detection, YOLOv7, Specific Data Augmentation, Deep Learning.

Abstract: In recent years, significant advancements in deep learning and the recognition of text in natural scene images have been achieved. Despite considerable progress, the efficacy of deep learning and the detection of multilingual text in natural scene images often face limitations due to the lack of comprehensive datasets that encompass a variety of scripts. Added to this is the absence of a robust detection system capable of overcoming the majority of existing challenges in natural scenes and taking into account in parallel the characteristics of each writing of different languages. YOLO (You Only Look Once) is a highly utilized deep learning neural network that has become extremely popular for its adaptability in addressing various machine learning tasks. YOLOv7 is an enhanced iteration of the YOLO series. It has also proven to be effective in solving complex image-related problems thanks to the evolution of its 'Backbone' responsible for capturing the features of images to overcome th e challenges encountered in a natural environment which leads us to adapt it to our text detection context. Our first contribution is to over-come environmental variations through the use of specific data augmentation based on improved basic techniques and a mixed transformation method applied to “RRC-MLT” and “SYPHAX” multilingual datasets which both contain Arabic scripts. The second contribution is the refinement of the 'Backbone' block of the YOLOv7 architecture to better extract the small details of the text which particularly stand out in Arabic scripts in punctuation marks. The article highlights future research directions aimed at developing a generic and efficient multilingual text detection system in the wild that also handles Arabic scripts, which is a new challenge that adds to the context, which justifies the choice of the two datasets. (More)

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 3.137.150.227

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Turki, H., Elleuch, M. and Kherallah, M. (2025). A Refined Multilingual Scene Text Detector Based on YOLOv7. In Proceedings of the 17th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART; ISBN 978-989-758-737-5; ISSN 2184-433X, SciTePress, pages 512-519. DOI: 10.5220/0013157100003890

@conference{icaart25,
author={Houssem Turki and Mohamed Elleuch and Monji Kherallah},
title={A Refined Multilingual Scene Text Detector Based on YOLOv7},
booktitle={Proceedings of the 17th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART},
year={2025},
pages={512-519},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013157100003890},
isbn={978-989-758-737-5},
issn={2184-433X},
}

TY - CONF

JO - Proceedings of the 17th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART
TI - A Refined Multilingual Scene Text Detector Based on YOLOv7
SN - 978-989-758-737-5
IS - 2184-433X
AU - Turki, H.
AU - Elleuch, M.
AU - Kherallah, M.
PY - 2025
SP - 512
EP - 519
DO - 10.5220/0013157100003890
PB - SciTePress