A Pattern-Based Approach to Name and Address Parsing with Active Learning

Onais Khan Mohammed, Khizer Syed, John Talburt, Adeeba Tarannum, Abdul Kareem Khan Kashif, Salman Khan, Najmudin Syed, Syed Yaser Mehdi

2025

Abstract

Processing population data often requires parsing demographic items into a standard set of fields to achieve metadata alignment. This paper describes a novel approach based on token pattern mappings augmented by active learning. Input strings are tokenized and a token mask is created by replacing each token with a single-character code indicating the token’s potential function in the input string. A user-created mapping then directs each token represented in the mask to its correct functional category. Testing has shown the system to be as accurate as, and in some cases, more accurate than comparable parsing systems. The primary advantage of this approach over other systems is that it allows a user to easily add a new mapping when an input does not conform to any previously encoded mappings instead of having to reprogram system parsing rules or retrain a supervised parsing machine learning model.

Download


Paper Citation


in Harvard Style

Mohammed O., Syed K., Talburt J., Tarannum A., Kashif A., Khan S., Syed N. and Mehdi S. (2025). A Pattern-Based Approach to Name and Address Parsing with Active Learning. In Proceedings of the 17th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART; ISBN 978-989-758-737-5, SciTePress, pages 70-77. DOI: 10.5220/0013077500003890


in Bibtex Style

@conference{icaart25,
author={Onais Mohammed and Khizer Syed and John Talburt and Adeeba Tarannum and Abdul Kashif and Salman Khan and Najmudin Syed and Syed Mehdi},
title={A Pattern-Based Approach to Name and Address Parsing with Active Learning},
booktitle={Proceedings of the 17th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART},
year={2025},
pages={70-77},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013077500003890},
isbn={978-989-758-737-5},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 17th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART
TI - A Pattern-Based Approach to Name and Address Parsing with Active Learning
SN - 978-989-758-737-5
AU - Mohammed O.
AU - Syed K.
AU - Talburt J.
AU - Tarannum A.
AU - Kashif A.
AU - Khan S.
AU - Syed N.
AU - Mehdi S.
PY - 2025
SP - 70
EP - 77
DO - 10.5220/0013077500003890
PB - SciTePress