A Pattern-Based Approach to Name and Address Parsing with Active Learning

Onais Khan Mohammed, Khizer Syed, John Talburt, Adeeba Tarannum, Abdul Kareem Khan Kashif, Salman Khan, Najmudin Syed, Syed Yaser Mehdi



Processing population data often requires parsing demographic items into a standard set of fields to achieve metadata alignment. This paper describes a novel approach based on token pattern mappings augmented by active learning. Input strings are tokenized and a token mask is created by replacing each token with a single-character code indicating the token’s potential function in the input string. A user-created mapping then directs each token represented in the mask to its correct functional category. Testing has shown the system to be as accurate as, and in some cases, more accurate than comparable parsing systems. The primary advantage of this approach over other systems is that it allows a user to easily add a new mapping when an input does not conform to any previously encoded mappings instead of having to reprogram system parsing rules or retrain a supervised parsing machine learning model.


Paper Citation

in Harvard Style

Mohammed O., Syed K., Talburt J., Tarannum A., Kashif A., Khan S., Syed N. and Mehdi S. (2025). A Pattern-Based Approach to Name and Address Parsing with Active Learning. In Proceedings of the 17th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART; ISBN 978-989-758-737-5, SciTePress, pages 70-77. DOI: 10.5220/0013077500003890

