Deep Learning, Feature Selection and Model Bias with Home Mortgage Loan Classification

Hope Hodges, Carolyn Garrity, James Pope

2024

Abstract

Analysis of home mortgage applications is critical for financial decision-making for commercial and government lending organisations. The Home Mortgage Disclosure Act (HMDA) requires financial organisations to provide data on loan applications. Accordingly, the Consumer Financial Protection Bureau (CFPB) provides loan application data by year. This loan application data can be used to design regression and classification models. However, the amount of data is too large to train for modest computational resources. To address this, we used reservoir sampling to take suitable subsets for processing. A second issue is that the number of features are limited to the original 78 features in the HMDA records. There are a large number of other data source and associated features that may improve model accuracy. We augment the HMDA data with ten economic indicator features from an external data source. We found that the additional economic features do not improve the model’s accuracy. We designed and compared several classical and recent classification approaches to predict the loan approval decision. We show that the Decision Tree, XG Boost, Random Forest, and Support Vector Machine classifiers achieve between 82-85% accuracy while Naive Bayes results in the lowest accuracy of 79%. We found that a Deep Neural Network classifier had the best classification perfor-mance with almost 89% f1 accuracy on the HMDA data. We performed feature selection to determine what features are the most important loan classification. We found that the more obvious loan amount and applicant income were important. Interestingly we found that when we left race and gender in the feature set, unfortunately, they were selected as an important feature by the machine learning methods. This highlights the need for diligence in financial systems to make sure the machine is not biased.

Download


Paper Citation


in Harvard Style

Hodges H., Garrity C. and Pope J. (2024). Deep Learning, Feature Selection and Model Bias with Home Mortgage Loan Classification. In Proceedings of the 13th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM; ISBN 978-989-758-684-2, SciTePress, pages 248-255. DOI: 10.5220/0012326800003654


in Bibtex Style

@conference{icpram24,
author={Hope Hodges and Carolyn Garrity and James Pope},
title={Deep Learning, Feature Selection and Model Bias with Home Mortgage Loan Classification},
booktitle={Proceedings of the 13th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM},
year={2024},
pages={248-255},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012326800003654},
isbn={978-989-758-684-2},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 13th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM
TI - Deep Learning, Feature Selection and Model Bias with Home Mortgage Loan Classification
SN - 978-989-758-684-2
AU - Hodges H.
AU - Garrity C.
AU - Pope J.
PY - 2024
SP - 248
EP - 255
DO - 10.5220/0012326800003654
PB - SciTePress