Table 1: Parameters that generated the best F1-Score for each evaluated classifier.
kNN
proposed
kNN
traditional
kNN
dist
XGBoost
Vector Dimensionality 700 800 1200 800
K 37 2 4 -
Imbalace Factor 0.29 - - -
Table 2: Results obtained for each of the techniques evaluated.
kNN
proposed
kNN
traditional
kNN
dist
XGBoost
F1-Score 0.55 ? 0.44 0.44 0.44
Accuracy 0.93 0.95 0.93 0.93
Precision 0.57 0.75 0.66 0.75
Recall 0.52 0.31 0.33 0.31
However, in general, the performance in the minority
class is the most important to consider, especially for
the application scenario of this work.
The approach proposed in this work considered
two important aspects when predicting the approval
of LPs: (i) the database imbalance factor and (ii) the
time in which proposals were submitted. These two
aspects were mitigated by modifying the traditional
kNN algorithm: we increased the distance between
disapproved documents and those that were further
away in time. Experimental results showed that the
two modifications proposed in the kNN algorithm
allowed to increase the classifier’s performance. The
proposed technique obtained a high recall rate among
the evaluated techniques, showing its potential in
predicting propositions with high chances of approval
in legislative houses, contributing with a valuable
tool to be used in the legislative process.
REFERENCES
Barot, P. and Jethva, H. (2021). Imbtree: Minority class
sensitive weighted decision tree for classification of
unbalanced data. International Journal of Intelligent
Systems and Applications in Engineering, 9(4):152–
158.
Brzezinski, D., Stefanowski, J., Susmaga, R., and Szczech,
I. (2020). On the dynamics of classification
measures for imbalanced and streaming data. IEEE
Transactions on Neural Networks and Learning
Systems, 31(8):2868–2878.
Cheng, Y., Agrawal, A., Liu, H., and Choudhary, A.
(2017). Legislative prediction with dual uncertainty
minimization from heterogeneous information.
Statistical Analysis and Data Mining: The ASA Data
Science Journal, 10(2):107–120.
Imandoust, S. B., Bolandraftar, M., et al. (2013).
Application of k-nearest neighbor (knn) approach for
predicting economic events: Theoretical background.
International journal of engineering research and
applications, 3(5):605–610.
Jiang, S., Pang, G., Wu, M., and Kuang, L. (2012).
An improved k-nearest-neighbor algorithm for text
categorization. Expert Systems with Applications,
39(1):1503–1509.
Karl, A., Wisnowski, J., and Rushing, W. H. (2015). A
practical guide to text mining with topic extraction.
Wiley Interdisciplinary Reviews: Computational
Statistics, 7(5):326–340.
Le, Q. and Mikolov, T. (2014). Distributed representations
of sentences and documents. In Xing, E. P. and Jebara,
T., editors, Proceedings of the 31st International
Conference on Machine Learning, volume 32 of
Proceedings of Machine Learning Research, pages
1188–1196, Bejing, China. PMLR.
Li, X. and Zhang, L. (2021). Unbalanced data processing
using deep sparse learning technique. Future
Generation Computer Systems, 125:480–484.
Nay, J. J. (2017). Predicting and understanding law-making
with word vectors and an ensemble model. PLOS
ONE, 12(5):1–14.
Rahutomo, F., Kitasuka, T., and Aritsugi, M. (2012).
Semantic cosine similarity. In The 7th international
student conference on advanced science and
technology ICAST, volume 4, page 1.
Sebastiani, F. (2002). Machine learning in automated text
categorization. ACM Comput. Surv., 34(1):1–47.
Tan, S. (2006). An effective refinement strategy for knn
text classifier. Expert Systems with Applications,
30(2):290–298.
Trstenjak, B., Mikac, S., and Donko, D. (2014). Knn
with tf-idf based framework for text categorization.
Procedia Engineering, 69:1356–1364.
Wang, L., Han, M., Li, X., Zhang, N., and Cheng,
H. (2021). Review of classification methods on
unbalanced data sets. IEEE Access, 9:64606–64628.
Wu, X., Kumar, V., Ross Quinlan, J., Ghosh, J., Yang,
Q., Motoda, H., McLachlan, G. J., Ng, A., Liu, B.,
Yu, P. S., Zhou, Z.-H., Steinbach, M., Hand, D. J.,
and Steinberg, D. (2007). Top 10 algorithms in data
mining. Knowl. Inf. Syst., 14(1):1–37.
Yang, Q. and Wu, X. (2006). 10 challenging problems in
data mining research. Int. J. Inf. Technol. Decis. Mak.,
5:597–604.
ICEIS 2023 - 25th International Conference on Enterprise Information Systems
442