Finally, we will focus on extending our investi-
gation to other systems within the Apache Software
Foundation due to their strict inclusion criteria, ma-
ture nature and comprehensive feature and software
defect information (Lenarduzzi et al., 2019).
REFERENCES
Begoli, E., Camacho-Rodr
´
ıguez, J., Hyde, J., and et al.
(2018). Apache Calcite: A Foundational Framework
for Optimized Query Processing Over Heterogeneous
Data Sources. In SIGMOD ’18, page 221–230. ACM.
Briciu, A., Czibula, G., and Lupea, M. (2023). A study
on the relevance of semantic features extracted us-
ing BERT-based language models for enhancing the
performance of software defect classifiers. Procedia
Computer Science, 225:1601–1610.
Chelaru, I.-G. (2024). PreSTyDe FigShare dataset. https:
//doi.org/10.6084/m9.figshare.25237600.
Chen, L., Fang, B., and Shang, Z. (2016). Software fault
prediction based on one-class SVM. In ICMLC 2016,
volume 2, pages 1003–1008.
Ciubotariu, G., Czibula, G., Czibula, I. G., and Chelaru, I.-
G. (2023). Uncovering behavioural patterns of one-
and binary-class SVM-based software defect predic-
tors. In ICSOFT 2023, pages 249–257. SciTePress.
Czibula, G., Chelaru, I.-G., Czibula, I. G., and Molnar, A.-
J. (2023). An UL-based methodology for uncovering
behavioural patterns for specific types of software de-
fects. Procedia Computer Science, 225:2644–2653.
Czibula, G. and Czibula, I. G. (2012). Unsupervised restruc-
turing of OO software systems using self-organizing
feature maps. IJICIC journal, 8(3(A)):1689–1704.
Dam, H. K., Pham, T., Ng, S. W., Tran, T., Grundy, J.,
Ghose, A., Kim, T., and Kim, C.-J. (2019). Lessons
Learned from Using a Deep Tree-Based Model for
SDPin Practice. In MSR 2019, pages 46–57.
Deerwester, S. C., Dumais, S. T., Landauer, T. K., Furnas,
G. W., and Harshman, R. A. (1990). Indexing by latent
semantic analysis. JASIST journal, 41:391–407.
Fawcett, T. (2006). An introduction to ROC analysis. Pat-
tern Recognition Letters, 27(8):861–874.
Herbold, S., Trautsch, A., Trautsch, F., and Ledel, B.
(2022). Problems with szz and features: An empirical
study of the state of practice of defect prediction data
collection. Empirical Software Engineering, 27(2).
Kamei, Y. and Shihab, E. (2016). Defect Prediction: Ac-
complishments and Future Challenges. In SANER
2016, volume 5, pages 33–45.
Le, Q. V. and Mikolov, T. (2014). Distributed represen-
tations of sentences and documents. Computing Re-
search Repository (CoRR), abs/1405.4:1–9.
Lenarduzzi, V., Saarim
¨
aki, N., and Taibi, D. (2019).
The Technical Debt Dataset. In Proceedings of
PROMISE’19, page 2–11. ACM.
Li, J., He, P., Zhu, J., and Lyu, M. R. (2017). Software
Defect Prediction via Convolutional Neural Network.
In QRS 2017, pages 318–328.
L
¨
otsch, J. and Ultsch, A. (2014). Exploiting the structures
of the u-matrix. In Advances in Self-Organizing Maps
and Learning Vector Quantization, pages 249–257.
Springer Publishing.
Menzies, T., Krishna, R., and Pryor, D. (2017). The
SEACRAFT Repository of Empirical Software Engi-
neering Data.
Miholca, D.-L., Tomescu, V.-I., and Czibula, G. (2022). An
in-depth analysis of the software features’ impact on
the performance of deep learning-based software de-
fect predictors. IEEE Access, 10:64801–64818.
Molnar, A.-J. and Motogna, S. (2020). Long-Term Evalu-
ation of Technical Debt in Open-Source Software. In
ESEM 2020, New York, NY, USA. ACM.
Moussa, R., Azar, D., and Sarro, F. (2022). Investigating the
use of one-class support vector machine for software
defect prediction. CoRR, abs/2202.12074.
Pachouly, J., Ahirrao, S., Kotecha, K., Selvachandran, G.,
and Abraham, A. (2022). A systematic literature re-
view on SDP using AI: Datasets, Data Validation, Ap-
proaches, and Tools. volume 111, page 104773.
ˇ
Reh
˚
u
ˇ
rek, R. and Sojka, P. (2010). Software framework for
topic modelling with large corpora. In LREC 2010,
pages 45–50. ELRA.
Runeson, P. and H
¨
ost, M. (2009). Guidelines for conduct-
ing and reporting case study research in software en-
gineering. Empirical Softw. Engg., 14(2):131–164.
Sahoo, S. K., Criswell, J., and Adve, V. (2010). An Em-
pirical Study of Reported Bugs in Server Software
with Implications for Automated Bug Diagnosis. In
ICSE’10, page 485–494, USA. ACM.
SARD (2023). Software Assurance Reference Dataset.
https://samate.nist.gov/SARD/.
Shepperd, M., Qinbao, S., Zhongbin, S., and Mair, C.
(2018). NASA MDP Software Defects Data Sets.
Shi, C., Wei, B., Wei, S., Wang, W., Liu, H., and Liu, J.
(2021). A quantitative discriminant method of elbow
point for the optimal number of clusters in clustering
algorithm. J. Wirel. Commun. Netw., 2021(1):31.
Wagner, S. (2008). Defect Classification and Defect Types
Revisited. In Proc. of the 2008 Workshop on Defects
in Large Software Systems, page 39–40, New York,
NY, USA. Association for Computing Machinery.
Wang, S., Liu, T., and Tan, L. (2016). Automatically learn-
ing semantic features for defect prediction. In Pro-
ceedings of the 38th ICSE, pages 297–308. ACM.
Xu, J., Ai, J., and Shi, T. (2021). Software Defect Prediction
for Specific Defect Types based on Augmented Code
Graph Representation. In DSA 2021, pages 669–678.
Zhang, S., Jiang, S., and Yan, Y. (2022). A Software Defect
Prediction Approach Based on BiGAN Anomaly De-
tection. Scientific Programming, 2022:ID 5024399.
Zhou, C., He, P., Zeng, C., and Ma, J. (2022). Software
defect prediction with semantic and structural infor-
mation of codes based on graph neural networks. In-
formation and Software Technology, 152:107057.
PreSTyDe: Improving the Performance of within-project Defects Prediction by Learning to Classify Types of Software Faults
225