Like all studies, our research has its limitations.
The accuracy and completeness of the input data
determine how well machine learning algorithms
work and provide results. Despite our best efforts to
ensure thorough feature selection and data
pretreatment, biases present in the original datasets
may nevertheless affect the outcomes. Additionally,
the choice of hyperparameters and model
configurations can affect the algorithms'
performance, which we aimed to optimize but might
not be the best for all scenarios.
Future research could go deeper into
comprehending the precise causes for the distinct
gene selections of various models, given the insights
from our current analysis. To further improve the
precision and applicability of disease classification, it
may be worthwhile to investigate integrating more
complex or specialized algorithms or even ensemble
approaches that combine the best features of several
algorithms corporating feedback loops, which allow
for continuous learning from fresh data to improve
and refine the disease's grouping significance. This
should be a consideration in GediNET's progress.
In conclusion, our efforts to enhance GediNET
have opened new horizons for understanding disease
groupings. At the same time, we've made significant
advances in the process of exploration and refinement
in this domain. The combination of biology and
machine learning may lead to more accurate, tailored,
and successful disease knowledge and treatment in
the future.
ACKNOWLEDGEMENTS
The work of M.Y. has been supported by the Zefat
Academic College.
REFERENCES
Boulesteix, A.-L., Janitza, S., Kruppa, J., & König, I. R.
(2012). Overview of random forest methodology and
practical guidance with emphasis on computational
biology and bioinformatics: Random forests in
bioinformatics. Wiley Interdisciplinary Reviews: Data
Mining and Knowledge Discovery, 2(6), 493–507.
https://doi.org/10.1002/widm.1072
Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J.
(2017). Classification And Regression Trees (1st ed.).
Routledge. https://doi.org/10.1201/9781315139470
Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree
Boosting System. Proceedings of the 22nd ACM
SIGKDD International Conference on Knowledge
Discovery and Data Mining, 785–794. https://doi.org/
10.1145/2939672.2939785
Clough, E., & Barrett, T. (2016). The Gene Expression
Omnibus Database. Methods in Molecular Biology
(Clifton, N.J.), 1418, 93–110. https://doi.org/10.1007/
978-1-4939-3578-9_5
Cortes, C., & Vapnik, V. (1995). Support-vector networks.
Machine Learning, 20(3), 273–297. https://doi.org/
10.1007/BF00994018
Ersoz, N. S., Bakir-Gungor, B., & Yousef, M. (2023).
GeNetOntology: Identifying Affected Gene Ontology
Groups via Grouping, Scoring and Modelling from
Gene Expression Data utilizing Biological Knowledge
Based Machine Learning. Frontiers in Genetics.
Gligorijević, V., & Pržulj, N. (2015). Methods for
biological data integration: Perspectives and
challenges. Journal of The Royal Society Interface,
12(112), 20150571. https://doi.org/10.1098/rsif.2015.0
571
Ho, T. K. (1995). Random decision forests. Proceedings of
3rd International Conference on Document Analysis
and Recognition, 1, 278–282 vol.1. https://doi.org/
10.1109/ICDAR.1995.598994
Jabeer, A., Temiz, M., Bakir-Gungor, B., & Yousef, M.
(2023). miRdisNET: Discovering microRNA
biomarkers that are associated with diseases utilizing
biological knowledge-based machine learning.
Frontiers in Genetics, 13, 1076554. https://doi.org/
10.3389/fgene.2022.1076554
Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W.,
Ye, Q., & Liu, T.-Y. (2017). LightGBM: A Highly
Efficient Gradient Boosting Decision Tree. Advances in
Neural Information Processing Systems, 30.
https://proceedings.neurips.cc/paper/2017/hash/6449f4
4a102fde848669bdd9eb6b76fa-Abstract.html
Kolde, R., Laur, S., Adler, P., & Vilo, J. (2012). Robust
rank aggregation for gene list integration and meta-
analysis. Bioinformatics, 28(4), 573–580.
https://doi.org/10.1093/bioinformatics/btr709
Kuzudisli, C., Bakir-Gungor, B., Bulut, N., Qaqish, B., &
Yousef, M. (2023). Review of Feature selection
approaches based on Grouping of features. PeerJ.
Libbrecht, M. W., & Noble, W. S. (2015). Machine learning
applications in genetics and genomics. Nature Reviews
Genetics, 16(6), 321–332. https://doi.org/10.1038/
nrg3920
Mothilal, R. K., Sharma, A., & Tan, C. (2020). Explaining
machine learning classifiers through diverse
counterfactual explanations. Proceedings of the 2020
Conference on Fairness, Accountability, and
Transparency, 607–617. https://doi.org/10.1145/33510
95.3372850
Natekin, A., & Knoll, A. (2013). Gradient boosting
machines, a tutorial. Frontiers in Neurorobotics, 7.
https://doi.org/10.3389/fnbot.2013.00021
Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A.
V., & Gulin, A. (2018). CatBoost: Unbiased boosting
with categorical features. Advances in Neural
Information Processing Systems, 31. https://proceed