Authors:
Emma Qumsiyeh
1
;
Burcu Bakir-Gungo
2
and
Malik Yousef
2
Affiliations:
1
Faculty of Engineering and Information Technology, Palestine Ahliya University, Bethlehem, Palestine
;
2
Department of Computer Engineering, Faculty of Engineering, Abdullah Gul University, Kayseri, Turkey
Keyword(s):
Grouping-Scoring-Modeling (G-S-M) Approach, Machine Learning, Biological Integrative Approach, Feature selection, Pathway-Disease Associations, Comparative Toxicogenomics Database (CTD), Biomarkers.
Abstract:
Recently, machine learning and various feature selection techniques have become popular for understanding the relationship between genes, molecular pathways, and diseases. Integrating existing domain knowledge into biological data analysis has demonstrated considerable potential for finding new biomarkers with translational uses. This paper presents PathDisGene, an innovative machine-learning tool that integrates existing domain knowledge by utilizing a Grouping-Scoring-Modeling (G-S-M) approach to discover associations among gene-pathway-disease. The first step in PathDisGene is the grouping component that associates genes according to their biological associations with diseases and pathways. This component uses the Comparative Toxicogenomics Database (CTD). Subsequently, the scoring component is applied to score each group and the highest-ranked groupings are then used to train the classifier. We test PathDisGene on ten GEO datasets and demonstrate its performance, where most of th
em are with high accuracy, sensitivity, specificity, and AUC values across various diseases. The tool's capacity to recognize new pathway-disease associations and uncover connections between pathways and diseases along their associated genes underscores its potential as a significant asset in promoting precision medicine and systems biology.
(More)