Authors:
Maysson Al-Haj Ibrahim
;
Sabah Jassim
;
Michael A. Cawthorne
and
Kenneth Langlands
Affiliation:
Buckingham University, United Kingdom
Keyword(s):
Disease classification, Biomarker discovery, Pathway enrichment, Gene network analysis, Microarray data analysis.
Related
Ontology
Subjects/Areas/Topics:
Algorithms and Software Tools
;
Bioinformatics
;
Biomedical Engineering
;
Pattern Recognition, Clustering and Classification
Abstract:
At present, a range of clinical indicators are used to gain insight into the course a newly-presented individual’s disease may take, and so inform treatment regimes. However, such indicators are not absolutely predictive and patients with apparently low-risk disease may follow a more aggressive course. Advances in molecular medicine offer the hope of improved disease stratification and personalised treatment. For example, the identification of “genetic signatures” characteristic of disease subtypes is facilitated by high-throughput transcriptional profiling techniques (microarrays) in which gene expression levels for thousands of genes are measured across a range of biopsy samples. However, the selection of a compact gene set conferring the most clinically-relevant information from complex and high-dimensional microarray datasets is a challenging task. We reduced this complexity using a Pathway Enrichment and Gene Network Analysis (PEGNA) method, which integrates gene expression data
with prior biological knowledge to select a group of strongly-correlated genes providing accurate discrimination of complex disease subtypes. In our method, pathway enrichment analysis was applied to a microarray dataset in order to identify the most impacted biological processes. Secondly, we used gene network analysis to find a group of strongly-correlated genes from which subsets of genes were selected to use for disease classification with a support vector machine classifier. In this way, we were able to more accurately classify disease states, using smaller numbers of genes, compared to other methods across a range of biological datasets.
(More)