A Framework for High-throughput Gene Signatures with Microarray-based Brain Cancer Gene Expression Profiling Data

Hung-Ming Lai, Andreas Albrecht, Kathleen Steinhöfel

Abstract

Cancer classification through high-throughput gene expression profiles has been widely used in biomedical research. Most recently, we portrayed a multivariate method for large scale gene selection based on information theory with the central issue of feature interdependence, and we validated its effectiveness using a colon cancer benchmark. The present paper further develops our previous work on feature interdependence. Firstly, we have refined the method and proposed a complete framework to select a gene signature for a certain disease phenotype prediction under high-throughput technologies. The framework has then been applied to a brain cancer gene expression profile derived from Affymetrix Human Genome U95Av2 Array, where the number of interrogated genes is six times larger than that in the previously studied colon cancer data set. Three information theory based filters were used for comparison. Our experimental results show that the framework outperforms them in terms of classification performance based upon three performance measures. Additionally, to demonstrate how effectively feature interdependence can be tackled within the framework, two sets of enrichment analysis have also been performed. The results also show that more statistically significant gene sets and regulatory interactions could be found in our gene signature. Therefore, this framework could be promising for high-throughput gene selection around gene synergy.

References

  1. Albrecht, A., Vinterbo, S. A. & Ohno-Machado, L. 2003. An Epicurean learning approach to gene-expression data classification. Artificial Intelligence in Medicine, 28, 75-87.
  2. Alon, U., Barkai, N., Notterman, D. A., Gish, K., Ybarra, S., Mack, D. & Levine, A. J. 1999. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of the National Academy of Sciences, 96, 6745-6750.
  3. Bell, D. A. & Wang, H. 2000. A formalism for relevance and its application in feature subset selection. Machine Learning, 41, 175-195.
  4. Brown, G., Pocock, A., Zhao, M.-J. & Luj N, M. 2012. Conditional likelihood maximisation: A unifying framework for information theoretic feature selection. The Journal of Machine Learning Research, 13, 27-66.
  5. Coussens, L. M., Zitvogel, L. & Palucka, A. K. 2013. Neutralizing tumor-promoting chronic inflammation: a magic bullet? Science, 339, 286-291.
  6. Cover, T. M. & Thomas, J. A. 2012. Elements of Information Theory, John Wiley & Sons.
  7. Davies, S. & Russell, S. NP-completeness of searches for smallest possible feature sets. Proceedings of the 1994 AAAI Fall Symposium on Relevance, 1994. 37-39.
  8. Ding, C. & Peng, H. 2005. Minimum redundancy feature selection from microarray gene expression data. Journal of Bioinformatics and Computational Biology, 3, 185-205.
  9. Ein-Dor, L., Zuk, O. & Domany, E. 2006. Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer. Proceedings of the National Academy of Sciences, 103, 5923-5928.
  10. Fleuret, F. 2004. Fast binary feature selection with conditional mutual information. The Journal of Machine Learning Research, 5, 1531-1555.
  11. Gheyas, I. A. & Smith, L. S. 2010. Feature subset selection in large dimensionality domains. Pattern Recognition, 43, 5-13.
  12. Guyon, I., Weston, J., Barnhill, S. & Vapnik, V. 2002. Gene selection for cancer classification using support vector machines. Machine Learning, 46, 389-422.
  13. Kim, S.-Y. 2009. Effects of sample size on robustness and prediction accuracy of a prognostic gene signature. BMC Bioinformatics, 10, 147.
  14. Kohavi, R. & John, G. H. 1997. Wrappers for feature subset selection. Artificial Intelligence, 97, 273-324.
  15. Lai, H.-M., Albrecht, A. & Steinhofel, K. 2013. Gene selection guided by feature interdependence. World Academy of Science, Engineering and Technology (WASET), 1432-1438.
  16. Lazar, C., Taminau, J., Meganck, S., Steenhoff, D., Coletta, A., Molter, C., De Schaetzen, V., Duque, R., Bersini, H. & Now , A. 2012. A survey on filter techniques for feature selection in gene expression microarray analysis. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), 9, 1106-1119.
  17. Mundra, P. A. & Rajapakse, J. C. 2010. SVM-RFE with MRMR filter for gene selection. NanoBioscience, IEEE Transactions on, 9, 31-37.
  18. Nevins, J. R. & Potti, A. 2007. Mining gene expression profiles: expression signatures as cancer phenotypes. Nature Reviews Genetics, 8, 601-609.
  19. Nutt, C. L., Mani, D., Betensky, R. A., Tamayo, P., Cairncross, J. G., Ladd, C., Pohl, U., Hartmann, C., Mclaughlin, M. E. & Batchelor, T. T. 2003. Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. Cancer Research, 63, 1602-1607.
  20. Saeys, Y., Inza, I. & Larra Aga, P. 2007. A review of feature selection techniques in bioinformatics. Bioinformatics, 23, 2507-2517.
  21. Subramanian, A., Tamayo, P., Mootha, V. K., Mukherjee, S., Ebert, B. L., Gillette, M. A., Paulovich, A., Pomeroy, S. L., Golub, T. R. & Lander, E. S. 2005. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences, 102, 15545-15550.
  22. Wang, J., Duncan, D., Shi, Z. & Zhang, B. 2013. WEBbased GEne SeT AnaLysis Toolkit (WebGestalt): update 2013. Nucleic Acids Research, 41, W77-W83.
  23. Yu, L. & Liu, H. 2004. Efficient feature selection via analysis of relevance and redundancy. The Journal of Machine Learning Research, 5, 1205-1224.
  24. Zhou, X. & Tuck, D. P. 2007. MSVM-RFE: extensions of SVM-RFE for multiclass gene selection on DNA microarray data. Bioinformatics, 23, 1106-1114.
Download


Paper Citation


in Harvard Style

Lai H., Albrecht A. and Steinhöfel K. (2014). A Framework for High-throughput Gene Signatures with Microarray-based Brain Cancer Gene Expression Profiling Data . In Proceedings of the 6th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART, ISBN 978-989-758-015-4, pages 211-220. DOI: 10.5220/0004926002110220


in Bibtex Style

@conference{icaart14,
author={Hung-Ming Lai and Andreas Albrecht and Kathleen Steinhöfel},
title={A Framework for High-throughput Gene Signatures with Microarray-based Brain Cancer Gene Expression Profiling Data},
booktitle={Proceedings of the 6th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,},
year={2014},
pages={211-220},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004926002110220},
isbn={978-989-758-015-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 6th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,
TI - A Framework for High-throughput Gene Signatures with Microarray-based Brain Cancer Gene Expression Profiling Data
SN - 978-989-758-015-4
AU - Lai H.
AU - Albrecht A.
AU - Steinhöfel K.
PY - 2014
SP - 211
EP - 220
DO - 10.5220/0004926002110220