Using a Fuzzy Decision Tree Ensemble for Tumor Classification from Gene Expression Data
José M. Cadenas, M. Carmen Garrido, Raquel Martínez, David A. Pelta, Piero P. Bonissone
2013
Abstract
Machine learning techniques are useful tools that can help us in the knowledge extraction from gene expression data in biological systems. In this paper two machine learning techniques are applied to tumor datasets based on gene expression data. Both techniques are based on a fuzzy decision tree ensemble and are used to carry out the classification and selection of features on datasets. The classification accuracies obtained both when we use all genes to classify and when we only use the selected genes are high. However, in this second case the result also increases the interpretability of the solution provided by the technique. Additionally, the feature selection technique provides a ranking of importance of genes and a partitioning of the domains of the genes.
References
- Alon, U., Barkai, N., Notterman, D. A., Gish, K., Ybarra, S., Mack, D., and Levine, A. J. (1999). Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci U S A., 96:6745- 6750.
- Ben-Dor, A., Bruhn, L., Friedman, N., Nachman, I., Schummer, M., and Yakhini, Z. (2000). Tissue classification with gene expression profiles. Journal of Computational Biology, 7:559-583.
- Bonissone, P. P., Cadenas, J. M., Garrido, M. C., and Díaz-Valladares, R. A. (2010). A fuzzy random forest. International Journal of Approximate Reasoning, 51(7):729-747.
- Brandley, A. P. (1997). The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern Recognition, 30(7):1145-1159.
- Breiman, L. (2001). Random forests. Machine Learning, 43:5-32.
- Cadenas, J. M., Garrido, M. C., and Martínez, R. (2013). Feature subset selection filter-wrapper based on low quality data. Expert Systems with Applications, 40:1- 10. doi:10.1016/j.eswa.2013.05.051.
- Cadenas, J. M., Garrido, M. C., Martínez, R., and Bonissone, P. P. (2012a). Extending information processing in a fuzzy random forest ensemble. Soft Computing, 16(5):845-861.
- Cadenas, J. M., Garrido, M. C., Martínez, R., and Bonissone, P. P. (2012b). Ofp class: a hybrid method to generate optimized fuzzy partitions for classification. Soft Computing, 16:667-682.
- Clarke, P., George, M., Cunningham, D., Swift, I., and Workman, P. (1999). Analysis of tumor gene expression following chemotherapeutic treatment of patients with bowel cancer. In Proc. Nature Genetics Microarray Meeting, pages 39-39, Scottsdale, Arizona.
- Dagliyan, O., Uney-Yuksektepe, F., Kavakli, I. H., and Turkay, M. (2011). Optimization based tumor classification from microarray gene expression data. PLoS One, 6:e14579.
- DeLong, E. R., DeLong, D. M., and Clarke-Pearson, D. L. (1988). Comparing the areas under two or more correlated receiver operating characteristic curves: A nonparametric approach. Biometrics, 44:837-845.
- Diaz-Uriarte, R. and de Andrés, S. A. (2006). Gene selection and classification of microarray data using random forest. BMC Bioinformatics, 7(3).
- Duval, B. and Hao, J.-K. (2010). Advances in metaheuristics for gene selection and classification of microarray data. Briefings in Bioinformatics, 11(1):127-141.
- Genuer, R., Poggi, J.-M., and Tuleau-Malot, C. (2010). Variable selecting using random forest. Pattern Recognition Letters, 31(14):2225-2236.
- Ghoraia, S., Mukherjeeb, A., and Duttab, P. K. (2012). Gene expression data classification by vvrkfa. Procedia Technology, 4:330-335.
- Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., Coller, H., Loh, M., Downing, J., Caligiuri, M., Bloomfield, C., and Lander, E. (1999). Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, 286:531-537.
- Hanley, J. A. and McNeil, B. J. (1982). The meaning and use of the area under a receiver operating characteristic (roc) curve. Radiology, 143:29-36.
- Mukhopadhyaya, A. and Maulikb, U. (2009). Towards improving fuzzy clustering using support vector machine: Application to gene expression data. Pattern Recognition Pattern Recognition, 42:2744-2763.
- Nitsch, D., Gonzalves, J. P., Ojeda, F., de Moor, B., and Moreau, Y. (2010). Candidate gene prioritization by network analysis of differential expression using machine learning approaches. BMC Bioinformatics, 11:460.
- Yager, R. R. (1988). On ordered weighted averaging aggregation operators in multicriteria decision making. IEEE transactions on Systems, Man and Cybernetics, 18:183-190.
Paper Citation
in Harvard Style
Cadenas J., Garrido M., Martínez R., A. Pelta D. and P. Bonissone P. (2013). Using a Fuzzy Decision Tree Ensemble for Tumor Classification from Gene Expression Data . In Proceedings of the 5th International Joint Conference on Computational Intelligence - Volume 1: SCA, (IJCCI 2013) ISBN 978-989-8565-77-8, pages 320-331. DOI: 10.5220/0004658203200331
in Bibtex Style
@conference{sca13,
author={José M. Cadenas and M. Carmen Garrido and Raquel Martínez and David A. Pelta and Piero P. Bonissone},
title={Using a Fuzzy Decision Tree Ensemble for Tumor Classification from Gene Expression Data},
booktitle={Proceedings of the 5th International Joint Conference on Computational Intelligence - Volume 1: SCA, (IJCCI 2013)},
year={2013},
pages={320-331},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004658203200331},
isbn={978-989-8565-77-8},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 5th International Joint Conference on Computational Intelligence - Volume 1: SCA, (IJCCI 2013)
TI - Using a Fuzzy Decision Tree Ensemble for Tumor Classification from Gene Expression Data
SN - 978-989-8565-77-8
AU - Cadenas J.
AU - Garrido M.
AU - Martínez R.
AU - A. Pelta D.
AU - P. Bonissone P.
PY - 2013
SP - 320
EP - 331
DO - 10.5220/0004658203200331