Feature Selection for MicroRNA Target Prediction - Comparison of One-Class Feature Selection Methodologies
Malik Yousef, Jens Allmer, Waleed Khalifa
2016
Abstract
Traditionally, machine learning algorithms build classification models from positive and negative examples. Recently, one-class classification (OCC) receives increasing attention in machine learning for problems where the negative class cannot be defined unambiguously. This is specifically problematic in bioinformatics since for some important biological problems the target class (positive class) is easy to obtain while the negative one cannot be measured. Artificially generating the negative class data can be based on unreliable assumptions. Several studies have applied two-class machine learning to predict microRNAs (miRNAs) and their target. Different approaches for the generation of an artificial negative class have been applied, but may lead to a biased performance estimate. Feature selection has been well studied for the two–class classification problem, while fewer methods are available for feature selection in respect to OCC. In this study, we present a feature selection approach for applying one-class classification to the prediction of miRNA targets. A comparison between one-class and two-class approaches is presented to highlight that their performance are similar while one-class classification is not based on questionable artificial data for training and performance evaluation. We further show that the feature selection method we tried works to a degree, but needs improvement in the future. Perhaps it could be combined with other approaches.
References
- Amaldi, E., and Kann, V. (1998). On the approximability of minimizing nonzero variables or unsatisfied relations in linear systems. Theoretical Computer Science, 209(1-2), 237-260. doi:10.1016/S0304- 3975(97)00115-1.
- Bailey, T. L., and Elkan, C. (1994). Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proceedings / ... International Conference on Intelligent Systems for Molecular Biology?; ISMB. International Conference on Intelligent Systems for Molecular Biology, 2, 28-36. Retrieved from http://www.ncbi.nlm.nih.gov/ pubmed/7584402.
- Bartel, D. P., Lee, R., and Feinbaum, R. (2004). MicroRNAs?: Genomics , Biogenesis , Mechanism , and Function Genomics?: The miRNA Genes, 116, 281- 297.
- Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5-32. doi:10.1023/A:1010933404324.
- Chang, C.-C., and Lin, C.-J. (2011). LIBSVM. ACM Transactions on Intelligent Systems and Technology, 2(3), 1-27. doi:10.1145/1961189.1961199.
- Crammer, K., and Chechik, G. (2004). A needle in a haystack: local one-class optimization. In R. Greiner and D. Schuurmans (Eds.), Proceedings of the 21st International Conference on Machine Learning (ICML04). Retrieved from http://www.machinelearning.org/ proceedings/icml2004/papers/239.ps.
- Donaldson, I., Martin, J., de Bruijn, B., Wolting, C., Lay, V., Tuekam, B., … Hogue, C. W. V. (2003). PreBIND and Textomy--mining the biomedical literature for protein-protein interactions using a support vector machine. BMC Bioinformatics, 4, 11.
- Fan, X., and Kurgan, L. (2014). Comprehensive overview and assessment of computational prediction of microRNA targets in animals. Briefings in Bioinformatics. doi:10.1093/bib/bbu044.
- Goymer, P. (2006). Different treatment. Nature Reviews Cancer, 6(2), 94-95. doi:10.1038/nrc1808.
- Gupta, G., and Ghosh, J. (2005). Robust one-class clustering using hybrid global and local search. In Proceedings of the 22nd international conference on Machine learning - ICML 7805 (pp. 273-280). New York, New York, USA: ACM Press. doi:10.1145/1102351.1102386.
- Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., and Witten, I. H. (2009). The WEKA data mining software. ACM SIGKDD Explorations Newsletter, 11(1), 10. doi:10.1145/1656274.1656278.
- Haussler, D. (1999). Convolution Kernels on Discrete Structures. In Technical Report UCSCRL9910 UC, 23(1), 1-38. Retrieved from http:// eprints.kfupm.edu.sa/32597/
- Jeong, Y.-S., Kang, I.-H., Jeong, M.-K., and Kong, D. (2012). A New Feature Selection Method for One-Class Classification Problems. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42(6), 1500-1509. doi:10.1109/TSMCC.2012.2196794.
- John, B., Enright, A. J., Aravin, A., Tuschl, T., Sander, C., and Marks, D. S. (2004). Human MicroRNA targets. PLoS Biology, 2(11), e363. doi:10.1371/ journal.pbio.0020363.
- Khan, S. S., and Madden, M. G. (2014). One-class classification: taxonomy of study and review of techniques. The Knowledge Engineering Review, 29(03), 345-374. doi:10.1017/S026988891300043X.
- Kim, S.-K., Nam, J.-W., Lee, W.-J., and Zhang, B.-T. (2005). A Kernel Method for MicroRNA Target Prediction Using Sensible Data and Position-Based Features. In 2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (pp. 1-7). IEEE. doi:10.1109/CIBCB.2005.1594897.
- Kiriakidou, M., Nelson, P. T., Kouranov, A., Fitziev, P., Bouyioukos, C., Mourelatos, Z., and Hatzigeorgiou, A. (2004). A combined computational-experimental approach predicts human microRNA targets. Genes and Development, 18(10), 1165-1178. doi:10.1101/ gad.1184704.
- Koppel, M., and Schler, J. (2004). Authorship verification as a one-class classification problem. In Twenty-first international conference on Machine learning - ICML 7804 (p. 62). New York, New York, USA, Alberta, Canada: ACM Press. doi:10.1145/1015330.1015448.
- Kowalczyk, A., and Raskutti, B. (2002). One Class SVM for Yeast Regulation Prediction. SIGKDD Explorations, 4(2), 99-100.
- Krek, A., Grün, D., Poy, M. N., Wolf, R., Rosenberg, L., Epstein, E. J., … Rajewsky, N. (2005). Combinatorial microRNA target predictions. Nature Genetics, 37(5), 495-500. doi:10.1038/ng1536.
- Krüger, J., and Rehmsmeier, M. (2006). RNAhybrid: microRNA target prediction easy, fast and flexible. Nucleic Acids Research, 34(Web Server issue), W451- 454. doi:10.1093/nar/gkl243.
- Lai, E. C. (2004). Predicting and validating microRNA targets. Genome Biology, 5(9), 115. doi:10.1186/gb2004-5-9-115.
- Lewis, B. P., Burge, C. B., and Bartel, D. P. (2005). Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell, 120(1), 15-20. doi:10.1016/ j.cell.2004.12.035.
- Lewis, B. P., Shih, I., Jones-Rhoades, M. W., Bartel, D. P., and Burge, C. B. (2003). Prediction of mammalian microRNA targets. Cell, 115(7), 787-798. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/14697198.
- Lian, H. (2012). On feature selection with principal component analysis for one-class SVM. Pattern Recognition Letters, 33(9), 1027-1031. doi:10.1016/j.patrec.2012.01.019.
- Lorena, L. H. N., Carvalho, A. C. P. L. F., and Lorena, A. C. (2014). Filter Feature Selection for One-Class Classification. Journal of Intelligent and Robotic Systems, 1-17. doi:10.1007/s10846-014-0101-2.
- Lytle, J. R., Yario, T. A., and Steitz, J. A. (2007). Target mRNAs are repressed as efficiently by microRNAbinding sites in the 578 UTR as in the 378 UTR. Proceedings of the National Academy of Sciences of the United States of America, 104(23), 9667-9672. doi:10.1073/pnas.0703820104.
- Manevitz, L. M., and Yousef, M. (2002). One-Class SVMs for Document Classification. The Journal of Machine Learning Research, 2, 139-154. Retrieved from http://dl.acm.org/citation.cfm?id=944808.
- McCallum, A. K. (1996). Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering. Retrieved from http://www.cs.cmu.edu/ mccallum/bow.
- Miranda, K. C., Huynh, T., Tay, Y., Ang, Y.-S., Tam, W.- L., Thomson, A. M., … Rigoutsos, I. (2006). A patternbased method for the identification of MicroRNA binding sites and their corresponding heteroduplexes. Cell, 126(6), 1203-17. doi:10.1016/j.cell.2006.07.031.
- Mitchell, T. (1997). Machine Learning.
- Novak, K. (2006). Taking out the trash. Nature Reviews Cancer, 6(2), 92-92. doi:10.1038/nrc1807.
- Pavlidis, P., Weston, J., Jinsong, C., and Grundy, W. N. (2001). Gene functional classification from heterogeneous data. In Proceedings of the Fifth International Conference on Computational Molecular Biology (pp. 242-248). Retrieved from https://noble.gs.washington.edu/papers/exp-phylo.pdf.
- Quinlan, J. R. (1993). C4.5: programs for machine learning. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
- Saetrom, O., Snøve, O., and Saetrom, P. (2005). Weighted sequence motifs as an improved seeding step in microRNA target prediction algorithms. RNA, 11(7), 995-1003. doi:10.1261/rna.7290705.
- Schölkopf, B., Burges, C. J. C., and Smola, A. J. (1999). Advances in Kernel Methods. Cambridge, MA: MIT Press.
- Schölkopf, B., Platt, J. C., Shawe-Taylor, J., Smola, A. J., and Williamson, R. C. (2001). Estimating the Support of a High-Dimensional Distribution. Neural Comp., 13(7), 1443-1471.
- Sethupathy, P., Corda, B., and Hatzigeorgiou, A. G. (2006). TarBase: A comprehensive database of experimentally supported animal microRNA targets. RNA, 12(2), 192- 7. doi:10.1261/rna.2239606.
- Spinosa, E. J., and Carvalho, A. C. P. L. F. de. (2005). Support vector machines for novel class detection in Bioinformatics. Genetics and Molecular Research [electronic Resource]?: GMR., 4(3), 608-615.
- Tax, D. M. J. (2001). One-class classification. Technical University Delft. Retrieved from ISBN: 90-75691-05-x.
- Tax, D. M. J. (2015). DDtools, the Data Description Toolbox for Matlab.
- Thadani, R., and Tammi, M. T. (2006). MicroTar: predicting microRNA targets from RNA duplexes. BMC Bioinformatics, 7 Suppl 5, S20. doi:10.1186/1471-2105-7-S5-S20.
- Thirion, B., and Faugeras, O. (2004). Feature characterization in fMRI data: The Information Bottleneck approach. Medical Image Analysis, 8(4), 403-419. doi:10.1016/j.media.2004.09.001.
- Vapnik, V. N. (1995). The nature of statistical learning theory. New York, New York, USA: Springer-Verlag. Retrieved from http://dl.acm.org/ citation.cfm?id=211359.
- Witten, I. H., Frank, E., and Hall, M. A. (2011). Introduction to Weka. In Data Mining: Practical Machine Learning Tools and Techniques (pp. 403- 406). Elsevier. doi:10.1016/B978-0-12-374856- 0.00010-9.
- Xiao, F., Zuo, Z., Cai, G., Kang, S., Gao, X., and Li, T. (2009). miRecords: an integrated resource for microRNA-target interactions. Nucleic Acids Research, 37(Database issue), D105-10. doi:10.1093/nar/gkn851.
- Xuan, P., Guo, M., Liu, X., Huang, Y., Li, W., and Huang, Y. (2011). PlantMiRNAPred: efficient classification of real and pseudo plant pre-miRNAs. Bioinformatics (Oxford, England), 27(10), 1368-76. doi:10.1093/bioinformatics/btr153.
- Yan, X., Chao, T., Tu, K., Zhang, Y., Xie, L., Gong, Y., … Peng, X. (2007). Improving the prediction of human microRNA target genes by using ensemble algorithm. FEBS Letters, 581(8), 1587-93. doi:10.1016/ j.febslet.2007.03.022.
- Yousef, M., Jung, S., Kossenkov, A. V, Showe, L. C., and Showe, M. K. (2007). Naïve Bayes for microRNA target predictions--machine learning for microRNA targets. Bioinformatics (Oxford, England), 23(22), 2987-92. doi:10.1093/bioinformatics/btm484.
- Yousef, M., Jung, S., Showe, L. C., and Showe, M. K. (2008). Learning from positive examples when the negative class is undetermined--microRNA gene identification. Algorithms for Molecular Biology, 3, 2. doi:10.1186/1748-7188-3-2.
- Yousef, M., Najami, N., and Khalifa, W. (2010). A Comparison Study Between One-Class and Two-Class Machine Learning for MicroRNA Target Detection. Journal of Biomedical Science and Engineering.
- Yousef, M., Showe, L., and Showe, M. (2009). A study of microRNAs in silico and in vivo: Bioinformatics approaches to microRNA discovery and target identification. FEBS Journal. doi:10.1111/j.1742- 4658.2009.06933.x.
- Zuker, M. (2003). Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Research, 31(13), 3406-3415. doi:10.1093/nar/gkg595.
Paper Citation
in Harvard Style
Yousef M., Allmer J. and Khalifa W. (2016). Feature Selection for MicroRNA Target Prediction - Comparison of One-Class Feature Selection Methodologies . In Proceedings of the 9th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 3: BIOINFORMATICS, (BIOSTEC 2016) ISBN 978-989-758-170-0, pages 216-225. DOI: 10.5220/0005701602160225
in Bibtex Style
@conference{bioinformatics16,
author={Malik Yousef and Jens Allmer and Waleed Khalifa},
title={Feature Selection for MicroRNA Target Prediction - Comparison of One-Class Feature Selection Methodologies},
booktitle={Proceedings of the 9th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 3: BIOINFORMATICS, (BIOSTEC 2016)},
year={2016},
pages={216-225},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005701602160225},
isbn={978-989-758-170-0},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 9th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 3: BIOINFORMATICS, (BIOSTEC 2016)
TI - Feature Selection for MicroRNA Target Prediction - Comparison of One-Class Feature Selection Methodologies
SN - 978-989-758-170-0
AU - Yousef M.
AU - Allmer J.
AU - Khalifa W.
PY - 2016
SP - 216
EP - 225
DO - 10.5220/0005701602160225