Sequence-based MicroRNA Clustering

Kübra Narcı, Hasan Oğul, Mahinur Akkaya


MicroRNAs (miRNAs) play important roles in post-transcriptional gene regulation. Altogether, understanding integrative and co-operative activities in gene regulation is conjugated with identification of miRNA families. In current applications, the identification of such groups of miRNAs is only investigated by the projections of their expression patterns and so along with their functional relations. Considering the fact that the miRNA regulation is mediated through its mature sequence by the recognition of the target mRNA sequences in the RISC (RNA-induced silencing complex) binding regions, we argue here that relevant miRNA groups can be obtained by de novo clustering them solely based on their sequence information, by a sequence clustering approach. In this way, a new study can be guided by a set of previously annotated miRNA groups without any preliminary experimentation or literature evidence. In this report, we presents the results of a computational study that considers only mature miRNA sequences to obtain relevant miRNA clusters using various machine learning methods employed with different sequence representation schemes. Both statistical and biological evaluations encourages the use this approach in silico assessment of functional miRNA groups.


  1. Abbott, A.L. et al., 2005. The let-7 MicroRNA family members mir-48, mir-84, and mir-241 function together to regulate developmental timing in Caenorhabditis elegans. Developmental cell, 9(3), pp.403-14.
  2. Altuvia, Y. et al., 2005. Clustering and conservation patterns of human microRNAs. Nucleic acids research, 33(8), pp.2697-706.
  3. Antonov, A. V et al., 2009. GeneSet2miRNA: finding the signature of cooperative miRNA activities in the gene lists. Nucleic acids research, 37(Web Server issue), pp.W323-8.
  4. Asgari, S., 2011. Role of MicroRNAs in Insect HostMicroorganism Interactions. Frontiers in physiology, 2(August), p.48.
  5. Bartel, B. & Bartel, D.P., 2003. Update on Small RNAs MicroRNAs?: At the Root of Plant Development?? 1. Plant physiology, 132(June), pp.709-717.
  6. Bartel, D.P., 2013. Micro RNA Target Recognition and Regulatory Functions. Cell, 136(2), pp.215-233.
  7. Bartel, D.P., 2004. MicroRNAs?: Genomics , Biogenesis , Mechanism , and Function Genomics?: The miRNA Genes. Cell, 116, pp.281-297.
  8. Burge, S.W. et al., 2013. Rfam 11.0: 10 years of RNA families. Nucleic acids research, 41(Database issue), pp.D226-32.
  9. Corpet, F., 1988. Multiple sequence alignment with hierarchical clustering. Nucleic acids research, 16(22), pp.10881-10890.
  10. Dib, L. & Carbone, A., 2012. Open Access CLAG?: an unsupervised non hierarchical clustering algorithm handling biological data.
  11. Dopazo, J. et al., 1997. Self-organizing tree-growing network for the classification of protein sequences. Protein science?: a publication of the Protein Society, 7(12), pp.2613-22.
  12. Dunn, J.C., 1973. A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact WellSeparated Clusters. Journal of Cybernetics, 3(3), pp.32-57.
  13. Dweep, H. & Gretz, N., 2015. miRWalk2.0: a comprehensive atlas of microRNA-target interactions. Nature methods, 12(8), p.697.
  14. Edgar, R.C., 2010. Search and clustering orders of magnitude faster than BLAST. Bioinformatics (Oxford, England), 26(19), pp.2460-1.
  15. Enright, a J., Van Dongen, S. & Ouzounis, C. a, 2002. An efficient algorithm for large-scale detection of protein families. Nucleic acids research, 30(7), pp.1575-84.
  16. Flynn, P.J., 1999. Data Clustering?: A Review. IEEE Computer Society, 31(3).
  17. Gennarino, V.A. et al., 2012. Identification of microRNAregulated gene networks by expression analysis of target genes. Genome research, 22(6), pp.1163-1172.
  18. He, L. & Hannon, G.J., 2004. MicroRNAs: small RNAs with a big role in gene regulation. Nature reviews. Genetics, 5(7), pp.522-31.
  19. Herrero, J., Diaz-Uriarte, R. & Dopazo, J., 2003. Gene expression data preprocessing. Bioinformatics, 19(5), pp.655-656.
  20. Herrero, J., Valencia, A. & Joaquin, D., 2001. network for clustering gene expression patterns. , 17(2), pp.126- 136.
  21. Hertel, J. et al., 2012. Evolution of the let-7 microRNA Family. RNA biology, 9(3), pp.1-11.
  22. Jacobsen, A. et al., 2013. Analysis of microRNA-target interactions across diverse cancer types. Nature structural & molecular biology, 20(11), pp.1325-32.
  23. Jain, A.K., 2010. Data clustering: 50 years beyond Kmeans. Pattern Recognition Letters, 31(8), pp.651- 666.
  24. Kozomara, A. & Griffiths-Jones, S., 2011. miRBase: integrating microRNA annotation and deepsequencing data. Nucleic acids research, 39(Database issue), pp.D152-7.
  25. Lagos-Quintana, M. et al., 2001. Identification of novel genes coding for small expressed RNAs. Science (New York, N.Y.), 294(5543), pp.853-8.
  26. Lai, E.C. et al., 2003. Computational identification of Drosophila microRNA genes. , 4(7), pp.1-20.
  27. Li, L., Stoeckert, C.J. & Roos, D.S., 2003. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome research, 13(9), pp.2178-89.
  28. Lu, M. et al., 2008. An analysis of human microRNA and disease associations. PloS one, 3(10), p.e3420.
  29. Lu, M. et al., 2010. TAM: a method for enrichment and depletion analysis of a microRNA category in a list of microRNAs. BMC bioinformatics, 11, p.419.
  30. Macqueen, J., 1967. Some Methods For Classification and Analysis of Multivariate Observation. In Berkeley Symposium on Matematical Statistic and Probablity. University of California Press, pp. 281-297.
  31. Needleman, S.B. & Wunsch, C.D., 1970. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology, 48(3), pp.443-453.
  32. Newman, M.A., Thomson, J.M. & Hammond, S.M., 2008. Lin-28 interaction with the Let-7 precursor loop mediates regulated microRNA processing. RNA biology, 14(8), pp.1539-1549.
  33. Ogul, H. & Mumcuoglu, E.U., 2007. A discriminative method for remote homology detection based on npeptide compositions with reduced amino acid alphabets. Bio Systems, 87(1), pp.75-81.
  34. Ölçer, D. & Ogul, H., 2013. Clustering MicroRNAs from Sequence and Time-Series Expression. BIOTECHNO 2013, 5(c), pp.1-4.
  35. Pratt, A.J. & MacRae, I.J., 2009. The RNA-induced silencing complex: a versatile gene-silencing machine. The Journal of biological chemistry, 284(27), pp.17897-901.
  36. Rawlins, T. et al., 2012. Interactive k-means clustering for investigation of optimisation solution data. , 0, pp.1-2.
  37. Satoh, J.-I., 2012. Molecular network analysis of human microRNA targetome: from cancers to Alzheimer's disease. BioData mining, 5(1), p.17.
  38. Shi, B., Gao, W. & Wang, J., 2012. Sequence fingerprints of microRNA conservation. PloS one, 7(10), p.e48256.
  39. Sisodia, D., 2012. Clustering Techniques?: A Brief Survey of Different Clustering Algorithms. International journal of latest trends in engineering and Technlogy, 1(3), pp.82-87.
  40. Smith, T.F. & Waterman, M.S., 1981. Identification of common molecular subsequences. Journal of Molecular Biology, 147(1), pp.195-197.
  41. Zhao, D. et al., 2010. PMirP: a pre-microRNA prediction method based on structure-sequence hybrid features. Artificial intelligence in medicine, 49(2), pp.127-32.
  42. Zhao, Z. & Liu, H., 2007. Spectral feature selection for supervised and unsupervised learning. Proceedings of the 24th international conference on Machine learning - ICML 7807, pp.1151-1157.

Paper Citation

in Harvard Style

Narcı K., Oğul H. and Akkaya M. (2016). Sequence-based MicroRNA Clustering . In Proceedings of the 9th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 3: BIOINFORMATICS, (BIOSTEC 2016) ISBN 978-989-758-170-0, pages 107-116. DOI: 10.5220/0005552901070116

in Bibtex Style

author={Kübra Narcı and Hasan Oğul and Mahinur Akkaya},
title={Sequence-based MicroRNA Clustering},
booktitle={Proceedings of the 9th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 3: BIOINFORMATICS, (BIOSTEC 2016)},

in EndNote Style

JO - Proceedings of the 9th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 3: BIOINFORMATICS, (BIOSTEC 2016)
TI - Sequence-based MicroRNA Clustering
SN - 978-989-758-170-0
AU - Narcı K.
AU - Oğul H.
AU - Akkaya M.
PY - 2016
SP - 107
EP - 116
DO - 10.5220/0005552901070116