On the Impact of Granularity in Extracting Knowledge from Bioinformatics Data

Sean West, Hesham Ali

2016

Abstract

With the rapidly increasing amount of various types of biological data currently available to researchers, the focus of the biomedical research community has been shifting from pure data generation towards the development of new methodologies for data analytics. Although many researchers continue to focus on approaches developed for analyzing single types of biological data, recent attempts have been made to utilize the availability of heterogeneous data sets that contain various types of data and try to establish tools for data integration and analysis in many bioinformatics applications. Such attempts are expected to increase significantly in this coming decade. While this can be viewed as a positive step towards advancing big data analytics in bioinformatics, it is critical that these integration methodologies are meticulously studied to ensure high quality of the knowledge extracted from the integrated data. In this work, we employ data integration methods to analyze biological data obtained from protein interaction networks and gene expression data. We conduct a study to show that potential problems can arise from integrating or fusing data obtained at different granularity levels and highlight the importance of developing advanced data fusing techniques to integrate various types of biological data for analytical purposes. Further, we explore the impact of granularity from a more formulized approach and the granularity levels significantly impact the quality of knowledge extracted from the integrated data.

References

  1. Agarwal, A. K., Xu, T., Jacob, M. R., Feng, Q., Lorenz, M. C., Walker, L. A., & Clark, A. M. (2008). Role of heme in the antifungal activity of the azaoxoaporphine alkaloid sampangine. Eukaryotic cell, 7(2), 387-400.
  2. Bindea, G., Mlecnik, B., Hackl, H., Charoentong, P., Tosolini, M., Kirilovsky, A., Fridman, W., Pages, F., Trajanoski, Z., & Galon, J. (2009). ClueGO: a Cytoscape plug-in to decipher functionally grouped gene ontology and pathway annotation networks. Bioinformatics, 25(8), 1091-1093.
  3. Bittner, T., & Smith, B. (2003). A theory of granular partitions. Foundations of geographic information science, 7, 124-125.
  4. Bittner, T., Donnelly, M., & Smith, B. (2004, November). Individuals, universals, collections: On the foundational relations of ontology. In Proceedings of the Third Conference on Formal Ontology in Information Systems (pp. 37-48).
  5. Bossi, A., & Lehner, B. (2009). Tissue specificity and the human protein interaction network. Molecular systems biology, 5(1).
  6. Ceol, A., Aryamontri, A. C., Licata, L., Peluso, D., Briganti, L., Perfetto, L., ... & Cesareni, G. (2009). MINT, the molecular interaction database: 2009 update.Nucleic acids research, gkp983.
  7. Chatr-aryamontri, A., Breitkreutz, B. J., Heinicke, S., Boucher, L., Winter, A., Stark, C., Nixon, J., Ramage, L., … & Tyers, M. (2013). The BioGRID interaction database: 2013 update. Nucleic acids research, 41(D1), D816-D823.
  8. Greer, J. E., & McCalla, G. I. (1989, August). A Computational Framework for Granularity and its Application to Educational Diagnosis. In IJCAI (pp. 477-482).
  9. Halevy, A., Rajaraman, A., & Ordille, J. (2006, September). Data integration: the teenage years. In Proceedings of the 32nd international conference on Very large data bases (pp. 9-16). VLDB Endowment. Bowman, M., Debray, S. K., and Peterson, L. L. 1993. Reasoning about naming systems. ACM Trans. Program. Lang. Syst. 15, 5 (Nov. 1993), 795-825.
  10. Hanisch, D., Zien, A., Zimmer, R., & Lengauer, T. (2002). Co-clustering of biological networks and gene expression data. Bioinformatics, 18(suppl 1), S145- S154.
  11. Hobbs, J. R. (1985). Granularity. In In Proceedings of the Ninth International Joint Conference on Artificial Intelligence.
  12. Hobbs, J. R. (1995). Sketch of an ontology underlying the way we talk about the world. International journal of human-computer studies, 43(5), 819-830.
  13. Ingram, P. J., Stumpf, M. P., & Stark, J. (2006). Network motifs: structure does not determine function. BMC genomics, 7(1), 108.
  14. Jiang, P., & Singh, M. (2010). SPICi: a fast clustering algorithm for large biological networks. Bio informatics, 26(8), 1105-1111.
  15. Kashani, Z. R., Ahrabian, H., Elahi, E., Nowzari-Dalini, A., Ansari, E. S., Asadi, S., ... & Masoudi-Nejad, A. (2009). Kavosh: a new algorithm for finding network motifs. BMC bioinformatics, 10(1), 318.
  16. Kerrien, S., Aranda, B., Breuza, L., Bridge, A., BroackesCarter, F., Chen, C., ... & Hermjakob, H. (2011). The IntAct molecular interaction database in 2012.Nucleic acids research, gkr1088.
  17. Liu, Z., Cao, J., Gao, X., Zhou, Y., Wen, L., Yang, X., Xuebiao, Y., Ren, J., & Xue, Y. (2011). CPLA 1.0: an integrated database of protein lysine acetylation. Nucleic acids research, 39(suppl 1), D1029-D1034.
  18. McCalla, G., Greer, J., Barrie, B., & Pospisil, P. (1992). Granularity hierarchies.Computers & Mathematics with Applications, 23(2), 363-375.
  19. Medintz, I. L., Vora, G. J., Rahbar, A. M., & Thach, D. C. (2007). Transcript and proteomic analyses of wild-type and gpa2 mutant Saccharomyces cerevisiae strains suggest a role for glycolytic carbon source sensing in pseudohyphal differentiation. Molecular BioSystems, 3(9), 623-634.
  20. Obayashi, T., & Kinoshita, K. (2009). Rank of correlation coefficient as a comparable measure for biological significance of gene coexpression. DNA research, 16(5), 249-260.
  21. Pawlak, Zdzislaw (1982). "Rough sets". International Journal of Parallel Programming 11 (5): 341- 356.doi:10.1007/BF01001956.
  22. Prasad, T. K., Goel, R., Kandasamy, K., Keerthikumar, S., Kumar, S., Mathivanan, S., ... & Pandey, A. (2009). Human protein reference database-2009 update. Nucleic acids research, 37(suppl 1), D767- D772.
  23. Rector, A., Rogers, J., & Bittner, T. (2006). Granularity, scale and collectivity: when size does and does not matter. Journal of biomedical informatics, 39(3), 333- 349.
  24. Rhee, S. Y., Wood, V., Dolinski, K., & Draghici, S. (2008). Use and misuse of the gene ontology annotations. Nature Reviews Genetics, 9(7), 509-515.
  25. Salwinski, L., Miller, C. S., Smith, A. J., Pettit, F. K., Bowie, J. U., & Eisenberg, D. (2004). The database of interacting proteins: 2004 update.Nucleic acids research, 32(suppl 1), D449-D451.
  26. Slowinski, R., Greco, S., & Matarazzo, B. (2014). Roughset-based decision support. In Search Methodologies (pp. 557-609). Springer US.
  27. Sun, B., & Ma, W. (2015). Multigranulation rough set theory over two universes. Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology, 28(3), 1251-1269.
  28. Taneera J, Lang S, Sharma A, Fadista J et al. A systems genetics approach identifies genes and pathways for type 2 diabetes in human islets. Cell Metab2012 Jul 3;16(1):122-34. PMID: 22768844.
  29. Thorne, T., & Stumpf, M. P. (2007). Generating confidence intervals on biological networks. BMC bioinfo, 8(1), 467.
  30. Veres, D. V., Gyurkó, D. M., Thaler, B., Szalay, K. Z., Fazekas, D., Korcsmáros, T., & Csermely, P. (2014). ComPPI: a cellular compartment-specific database for protein-protein interaction network analysis. Nucleic acids research, gku1007.
  31. Vogt, L., Grobe, P., Quast, B., & Bartolomaeus, T. (2012). Accommodating ontologies to biological reality-toplevel categories of cumulative-constitutively organized material entities. PLoS One, 7(1), e30004.
  32. Xu, T., Feng, Q., Jacob, M. R., Avula, B., Mask, M. M., Baerson, S. R., ... & Agarwal, A. K. (2011). The marine sponge-derived polyketide endoperoxide plakortide F acid mediates its antifungal activity by interfering with calcium homeostasis. Antimicrobial agents and chemotherapy, 55(4), 1611-1621.
  33. Zhang, B., & Horvath, S. (2005). A general framework for weighted gene co-expression network analysis. Statistical applications in genetics and molecular biology, 4(1), 1128.
Download


Paper Citation


in Harvard Style

West S. and Ali H. (2016). On the Impact of Granularity in Extracting Knowledge from Bioinformatics Data . In Proceedings of the 9th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 3: BIOINFORMATICS, (BIOSTEC 2016) ISBN 978-989-758-170-0, pages 92-103. DOI: 10.5220/0005778700920103


in Bibtex Style

@conference{bioinformatics16,
author={Sean West and Hesham Ali},
title={On the Impact of Granularity in Extracting Knowledge from Bioinformatics Data},
booktitle={Proceedings of the 9th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 3: BIOINFORMATICS, (BIOSTEC 2016)},
year={2016},
pages={92-103},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005778700920103},
isbn={978-989-758-170-0},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 9th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 3: BIOINFORMATICS, (BIOSTEC 2016)
TI - On the Impact of Granularity in Extracting Knowledge from Bioinformatics Data
SN - 978-989-758-170-0
AU - West S.
AU - Ali H.
PY - 2016
SP - 92
EP - 103
DO - 10.5220/0005778700920103