Authors:
Sean West
and
Hesham Ali
Affiliation:
University of Nebraska at Omaha, United States
Keyword(s):
Data Integration, Knowledge Extraction, Gene Expression Data, Protein-protein Interaction, Co-regulation, Correlation Networks, and Clusters.
Related
Ontology
Subjects/Areas/Topics:
Algorithms and Software Tools
;
Bioinformatics
;
Biomedical Engineering
;
Data Mining and Machine Learning
;
Databases and Data Management
;
Genomics and Proteomics
;
Systems Biology
Abstract:
With the rapidly increasing amount of various types of biological data currently available to researchers, the focus of the biomedical research community has been shifting from pure data generation towards the development of new methodologies for data analytics. Although many researchers continue to focus on approaches developed for analyzing single types of biological data, recent attempts have been made to utilize the availability of heterogeneous data sets that contain various types of data and try to establish tools for data integration and analysis in many bioinformatics applications. Such attempts are expected to increase significantly in this coming decade. While this can be viewed as a positive step towards advancing big data analytics in bioinformatics, it is critical that these integration methodologies are meticulously studied to ensure high quality of the knowledge extracted from the integrated data. In this work, we employ data integration methods to analyze biological d
ata obtained from protein interaction networks and gene expression data. We conduct a study to show that potential problems can arise from integrating or fusing data obtained at different granularity levels and highlight the importance of developing advanced data fusing techniques to integrate various types of biological data for analytical purposes. Further, we explore the impact of granularity from a more formulized approach and the granularity levels significantly impact the quality of knowledge extracted from the integrated data.
(More)