BIODBLINK: MULTI-LEVEL DATA MATCHING FOR AUTOMATIC GENERATION OF CROSS LINKS AMONG BIOCHEMICAL PATHWAY DATABASES

Jyh-Jong Tsay, Bo-Liang Wu, Hou-Ji Dai

Abstract

Most of biological databases provide cross links that point to data records describing the same object in other databases. However, as more and more databases are available, manually creating and maintaining cross links becomes very time consuming, if not impossible. Existing databases provide only a small portion of all possible links. In this paper, we present a database cross link server BioDBLink that can automatically collect and generate cross links among biological databases. The core of BioDBLink is a data matching technique that identifies and matches data records or elements describing the same object among pathway databases. Experiment on a data set collected from several pathway, enzyme and compound databases shows that our approach is able to identify most of the cross links provided by current databases, discover a large number of missing links, and detect inconsistency and duplicate errors.

References

  1. Birkland A., Yona G., 2006. BIOZON: a system for unification, management and analysis of heterogeneous biological data. In BMC Bioinformatics. 7:70doi:10.1186/1471-2105-7-70.
  2. Garcia C. A., Chen Y. P., Ragan M. A.,2005. Information integration in molecular bioscience. In Applied Bioinformatics, 4(3), 157-173.
  3. Macauley J., Wang H., Goodman N., 1998. A Model System for Studying the Integration of Molecular Biology Databases. Bioinformatic, 14(7), 575-582.
  4. Krishnamurthy L., Nadeau J., Ozsoyoglu G., Ozsoyoglu M., Schaeffer G., Tasan M. and Xu W., 2003. Pathways database system: an integrated system for biological pathways. Bioinformatic, 19(8), 930-937.
  5. Chen Y. P., Chen Q., 2006. Analyzing Inconsistency Toward Enhancing Integration of Biological Molecular Databases. In APBC , thefourth AsiaPacific Bioinformatics Conference, 197-206.
  6. Rajasimha K. H., 2004. PathMeld: A Methodology for the Unification of Metabolic Pathway Databases. Computer Science and Application, 2004 Jyh-Jong Tsay, Bo-Liang Wu and Chien-Wen Chen. Data Matching for Physical Integration of Biochemical Pathway Databases. IEEE International Conference on Bioinformatics and Bioengineering, 2009.
  7. Lim E., Chiang R. H., 2000. The integration of relationship instances from heterogeneous databases. Decision Support Systems, 29, 153-167 Sujansky W.,2001. Heterogeneous Database Integration in Biomedicine. Journal of Biomedical Informatics, 34, 285-298.
  8. Li W. and Clifton C., 2000. SEMINT: A tool for identifying attribute correspondences in heterogeneous databases using neural networks. Data and Knowledge Engineering, 33, 49-84.
  9. KEGG, Available at http://www.genome.jp/kegg/ Karp P. D., Riley M., Saier M., Paulsen I., Paley S., and A, 2000. Pellegrini-Toole, The EcoCyc and MetaCyc databases. Nucleic Acids Research, 28, 56-59.
  10. METACYC, Available at http://metacyc.org/ Green M. L. and Karp P. D., 2005. Genome annotation errors in pathway databases due to semantic ambiguity in partial EC numbers. Nucleic Acids Research, 33(13), 4035-4039.
Download


Paper Citation


in Harvard Style

Tsay J., Wu B. and Dai H. (2011). BIODBLINK: MULTI-LEVEL DATA MATCHING FOR AUTOMATIC GENERATION OF CROSS LINKS AMONG BIOCHEMICAL PATHWAY DATABASES . In Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2011) ISBN 978-989-8425-36-2, pages 125-130. DOI: 10.5220/0003136601250130


in Bibtex Style

@conference{bioinformatics11,
author={Jyh-Jong Tsay and Bo-Liang Wu and Hou-Ji Dai},
title={BIODBLINK: MULTI-LEVEL DATA MATCHING FOR AUTOMATIC GENERATION OF CROSS LINKS AMONG BIOCHEMICAL PATHWAY DATABASES},
booktitle={Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2011)},
year={2011},
pages={125-130},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003136601250130},
isbn={978-989-8425-36-2},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2011)
TI - BIODBLINK: MULTI-LEVEL DATA MATCHING FOR AUTOMATIC GENERATION OF CROSS LINKS AMONG BIOCHEMICAL PATHWAY DATABASES
SN - 978-989-8425-36-2
AU - Tsay J.
AU - Wu B.
AU - Dai H.
PY - 2011
SP - 125
EP - 130
DO - 10.5220/0003136601250130