Table 5: Enzyme matching result.
From DB 1 to DB 2
Number of
predicted
links
Number of
extracted links
Recall w.r.t.
extracted links
MetaCyc to KEGG 5855 7597 0.74
KEGG to ExplorEnz 4254 4257 0.999
KEGG to IUBMB
4253 4257 0.999
KEGG to ExPASy
4204 4257 0.99
KEGG to UM-BBD 318 289 0.98
KEGG to BRENDA 4155 4257 0.97
5 CONCLUSIONS
In this paper, we present a pathway database link
server BioDBLink that can automatically collect and
generate cross links among biological databases. The
core of BioDBLink is a multi-level data matching
technique that identifies and matches data records or
elements describing the same object. Matching
results can also be used to induce more accurate and
complete object descriptions, remove data
redundancy, and check data consistency. Experiment
on a set of pathway, compound and enzyme
databases shows that our approach is feasible,
identifies a large number of matchings, and detect
database inconsistency and duplicate errors. In the
future, we will continue to extend our server to
incorporate more databases available on internet,
and develop data matching techniques to match
other types of biological entities. Our goal is to
provide a database link server for more biological
databases.
REFERENCES
Birkland A., Yona G., 2006. BIOZON: a system for
unification, management and analysis of
heterogeneous biological data. In BMC Bioinformatics.
7:70doi:10.1186/1471-2105-7-70.
Garcia C. A., Chen Y. P., Ragan M. A.,2005. Information
integration in molecular bioscience. In Applied
Bioinformatics, 4(3), 157-173.
Macauley J., Wang H., Goodman N., 1998. A Model
System for Studying the Integration of Molecular
Biology Databases. Bioinformatic, 14(7), 575-582.
Krishnamurthy L., Nadeau J., Ozsoyoglu G., Ozsoyoglu
M., Schaeffer G., Tasan M. and Xu W., 2003.
Pathways database system: an integrated system for
biological pathways. Bioinformatic, 19(8), 930-937.
Chen Y. P., Chen Q., 2006. Analyzing Inconsistency
Toward Enhancing Integration of Biological
Molecular Databases. In APBC , thefourth Asia-
Pacific Bioinformatics Conference, 197-206.
Rajasimha K. H., 2004. PathMeld: A Methodology for the
Unification of Metabolic Pathway Databases.
Computer Science and Application, 2004
Jyh-Jong Tsay, Bo-Liang Wu and Chien-Wen Chen. Data
Matching for Physical Integration of Biochemical
Pathway Databases. IEEE International Conference
on Bioinformatics and Bioengineering, 2009.
Lim E., Chiang R. H., 2000. The integration of
relationship instances from heterogeneous databases.
Decision Support Systems, 29, 153-167
Sujansky W.,2001. Heterogeneous Database Integration in
Biomedicine. Journal of Biomedical Informatics, 34,
285-298.
Li W. and Clifton C., 2000. SEMINT: A tool for
identifying attribute correspondences in heterogeneous
databases using neural networks. Data and Knowledge
Engineering, 33, 49-84.
KEGG, Available at http://www.genome.jp/kegg/
Karp P. D., Riley M., Saier M., Paulsen I., Paley S., and
A, 2000. Pellegrini-Toole, The EcoCyc and MetaCyc
databases. Nucleic Acids Research, 28, 56-59.
METACYC, Available at http://metacyc.org/
Green M. L. and Karp P. D., 2005. Genome annotation
errors in pathway databases due to semantic ambiguity
in partial EC numbers. Nucleic Acids Research,
33(13), 4035-4039.
PubChem, Available at http://pubchem.ncbi.nlm.nih.gov/
ChEBI, Available at http://www.ebi.ac.uk/
KNApSAcK, Available at http://kanaya.aist-nara.ac.jp/
LIPIDMAPS, Available at http://www.lipidmaps.org/
LipidBank, Available at http://lipidbank.jp/
PDB-CCD, Available at http://remediation.wwpdb.org/
3DMET, Available at http://www.3dmet.dna.affrc.go.jp/
Nikkaji, Available at http://nikkajiweb.jst.go.jp/
NCI, Available at http://cactus.nci.nih.gov/
UM-BBD, Available at http://umbbd.msi.umn.edu/
ExplorEnz, Available at http://www.enzyme-database.org/
IUBMB, http://www.chem.qmul.ac.uk/iubmb/enzyme/
ExPASy, Available at http://www.expasy.org/
BRENDA, Available at http://www.brenda-enzymes.org/
BIOINFORMATICS 2011 - International Conference on Bioinformatics Models, Methods and Algorithms
130