Authors:
Eric Paquet
1
and
Herna Lydia Viktor
2
Affiliations:
1
National Research Council of Canada, Canada
;
2
University of Ottawa, Canada
Keyword(s):
Indexing, Proteins, Representation, Searching, 3D.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
BioInformatics & Pattern Discovery
;
Knowledge Discovery and Information Retrieval
;
Knowledge-Based Systems
;
Mining Multimedia Data
;
Symbolic Systems
Abstract:
Research suggests that the complex geometric shapes of amino-acid sequence folds often determine their functions. In order to aid domain experts to classify new protein structures, and to be able to identify the functions of such new discoveries, accurate shape-related algorithms for locating similar protein structures are thus needed. To this end, we present our Content-based Analysis of Protein Structure for Retrieval and Indexing system, which locates protein families, and identifies similarities between families, based on the 2D and 3D signatures of protein structures. Our approach is novel in that we utilize five different representations, using a query by prototype approach. These diverse representations provide us with the ability to view a particular protein structure, and the family it belongs to, focusing on (1) the C-α chain, (2) the atomic position, (3) the secondary structure, based on (4) residue type or (5) residue name. Our experimental results indicate that our met
hod is able to accurately locate protein families, when evaluated against the 53.000 entries located within the Protein Data Bank performing an exhaustive search in less than a fraction of a second.
(More)