FINDING PROTEIN FAMILY SIMILARITIES IN REAL TIME THROUGH MULTIPLE 3D AND 2D REPRESENTATIONS, INDEXING AND EXHAUSTIVE SEARCHING

Eric Paquet, Herna Lydia Viktor

Abstract

Research suggests that the complex geometric shapes of amino-acid sequence folds often determine their functions. In order to aid domain experts to classify new protein structures, and to be able to identify the functions of such new discoveries, accurate shape-related algorithms for locating similar protein structures are thus needed. To this end, we present our Content-based Analysis of Protein Structure for Retrieval and Indexing system, which locates protein families, and identifies similarities between families, based on the 2D and 3D signatures of protein structures. Our approach is novel in that we utilize five different representations, using a query by prototype approach. These diverse representations provide us with the ability to view a particular protein structure, and the family it belongs to, focusing on (1) the C-α chain, (2) the atomic position, (3) the secondary structure, based on (4) residue type or (5) residue name. Our experimental results indicate that our method is able to accurately locate protein families, when evaluated against the 53.000 entries located within the Protein Data Bank performing an exhaustive search in less than a fraction of a second.

References

  1. Abeysinghe, S., Tao, J., Baker, M. L., Wah, C. (2008). Shape Modeling and Matching in Identifying 3D Protein Structures. Computer Aided Design, 40 (6), 708-720.
  2. Akbar, S., Kung, J. and Wagner, R. (2006). Exploiting Geometrical Proper-ties of Protein Similarity Search. Proceeding of the 17th International Conference on Database and Expert Systems Applications (DEXA'06), Krakow, Poland, 228-234.
  3. Andreeva A., Howorth D., Chandonia J.-M., Brenner S.E., Hubbard T.J.P., Chothia C., Murzin A.G. (2008). Data growth and its impact on the SCOP database: new developments. Nucl. Acid Res. 36, D419-D425.
  4. Berman, H.M. et al. (2000). The Protein Data Bank. Nucleic Acids Research, 28, 235-242.
  5. Berman, H.M. et al. (2008). The Protein Data Bank. http://www.wwpdb.org.
  6. Chenyang, C., Zhen, L. (2008). Classification of 3D Protein based on Structure Information Feature. International Conference on Biomedical Engineering and Informatics (BMEI 2008), Sanya, China, 98-101.
  7. Chi, P.H., Scott, G., Shyu, C.-R. (2004). A Fast Protein Structure Sys-tem Using Image-Based Distance Matrices and Multidimensional Index. Proceeding of the Fourth IEEE Symposium on Bioinformatics and Bioengineering (BIBE'04), Taichung, Taiwan, 522- 532.
  8. Cui, C., Shi, J. (2004). Automatic retrieval of 3D Protein Structures based on Shape Similarity. SPIE: Storage and Retrieval Methods and Application for Multimedia, 5397, 543-549.
  9. Daras, P. et. al. (2006). Three-dimensional shapestructure comparison method for protein classification. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 3(3), 193-207.
  10. Huang, z. et. al. (2006). 3D Protein Structure Matching by Patch Signatures. DEXA 2006, LNCS 4080, Springer-Verlag, Berlin, 528-537.
  11. Lancia, G., Istrail, S. (2003). Mathematical Methods for Protein Structure Analysis and Design. C.I.M.E Summer School Advanced Lectures, Protein Structure Comparison: Algorithms and Applications, LNBI 2666, Springer-Verlag, Berlin, 1-33.
  12. Paquet, E., Viktor, H.L. (2007). CAPRI- Content-based Analysis of Protein Structure for Retrieval and Indexing, VLDB 2007 Workshop on Bioinformatics, Vienna: Austria, VLDB Press, 10 pp.
  13. Paquet, E., Viktor, H.L. (2007). Discovering Protein Families using Invariant 3D Shape-based Signatures. 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (ECBS 2006), Lyon, France, 1204-1208.
  14. Paquet, E., Viktor, H. L. (2008). CAPRI/MR: Exploring Protein Databases from a Structural and Physicochemical Point of View. 34th International Conference on Very Large Data Bases (VLDB 2008), Auckland, New Zealand, 1504-1507.
  15. Ohkawa, T., Nonomura, Y., Inoue, K. (2004). Logical Cluster Construction in a Grid Environment for Similar Protein Retrieval. Proceeding of the 2004 International Symposium on Applications and the Internet Workshops (SAINTW'04), Tokyo, Japan, 5- 16.
  16. Park, S.-H., Park, S.-J., Park, S.H. (2005). A Protein Structure Retrieval System Using 3D Edge Histogram, Key Engineering Materials. 277-279, 324-330.
  17. Yeh, J.-S., Chen, D.-Y., Ouhyoung, M. (2005). A Webbased Protein Retrieval System by Matching Visual Similarity. Bioinformatics, 21 (13), 3056-3057.
  18. Ying, Z.; Kaixing, Z., Yuankui, M. (2008). 3D Protein Structure Similarity Comparison using a Shape Distribution Method. 5th International Conference on Information Technology and Applications in Biomedicine in conjunction with 2nd International Symposium & Summer School on Biomedical and Health Engineering, Shenzhen, China, 233-236.
  19. Zaki, M. J., Bystroff (2008). Protein Structure Prediction. Totowa, NJ: Humana Press.
Download


Paper Citation


in Harvard Style

Paquet E. and Lydia Viktor H. (2009). FINDING PROTEIN FAMILY SIMILARITIES IN REAL TIME THROUGH MULTIPLE 3D AND 2D REPRESENTATIONS, INDEXING AND EXHAUSTIVE SEARCHING . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2009) ISBN 978-989-674-011-5, pages 127-133. DOI: 10.5220/0002286801270133


in Bibtex Style

@conference{kdir09,
author={Eric Paquet and Herna Lydia Viktor},
title={FINDING PROTEIN FAMILY SIMILARITIES IN REAL TIME THROUGH MULTIPLE 3D AND 2D REPRESENTATIONS, INDEXING AND EXHAUSTIVE SEARCHING},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2009)},
year={2009},
pages={127-133},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002286801270133},
isbn={978-989-674-011-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2009)
TI - FINDING PROTEIN FAMILY SIMILARITIES IN REAL TIME THROUGH MULTIPLE 3D AND 2D REPRESENTATIONS, INDEXING AND EXHAUSTIVE SEARCHING
SN - 978-989-674-011-5
AU - Paquet E.
AU - Lydia Viktor H.
PY - 2009
SP - 127
EP - 133
DO - 10.5220/0002286801270133