Community Detection within Clusters Helps Large Scale Protein Annotation - Preliminary Results of Modularity Maximization for the BAR+ Database

Giuseppe Profiti, Damiano Piovesan, Pier Luigi Martelli, Piero Fariselli, Rita Casadio

Abstract

Given the exponentially increasing amount of available data, electronic annotation procedures for protein sequences are a core topic in bioinformatics. In this paper we present the refinement of an already published procedure that allows a fine grained level of detail in the annotation results. This enhancement is based on a graph representation of the similarity relationship between sequences within a cluster, followed by the application of community detection algorithms. These algorithms identify groups of highly connected nodes inside a bigger graph. The core idea is that sequences belonging to the same community share more features in respect to all the other sequences in the same graph.

References

  1. Ashburner, M. et al., 2000. Gene Ontology: tool for the unification of biology. Nature Genetics, 25(1), pp.25- 29.
  2. Bartoli, L. et al., 2009. The Bologna Annotation Resource: a Non Hierarchical Method for the Functional and Structural Annotation of Protein Sequences Relying on a Comparative Large-Scale Genome Analysis. Journal of Proteome Research, 8(9), pp.4362-4371.
  3. Bastian, M., Heymann, S. & Jacomy, M., 2009. Gephi: An Open Source Software for Exploring and Manipulating Networks. In International AAAI Conference on Weblogs and Social Media.
  4. Blondel, V. D. et al., 2008. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008, p.P10008.
  5. Boeckmann, B. et al., 2003. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic acids research, 31(1), pp.365-370.
  6. Brandes, U. et al., 2008. On modularity clustering. Knowledge and Data Engineering, IEEE Transactions on, 20(2), pp.172-188.
  7. Diestel, R., 2005. Graph Theory, Springer London, Limited.
  8. Finn, R. D. et al., 2009. The Pfam protein families database. Nucleic Acids Research, 38(Database), pp.D211-D222.
  9. Fortunato, S., 2010. Community detection in graphs. Physics Reports, 486(3-5), pp.75-174.
  10. Fortunato, S. & Barthélemy, M., 2007. Resolution limit in community detection. Proceedings of the National Academy of Sciences, 104(1), p.36.
  11. Girvan, M. & Newman, M. E. J., 2002. Community structure in social and biological networks. Proceedings of the National Academy of Sciences, 99(12), p.7821.
  12. Magrane, M. & Consortium, U., 2011. UniProt Knowledgebase: a hub of integrated protein data. Database, 2011.
  13. McGinnis, S. & Madden, T. L., 2004. BLAST: at the core of a powerful and diverse set of sequence analysis tools. Nucleic acids research, 32 (suppl 2), pp.W20- W25.
  14. Newman, M. & Girvan, M., 2004. Finding and evaluating community structure in networks. Physical Review E, 69.
  15. Newman, M., 2004. Analysis of weighted networks. Physical Review E, 70.
  16. Newman, M. E. J., 2006. Modularity and community structure in networks. Proceedings of the National Academy of Sciences, 103(23), p.8577.
  17. Piovesan, D. et al., 2011. BAR-PLUS: the Bologna Annotation Resource Plus for functional and structural annotation of protein sequences. Nucleic Acids Research, 39, pp.W197-W202.
Download


Paper Citation


in Harvard Style

Profiti G., Piovesan D., Luigi Martelli P., Fariselli P. and Casadio R. (2013). Community Detection within Clusters Helps Large Scale Protein Annotation - Preliminary Results of Modularity Maximization for the BAR+ Database . In Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2013) ISBN 978-989-8565-35-8, pages 328-332. DOI: 10.5220/0004328703280332


in Bibtex Style

@conference{bioinformatics13,
author={Giuseppe Profiti and Damiano Piovesan and Pier Luigi Martelli and Piero Fariselli and Rita Casadio},
title={Community Detection within Clusters Helps Large Scale Protein Annotation - Preliminary Results of Modularity Maximization for the BAR+ Database},
booktitle={Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2013)},
year={2013},
pages={328-332},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004328703280332},
isbn={978-989-8565-35-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2013)
TI - Community Detection within Clusters Helps Large Scale Protein Annotation - Preliminary Results of Modularity Maximization for the BAR+ Database
SN - 978-989-8565-35-8
AU - Profiti G.
AU - Piovesan D.
AU - Luigi Martelli P.
AU - Fariselli P.
AU - Casadio R.
PY - 2013
SP - 328
EP - 332
DO - 10.5220/0004328703280332