Community Detection within Clusters Helps Large Scale Protein Annotation - Preliminary Results of Modularity Maximization for the BAR+ Database
Giuseppe Profiti, Damiano Piovesan, Pier Luigi Martelli, Piero Fariselli, Rita Casadio
2013
Abstract
Given the exponentially increasing amount of available data, electronic annotation procedures for protein sequences are a core topic in bioinformatics. In this paper we present the refinement of an already published procedure that allows a fine grained level of detail in the annotation results. This enhancement is based on a graph representation of the similarity relationship between sequences within a cluster, followed by the application of community detection algorithms. These algorithms identify groups of highly connected nodes inside a bigger graph. The core idea is that sequences belonging to the same community share more features in respect to all the other sequences in the same graph.
References
- Ashburner, M. et al., 2000. Gene Ontology: tool for the unification of biology. Nature Genetics, 25(1), pp.25- 29.
- Bartoli, L. et al., 2009. The Bologna Annotation Resource: a Non Hierarchical Method for the Functional and Structural Annotation of Protein Sequences Relying on a Comparative Large-Scale Genome Analysis. Journal of Proteome Research, 8(9), pp.4362-4371.
- Bastian, M., Heymann, S. & Jacomy, M., 2009. Gephi: An Open Source Software for Exploring and Manipulating Networks. In International AAAI Conference on Weblogs and Social Media.
- Blondel, V. D. et al., 2008. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008, p.P10008.
- Boeckmann, B. et al., 2003. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic acids research, 31(1), pp.365-370.
- Brandes, U. et al., 2008. On modularity clustering. Knowledge and Data Engineering, IEEE Transactions on, 20(2), pp.172-188.
- Diestel, R., 2005. Graph Theory, Springer London, Limited.
- Finn, R. D. et al., 2009. The Pfam protein families database. Nucleic Acids Research, 38(Database), pp.D211-D222.
- Fortunato, S., 2010. Community detection in graphs. Physics Reports, 486(3-5), pp.75-174.
- Fortunato, S. & Barthélemy, M., 2007. Resolution limit in community detection. Proceedings of the National Academy of Sciences, 104(1), p.36.
- Girvan, M. & Newman, M. E. J., 2002. Community structure in social and biological networks. Proceedings of the National Academy of Sciences, 99(12), p.7821.
- Magrane, M. & Consortium, U., 2011. UniProt Knowledgebase: a hub of integrated protein data. Database, 2011.
- McGinnis, S. & Madden, T. L., 2004. BLAST: at the core of a powerful and diverse set of sequence analysis tools. Nucleic acids research, 32 (suppl 2), pp.W20- W25.
- Newman, M. & Girvan, M., 2004. Finding and evaluating community structure in networks. Physical Review E, 69.
- Newman, M., 2004. Analysis of weighted networks. Physical Review E, 70.
- Newman, M. E. J., 2006. Modularity and community structure in networks. Proceedings of the National Academy of Sciences, 103(23), p.8577.
- Piovesan, D. et al., 2011. BAR-PLUS: the Bologna Annotation Resource Plus for functional and structural annotation of protein sequences. Nucleic Acids Research, 39, pp.W197-W202.
Paper Citation
in Harvard Style
Profiti G., Piovesan D., Luigi Martelli P., Fariselli P. and Casadio R. (2013). Community Detection within Clusters Helps Large Scale Protein Annotation - Preliminary Results of Modularity Maximization for the BAR+ Database . In Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2013) ISBN 978-989-8565-35-8, pages 328-332. DOI: 10.5220/0004328703280332
in Bibtex Style
@conference{bioinformatics13,
author={Giuseppe Profiti and Damiano Piovesan and Pier Luigi Martelli and Piero Fariselli and Rita Casadio},
title={Community Detection within Clusters Helps Large Scale Protein Annotation - Preliminary Results of Modularity Maximization for the BAR+ Database},
booktitle={Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2013)},
year={2013},
pages={328-332},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004328703280332},
isbn={978-989-8565-35-8},
}
in EndNote Style
TY - CONF
JO - Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2013)
TI - Community Detection within Clusters Helps Large Scale Protein Annotation - Preliminary Results of Modularity Maximization for the BAR+ Database
SN - 978-989-8565-35-8
AU - Profiti G.
AU - Piovesan D.
AU - Luigi Martelli P.
AU - Fariselli P.
AU - Casadio R.
PY - 2013
SP - 328
EP - 332
DO - 10.5220/0004328703280332