PARALLEL CALCULATION OF SUBGRAPH CENSUS IN BIOLOGICAL NETWORKS

Pedro Ribeiro, Fernando Silva, Luís Lopes

2010

Abstract

Mining meaningful data from complex biological networks is a critical task in many areas of research. One important example is calculating the frequency of all subgraphs of a certain size, also known as the subgraph census problem. This can provide a very comprehensive structural characterization of a network and is also used as an intermediate step in the computation of network motifs, an important basic building block of networks, that try to bridge the gap between structure and function. The subgraph census problem is computationally hard and here we present several parallel strategies to solve this problem. Our initial strategies were refined towards achieving an efficient and scalable adaptive parallel algorithm. This algorithm achieves almost linear speedups up to 128 cores when applied to a representative set of biological networks from different domains and makes the calculation of census for larger subgraph sizes feasible.

References

  1. Albert, I. and Albert, R. (2004). Conserved network motifs allow protein-protein interaction prediction. Bioinformatics, 20(18):3346-3352.
  2. Alm, E. and Arkin, A. P. (2003). Biological networks. Current Opinion in Structural Biology, 13(2):193-202.
  3. Barabasi, A. L. and Albert, R. (1999). Emergence of scaling in random networks. Science, 286(5439):509-512.
  4. Bordino, I., Donato, D., Gionis, A., and Leonardi, S. (2008). Mining large networks with subgraph counting. In Procs of the 8th IEEE International Conference on Data Mining (ICDM), pages 6 pp.+.
  5. Bu, D., Zhao, Y., Cai, L., Xue, H., Zhu, X., Lu, H., Zhang, J., Sun, S., Ling, L., Zhang, N., Li, G., and Chen, R. (2003). Topological structure analysis of the proteinprotein interaction network in budding yeast. Nucl. Acids Res., 31(9):2443-2450.
  6. Costa, L., Rodrigues, F. A., Travieso, G., and Boas, P. R. V. (2007). Characterization of complex networks: A survey of measurements. Advances In Physics, 56:167.
  7. Eager, D. L., Lazowska, E. D., and Zahorjan, J. (1986). Adaptive load sharing in homogeneous distributed systems. IEEE Trans. Softw. Eng., 12(5):662-675.
  8. Faust, K. (2007). Very local structure in social networks. Sociological Methodology, 37(1):209-256.
  9. Hall, L. A. (1997). Approximation algorithms for scheduling. In Approximation algorithms for NP-hard problems, pages 1-45, Boston, MA, USA. PWS Publishing Co.
  10. Heymann, E., Senar, M. A., Luque, E., , and Livny, M. (2000). Evaluation of an adaptive scheduling strategy for master-worker applications on clusters of workstations. In Proc. of the 7th International Conference on High Performance Computing, Bangalore, India.
  11. Itzkovitz, S., Levitt, R., Kashtan, N., Milo, R., Itzkovitz, M., and Alon, U. (2005). Coarse-graining and selfdissimilarity of complex networks. Phys Rev E Stat Nonlin Soft Matter Phys, 71(1 Pt 2).
  12. Jeong, H., Tombor, B., Albert, R., Oltvai, Z. N., and Barabási, A. L. (2000). The large-scale organization of metabolic networks. Nature, 407(6804):651-654.
  13. Kashtan, N., Itzkovitz, S., Milo, R., and Alon, U. (2004). Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs. Bioinformatics, 20(11):1746-1758.
  14. Kondoh, M. (2008). Building trophic modules into a persistent food web. Proceedings of the National Academy of Sciences, 105(43):16631-16635.
  15. Kuramochi, M. and Karypis, G. (2001). Frequent subgraph discovery. IEEE International Conference on Data Mining, 0:313.
  16. Matias, C., Schbath, S., Birmel, E., Daudin, J.-J., and Robin, S. (2006). Network motifs: mean and variance for the count. REVSTAT, 4:31-35.
  17. Mazurie, A., Bottani, S., and Vergassola, M. (2005). An evolutionary and functional assessment of regulatory network motifs. Genome Biology, 6:R35.
  18. McKay, B. (1981). Practical graph isomorphism. Congressus Numerantium, 30:45-87.
  19. Middendorf, M., Ziv, E., and Wiggins, C. (2004). Inferring network mechanisms: The drosophila melanogaster protein interaction network. PNAS, 102:3192.
  20. Milo, R., Shen-Orr, S., Itzkovitz, S., Kashtan, N., Chklovskii, D., and Alon, U. (2002). Network motifs: simple building blocks of complex networks. Science, 298(5594):824-827.
  21. Nijssen, S. and Kok, J. N. (2004). Frequent graph mining and its application to molecular databases. In SMC (5), pages 4571-4577. IEEE.
  22. Picard, F., Daudin, J.-J. J., Koskas, M., Schbath, S., and Robin, S. (2008). Assessing the exceptionality of network motifs. J Comput Biol.
  23. Schatz, M., Cooper-Balis, E., and Bazinet, A. (2008). Parallel network motif finding.
  24. Schreiber, F. and Schwobbermeyer, H. (2004). Towards motif detection in networks: Frequency concepts and flexible search. In Proceedings of the International Workshop on Network Tools and Applications in Biology (NETTAB04, pages 91-102.
  25. Sporns, O. and Kotter, R. (2004). Motifs in brain networks. PLoS Biology, 2.
  26. Valverde, S. and Solé, R. V. (2005). Network motifs in computational graphs: A case study in software architecture. Physical Review E (Statistical, Nonlinear, and Soft Matter Physics), 72(2).
  27. Wang, C. and Parthasarathy, S. (2004). Parallel algorithms for mining frequent structural motifs in scientific data. In In ACM International Conference on Supercomputing (ICS) 2004.
  28. Wang, T., Touchman, J. W., Zhang, W., Suh, E. B., and Xue, G. (2005). A parallel algorithm for extracting transcription regulatory network motifs. Bioinformatic and Bioengineering, IEEE International Symposium on, 0:193- 200.
  29. Wasserman, S., Faust, K., and Iacobucci, D. (1994). Social Network Analysis : Methods and Applications (Structural Analysis in the Social Sciences). Cambridge University Press.
  30. Watts, D. J. and Strogatz, S. H. (1998). Collective dynamics of 'small-world' networks. Nature, 393(6684):440-442.
  31. Wernicke, S. (2006). Efficient detection of network motifs. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 3(4):347-359.
Download


Paper Citation


in Harvard Style

Ribeiro P., Silva F. and Lopes L. (2010). PARALLEL CALCULATION OF SUBGRAPH CENSUS IN BIOLOGICAL NETWORKS . In Proceedings of the First International Conference on Bioinformatics - Volume 1: BIOINFORMATICS, (BIOSTEC 2010) ISBN 978-989-674-019-1, pages 56-65. DOI: 10.5220/0002749600560065


in Bibtex Style

@conference{bioinformatics10,
author={Pedro Ribeiro and Fernando Silva and Luís Lopes},
title={PARALLEL CALCULATION OF SUBGRAPH CENSUS IN BIOLOGICAL NETWORKS},
booktitle={Proceedings of the First International Conference on Bioinformatics - Volume 1: BIOINFORMATICS, (BIOSTEC 2010)},
year={2010},
pages={56-65},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002749600560065},
isbn={978-989-674-019-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the First International Conference on Bioinformatics - Volume 1: BIOINFORMATICS, (BIOSTEC 2010)
TI - PARALLEL CALCULATION OF SUBGRAPH CENSUS IN BIOLOGICAL NETWORKS
SN - 978-989-674-019-1
AU - Ribeiro P.
AU - Silva F.
AU - Lopes L.
PY - 2010
SP - 56
EP - 65
DO - 10.5220/0002749600560065