INFRASTRUCTURE FOR METAGENOME DATA MANAGEMENT AND ANALYSIS

Tatiana Tatusova

2011

Abstract

Metagenome sequencing projects are generating unprecedented amounts of data. Public sequence archive databases are challenged with large-scale data management issues including data storage, quick search and retrieval of the sequence data for further analysis. The sequence data is linked to the rich set of metadata attributes such as geochemical and ecological parameters for environmental projects and clinical patient in-formation for human microbiome studies. That complex collection of heterogeneous information has to be integrated, organized and presented to the users in a meaningful and the most useful way. For the last 20 years The National Center for Biotechnology Information (NCBI) has been developing the infrastructure that allows an easy storage and distribution of various types of bimolecular data as well as data integration and easy navigation in complex information space. Here we describe NCBI resources that are used for me-tagenomics data management.

References

  1. Sayers E. W. et al.: Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2010 Jan; 38 (Database issue): D5-16.
  2. Shumway M.: The Sequence Read Archive (SRA) - A worldwide resource. Nucleic Acids Res. 2010 Jan; 38 (Database issue): D.
  3. Benson D. A., Karsch-Mizrachi I., Lipman D. J., Ostell J., Sayers E. W.: GenBank. Nucleic Acids Res. 2010 Jan; 38 (Database issue): D46-51.
  4. Pruitt K. D., Tatusova T., Klimke W., Maglott D. R.: NCBI Reference Sequences: current status, policy and new initiatives. Nucleic Acids Res. 2009 Jan; 37 (Database issue): D32-6.
  5. Altschul S. F., Gish W., Miller W., Myers E. W., Lipman D. J.: Basic local alignment search tool. J. Mol. Biol. 1990; 215: 403-410.
  6. Altschul S. F., Madden T. L., Schaffer A. A., Zhang J., Zhang Z., Miller W., Lipman D. J.: Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 1997; 25:3389-3402.
  7. Ye J., McGinnis S., Madden T. L.: BLAST: improvements for better sequence analysis. Nucleic Acids Res. 2006; 34: W6-W9.
  8. Zhang Z., Schwartz S., Wagner L., Miller W.: A greedy algorithm for aligning DNA sequences. J. Comput. Biol. 2000; 7: 203-214.
  9. Cummings L., Riley L., Black L., Souvorov A., Resenchuk S., Dondoshansky I., Tatusova T.: Genomic BLAST: custom-defined virtual databases for complete and unfinished genomes. FEMS Microbiol Lett. 2002 Nov 5; 216 (2): 133-8.
Download


Paper Citation


in Harvard Style

Tatusova T. (2011). INFRASTRUCTURE FOR METAGENOME DATA MANAGEMENT AND ANALYSIS . In Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: Meta, (BIOSTEC 2011) ISBN 978-989-8425-36-2, pages 357-362. DOI: 10.5220/0003333803570362


in Bibtex Style

@conference{meta11,
author={Tatiana Tatusova},
title={INFRASTRUCTURE FOR METAGENOME DATA MANAGEMENT AND ANALYSIS},
booktitle={Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: Meta, (BIOSTEC 2011)},
year={2011},
pages={357-362},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003333803570362},
isbn={978-989-8425-36-2},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: Meta, (BIOSTEC 2011)
TI - INFRASTRUCTURE FOR METAGENOME DATA MANAGEMENT AND ANALYSIS
SN - 978-989-8425-36-2
AU - Tatusova T.
PY - 2011
SP - 357
EP - 362
DO - 10.5220/0003333803570362