A Dynamic Whole-genome Database for Comparative Analyses, Molecular Epidemiology and Phenotypic Summary of Bacterial Pathogens

Chad R. Laing, Eduardo Taboada, Peter Kruczkiewicz, James E. Thomas, Victor P. J. Gannon

2013

Abstract

Background. Recent outbreaks caused by bacterial contaminants in food, including sprouts by E. coli O104:H4 in Germany and processed meats by Listeria in Canada highlight the need for rapid and accurate characterization of bacterial pathogens. Current sequencing platforms have revolutionized the amount and quality of data available to epidemiologists, public health officials and microbiologists, who now require powerful yet intuitive tools to make sense of the underlying biology in these large datasets. In this study, we developed bioinformatics tools to: automate whole-genome analyses, make the data broadly accessible via novel reporting functions, and provide a dynamic computational platform for genomic analyses online at http://76.70.11.198/bacpath. Methods. A PHP-based web front end and PostgreSQL database display the pre-computed data. Genomic comparisons are performed using updates to our previously created pan-genomic software suite, Panseq (http:://lfz.corefacility.ca/panseq/). New genomic sequences are analyzed and added to the database without the need for recomputing previous analyses. Phylogenetic trees are created with MrBayes. Statistical calculations are performed using R. Results. A pathogen-specific genomic database encompassing all publicly available E. coli strains was created as a proof of concept. Pre-computed comparisons for the hundreds of bacterial genomes including phylogeny, presence/absence of virulence markers, group-specific biomarkers and geospatial information were generated. Data reporting tools were created to summarize the complexity of the data and to provide biologically pertinent results including genotype, phenotype (eg. anti-microbial resistance), and geospatial information. Discussion. The database provides rapid and accurate identification and characterization of E. coli. Output is formatted specifically for end users describing virulence, phylogeny and group-specific markers. Uptake of a global surveillance system with near real time analysis will provide an effective early warning system and allow for a faster response to pathogen-related outbreaks.

References

  1. Gilmour, M., Graham, M., Van Domselaar, G., Tyler, S., Kent, H., Trout-Yakel, K., Larios, O., Allen, V., Lee, B., and Nadon, C. (2010). High-throughput genome sequencing of two listeria monocytogenes clinical isolates during a large foodborne outbreak. BMC Genomics, 11(1):120.
  2. Junier, T. and Zdobnov, E. M. (2010). The newick utilities: high-throughput phylogenetic tree processing in the UNIX shell. Bioinformatics (Oxford, England), 26(13):1669-1670. PMID: 20472542.
  3. Kupferschmidt, K. (2011). Outbreak detectives embrace the genome era. Science, 333(6051):1818-1819.
  4. Laing, C., Buchanan, C., Taboada, E. N., Zhang, Y., Kropinski, A., Villegas, A., Thomas, J. E., and Gannon, V. P. J. (2010). Pan-genome sequence analysis using panseq: an online tool for the rapid analysis of core and accessory genomic regions. BMC Bioinformatics, 11:461. PMID: 20843356.
  5. Laing, C., Pegg, C., Yawney, D., Ziebell, K., Steele, M., Johnson, R., Thomas, J. E., Taboada, E. N., Zhang, Y., and Gannon, V. P. J. (2008). Rapid determination of escherichia coli O157:H7 lineage types and molecular subtypes by using comparative genomic fingerprinting. Applied and Environmental Microbiology, 74(21):6606-15. PMID: 18791027.
  6. Laing, C., Villegas, A., Taboada, E. N., Kropinski, A., Thomas, J. E., and Gannon, V. P. J. (2011). Identification of salmonella enterica species- and subgroupspecific genomic regions using panseq 2.0. Infection, Genetics and Evolution: Journal of Molecular Epidemiology and Evolutionary Genetics in Infectious Diseases. PMID: 22001825.
  7. Ronquist, F. and Huelsenbeck, J. P. (2003). MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics, 19(12):1572-1574.
  8. Taboada, E. N. e. a. (2012). Development and validation of a comparative genomic fingerprinting method for high-resolution genotyping of Campylobacter jejuni. Journal of clinical microbiology, 50(3):788-797. PMID: 22170908.
Download


Paper Citation


in Harvard Style

R. Laing C., Taboada E., Kruczkiewicz P., E. Thomas J. and P. J. Gannon V. (2013). A Dynamic Whole-genome Database for Comparative Analyses, Molecular Epidemiology and Phenotypic Summary of Bacterial Pathogens . In Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2013) ISBN 978-989-8565-35-8, pages 304-307. DOI: 10.5220/0004239803040307


in Bibtex Style

@conference{bioinformatics13,
author={Chad R. Laing and Eduardo Taboada and Peter Kruczkiewicz and James E. Thomas and Victor P. J. Gannon},
title={A Dynamic Whole-genome Database for Comparative Analyses, Molecular Epidemiology and Phenotypic Summary of Bacterial Pathogens},
booktitle={Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2013)},
year={2013},
pages={304-307},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004239803040307},
isbn={978-989-8565-35-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2013)
TI - A Dynamic Whole-genome Database for Comparative Analyses, Molecular Epidemiology and Phenotypic Summary of Bacterial Pathogens
SN - 978-989-8565-35-8
AU - R. Laing C.
AU - Taboada E.
AU - Kruczkiewicz P.
AU - E. Thomas J.
AU - P. J. Gannon V.
PY - 2013
SP - 304
EP - 307
DO - 10.5220/0004239803040307