Authors:
Chad R. Laing
1
;
Eduardo Taboada
2
;
Peter Kruczkiewicz
1
;
James E. Thomas
3
and
Victor P. J. Gannon
2
Affiliations:
1
Public Health Agency of Canada and University of Lethbridge, Canada
;
2
Public Health Agency of Canada, Canada
;
3
University of Lethbridge, Canada
Keyword(s):
Genomics, Database, Molecular Epidemiology, Phenotype, Comparative Analyses.
Related
Ontology
Subjects/Areas/Topics:
Algorithms and Software Tools
;
Bioinformatics
;
Biomedical Engineering
;
Databases and Data Management
;
Genomics and Proteomics
;
Next Generation Sequencing
;
Sequence Analysis
;
Web Services in Bioinformatics
Abstract:
Background. Recent outbreaks caused by bacterial contaminants in food, including sprouts by E. coli O104:H4 in Germany and processed meats by Listeria in Canada highlight the need for rapid and accurate characterization of bacterial pathogens. Current sequencing platforms have revolutionized the amount and quality of data available to epidemiologists, public health officials and microbiologists, who now require powerful yet intuitive tools to make sense of the underlying biology in these large datasets. In this study, we developed bioinformatics tools to: automate whole-genome analyses, make the data broadly accessible via novel reporting functions, and provide a dynamic computational platform for genomic analyses online at http://76.70.11.198/bacpath.
Methods. A PHP-based web front end and PostgreSQL database display the pre-computed data. Genomic comparisons are performed using updates to our previously created pan-genomic software suite, Panseq (http:://lfz.corefacility.ca/panseq/
). New genomic sequences are analyzed and added to the database without the need for recomputing previous analyses. Phylogenetic trees are created with MrBayes. Statistical calculations are performed using R.
Results. A pathogen-specific genomic database encompassing all publicly available E. coli strains was created as a proof of concept. Pre-computed comparisons for the hundreds of bacterial genomes including phylogeny, presence/absence of virulence markers, group-specific biomarkers and geospatial information were generated. Data reporting tools were created to summarize the complexity of the data and to provide biologically pertinent results including genotype, phenotype (eg. anti-microbial resistance), and geospatial information.
Discussion. The database provides rapid and accurate identification and characterization of E. coli. Output is formatted specifically for end users describing virulence, phylogeny and group-specific markers. Uptake of a global surveillance system with near real time analysis will provide an effective early warning system and allow for a faster response to pathogen-related outbreaks.
(More)