Furthermore, mitochondrial genotypes of 19,337
datasets from the SRA have been analyzed in this
study. Many substitutions and indels were identified
within these samples, some of which have not been
reported in the Mitomap project and may represent
rare variants. The methods used here are very similar
to the pipeline used to analyze exome sequencing
data of the 1000 genomes project (Diroma et al.
2014, Picardi et al. 2012). While they used a
sophisticated approach to recover putative
mitochondrial reads from hits to known NumT
sequences, this was not done here, to more
effectively analyse a diverse set of samples. Hence,
sequences similar to NumT sequences in the human
genome may be underrepresented in this analysis.
When searching for a mutation in patients with
suspected hereditary diseases, it is often crucial to
discern pathogenic variants from non-pathogenic
variants. Often, allele frequencies of a healthy
population may give a clue, since it is unlikely that a
pathogenic variant is seen in a large proportion of
"random" genomes. Here, we found 4,653
substitutions, 883 of which have not yet been
reported in the largest database of mitochondrial
variation (Lott et al. 2013). Also, 2,008 distinct
indels have been detected. Some of these may be
artifacts due to non-standard library preparation
types or inaccurate sequencing technologies. Using
the large database of sequencing experiments
provided by SRA might provide more clues on rare
allele frequencies on the mitochondrial genome than
have been available to date. However, since a
dedicated description of a sample regarding its
phenotypes is most often missing, this method may
detect rare variants, but probably may be of limited
use in discerning pathogenic variants from non-
pathogenic variants on the mitochondrial genome.
However, many possible difficulties may arise with
the automated approach of assigning mitochondrial
genotypes based on off-target sequencing data from
various experiments. Different library preparation
techniques alter the nucleotide sequence itself and
while the most common one (bisulfite sequencing)
has been excluded from this analyses, other lesser
common preparation techniques might introduce
their own pattern of sequence alteration. Also,
sequencing technology may introduce non-random
biases into the sequencing data if the error profile of
a technology is different depending on the DNA
context (e.g. Illumina's GGC-error (Nakamura et al.
2011)).
Furthermore heteroplasmy may represent a
challenge in correctly identifying mitochondrial
genotypes. Here, identification of heteroplasmy has
not been done, since it is hard to discern from
sequencing artifacts and NumTs.
Finally, it should be noted, that this method only
works well in experiments, where sequencing reads
on the mitochondrial genome can be found as a by-
product of non-specific binding or non-effective
removal of untargeted DNA. Hence, we think, in
order to further enable researchers to test new ideas
on data or reinterpret primary data sources, public
deposition of any kind of data should become
standard not only in biomedical research, but in
research in general.
REFERENCES
Sequence Read Archive (2015) Available from:
http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi. [19
July 2015].
Mamanova L, Coffey AJ, Scott CE et al. (2010): "Target-
enrichment strategies for next-generation sequencing".
In: Nature Methods. 7 (2), S. 111-118, DOI:
10.1038/nmeth.1419.
Diroma MA, Calabrese C, Simone D et al. (2014):
"Extraction and annotation of human mitochondrial
genomes from 1000 Genomes Whole Exome
Sequencing data". In: BMC Genomics. 15 (Suppl 3),
S. S2, DOI: 10.1186/1471-2164-15-s3-s2.
Smith DR. (2013): "RNA-Seq data: a goldmine for
organelle research". In: Briefings in Functional
Genomics. 12 (5), S. 454-456, DOI:
10.1093/bfgp/els066.
Taylor RW.Turnbull DM. (2005): "Mitochondrial DNA
mutations in human disease". In: Nature Reviews
Genetics. 6 (5), S. 389-402, DOI: 10.1038/nrg1606.
Cann RL, Stoneking M, Wilson, AC. (1987):
"Mitochondrial DNA and human evolution". In:
Nature. 325 (6099), S. 31-36, DOI:
10.1038/325031a0.
Herrnstadt C, Preston G, Andrews R et al. (2002): "A high
frequency of mtDNA polymorphisms in HeLa cell
sublines". In: Mutation Research/Fundamental and
Molecular Mechanisms of Mutagenesis. 501 (1-2), S.
19-28, DOI: 10.1016/s0027-5107(01)00304-9.
Lott M, Leipzig JN, Derbeneva O et al. (2013): "mtDNA
Variation and Analysis Using MITOMAP and
MITOMASTER". In: Current Proctocols in
Bioinformatics 44, pp. 1.23.1{1.23.26. doi:
10.1002/0471250953.bi0123s44.
Picardi E, Pesole G (2012): "Mitochondrial genomes
gleaned from human whole-exome sequencing". In:
Nature Methods. 9 (6), S. 523-524, DOI:
10.1038/nmeth.2029.
Nakamura K, Oshima T, Morimoto T et al. (2011):
"Sequence-specific error profile of Illumina
sequencers". In: Nucleic Acids Research. 39 (13), S.
e90-e90, DOI: 10.1093/nar/gkr344.