Author:
K. Hamacher
Affiliation:
Technische Universität Darmstadt, Germany
Keyword(s):
Information theory, Jensen-Shannon divergence, Evolutionary dynamics, Phylogenetic trees, SUPFAM.
Related
Ontology
Subjects/Areas/Topics:
Algorithms and Software Tools
;
Bioinformatics
;
Biomedical Engineering
;
Biostatistics and Stochastic Models
;
Data Mining and Machine Learning
;
Sequence Analysis
Abstract:
The ever-increasing wealth of whole-genome information prompts for phylogenies based on entire genomes. The quest for a good distance measure, however, poses a big challenge; e.g. because of large-scale evolutionary events such as genomic rearrangements or inversions. We introduce here an information theory driven measure that for the encoded protein domain composition of genomes as protein domains are key evolutionary entities. Thus the new method focuses on selective advantageous events. As evolving different protein domain compositions is more complex than single point mutations, the method makes longer evolutionary times accessible. Illustrating the new methodology we extract several phylogenetic trees for some 700 genomes, e.g. the separation of the three kingdoms of life, trees for mammals and bacillales, and a speculative result for plants (monocotyledons and dicotyledons). The method itself is shown to be robust against incomplete genome sampling. It has a consistent interpre
tation in both, information space at the sequence/information level and at the level of stochastic, evolutionary dynamics. In contrast to established protocols it becomes more accurate as more organisms are taken into account. Finally we show the equivalence to a (simplified) model of evolutionary dynamics of proteomes.
(More)