Measuring the Similarity of Proteomes using Grammar-based Compression via Domain Combinations
Morihiro Hayashida, Hitoshi Koyano, Jose C. Nacher
2020
Abstract
Revealing evolution of organisms is one of important biological research topics, and is also useful for understanding the origin of organisms. Hence, genomic sequences have been compared and aligned for finding conserved and functional regions. A protein can contain several domains, which are known as structural and functional units. In the previous work, a proteome, whole kinds of proteins in an organism, was regarded as a set of sequences of protein domains, and a grammar-based compression algorithm was developed for a proteome, where production rules in the grammar represented evolutionary processes, mutation and duplication. In this paper, we propose a similarity measure based on the grammar-based compression, and apply it to hierarchical clustering of seven organisms, Homo sapiens, Mus musculus, Drosophila melanogaster, Caenorhabditis elegans, Saccharomyces cerevisiae, Arabidopsis thaliana, and Escherichia coli. The results suggest that our similarity measure could classify the organisms very well.
DownloadPaper Citation
in Harvard Style
Hayashida M., Koyano H. and Nacher J. (2020). Measuring the Similarity of Proteomes using Grammar-based Compression via Domain Combinations. In Proceedings of the 13th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2020) - Volume 3: BIOINFORMATICS; ISBN 978-989-758-398-8, SciTePress, pages 117-122. DOI: 10.5220/0008913101170122
in Bibtex Style
@conference{bioinformatics20,
author={Morihiro Hayashida and Hitoshi Koyano and Jose C. Nacher},
title={Measuring the Similarity of Proteomes using Grammar-based Compression via Domain Combinations},
booktitle={Proceedings of the 13th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2020) - Volume 3: BIOINFORMATICS},
year={2020},
pages={117-122},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0008913101170122},
isbn={978-989-758-398-8},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 13th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2020) - Volume 3: BIOINFORMATICS
TI - Measuring the Similarity of Proteomes using Grammar-based Compression via Domain Combinations
SN - 978-989-758-398-8
AU - Hayashida M.
AU - Koyano H.
AU - Nacher J.
PY - 2020
SP - 117
EP - 122
DO - 10.5220/0008913101170122
PB - SciTePress