A Deep Learning Method to Impute Missing Values and Compress Genome-wide Polymorphism Data in Rice

Tanzila Islam, Chyon Hae Kim, Hiroyoshi Iwata, Hiroyuki Shimono, Akio Kimura, Hein Zaw, Chitra Raghavan, Hei Leung, Rakesh Kumar Singh

2021

Abstract

Missing value imputation and compressing genome-wide DNA polymorphism data are considered as a challenging task in genomic data analysis. Missing data consists in the lack of information in a dataset that directly influences data analysis performance. The aim is to develop a deep learning model named Autoencoder Genome Imputation and Compression (AGIC) which can impute missing values and compress genome-wide polymorphism data using a separated neural network model to reduce the computational time. This research will challenge the construction of a model by using Autoencoder for genomic analysis, in other words, a fusion research between agriculture and information sciences. Moreover, there is no knowledge of missing value imputation and genome-wide polymorphism data compression using Separated Stacking Autoencoder Model. The main contributions are: (1) missing value imputation of genome-wide polymorphism data, (2) genome-wide polymorphism data compression of Rice DNA. To demonstrate the usage of AGIC model, real genome-wide polymorphism data from a rice MAGIC population has been used.

Download


Paper Citation


in Harvard Style

Islam T., Kim C., Iwata H., Shimono H., Kimura A., Zaw H., Raghavan C., Leung H. and Singh R. (2021). A Deep Learning Method to Impute Missing Values and Compress Genome-wide Polymorphism Data in Rice. In Proceedings of the 14th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2021) - Volume 3: BIOINFORMATICS; ISBN 978-989-758-490-9, SciTePress, pages 101-109. DOI: 10.5220/0010233900002865


in Bibtex Style

@conference{bioinformatics21,
author={Tanzila Islam and Chyon Hae Kim and Hiroyoshi Iwata and Hiroyuki Shimono and Akio Kimura and Hein Zaw and Chitra Raghavan and Hei Leung and Rakesh Kumar Singh},
title={A Deep Learning Method to Impute Missing Values and Compress Genome-wide Polymorphism Data in Rice},
booktitle={Proceedings of the 14th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2021) - Volume 3: BIOINFORMATICS},
year={2021},
pages={101-109},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010233900002865},
isbn={978-989-758-490-9},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 14th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2021) - Volume 3: BIOINFORMATICS
TI - A Deep Learning Method to Impute Missing Values and Compress Genome-wide Polymorphism Data in Rice
SN - 978-989-758-490-9
AU - Islam T.
AU - Kim C.
AU - Iwata H.
AU - Shimono H.
AU - Kimura A.
AU - Zaw H.
AU - Raghavan C.
AU - Leung H.
AU - Singh R.
PY - 2021
SP - 101
EP - 109
DO - 10.5220/0010233900002865
PB - SciTePress