5 CONCLUSIONS
In summary, a novel deep learning model AGIC as a
new paradigm was introduced to impute missing
values and compress genome expressions. The results
showed that AGIC model can achieve up to 96%
accuracy to impute missing values.
Moreover, this learning method is scalable for the
data of the large number of genome-wide
polymorphisms. A separate stacking model has been
implemented to minimize the calculation cost of the
network. The calculation cost of the network in AGIC
method increases with linear order, whereas
calculation costs of other popular methods increases
rapidly if the number of genome-wide
polymorphisms increases. AGIC model provides a
strong alternative to traditional methods for imputing
missing values and compressing genome expressions
at a time.
ACKNOWLEDGEMENTS
This research has been partially supported by the
JSPS KAKENHI (Grants-in-Aid for Scientific
Research) JP19H00938.
REFERENCES
Abdella, M., & Marwala, T. (2005). The use of genetic
algorithms and neural networks to approximate missing
data in database. IEEE 3rd International Conference on
Computational Cybernetics, 2005. ICCC 2005. (pp.
207β212). Presented at the IEEE 3rd International
Conference on Computational Cybernetics, 2005.
ICCC 2005., IEEE.
Absardi, Z. N., & Javidan, R. (2019). A Fast Reference-
Free Genome Compression Using Deep Neural
Networks. 2019 Big Data, Knowledge and Control
Systems Engineering (BdKCSE) (pp. 1β7). Presented at
the 2019 Big Data, Knowledge and Control Systems
Engineering (BdKCSE), IEEE.
Beaulieu-Jones, B. K., & Moore, J. H. (2017). Missing data
imputation in the electronic health record using deeply
learned autoencoders. Pacific Symposium on
Biocomputing, 22, 207β218.
Browning, B. L., & Browning, S. R. (2009). A unified
approach to genotype imputation and haplotype-phase
inference for large data sets of trios and unrelated
individuals. American Journal of Human Genetics,
84(2), 210β223.
Browning, B. L., Zhou, Y., & Browning, S. R. (2018). A
One-Penny Imputed Genome from Next-Generation
Reference Panels. American Journal of Human
Genetics, 103(3), 338β348.
Chen, J., & Shi, X. (2019). Sparse convolutional denoising
autoencoders for genotype imputation. Genes, 10(9).
Duan, Y., Lv, Y., Liu, Y.-L., & Wang, F.-Y. (2016). An
efficient realization of deep learning for traffic data
imputation. Transportation Research Part C: Emerging
Technologies, 72, 168β181.
Gad, I., Hosahalli, D., Manjunatha, B. R., & Ghoneim, O.
A. (2020). A robust deep learning model for missing
value imputation in big NCDC dataset. Iran Journal of
Computer Science.
Grumbach, S., & Tahi, F. (1994). A new challenge for
compression algorithms: Genetic sequences.
Information Processing & Management, 30(6), 875β
886.
Gulli, A., & Pal, S. (2017). Deep Learning with Keras (p.
318). Packt Publishing Ltd.
Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the
dimensionality of data with neural networks. Science,
313(5786), 504β507.
Kamilaris, A., & Prenafeta-BoldΓΊ, F. X. (2018). Deep
learning in agriculture: A survey. Computers and
Electronics in Agriculture, 147, 70β90.
Li, Y., Huang, C., Ding, L., Li, Z., Pan, Y., & Gao, X.
(2019). Deep learning in bioinformatics: Introduction,
application, and perspective in the big data era.
Methods, 166, 4β21.
Marchini, J., Howie, B., Myers, S., McVean, G., &
Donnelly, P. (2007). A new multipoint method for
genome-wide association studies by imputation of
genotypes. Nature Genetics,
39(7), 906β913.
Qiu, Y. L., Zheng, H., & Gavaert, O. (2018). A deep
learning framework for imputing missing values in
genomic data. BioRxiv.
Qiu, Y. L., Zheng, H., & Gevaert, O. (2020). Genomic data
imputation with variational auto-encoders.
GigaScience, 9(8).
Rana, S., John, A. H., & Midi, H. (2012). Robust regression
imputation for analyzing missing data. 2012
International Conference on Statistics in Science,
Business and Engineering (ICSSBE) (pp. 1β4).
Presented at the 2012 International Conference on
Statistics in Science, Business and Engineering
(ICSSBE2012), IEEE.
Scheet, P., & Stephens, M. (2006). A fast and flexible
statistical model for large-scale population genotype
data: applications to inferring missing genotypes and
haplotypic phase. American Journal of Human
Genetics, 78(4), 629β644.
Sento, A. (2016). Image compression with auto-encoder
algorithm using deep neural network (DNN). 2016
Management and Innovation Technology International
Conference (MITicon) (p. MIT-99-MIT-103).
Presented at the 2016 Management and Innovation
Technology International Conference (MITicon),
IEEE.
Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P.,
Hastie, T., Tibshirani, R., Botstein, D., et al. (2001).
Missing value estimation methods for DNA
microarrays. Bioinformatics, 17(6), 520β525.