5  CONCLUSIONS 
In summary, a novel deep learning model AGIC as a 
new  paradigm  was  introduced  to  impute  missing 
values and compress genome expressions. The results 
showed  that  AGIC  model  can  achieve  up  to  96% 
accuracy to impute missing values.   
Moreover, this learning method is scalable for the 
data  of  the  large  number  of  genome-wide 
polymorphisms. A separate stacking model has been 
implemented to minimize the calculation cost of the 
network. The calculation cost of the network in AGIC 
method  increases  with  linear  order,  whereas 
calculation costs of other popular methods increases 
rapidly  if  the  number  of  genome-wide 
polymorphisms  increases.  AGIC  model  provides  a 
strong alternative to traditional methods for imputing 
missing values and compressing genome expressions 
at a time. 
ACKNOWLEDGEMENTS 
This  research  has  been  partially  supported  by  the 
JSPS  KAKENHI  (Grants-in-Aid  for  Scientific 
Research) JP19H00938. 
REFERENCES 
Abdella,  M.,  &  Marwala,  T.  (2005).  The  use  of  genetic 
algorithms and neural networks to approximate missing 
data in database. IEEE 3rd International Conference on 
Computational Cybernetics, 2005. ICCC 2005. (pp. 
207β212).  Presented  at  the  IEEE  3rd  International 
Conference  on  Computational  Cybernetics,  2005. 
ICCC 2005., IEEE. 
Absardi,  Z.  N.,  &  Javidan,  R.  (2019).  A  Fast  Reference-
Free  Genome  Compression  Using  Deep  Neural 
Networks.  2019 Big Data, Knowledge and Control 
Systems Engineering (BdKCSE) (pp. 1β7). Presented at 
the  2019  Big  Data,  Knowledge  and  Control  Systems 
Engineering (BdKCSE), IEEE. 
Beaulieu-Jones, B. K., & Moore, J. H. (2017). Missing data 
imputation in the electronic health record using deeply 
learned  autoencoders.  Pacific Symposium on 
Biocomputing, 22, 207β218. 
Browning,  B.  L.,  &  Browning,  S.  R.  (2009).  A  unified 
approach to genotype imputation and haplotype-phase 
inference  for  large  data  sets  of  trios  and  unrelated 
individuals.  American Journal of Human Genetics, 
84(2), 210β223. 
Browning, B. L., Zhou, Y., & Browning, S. R. (2018). A 
One-Penny  Imputed  Genome  from  Next-Generation 
Reference  Panels.  American Journal of Human 
Genetics, 103(3), 338β348. 
Chen, J., & Shi, X. (2019). Sparse convolutional denoising 
autoencoders for genotype imputation. Genes, 10(9). 
Duan,  Y.,  Lv,  Y.,  Liu,  Y.-L., & Wang,  F.-Y.  (2016).  An 
efficient  realization  of  deep  learning  for  traffic  data 
imputation. Transportation Research Part C: Emerging 
Technologies, 72, 168β181. 
Gad, I., Hosahalli, D., Manjunatha, B. R., & Ghoneim, O. 
A.  (2020).  A  robust  deep  learning  model  for  missing 
value imputation in big NCDC dataset. Iran Journal of 
Computer Science. 
Grumbach,  S.,  &  Tahi,  F.  (1994).  A  new  challenge  for 
compression  algorithms:  Genetic  sequences. 
Information Processing & Management,  30(6),  875β
886. 
Gulli, A., & Pal, S. (2017). Deep Learning with Keras (p. 
318). Packt Publishing Ltd. 
Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the 
dimensionality of  data with neural networks. Science, 
313(5786), 504β507. 
Kamilaris,  A.,  &  Prenafeta-BoldΓΊ,  F.  X.  (2018).  Deep 
learning  in  agriculture:  A  survey.  Computers and 
Electronics in Agriculture, 147, 70β90. 
Li,  Y.,  Huang,  C.,  Ding,  L.,  Li,  Z.,  Pan,  Y.,  &  Gao,  X. 
(2019). Deep learning in bioinformatics: Introduction, 
application,  and  perspective  in  the  big  data  era. 
Methods, 166, 4β21. 
Marchini,  J.,  Howie,  B.,  Myers,  S.,  McVean,  G.,  & 
Donnelly,  P.  (2007).  A  new  multipoint  method  for 
genome-wide  association  studies  by  imputation  of 
genotypes. Nature Genetics, 
39(7), 906β913. 
Qiu,  Y.  L.,  Zheng,  H.,  &  Gavaert,  O.  (2018).  A  deep 
learning  framework  for  imputing  missing  values  in 
genomic data. BioRxiv. 
Qiu, Y. L., Zheng, H., & Gevaert, O. (2020). Genomic data 
imputation  with  variational  auto-encoders. 
GigaScience, 9(8). 
Rana, S., John, A. H., & Midi, H. (2012). Robust regression 
imputation  for  analyzing  missing  data.  2012 
International Conference on Statistics in Science, 
Business and Engineering (ICSSBE)  (pp.  1β4). 
Presented  at  the  2012  International  Conference  on 
Statistics  in  Science,  Business  and  Engineering 
(ICSSBE2012), IEEE. 
Scheet,  P.,  &  Stephens,  M.  (2006).  A  fast  and  flexible 
statistical  model  for  large-scale  population  genotype 
data:  applications  to  inferring  missing  genotypes  and 
haplotypic  phase.  American Journal of Human 
Genetics, 78(4), 629β644. 
Sento,  A.  (2016).  Image  compression  with  auto-encoder 
algorithm  using  deep  neural  network  (DNN).  2016 
Management and Innovation Technology International 
Conference (MITicon)  (p.  MIT-99-MIT-103). 
Presented  at  the  2016  Management  and  Innovation 
Technology  International  Conference  (MITicon), 
IEEE. 
Troyanskaya,  O.,  Cantor,  M.,  Sherlock,  G.,  Brown,  P., 
Hastie,  T.,  Tibshirani,  R.,  Botstein,  D.,  et  al.  (2001). 
Missing  value  estimation  methods  for  DNA 
microarrays. Bioinformatics, 17(6), 520β525.