5 CONCLUSIONS
We show that DNA methylation is a prime resource
for unsupervised learning with variational autoen-
coders. Generative models such as these learn and
underlying distribution of the data, providing promis-
ing new avenues to generate artificial data to enhance
training. The volume of publicly available DNAm
data is growing, and as precision medical research
continues to progress, scientists should be taking ad-
vantage of such opportunities.
ACKNOWLEDGEMENTS
Research reported in this publication was supported
by the Office of the U.S. Director of the National In-
stitutes of Health under award number T32LM012204
to AJT, grants R01DE022772 and R01CA216265 to
BCC, and by a Burroughs Wellcome Fund fellowship
to CAB under award number #1014106.
REFERENCES
Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z.,
Citro, C., Corrado, G., Davis, A., Dean, J., Devin, M.,
Ghemawat, S., Goodfellow, I., Harp, A., Irving, G.,
Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur,
M., Levenberg, J., Mane, D., Monga, R., Moore, S.,
Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner,
B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke,
V., Vasudevan, V., Viegas, F., Vinyals, O., Warden,
P., Wattenberg, M., Wicke, M., Yu, Y., and Zheng,
X. (2016). TensorFlow: Large-Scale Machine Learn-
ing on Heterogeneous Distributed Systems. ArXiv e-
prints.
Angermueller, C., Lee, H. J., Reik, W., and Stegle, O.
(2017). DeepCpG: accurate prediction of single-cell
DNA methylation states using deep learning. Genome
Biol., 18(1):67.
Aryee, M. J., Jaffe, A. E., Corrada-Bravo, H., Ladd-Acosta,
C., Feinberg, A. P., Hansen, K. D., and Irizarry,
R. A. (2014). Minfi: a flexible and comprehen-
sive Bioconductor package for the analysis of In-
finium DNA methylation microarrays. Bioinformat-
ics, 30(10):1363–1369.
Chen, Y.-a., Lemire, M., Choufani, S., Butcher, D. T.,
Grafodatskaya, D., Zanke, B. W., Gallinger, S., Hud-
son, T. J., and Weksberg, R. (2013). Discovery of
cross-reactive probes and polymorphic CpGs in the
Illumina Infinium HumanMethylation450 microarray.
Epigenetics, 8(2):203–209.
Chollet, F. and Others (2015). Keras.
https://github.com/fchollet/keras.
Horvath, S. (2013). DNA methylation age of human tissues
and cell types. Genome Biol., 14(10):R115.
Houseman, E. A., Accomando, W. P., Koestler, D. C.,
Christensen, B. C., Marsit, C. J., Nelson, H. H.,
Wiencke, J. K., and Kelsey, K. T. (2012). DNA methy-
lation arrays as surrogate measures of cell mixture dis-
tribution. BMC Bioinformatics, 13(1):86.
Kingma, D., Rezende, D., Mohamed, S., and Welling, M.
(2014). Semi-Supervised Learning with Deep Gener-
ative Models. ArXiv e-prints.
Kingma, D. P. and Ba, J. (2014). Adam: A Method for
Stochastic Optimization. CoRR, abs/1412.6.
Kingma, D. P. and Welling, M. (2013). Auto-Encoding
Variational Bayes. ArXiv e-prints.
Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Im-
ageNet Classification with Deep Convolutional Neu-
ral Networks. In Pereira, F., Burges, C. J. C., Bot-
tou, L., and Weinberger, K. Q., editors, Adv. Neural
Inf. Process. Syst. 25, pages 1097–1105. Curran As-
sociates, Inc.
Nair, V. and Hinton, G. E. (2010). Rectified linear units
improve restricted boltzmann machines. In Proc. 27th
Int. Conf. Mach. Learn., pages 807–814.
Sorlie, T., Perou, C. M., Tibshirani, R., Aas, T., Geisler,
S., Johnsen, H., Hastie, T., Eisen, M. B., van de Rijn,
M., Jeffrey, S. S., Thorsen, T., Quist, H., Matese,
J. C., Brown, P. O., Botstein, D., Lonning, P. E.,
and Borresen-Dale, A. L. (2001). Gene expression
patterns of breast carcinomas distinguish tumor sub-
classes with clinical implications. Proc. Natl. Acad.
Sci. U. S. A., 98(19):10869–10874.
TCGA (2012). Comprehensive molecular portraits of hu-
man breast tumours. Nature, 490(7418):61–70.
Titus, A. J., Way, G. P., Johnson, K. C., and Chris-
tensen, B. C. (2017). Deconvolution of DNA methy-
lation identifies differentially methylated gene regions
on 1p36 across breast cancer subtypes. Sci. Rep.,
7(11594).
van der Maaten, L. and Hinton, G. (2008). Visualizing
data using t-SNE. J. Mach. Learn. Res., 9(Nov):2579–
2605.
Wang, Y., Liu, T., Xu, D., Shi, H., Zhang, C., Mo, Y.-Y., and
Wang, Z. (2016). Predicting DNA Methylation State
of CpG Dinucleotide Using Genome Topological Fea-
tures and Deep Networks. 6:19598.
Way, G. P. and Greene, C. S. (2017). Extracting a Biolog-
ically Relevant Latent Space from Cancer Transcrip-
tomes with Variational Autoencoders. bioRxiv.
Wilhelm-Benartzi, C. S., Koestler, D. C., Karagas, M. R.,
Flanagan, J. M., Christensen, B. C., Kelsey, K. T.,
Marsit, C. J., Houseman, E. A., and Brown, R. (2013).
Review of processing and analysis methods for DNA
methylation array data. Br. J. Cancer, 109(6):1394–
1402.
Yang, Z., Wong, A., Kuh, D., Paul, D. S., Rakyan, V. K.,
Leslie, R. D., Zheng, S. C., Widschwendter, M., Beck,
S., and Teschendorff, A. E. (2016). Correlation of
an epigenetic mitotic clock with cancer risk. Genome
Biol., 17(1):205.
Zeng, H. and Gifford, D. K. (2017). Predicting the impact
of non-coding variants on DNA methylation. Nucleic
Acids Res., 45(11):e99.