Figure 6: Distributions of (A) beta diversity based on the
Bray Curtis dissimilarity between the training set and itself,
the validation, the generated (CGAN), and random datasets
and (B) Shannon alpha diversity of training, validation,
generated, and random samples for IBD (left) and healthy
(right) samples.
4 CONCLUSIONS
In this study, we have developed a novel approach for
the generation of synthetic microbiome samples using
a CGAN architecture in order to augment ML
analyses. Using two different cohorts of subjects with
IBD, we have demonstrated that the synthetic
samples generated from the CGAN are similar to the
original data in both alpha and beta diversity metrics.
In addition, we have shown that augmenting the
training set by using a large number of synthetic
samples can improve the performance of logistic
regression and MLPNN in predicting host phenotype.
A current limitation to this approach involves
selecting the best CGAN model. Even though visual
inspection has been a common approach, it is a
subjective and may miss the optimal model. We plan
to further this study by investigating stopping criteria
using alpha and beta diversity metrics in order to
facilitate CGAN model selection. In addition, we plan
to evaluate other forms of side information such as
using time in longitudinal datasets.
REFERENCES
Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean,
J., Isard, M. (2016). Tensorflow: A system for large-
scale machine learning. Paper presented at the 12th
{USENIX} Symposium on Operating Systems Design
and Implementation ({OSDI} 16).
Barlow, G. M., Yu, A., & Mathur, R. (2015). Role of the
gut microbiome in obesity and diabetes mellitus.
Nutrition in clinical practice, 30(6), 787-797.
Bowles, C., Chen, L., Guerrero, R., Bentley, P., Gunn, R.,
Hammers, A., Rueckert, D. (2018). GAN
Augmentation: Augmenting Training Data using
Generative Adversarial Networks.
Bray, J. R., & Curtis, J. T. (1957). An Ordination of the
Upland Forest Communities of Southern Wisconsin.
Ecological Monographs, 27(4), 326-349.
doi:10.2307/1942268
Carabotti, M., Scirocco, A., Maselli, M. A., & Severi, C.
(2015). The gut-brain axis: interactions between enteric
microbiota, central and enteric nervous systems. Annals
of gastroenterology, 28(2), 203-209.
Che, Z., Cheng, Y., Zhai, S., Sun, Z., & Liu, Y. (2017, 18-
21 Nov. 2017). Boosting Deep Learning Risk
Prediction with Generative Adversarial Networks for
Electronic Health Records. Paper presented at the 2017
IEEE International Conference on Data Mining
(ICDM).
Franzosa, E. A., Sirota-Madi, A., Avila-Pacheco, J.,
Fornelos, N., Haiser, H. J., Reinker, S., Xavier, R. J.
(2019). Gut microbiome structure and metabolic
activity in inflammatory bowel disease. Nature
microbiology, 4(2), 293-305. doi:10.1038/s41564-018-
0306-4
Frid-Adar, M., Diamant, I., Klang, E., Amitai, M.,
Goldberger, J., & Greenspan, H. (2018). GAN-based
synthetic medical image augmentation for increased
CNN performance in liver lesion classification.
Neurocomputing, 321, 321-331.
Fung, T. C., Olson, C. A., & Hsiao, E. Y. (2017).
Interactions between the microbiota, immune and
nervous systems in health and disease. Nature
neuroscience, 20(2), 145.
Ghahramani, A., Watt, F. M., & Luscombe, N. M. (2018).
Generative adversarial networks simulate gene
expression and predict perturbations in single cells.
BioRxiv, 262501.
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B.,
Warde-Farley, D., Ozair, S., Bengio, Y. (2014).
Generative adversarial nets. Paper presented at the
Advances in neural information processing systems.
Gopalakrishnan, V., Helmink, B. A., Spencer, C. N.,
Reuben, A., & Wargo, J. A. (2018). The influence of
the gut microbiome on cancer, immunity, and cancer
immunotherapy. Cancer cell, 33(4), 570-580.
Kingma, D., & Ba, J. (2014). Adam: A Method for
Stochastic Optimization. International Conference on
Learning Representations.
Knights, D., Parfrey, L. W., Zaneveld, J., Lozupone, C., &
Knight, R. (2011). Human-associated microbial