alongside regression for examination of the possible
effects of environmental exposures on health outcome
distributions, which in the absence of knowledge of
the intrinsic rates of these effects allows for the
quantification of a stronger association and
identification of possible harm to human health.
Implications for Future Population-level Outcome-
Exposure Analysis: This success of CCA for
specifically capturing the relationships between
exposure and outcome covariations has further
applications for approaching problems where a link
between environmental factors and possible health
effects is only hypothesized. CCA has other useful
potential applications in such investigations which
seek to determine the relative contributions of
environmental factors or other proposed risk factors to
shifts in distributions of multi-class outcomes, for
example variations in rates of cancers of different
kinds relative to different complex background
exposure levels.
An advantage of this approach is that we do not
require a priori knowledge of outcome distributions or
background risk levels. As evidenced particularly by
the strong and significant correlation in the first
canonical dimension in our CCA analysis, we see that
CCA quantifies a link between covariations in the data
sets, and the interpretation of this link can be
considered against the relative weights assigned to
each of the set elements in the canonical projections
(Hotelling, 1936; Gonzalez et al., 2008). Given that
environmental exposures rarely occur in isolation and
may have effects on multiple organ systems, CCA is
therefore uniquely suited for applications where we
aim to explore whether covariation relationships
between multi-dimensional environmental factors and
interrelated population health outcomes are present.
Future work advancing CCA applications in
environmental epidemiology may take into
consideration not only the formulation of maximally
correlated projections beyond those produced through
linear CCA methods but also preservation of
interpretability of the latent weightings, in order to
permit assessment and characterization of latent factor
relationships in kernel and deep CCA formulations or
the identification of locations which map to similar
positions within the latent projections as regions of
interest for further study.
6 CONCLUSIONS
In this work, we explore the potential of CCA for
population-level environmental epidemiology by
demonstrating its use for understanding the impact of
air pollution on mortality. Our analysis demonstrates
the complementarity of CCA for use alongside
traditional multiple linear regression approaches and
the promise of this method for extension to
investigating other hypothesized exposure outcome
data set relationships.
ACKNOWLEDGEMENTS
We thank the DSRG and Data Science Community
at WPI for their support and feedback.
REFERENCES
Vineis, P., Kriebel, D., 2006. Causal models in
epidemiology: past inheritance and genetic future.
Environmental Health: A Global Access Science
Source, 5:21.
Cromar K.R., Gladson, L.A., Ewart, G., 2019. Trends in
Excess Morbidity and Mortality Associated with Air
Pollution above American Thoracic Society-
Recommended Standards, 2008-2017. Annals ATS, Vol
16 (7): 836-845.
United States Clean Air Act: 42 United States Code §7401
et seq. (1970).
Di, Q., Wang, Y., Zanobetti, A., et al., 2017. Air pollution
and mortality in the Medicare population. NEJM, 26:
376.
Shah, A., Lee, K., McAllister, D., et al., 2015. Short term
exposure to air pollution and stroke: systematic review
and meta-analysis. BMJ, 24: 350.
Han, C., Lim, Y.H., Yorifuji, T., Hong, Y.C., 2018. Air
quality management policy and reduced mortality rates
in Seoul Metropolitan Area: A quasi-experimental
study. Environ Int. 121(Pt 1): 600-609.
Peng, L., Xiao, S., Gao, W., Zhou, Y., Zhou, J. Yang D.,
Ye, X., 2019. Short-term associations between size-
fractionated particulate air pollution and COPD
mortality in Shanghai, China. Environ Pollut. Epub.
India State-Level Disease Burden Initiative Air Pollution
Collaborators, 2019. The impact of air pollution on
deaths, disease burden, and life expectancy across the
states of India: the global burden of disease study 2017.
Lancet Planet Health 3(1): e26-e39.
Wang, T., Zhao, B., Liou, K.N., Gu, Y., Jiang, Z., Song, K.,
Su, H., Jerrett, M., Zhu Y., 2019. Mortality burdens in
California due to air pollution attributable to local and
nonlocal emissions. Environ Int, 133(Pt B):105232.
James, G., Witten, D., Hastie, T., Tibshirani, R.. An
Introduction to Statistical Learning with Applications
in R. Springer 2014. ISBN:1461471370
9781461471370.
Hotelling, H., 1936. Relations between two sets of variates.
Biometrika, 28 (3-4):321-377.