
(2019). Invariant risk minimization. arXiv preprint
arXiv:1907.02893.
Bose, R. and Roy, A. M. (2024). Invariance embedded
physics-infused deep neural network-based sub-grid
scale models for turbulent flows. Engineering Appli-
cations of Artificial Intelligence, 128:107483.
Chickering, D. M. (2002). Optimal structure identification
with greedy search. Journal of Machine Learning Re-
search, 3(Nov):507–554.
Chowdhury, J., Fricke, C., Bamidele, O., Bello, M., Yang,
W., Heyden, A., and Terejanu, G. (2024). Invariant
molecular representations for heterogeneous cataly-
sis. Journal of Chemical Information and Modeling,
64(2):327–339.
Chowdhury., J., Rashid., R., and Terejanu., G. (2023). Eval-
uation of induced expert knowledge in causal struc-
ture learning by notears. In Proceedings of the 12th
International Conference on Pattern Recognition Ap-
plications and Methods - ICPRAM, pages 136–146.
INSTICC, SciTePress.
Chowdhury, J. and Terejanu, G. (2023). Cd-notears: Con-
cept driven causal structure learning using notears. In
2023 International Conference on Machine Learning
and Applications (ICMLA), pages 808–813. IEEE.
Colombo, D., Maathuis, M. H., Kalisch, M., and Richard-
son, T. S. (2012). Learning high-dimensional directed
acyclic graphs with latent and selection variables. The
Annals of Statistics, pages 294–321.
Cortez, P., Cerdeira, A., Almeida, F., Matos, T., and Reis, J.
(2009). Wine Quality. UCI Machine Learning Repos-
itory. DOI: https://doi.org/10.24432/C56S3T.
Ge, Y., Arik, S.
¨
O., Yoon, J., Xu, A., Itti, L., and Pfister,
T. (2022). Invariant structure learning for better gen-
eralization and causal explainability. arXiv preprint
arXiv:2206.06469.
Gencoglu, O. and Gruber, M. (2020). Causal modeling
of twitter activity during covid-19. Computation,
8(4):85.
Gerritsma, J., Onnink, R., and Versluis, A. (2013). Yacht
Hydrodynamics. UCI Machine Learning Repository.
DOI: https://doi.org/10.24432/C5XG7R.
Harrison, D. and Rubinfeld, D. L. (1978). Hedonic prices
and the demand for clean air. J. Environ. Econ. and
Management. UCI Machine Learning Repository,
http://lib.stat.cmu.edu/datasets/boston.
He, Y., Shen, Z., and Cui, P. (2021). Towards non-iid im-
age classification: A dataset and baselines. Pattern
Recognition, 110:107383.
Huang, B., Zhang, K., Lin, Y., Sch
¨
olkopf, B., and Glymour,
C. (2018). Generalized score functions for causal dis-
covery. In Proceedings of the 24th ACM SIGKDD
international conference on knowledge discovery &
data mining, pages 1551–1560.
Jiang, G.-q., Liu, J., Ding, Z., Guo, L., and Lin, W. (2023).
Accelerating large batch training via gradient signal to
noise ratio (gsnr). arXiv preprint arXiv:2309.13681.
Kaiser, M. and Sipos, M. (2022). Unsuitability of notears
for causal graph discovery when dealing with di-
mensional quantities. Neural Processing Letters,
54(3):1587–1595.
Koyama, M. and Yamaguchi, S. (2020). Out-of-distribution
generalization with maximal invariant predictor.
Lachapelle, S., Brouillard, P., Deleu, T., and Lacoste-Julien,
S. (2019). Gradient-based neural dag learning. arXiv
preprint arXiv:1906.02226.
LeCun, Y., Jackel, L. D., Bottou, L., Cortes, C., Denker,
J. S., Drucker, H., Guyon, I., Muller, U. A., Sackinger,
E., Simard, P., et al. (1995). Learning algorithms
for classification: A comparison on handwritten digit
recognition. Neural networks: the statistical mechan-
ics perspective, 261(276):2.
Lin, Y., Dong, H., Wang, H., and Zhang, T. (2022).
Bayesian invariant risk minimization. In Proceedings
of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition, pages 16021–16030.
Liu, J., Chen, Y., and Zhao, J. (2021). Knowledge enhanced
event causality identification with mention masking
generalizations. In Proceedings of the twenty-ninth
international conference on international joint confer-
ences on artificial intelligence, pages 3608–3614.
Liu, J., Jiang, G., Bai, Y., Chen, T., and Wang, H.
(2020). Understanding why neural networks gener-
alize well through gsnr of parameters. arXiv preprint
arXiv:2001.07384.
Lloyd, S. (1982). Least squares quantization in pcm. IEEE
transactions on information theory, 28(2):129–137.
Ming, Y., Yin, H., and Li, Y. (2022). On the impact of
spurious correlation for out-of-distribution detection.
In Proceedings of the AAAI conference on artificial
intelligence, volume 36, pages 10051–10059.
Montavon, G., Hansen, K., Fazli, S., Rupp, M., Biegler,
F., Ziehe, A., Tkatchenko, A., Lilienfeld, A., and
M
¨
uller, K.-R. (2012). Learning invariant representa-
tions of molecules for atomization energy prediction.
Advances in neural information processing systems,
25.
Ng, I., Ghassami, A., and Zhang, K. (2020). On the role of
sparsity and dag constraints for learning linear dags.
Advances in Neural Information Processing Systems,
33:17943–17954.
Ng, I., Zhu, S., Fang, Z., Li, H., Chen, Z., and Wang, J.
(2022). Masked gradient-based causal structure learn-
ing. In Proceedings of the 2022 SIAM International
Conference on Data Mining (SDM), pages 424–432.
SIAM.
Ormaniec, W., Sussex, S., Lorch, L., Sch
¨
olkopf, B., and
Krause, A. (2024). Standardizing structural causal
models. arXiv preprint arXiv:2406.11601.
O’Donnell, R. T., Nicholson, A. E., Han, B., Korb, K. B.,
Alam, M. J., and Hope, L. R. (2006). Causal discov-
ery with prior information. In AI 2006: Advances in
Artificial Intelligence: 19th Australian Joint Confer-
ence on Artificial Intelligence, Hobart, Australia, De-
cember 4-8, 2006. Proceedings 19, pages 1162–1167.
Springer.
Parascandolo, G., Neitz, A., Orvieto, A., Gresele, L., and
Sch
¨
olkopf, B. (2020). Learning explanations that are
hard to vary. arXiv preprint arXiv:2009.00329.
Pearl, J. (2009). Causality. Cambridge university press.
CGLearn: Consistent Gradient-Based Learning for Out-of-Distribution Generalization
111