
3.2 Comparison with POT
We compared the performance with POT, the exist-
ing Python library as mentioned in Section 1. Here,
the initial distribution is a two-dimensional uncorre-
lated normal distribution with mean 0 and variance 1,
and the target distribution is a two-dimensional un-
correlated normal distribution with mean 5 and vari-
ance 1. The Gaussian kernel K(x,y) = e
−|x−y|
2
, cost
function c(x, y) = |x − y|
2
and the Adam optimizer
with learning late 0.0001 is used. Here, the func-
tion T (x) is described by a multi-layer perception
with 2 hidden layer. Penalty parameter λ defined by
1/λ = 0.000001. In this experiment, we compare the
number of points that can be calculated, the num-
ber of times the calculation can be run, and the ac-
curacy achieved when using an AMD EPYC 9654
with 768GB of memory. The experimental result is
described in Table 1, where SD stands for standard
deviation. Here, we set the number of batch size to
Table 1: The proposed method with CPU.
Data size epochs expectation, SD
600000 3 5.03, 0.99
be 10000 and so the number of iterations is 60. Table
1 shows that the proposed method is able to compute
accurately even for large data sizes of 600000.
Next, we perform a similar experiment on
NVIDIA H100. The experimental result is described
in Table 2, where SD again stands for standard de-
viation. Here we set the number of batch size to be
Table 2: The proposed method with GPU.
Data size epochs expectation, SD
600000 10 4.97, 1.01
10000 and so the number of iterations is 60. We can
see from Table 2 that our method is able to compute
accurately even for large data sizes of 600000 as in
the CPU case.
Then, we use the solver ot.sinkhorn() in POT
to compare the performance of POT with that of
our algorithm on a Core(TM) i7-13700H with 32GB
memory. The computational complexity of this solver
is known to be O(n
2
), where n is the input data size.
Table 3 shows that a stable result is obtained when
Table 3: Python Optimal Transport.
Data size expectation, SD
1000 5.00, 0.97
the data size is 1000. When the data size exceeded
2000, the solver is unstable due to the computational
limitations mentioned above.
Numerical experiments in this subsection show
that our proposed method is a promising option for
solving large-scale Monge problems.
4 CONCLUSION
In this paper, we derive an approximate solution to the
Monge problem by using the embeddings of probabil-
ity measures and a deep learning algorithm. Through
several numerical experiments, we confirmed that
our method produces accurate approximate solutions
and is efficiently computable on a GPU. In future
work, we aim to extend our research to handle larger
datasets, explore the Monge problem with alternative
cost functions, and investigate numerical solutions for
multi-marginal transport problems.
REFERENCES
Bauschke, H. H. and Combettes, P. L. (2011). Convex anal-
ysis and monotone operator theory in Hilbert spaces.
Springer-Verlag.
Billingsley, P. (2013). Convergence of probability mea-
sures. John Wiley & Sons.
Breiner, Y. (1987). D
´
ecomposition polaire et
r
´
earrangement monotone des champs de vecteurs.
C. R. Acad. Sci. Paris S
´
erie I, 305(19):805–808.
Breiner, Y. (1991). Polar factorization and monotone re-
arrangement of vecto-valued fanctions. Comm. Pure
Appl. Math., 44(4):375–417.
Carmona, R. and Delarue, F. (2018). Probabilistic theory of
mean field games with applications I. Springer.
Charlier, B., Feydy, J., Glaun
`
es, J. A., Collin, F. D., and
Durif, G. (2021). Kernel operations on the GPU, with
autodiff, without memory overflows. Journal of Ma-
chine Learning Research, 22(74):1–6.
Cuturi, M., Meng-Papaxanthos, L., Tian, Y., Bunne, C.,
Davis, G., and Teboul, O. (2022). Optimal transport
tools (ott): A jax toolbox for all things wasserstein.
arXiv:2201.12324.
Dantzig, G. B. (1949). Programming of interdependent
activities: Ii mathematical model. Econometrican,
17(3/4):200–211.
Dantzig, G. B. (1951). Application of the simplex method
to a transportation problem. Activity Analysis of Pro-
duction and Allocation, 13:359–373.
Feydy, J., S
´
ejourn
´
e, T., Vialard, F. X., Amari, S., Trouve,
A., and Peyr
´
e, G. (2019). Interpolating between Opti-
mal Transport and MMD using Sinkhorn Divergences.
In The 22nd International Conference on Artificial In-
telligence and Statistics, pages 2681–2690.
Flamary, R., Courty, N., Gramfort, A., Alaya, M. Z., Bois-
bunon, A., Chambon, S., Chapel, L., Corenflos, A.,
Fatras, K., Fournier, N., Gautheron, L., Gayraud, N.
Solving Monge Problem by Hilbert Space Embeddings of Probability Measures
299