scieee Science in your language
[en] (orig)
Citation: Ba, F.A.; Quellmalz, M.
Accelerating the Sinkhorn Algorithm
for Sparse Multi-Marginal Optimal
Transport via Fast Fourier
Transforms. Algorithms 2022,15, 311.
https://doi.org/10.3390/a15090311
Academic Editor: Francisco Saldanha
da Gama
Received: 5 August 2022
Accepted: 26 August 2022
Published: 30 August 2022
Publishers Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2022 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
algorithms
Article
Accelerating the Sinkhorn Algorithm for Sparse
Multi-Marginal Optimal Transport via Fast Fourier Transforms
Fatima Antarou Ba * , Michael Quellmalz
Institute of Mathematics, Technische Universität Berlin, Straße des 17. Juni 136, 10623 Berlin, Germany
*Correspondence: [email protected]
Abstract:
We consider the numerical solution of the discrete multi-marginal optimal transport (MOT)
by means of the Sinkhorn algorithm. In general, the Sinkhorn algorithm suffers from the curse of
dimensionality with respect to the number of marginals. If the MOT cost function decouples according
to a tree or circle, its complexity is linear in the number of marginal measures. In this case, we speed
up the convolution with the radial kernel required in the Sinkhorn algorithm via non-uniform fast
Fourier methods. Each step of the proposed accelerated Sinkhorn algorithm with a tree-structured
cost function has a complexity of
O(KN)
instead of the classical
O(KN2)
for straightforward matrix–
vector operations, where
K
is the number of marginals and each marginal measure is supported on,
at most,
N
points. In the case of a circle-structured cost function, the complexity improves from
O(KN3)to O(KN2). This is confirmed through numerical experiments.
Keywords:
multi-marginal optimal transport; Sinkhorn algorithm; fast Fourier transform; image
processing; optimal transport; Wasserstein barycenter
MSC: 49Q22; 65T50; 65D18; 49M20
1. Introduction
The optimal transport (OT) problem is an optimization problem that deals with the
search for an optimal map (plan) that moves masses between two or more measures at
low cost [
1
,
2
]. OT appears in a wide range of applications such as image and signal
processing
[37]
, economics [
8
,
9
], finance [
10
,
11
], and physics [
12
,
13
]. The OT problem
was first introduced in 1781 by Monge. His objective was to find a map between two
probability measures
µ1
,
µ2
on
Rd
that transports
µ1
to
µ2
with minimal cost, where the
cost function describes the cost of transporting a mass between two points in
Rd
. However,
such maps do not always exist, so that Kantorovich [
14
] relaxed the problem in 1942 by
looking for a transport plan with two prescribed marginals
µ1
and
µ2
that minimizes a
certain cost functional.
Several authors have generalized the formulation to multi-marginal optimal transport
(MOT) [
15
17
], where more than two marginal measures are given. For the given probability
measures
µk
on
kRd
,
k=
1,
. . .
,
K
, an optimal transport plan
π
is defined as a solution
of the MOT problem
min
πΠ(µ1,...,µk)Z1×···×Kc(x1, . . . , xK)dπ(x1, . . . , xK)
where
Π(µ1
,
· · ·
,
µK)
is the convex set of all joint probability measures
π
whose marginals
are µk, and c:1× · · · × KRis the cost function.
Since the numerical computation of a transport plan is difficult in general, a reg-
ularization term such as the entropy [
13
,
18
], Kullback–Leibler divergence [
19
], general
f
-divergence [
20
], or
L2
-regularization [
21
,
22
] can be added to make the problem strictly
convex. Different approaches such as the Sinkhorn algorithm [
1
,
13
], stochastic gradient
Algorithms 2022,15, 311. https://doi.org/10.3390/a15090311 https://www.mdpi.com/journal/algorithms
Algorithms 2022,15, 311 2 of 19
descent [
23
], the Gauss–Seidel method [
22
], or proximal splitting [
24
] have been used to
iteratively determine a minimizing sequence of the MOT problem.
However, the problem suffers from the curse of dimensionality as the complexity
grows exponentially with the number
K
of marginal measures. One way to circumvent
this lies in incorporating additional sparsity assumptions into the model. Polynomial-time
algorithms to solve certain sparse MOT problems have been studied in [
13
,
25
,
26
]. We will
assume that the cost function decouples according to a graph, where the nodes correspond
to the marginals and the cost function is the sum of functions that depend only on two
variables which correspond to two marginals connected by an edge of the graph. For
example, the circle-structured Euclidean cost function reads as
c(x1, . . . , xK) =
x1x2
2
2+· · · +
xK1xK
2
2+
xKx1
2
2,x1, . . . , xKRd.
MOT problems with graph-structured cost functions with a constant tree-width can
be solved with the Sinkhorn algorithm in polynomial time [
27
]. In [
25
], polynomial-time
algorithms were presented for the MOT problem and its regularized counterpart for the
cases of a graph structure and a set-optimization structure, as well as a low-rank and
sparsely structured cost function. Another sparsity assumption lies on the transport plan
to be thinly spread, which is, e.g., the case for the L2-regularized problem [21,22].
1.1. Our Contributions
In the present paper, we study the discrete, entropy-regularized
MOT
problem with
a tree-structured [
3
,
13
] or a circle-structured cost function, where all measures are sup-
ported on a finite number of points (atoms) in
Rd
. Then, the computational time of the
Sinkhorn algorithm [
28
30
], which iteratively determines a sequence converging to the
solution, depends linearly on the number
K
of input measures. If the numbers of atoms is
large, however, the Sinkhorn algorithm still requires considerable computational time and
memory, which mainly comes from computing a discrete convolution, i.e., a matrix–vector
product with a kernel matrix.
This is significantly improved by Fourier-based fast summation methods [
31
,
32
]. The
key idea is the approximation of the kernel function using a Fourier series, which enables
the application of the non-uniform fast Fourier transform (NFFT). Although such fast
summation methods are frequently used in different applications such as electrostatic
particle interaction [
33
], tomography [
34
], image segmentation [
35
], and, very recently, also
OT with two marginals [
36
], they have not been utilized for MOT so far. Furthermore, a
method for accelerating the Sinkhorn algorithm for Wasserstein barycenters via computing
the convolution with a different kernel, namely the heat kernel, was discussed in [37].
Our main contribution is the combination of the fast summation method with the spar-
sity of the tree- or circle-structured cost function in the MOT problem for accelerating the
Sinkhorn algorithm. Each iteration step has a complexity of
O(KN)
for a tree- and
O(KN2)
for a circle-structured cost function, compared to
O(KN2)
and
O(KN3)
, respectively, with
the straightforward matrix–vector operations, where
N
is an upper bound of the number
of atoms for each of the
K
marginal measures. Our numerical tests with both tree- and
circle-structured cost functions confirm a considerable acceleration, while the accuracy
stays almost the same. A different acceleration of the Sinkhorn algorithm via low-rank
approximation for tree-structured cost yields the same asymptotic complexity [
38
]. We note
that MOT problems with tree-structured cost functions are used for the computation of
Wasserstein barycenters [
4
], and with circle-structured cost for computing Euler flows [
39
].
1.2. Outline of the Paper
Section 2introduces the notation. In Section 3, we focus on the discrete MOT problem
with squared Euclidean norm cost functions and the numerical solution of the correspond-
ing entropy-regularized problem, using the Sinkhorn algorithm. We investigate sparsely
structured cost functions that decouple according to a tree or circle in Section 4. Then,
Algorithms 2022,15, 311 3 of 19
the complexity of the Sinkhorn algorithm depends only linearly on
K
. Section 5describes
a fast summation method for further accelerating the Sinkhorn algorithm. Finally, in
Section 6
, we verify the practical performance by applying it to generalized Euler flows
and for finding generalized Wasserstein barycenters. We compare the computational time
of the proposed Sinkhorn algorithm based on the NFFT, with the algorithm based on direct
matrix multiplication.
2. Notation
Let
KN
and
n= (n1
,
. . .
,
nk)NK
. We set
[K]:={1, · · · ,K}
and consider
K
finite sets
k=nxk
ik:ik[nk]oRd,k[K],
consisting of points
xk
ikRd
, which are called atoms. We set
:=1× · · · × K
. Addi-
tionally, we define the index set
I:=i= (i1, . . . , iK):ik[nk],k[K]
and the set of K-dimensional matrices (tensors)
Ri:=Ri1× · · · × RiK,iI.
Let P(k)denote the set of probability measures on k. In this paper, we consider K
discrete probability measures, also called marginal measures,µkP(k), given by
µk=
nk
ik=1
µk
ikδxk
ik
,k[K],
where the probabilities satisfy
µk
ik0,
nk
ik=1
µk
ik=1 for all ik[nk],k[K],
and, for all Ak, the Dirac measure is given by
δxk
ik
(A):=
1 if xk
ikA,
0 otherwise.
For G,HRn, we denote their component-wise (Hadamard) product by
GH:=(GiHi)iIRn,
and similarly, their component-wise division by
, as well as the Frobenius inner product
hG,HiF:=
iI
GiHiR.
The tensor product (Kronecker product) of
u
,
vRm
is denoted by
uvRm×m
.
Analogues can be defined for tensors of different size.
3. Multi-Marginal Optimal Transport
In the following, we consider the discrete multi-marginal optimal transport (MOT)
between
K
marginal measures
µkP(k)
,
k[K]
. We define the set of admissible transport
plans by
Π(µ1, . . . , µk):=nΠRn
0:Pk(Π) = µkfor all k[K]o, (1)
Advertisement
Algorithms 2022,15, 311 4 of 19
where the k-th marginal projection is defined as
Pk(Π):=
`[K]\{k}
i`[n`]
Πi1,··· ,iKRnk. (2)
For a cost function
c:R0
and the samples
xi=x1
i1, . . . , xK
iK
,
iI
, we define
the respective cost matrix
C:= [Ci]iI= [c(xi)]iI= [c(x1
i1,· · · ,xK
iK)](i1,··· ,iK)IRn
0. (3)
The discrete MOT problem reads
min
ΠΠ(µ1,...,µk)hΠ,CiF, (4)
whose solution ΠRn
0is called optimal plan.
Entropy Regularization
As the MOT problem
(4)
is numerically unfeasible, we consider for
η>
0 the entropy-
regularized multi-marginal optimal transport (MOTη) problem
min
ΠΠ(µ1,...,µK)hΠ,CiF+ηhΠ, log Π1niF=min
ΠΠ(µ1,...,µk)
iI
ΠiCi+η
iI
Πilog Πi1, (5)
which is a convex optimization problem. It is possible to numerically deduce the optimal
transport plan
ˆ
Π
of
(5)
from the solution of the corresponding Lagrangian dual problem.
The following theorem is a special case of [3] for a constant entropy function.
Theorem 1. The Lagrangian dual formulation of the discrete MOTηproblem (5)states
sup
φkRnk
0,k[K]
Sφ1,· · · ,φK:=sup
φkRnk
0,k[K]
η
k[K]
j[nk]
µk
jlog φk
jη
iI
KiΦi, (6)
where the kernel matrix KRnis defined by
Ki:=exp Ci
η!,iI,
and the dual tensor Φ=NK
k=1φkby
Φi:=
K
k=1
φk
ik,iI.
The functional S:Rn
0Ris called the Sinkhorn function.
The solutions of the dual and the primal problem are generally not equal. The solution
of
(6)
is generally a lower bound to the solution of the primal problem
(5)
. Equality holds if
the cost function
c
is lower semi-continuous; i.e.,
lim inf c(x)c(x0)
as
xx0
for every
x0Rn
, or Borel measurable and bounded [
40
]. This is obviously the case for the squared
Euclidean norm cost function
c:Rn[0, ),c(x) =
k1,k2[K]
k16=k2
xk1
ik1
xk2
ik2
2
2
that we study here.
Algorithms 2022,15, 311 5 of 19
Proposition 1 ([41]).An optimal plan of the MOTηproblem (5)is given by
ˆ
Π=Kˆ
Φ,
where ˆ
Φ=Nk[K]ˆ
φkand ˆ
φk,k[K]are the optimal solutions of dual problem (6).
A sequence converging to the optimal dual vectors
ˆ
φk
,
k[K]
in
(6)
can be iteratively
determined by the Sinkhorn algorithm [
13
,
42
] presented in Algorithm 1, where we note
that line 4 is obtained by deriving the Sinkhorn function
S
with respect to
φk
,
k[K]
.
The complexity of the algorithm mainly comes from the computation of the marginal
Pk(KΦ)
, where the projection
Pk
is defined in
(2)
. In general, the number of operations
depends exponentially on K.
Algorithm 1 Sinkhorn iterations for the MOTηproblem
1: Input:
Initial values
(φk)(0)Rnk
,
k[K]
, regularization parameter
η>
0, threshold
δ>0
2: Set r0
3: do
4: for k=1, . . . , Kdo
5: Compute ˜
Φ(r+1)
k:=N`[k1]φ`(r+1)N`[K]\[k1]φ`(r)
6: Compute dual vectors
φk(r+1):= µkφk(r)!Pk K˜
Φ(r+1)
k!
7: Increment rr+1
8: end for
9: while
S φ1(r), . . . , φK(r)! S φ1(r1), . . . , φK(r1)!
δ
10: return optimal plan ˆ
Π=Kˆ
Φand ˆ
Φ=Nk[K]φk(r+1)
4. Sparse Cost Functions
In this section, we take a look at sparsely structured cost functions, for which the
Sinkhorn algorithm becomes much faster and we overcome the curse of dimensionality.
Let
G=(V,E)
be an undirected graph with vertices
V
and edges
E
. We say that the
MOTη
problem has the graph
G
structure if
V= [K]
and the cost matrix
(3)
decouples according to
Ci=
{k1,k2}E
xk1
ik1
xk2
ik2
2
2,i= (i1, . . . , iK)I.
This implies that the kernel KRn
0satisfies
Ki=exp Ci
η!=
{k1,k2}E
K(k1,k2)
ik1,ik2
,iI, (7)
where the kernel matrix K(k1,k2)Rnk1×nk2
0for {k1,k2} E is given by
K(k1,k2)
ik1,ik2
:=exp 1
η
xk1
ik1
xk2
ik2
2
2!. (8)
We use the indices
k V = [K]
to identify the marginal measures
µk
in the rest of
the paper.
Advertisement
Loading more pages...