Accelerating the Sinkhorn Algorithm for Sparse Multi-Marginal Optimal Transport via Fast Fourier Transforms [original]

Citation: Ba, F.A.; Quellmalz, M.

Accelerating the Sinkhorn Algorithm

for Sparse Multi-Marginal Optimal

Transport via Fast Fourier

Transforms. Algorithms 2022,15, 311.

https://doi.org/10.3390/a15090311

Academic Editor: Francisco Saldanha

da Gama

Received: 5 August 2022

Accepted: 26 August 2022

Published: 30 August 2022

Publisher’s Note: MDPI stays neutral

with regard to jurisdictional claims in

published maps and institutional affil-

iations.

Licensee MDPI, Basel, Switzerland.

This article is an open access article

distributed under the terms and

conditions of the Creative Commons

Attribution (CC BY) license (https://

creativecommons.org/licenses/by/

4.0/).

algorithms

Article

Accelerating the Sinkhorn Algorithm for Sparse

Multi-Marginal Optimal Transport via Fast Fourier Transforms

Fatima Antarou Ba * , Michael Quellmalz

Institute of Mathematics, Technische Universität Berlin, Straße des 17. Juni 136, 10623 Berlin, Germany

*Correspondence: [email protected]

Abstract:

We consider the numerical solution of the discrete multi-marginal optimal transport (MOT)

by means of the Sinkhorn algorithm. In general, the Sinkhorn algorithm suffers from the curse of

dimensionality with respect to the number of marginals. If the MOT cost function decouples according

to a tree or circle, its complexity is linear in the number of marginal measures. In this case, we speed

up the convolution with the radial kernel required in the Sinkhorn algorithm via non-uniform fast

Fourier methods. Each step of the proposed accelerated Sinkhorn algorithm with a tree-structured

cost function has a complexity of

O(KN)

instead of the classical

O(KN2)

for straightforward matrix–

vector operations, where

is the number of marginals and each marginal measure is supported on,

at most,

points. In the case of a circle-structured cost function, the complexity improves from

O(KN3)to O(KN2). This is confirmed through numerical experiments.

Keywords:

multi-marginal optimal transport; Sinkhorn algorithm; fast Fourier transform; image

processing; optimal transport; Wasserstein barycenter

MSC: 49Q22; 65T50; 65D18; 49M20

1. Introduction

The optimal transport (OT) problem is an optimization problem that deals with the

search for an optimal map (plan) that moves masses between two or more measures at

low cost [

]. OT appears in a wide range of applications such as image and signal

processing

[3–7]

, economics [

], finance [

], and physics [

]. The OT problem

was first introduced in 1781 by Monge. His objective was to find a map between two

probability measures

µ1

µ2

that transports

µ1

µ2

with minimal cost, where the

cost function describes the cost of transporting a mass between two points in

. However,

such maps do not always exist, so that Kantorovich [

] relaxed the problem in 1942 by

looking for a transport plan with two prescribed marginals

µ1

and

µ2

that minimizes a

certain cost functional.

Several authors have generalized the formulation to multi-marginal optimal transport

(MOT) [

–

], where more than two marginal measures are given. For the given probability

measures

µk

Ωk⊂Rd

. . .

, an optimal transport plan

is defined as a solution

of the MOT problem

min

π∈Π(µ1,...,µk)ZΩ1×···×ΩKc(x1, . . . , xK)dπ(x1, . . . , xK)

where

Π(µ1

· · ·

µK)

is the convex set of all joint probability measures

whose marginals

are µk, and c:Ω1× · · · × ΩK→Ris the cost function.

Since the numerical computation of a transport plan is difficult in general, a reg-

ularization term such as the entropy [

], Kullback–Leibler divergence [

], general

-divergence [

], or

-regularization [

] can be added to make the problem strictly

convex. Different approaches such as the Sinkhorn algorithm [

], stochastic gradient

Algorithms 2022,15, 311. https://doi.org/10.3390/a15090311 https://www.mdpi.com/journal/algorithms

Algorithms 2022,15, 311 2 of 19

descent [

], the Gauss–Seidel method [

], or proximal splitting [

] have been used to

iteratively determine a minimizing sequence of the MOT problem.

However, the problem suffers from the curse of dimensionality as the complexity

grows exponentially with the number

of marginal measures. One way to circumvent

this lies in incorporating additional sparsity assumptions into the model. Polynomial-time

algorithms to solve certain sparse MOT problems have been studied in [

]. We will

assume that the cost function decouples according to a graph, where the nodes correspond

to the marginals and the cost function is the sum of functions that depend only on two

variables which correspond to two marginals connected by an edge of the graph. For

example, the circle-structured Euclidean cost function reads as

c(x1, . . . , xK) =



x1−x2



2+· · · +



xK−1−xK



2+



xK−x1



2,x1, . . . , xK∈Rd.

MOT problems with graph-structured cost functions with a constant tree-width can

be solved with the Sinkhorn algorithm in polynomial time [

]. In [

], polynomial-time

algorithms were presented for the MOT problem and its regularized counterpart for the

cases of a graph structure and a set-optimization structure, as well as a low-rank and

sparsely structured cost function. Another sparsity assumption lies on the transport plan

to be thinly spread, which is, e.g., the case for the L2-regularized problem [21,22].

1.1. Our Contributions

In the present paper, we study the discrete, entropy-regularized

MOT

problem with

a tree-structured [

] or a circle-structured cost function, where all measures are sup-

ported on a finite number of points (atoms) in

. Then, the computational time of the

Sinkhorn algorithm [

–

], which iteratively determines a sequence converging to the

solution, depends linearly on the number

of input measures. If the numbers of atoms is

large, however, the Sinkhorn algorithm still requires considerable computational time and

memory, which mainly comes from computing a discrete convolution, i.e., a matrix–vector

product with a kernel matrix.

This is significantly improved by Fourier-based fast summation methods [

]. The

key idea is the approximation of the kernel function using a Fourier series, which enables

the application of the non-uniform fast Fourier transform (NFFT). Although such fast

summation methods are frequently used in different applications such as electrostatic

particle interaction [

], tomography [

], image segmentation [

], and, very recently, also

OT with two marginals [

], they have not been utilized for MOT so far. Furthermore, a

method for accelerating the Sinkhorn algorithm for Wasserstein barycenters via computing

the convolution with a different kernel, namely the heat kernel, was discussed in [37].

Our main contribution is the combination of the fast summation method with the spar-

sity of the tree- or circle-structured cost function in the MOT problem for accelerating the

Sinkhorn algorithm. Each iteration step has a complexity of

O(KN)

for a tree- and

O(KN2)

for a circle-structured cost function, compared to

O(KN2)

and

O(KN3)

, respectively, with

the straightforward matrix–vector operations, where

is an upper bound of the number

of atoms for each of the

marginal measures. Our numerical tests with both tree- and

circle-structured cost functions confirm a considerable acceleration, while the accuracy

stays almost the same. A different acceleration of the Sinkhorn algorithm via low-rank

approximation for tree-structured cost yields the same asymptotic complexity [

]. We note

that MOT problems with tree-structured cost functions are used for the computation of

Wasserstein barycenters [

], and with circle-structured cost for computing Euler flows [

1.2. Outline of the Paper

Section 2introduces the notation. In Section 3, we focus on the discrete MOT problem

with squared Euclidean norm cost functions and the numerical solution of the correspond-

ing entropy-regularized problem, using the Sinkhorn algorithm. We investigate sparsely

structured cost functions that decouple according to a tree or circle in Section 4. Then,

Algorithms 2022,15, 311 3 of 19

the complexity of the Sinkhorn algorithm depends only linearly on

. Section 5describes

a fast summation method for further accelerating the Sinkhorn algorithm. Finally, in

Section 6

, we verify the practical performance by applying it to generalized Euler flows

and for finding generalized Wasserstein barycenters. We compare the computational time

of the proposed Sinkhorn algorithm based on the NFFT, with the algorithm based on direct

matrix multiplication.

2. Notation

Let

K∈N

and

n= (n1

. . .

nk)∈NK

. We set

[K]:={1, · · · ,K}

and consider

finite sets

Ωk=nxk

ik:ik∈[nk]o⊂Rd,k∈[K],

consisting of points

ik∈Rd

, which are called atoms. We set

Ω:=Ω1× · · · × ΩK

. Addi-

tionally, we define the index set

I:=i= (i1, . . . , iK):ik∈[nk],k∈[K]

and the set of K-dimensional matrices (tensors)

Ri:=Ri1× · · · × RiK,i∈I.

Let P(Ωk)denote the set of probability measures on Ωk. In this paper, we consider K

discrete probability measures, also called marginal measures,µk∈P(Ωk), given by

µk=

∑

ik=1

µk

ikδxk

,k∈[K],

where the probabilities satisfy

µk

ik≥0,

∑

ik=1

µk

ik=1 for all ik∈[nk],k∈[K],

and, for all A⊂Ωk, the Dirac measure is given by

δxk

(A):=





1 if xk

ik∈A,

0 otherwise.

For G,H∈Rn, we denote their component-wise (Hadamard) product by

GH:=(GiHi)i∈I∈Rn,

and similarly, their component-wise division by



, as well as the Frobenius inner product

hG,HiF:=∑

i∈I

GiHi∈R.

The tensor product (Kronecker product) of

v∈Rm

is denoted by

u⊗v∈Rm×m

Analogues can be defined for tensors of different size.

3. Multi-Marginal Optimal Transport

In the following, we consider the discrete multi-marginal optimal transport (MOT)

between

marginal measures

µk∈P(Ωk)

k∈[K]

. We define the set of admissible transport

plans by

Π(µ1, . . . , µk):=nΠ∈Rn

≥0:Pk(Π) = µkfor all k∈[K]o, (1)

Algorithms 2022,15, 311 4 of 19

where the k-th marginal projection is defined as

Pk(Π):=∑

`∈[K]\{k}

∑

i`∈[n`]

Πi1,··· ,iK∈Rnk. (2)

For a cost function

c:Ω→R≥0

and the samples

xi=x1

i1, . . . , xK

iK

i∈I

, we define

the respective cost matrix

C:= [Ci]i∈I= [c(xi)]i∈I= [c(x1

i1,· · · ,xK

iK)](i1,··· ,iK)∈I∈Rn

≥0. (3)

The discrete MOT problem reads

min

Π∈Π(µ1,...,µk)hΠ,CiF, (4)

whose solution Π∈Rn

≥0is called optimal plan.

Entropy Regularization

As the MOT problem

(4)

is numerically unfeasible, we consider for

η>

0 the entropy-

regularized multi-marginal optimal transport (MOTη) problem

min

Π∈Π(µ1,...,µK)hΠ,CiF+ηhΠ, log Π−1niF=min

Π∈Π(µ1,...,µk)∑

i∈I

ΠiCi+η∑

i∈I

Πilog Πi−1, (5)

which is a convex optimization problem. It is possible to numerically deduce the optimal

transport plan

(5)

from the solution of the corresponding Lagrangian dual problem.

The following theorem is a special case of [3] for a constant entropy function.

Theorem 1. The Lagrangian dual formulation of the discrete MOTηproblem (5)states

sup

φk∈Rnk

≥0,k∈[K]

Sφ1,· · · ,φK:=sup

φk∈Rnk

≥0,k∈[K]

η∑

k∈[K]

∑

j∈[nk]

µk

jlog φk

j−η∑

i∈I

KiΦi, (6)

where the kernel matrix K∈Rnis defined by

Ki:=exp −Ci

η!,i∈I,

and the dual tensor Φ=NK

k=1φkby

Φi:=

∏

k=1

φk

ik,i∈I.

The functional S:Rn

≥0→Ris called the Sinkhorn function.

The solutions of the dual and the primal problem are generally not equal. The solution

(6)

is generally a lower bound to the solution of the primal problem

(5)

. Equality holds if

the cost function

is lower semi-continuous; i.e.,

lim inf c(x)≥c(x0)

x→x0

for every

x0∈Rn

, or Borel measurable and bounded [

]. This is obviously the case for the squared

Euclidean norm cost function

c:Rn→[0, ∞),c(x) = ∑

k1,k2∈[K]

k16=k2



xk1

ik1

−xk2

ik2



that we study here.

Algorithms 2022,15, 311 5 of 19

Proposition 1 ([41]).An optimal plan of the MOTηproblem (5)is given by

Π=Kˆ

Φ,

where ˆ

Φ=Nk∈[K]ˆ

φkand ˆ

φk,k∈[K]are the optimal solutions of dual problem (6).

A sequence converging to the optimal dual vectors

φk

k∈[K]

(6)

can be iteratively

determined by the Sinkhorn algorithm [

] presented in Algorithm 1, where we note

that line 4 is obtained by deriving the Sinkhorn function

with respect to

φk

k∈[K]

The complexity of the algorithm mainly comes from the computation of the marginal

Pk(KΦ)

, where the projection

is defined in

(2)

. In general, the number of operations

depends exponentially on K.

Algorithm 1 Sinkhorn iterations for the MOTηproblem

1: Input:

Initial values

(φk)(0)∈Rnk

k∈[K]

, regularization parameter

η>

0, threshold

δ>0

2: Set r←0

3: do

4: for k=1, . . . , Kdo

5: Compute ˜

Φ(r+1)

k:=N`∈[k−1]φ`(r+1)⊗N`∈[K]\[k−1]φ`(r)

6: Compute dual vectors

φk(r+1):= µkφk(r)!Pk K˜

Φ(r+1)

7: Increment r←r+1

8: end for

9: while

S φ1(r), . . . , φK(r)!− S φ1(r−1), . . . , φK(r−1)!

≥δ

10: return optimal plan ˆ

Π=Kˆ

Φand ˆ

Φ=Nk∈[K]φk(r+1)

4. Sparse Cost Functions

In this section, we take a look at sparsely structured cost functions, for which the

Sinkhorn algorithm becomes much faster and we overcome the curse of dimensionality.

Let

G=(V,E)

be an undirected graph with vertices

and edges

. We say that the

MOTη

problem has the graph

structure if

V= [K]

and the cost matrix

(3)

decouples according to

Ci=∑

{k1,k2}∈E 



xk1

ik1

−xk2

ik2



2,i= (i1, . . . , iK)∈I.

This implies that the kernel K∈Rn

≥0satisfies

Ki=exp −Ci

η!=∏

{k1,k2}∈E

K(k1,k2)

ik1,ik2

,i∈I, (7)

where the kernel matrix K(k1,k2)∈Rnk1×nk2

≥0for {k1,k2} ∈ E is given by

K(k1,k2)

ik1,ik2

:=exp −1

η



xk1

ik1

−xk2

ik2



2!. (8)

We use the indices

k∈ V = [K]

to identify the marginal measures

µk

in the rest of

the paper.

Loading more pages...