Wasserstein metric

The Wasserstein metric (also Vaserstein metric ) is a metric between probability measures on a given metric space .

One can intuitively imagine that if each distribution is viewed as a heap of “ earth ” piled up on metric space, then this metric describes the minimal “cost” of converting one heap into the other. Because of this analogy, this metric is known in computer science as the Earth Mover's Metric .

The metric got its name in 1970 from Roland Lwowitsch Dobruschin , who named it after Leonid Vaseršteĭn . Vaseršteĭn introduced the concept in 1969.

definition

Be a metric space in which every probability a Radon measure on is also Radon space called. For let the set of all probability measures be on with a finite -th moment, that is to say for on off : ${\ displaystyle (M, d)}$ ${\ displaystyle M}$ ${\ displaystyle p \ geq 1}$ ${\ displaystyle P_ {p} (M)}$ ${\ displaystyle \ mu}$ ${\ displaystyle M}$ ${\ displaystyle p}$ ${\ displaystyle x_ {0}}$ ${\ displaystyle M}$

{\ displaystyle \ int _ {M} d (x, x_ {0}) ^ {p} \ mathrm {d} \ mu (x) <\ infty}

.

Then the -th Wasserstein distance between two probability measures and out for is defined as: ${\ displaystyle p}$ ${\ displaystyle \ mu}$ ${\ displaystyle \ nu}$ ${\ displaystyle P_ {p} (M)}$ ${\ displaystyle p <\ infty}$

{\ displaystyle W_ {p} (\ mu, \ nu): = \ left (\ inf _ {\ gamma \ in \ Gamma (\ mu, \ nu)} \ int _ {M \ times M} d (x, y) ^ {p} \ mathrm {d} \ gamma (x, y) \ right) ^ {\ frac {1} {p}}}

,

where the set of all measures denotes, with and as marginal distributions with respect to the first and second factor, respectively. ( This is also called the set of all couplings between and .) For the Wasserstein distance is defined as: ${\ displaystyle \ Gamma (\ mu, \ nu)}$ ${\ displaystyle M \ times M}$ ${\ displaystyle \ mu}$ ${\ displaystyle \ nu}$ ${\ displaystyle \ Gamma (\ mu, \ nu)}$ ${\ displaystyle \ mu}$ ${\ displaystyle \ nu}$ ${\ displaystyle p = \ infty}$

{\ displaystyle W _ {\ infty} (\ mu, \ nu): = \ inf _ {\ gamma \ in \ Gamma (\ mu, \ nu)} \ sup _ {(x, y) \ in \ mathrm {supp } (\ gamma)} d (x, y),}

where is the bearer of measure. ${\ displaystyle \ mathrm {supp} (\ gamma)}$

Examples

Dirac measure

Be and two Dirac measures with . Then the only possible pairing is . Referring now to the distance function, the absolute value function on , we obtain for any : ${\ displaystyle \ mu = \ delta _ {a_ {1}}}$ ${\ displaystyle \ nu = \ delta _ {a_ {2}}}$ ${\ displaystyle a_ {1}, a_ {2} \ in \ mathbb {R}}$ ${\ displaystyle \ delta _ {(a_ {1}, a_ {2})}}$ ${\ displaystyle \ mathbb {R}}$ ${\ displaystyle p \ geq 1}$

{\ displaystyle W_ {p} (\ mu, \ nu) = | a_ {1} -a_ {2} |.}

If now and if one takes the Euclidean distance instead of the absolute value function , one obtains: ${\ displaystyle a_ {1}, a_ {2} \ in \ mathbb {R} ^ {n}}$

{\ displaystyle W_ {p} (\ mu, \ nu) = \ | a_ {1} -a_ {2} \ | _ {2}.}

Normal distribution

Let and be two normal distributions on the , with expected values and covariance matrices . If the Euclidean distance is now taken as the distance function, the 2-Wasserstein metric between and as the sum of the square Euclidean distance of the mean values and the square Frobenius distance between the roots of the covariances can be expressed: ${\ displaystyle \ mu = {\ mathcal {N}} (m_ {1}, C_ {1})}$ ${\ displaystyle \ nu = {\ mathcal {N}} (m_ {2}, C_ {2})}$ ${\ displaystyle \ mathbb {R} ^ {n}}$ ${\ displaystyle m_ {1}, m_ {2} \ in \ mathbb {R} ^ {n}}$ ${\ displaystyle C_ {1}, C_ {2} \ in \ mathbb {R} ^ {n \ times n}}$ ${\ displaystyle \ mu}$ ${\ displaystyle \ nu}$

{\ displaystyle W_ {2} (\ mu, \ nu) ^ {2} = || m_ {1} -m_ {2} || _ {2} ^ {2} + \ | C_ {1} ^ {1 / 2} -C_ {2} ^ {1/2} \ | _ {F} ^ {2} = || m_ {1} -m_ {2} || _ {2} ^ {2} + \ operatorname { Trace} \ left (C_ {1} + C_ {2} -2 (C_ {2} ^ {1/2} C_ {1} C_ {2} ^ {1/2}) ^ {1/2} \ right ) = || m_ {1} -m_ {2} || _ {2} ^ {2} + \ operatorname {track} \ left (C_ {1} \ right) + \ operatorname {track} \ left (C_ { 2} \ right) -2 \ operatorname {track} \ left ((C_ {1} C_ {2}) ^ {1/2} \ right).}

This result also generalizes the previous example, since the Dirac measure can be viewed as normal distributions with a covariance matrix equal to zero. Then the trace terms are omitted and only the distance between the expected values remains. ${\ displaystyle p = 2}$

application

The Wasserstein metric is a natural way to compare the probability distributions of two variables and , where one variable is derived from the other by small, non-uniform perturbations (random or deterministic). ${\ displaystyle X}$ ${\ displaystyle Y}$

In computer science, for example, the metric is widely used to compare discrete distributions, for example the color histograms of two digital images. ${\ displaystyle W_ {1}}$

properties

Metric structure

It can be shown that all axioms of a metric are met. In addition, convergence is equivalent to the weak convergence of measures plus the convergence of the first moments. ${\ displaystyle W_ {p}}$ ${\ displaystyle P_ {p} (M)}$ ${\ displaystyle W_ {p}}$ ${\ displaystyle p}$

It applies to and : ${\ displaystyle 1 \ leq p \ leq q <\ infty}$ ${\ displaystyle \ mu, \ nu \ in P_ {p} (M)}$

{\ displaystyle W_ {p} (\ mu, \ nu) \ leq W_ {q} (\ mu, \ nu).}

Dual representation of the ${\ displaystyle W_ {1}}$

If and limited carrier who then applies ${\ displaystyle \ mu}$ ${\ displaystyle \ nu}$

{\ displaystyle W_ {1} (\ mu, \ nu) = \ sup \ left \ {\ left. \ int _ {M} f (x) \, \ mathrm {d} (\ mu - \ nu) (x ) \ right | {\ mbox {continuous}} f \ colon M \ to \ mathbb {R}, \, \ mathrm {Lip} (f) \ leq 1 \ right \}}

,

where describes the smallest Lipschitz constant of . ${\ displaystyle \ mathrm {Lip} (f)}$ ${\ displaystyle f}$

This can be compared to the definition of the Radon metric:

{\ displaystyle \ rho (\ mu, \ nu): = \ sup \ left \ {\ left. \ int _ {M} f (x) \, \ mathrm {d} (\ mu - \ nu) (x) \ right | {\ mbox {continuous}} f \ colon M \ to [-1,1] \ right \}}

.

If the metric is constrained by a , then: ${\ displaystyle d}$ ${\ displaystyle C}$

{\ displaystyle 2W_ {1} (\ mu, \ nu) \ leq C \ rho (\ mu, \ nu)}

.

thus the convergence in the Radon metric implies the convergence with respect to . The reverse direction generally does not apply. ${\ displaystyle W_ {1}}$

Separability and completeness

For each , the metric space is separable and complete if it is separable and complete. ${\ displaystyle p \ geq 1}$ ${\ displaystyle (P_ {p} (M), W_ {p})}$ ${\ displaystyle (M, d)}$

literature

Ambrosio, L., Gigli, N. & Savaré, G .: Gradient Flows in Metric Spaces and in the Space of Probability Measures . ETH Zurich, Birkhäuser Verlag, Basel 2005, ISBN 3-7643-2428-7 .
Richard Jordan, David Kinderlehrer , Felix Otto : The variational formulation of the Fokker-Planck equation . In: SIAM J. Math. Anal. . 29, No. 1, 1998, ISSN 0036-1410 , pp. 1-17 (electronic). doi : 10.1137 / S0036141096303359 .

Individual evidence

↑ ^a ^b Facundo Mémoli : Gromov-Wasserstein Distances and the Metric Approach to Object Matching . In: Foundation of Computational Mathematics . April 2011, pp. 427-430. doi : 10.1007 / s10208-011-9093-5 .
↑ Olkin, I. and Pukelsheim, F .: The distance between two random vectors with given dispersion matrices . In: Linear Algebra Appl. . 48, 1982, ISSN 0024-3795 , pp. 257-263. doi : 10.1016 / 0024-3795 (82) 90112-4 .
^ Dowson, DC and Landau, BV: The Fréchet Distance between Multivariate Normal Distributions . In: J. of Multivariate Analysis . 12, No. 3, 1982, ISSN 0047-259X , pp. 450-455. doi : 10.1016 / 0047-259X (82) 90077-X .
↑ Bogachev, VI, Kolesnikov, AV: The Monge-Kantorovich problem: achievements, connections, and perspectives . In: Russian Math. Surveys . 67, September, pp. 785-890. doi : 10.1070 / RM2012v067n05ABEH004808 .

[memoli11-1] Facundo Mémoli : Gromov-Wasserstein Distances and the Metric Approach to Object Matching . In: Foundation of Computational Mathematics . April 2011, pp. 427-430. doi : 10.1007 / s10208-011-9093-5 .

[2] Olkin, I. and Pukelsheim, F .: The distance between two random vectors with given dispersion matrices . In: Linear Algebra Appl. . 48, 1982, ISSN 0024-3795 , pp. 257-263. doi : 10.1016 / 0024-3795 (82) 90112-4 .

[3] Dowson, DC and Landau, BV: The Fréchet Distance between Multivariate Normal Distributions . In: J. of Multivariate Analysis . 12, No. 3, 1982, ISSN 0047-259X , pp. 450-455. doi : 10.1016 / 0047-259X (82) 90077-X .

[4] Bogachev, VI, Kolesnikov, AV: The Monge-Kantorovich problem: achievements, connections, and perspectives . In: Russian Math. Surveys . 67, September, pp. 785-890. doi : 10.1070 / RM2012v067n05ABEH004808 .