# Lorenz curve

Use of the Lorenz curve to illustrate the distribution of income : For example (in the solid curve) the poorest 50% of households have around 27% of total income; the poorest 80% have around 60% of their income. Of course, this also shows that the remaining 40% of income comes from the richest 20% of households. The dashed curve shows an unequal distribution of income. Here, the poorest 50% only have around 15% of the income.

The Lorenz curve (also: Lorenz curve ) was developed in 1905 by the American statistician and economist Max Otto Lorenz (1876-1959). It shows statistical distributions graphically and illustrates the extent of disparity (inequality) or relative concentration within the distribution; therefore it is also called the disparity curve. Official statistics use the Lorenz curve to illustrate the distribution of income in a country; These calculations are based on a list of individual incomes or assets, sorted in ascending order from left to right (see also: Pen's Parade ).

## Structure and explanation

The Lorenz curve is a function in the unit square of the 1st quadrant. It shows which parts of the total feature sum are attributable to which parts of the basic set with feature carriers. The -axis ( abscissa ) shows the shares in the totality of the feature carriers (for example: population), and the -axis ( ordinate ) shows the shares in the total feature sum (for example: income). First of all, the data are sorted in ascending order - starting with the lowest proportion of the total of features - and then cumulated ("summed up"). This creates the characteristic "belly" of the Lorenz curve below the diagonal, which reflects the extent of the uneven distribution. Each point on the Lorenz curve stands for a statement such as “the bottom 20% of all households receive 10% of total income” (see: Pareto principle ). A perfect equal distribution of income would be an income distribution where everyone has the same income. In this case the bottom of society would always have the income. This can be clearly illustrated by a straight line ; they are called the perfect line of perfect equality. In contrast, the perfect inequality would be a distribution where one person has all income and all other people have no income. In this case the curve would be for everyone and at This curve is called the perfect line of perfect inequality . ${\ displaystyle n}$${\ displaystyle x}$${\ displaystyle y}$${\ displaystyle N \, \%}$${\ displaystyle N \, \%}$${\ displaystyle y = x}$ ${\ displaystyle y = 0 \, \%}$${\ displaystyle x <100 \, \%}$${\ displaystyle y = 100 \, \%}$${\ displaystyle x = 100 \, \% \ ,.}$

The Gini coefficient is the proportion of the area between the perfect uniform distribution line and the observed Lorenz curve of the area under the uniform distribution line. The Gini coefficient is therefore a number between 0 and 1; the higher it is, the more uneven the distribution.

## calculation

### Discreet case

The Lorenz curve is defined as a linear curve defined in sections (i.e. as a polygon course ) through the points . If the shares in the totality of the feature carriers and the shares in the total feature sum (for these terms, see above "Structure and explanation"), then the coordinates of the points are defined as: ${\ displaystyle (0 \ vert 0), \ left (u_ {1} \ vert v_ {1} \ right), \ left (u_ {2} \ vert v_ {2} \ right), \ ldots, \ left ( u_ {n} \ vert v_ {n} \ right), (1 \ vert 1)}$${\ displaystyle x_ {j}}$${\ displaystyle y_ {j}}$${\ displaystyle i = 1, \ ldots, n}$

${\ displaystyle u_ {i} = \ sum \ limits _ {j = 1} ^ {i} x_ {j}}$ .

and

${\ displaystyle v_ {i} = \ sum \ limits _ {j = 1} ^ {i} y_ {j}}$ .

#### As a general rule

The Lorenz curve can often be represented by a function , with the abscissa and ordinate being plotted. ${\ displaystyle L (F)}$${\ displaystyle F}$${\ displaystyle L}$

For a population of size with a sequence of values , indexed in ascending order , the Lorenz curve is the continuous, section-wise linear function that connects the points ( , ) , where , is and for : ${\ displaystyle n}$${\ displaystyle y_ {i}}$${\ displaystyle i = 1,2, \ ldots, n}$${\ displaystyle \ left (y_ {i} \ leq y_ {i + 1} \ right) \ ,,}$${\ displaystyle F_ {i}}$${\ displaystyle L_ {i}}$${\ displaystyle i = 0.1, \ ldots, n}$${\ displaystyle F_ {0} = 0}$${\ displaystyle L_ {0} = 0}$${\ displaystyle i = 1,2, \ ldots, n}$

${\ displaystyle F_ {i} = {\ frac {i} {n}} \ ,,}$
${\ displaystyle S_ {i} = \ sum \ limits _ {j = 1} ^ {i} y_ {j} \ ,,}$
${\ displaystyle L_ {i} = {\ frac {S_ {i}} {S_ {n}}} \ ,.}$

It is also called the Lorenz asymmetry coefficient . ${\ displaystyle S_ {i}}$

For a discrete probability function are , the points with Non- / nonzero probabilities indicated by ascending order . The Lorenz curve is the continuous linear function, defined in sections, which connects the points ( , ) ,, with each other, where , is and for applies: ${\ displaystyle f (y)}$${\ displaystyle y_ {i}}$${\ displaystyle i = 1,2, \ ldots, n}$${\ displaystyle \ left (y_ {i} ${\ displaystyle F_ {i}}$${\ displaystyle L_ {i}}$${\ displaystyle i = 0.1, \ ldots, n}$${\ displaystyle F_ {0} = {0}}$${\ displaystyle L_ {0} = 0}$${\ displaystyle i = 1,2, \ ldots, n}$

${\ displaystyle F_ {i} = \ sum \ limits _ {j = 1} ^ {i} f (y_ {j}) \ ,,}$
${\ displaystyle S_ {i} = \ sum \ limits _ {j = 1} ^ {i} f (y_ {j}) \, y_ {j} \ ,,}$
${\ displaystyle L_ {i} = {\ frac {S_ {i}} {S_ {n}}} \ ,.}$

For the Laplace distribution , i.e. for all , one obtains exactly the above formulas for and . ${\ displaystyle f (y_ {i}) = {\ frac {1} {n}}}$${\ displaystyle i = 1 \ ldots, n}$${\ displaystyle F_ {i}}$${\ displaystyle L_ {i}}$

For a probability density function with the cumulative probability distribution function , the Lorenz curve is defined by: ${\ displaystyle f (x)}$${\ displaystyle F (x)}$${\ displaystyle L [F (x)]}$

${\ displaystyle L [F (x)] = {\ frac {\ int \ limits _ {- \ infty} ^ {x} t \, f (t) \, \ mathrm {d} t} {\ int \ limits _ {- \ infty} ^ {\ infty} t \, f (t) \, \ mathrm {d} t}} \ ,.}$

For a cumulative distribution function with the inverse function , the Lorenz curve is given by: ${\ displaystyle F (x)}$${\ displaystyle x (F)}$${\ displaystyle L (F)}$

${\ displaystyle L (F) = {\ frac {\ int \ limits _ {0} ^ {F} x (F_ {1}) \, \ mathrm {d} F_ {1}} {\ int \ limits _ { 0} ^ {1} x (F_ {1}) \, \ mathrm {d} F_ {1}}} \ ,.}$

The inverse function could not exist because the cumulative distribution function has jump points ( discontinuities ) or intervals of constant values. The previous equation remains valid if one defines more generally by the following formula:${\ displaystyle x (F)}$${\ displaystyle x (F_ {1})}$${\ displaystyle x (F_ {1}) = \ inf \ left \ {y \ colon F (y) \ geq F_ {1} \ right \} \ ,.}$

#### Gastwirths definition

Consider a non-negative random variable with the associated normalized quantile function . After Joseph Lewis Gastwirth the figure is ${\ displaystyle X}$ ${\ displaystyle Q ^ {*}}$

{\ displaystyle {\ begin {aligned} L \ colon [0,1] \ rightarrow & [0,1] \\\ alpha \ mapsto & L (\ alpha) = \ int \ limits _ {0} ^ {\ alpha} Q ^ {*} ({\ tilde {\ alpha}}) \ mathrm {d} {\ tilde {\ alpha}} \ end {aligned}}}

referred to as the (continuous) Lorenz curve of or for the distribution of . ${\ displaystyle X}$${\ displaystyle X}$

## properties

The Lorenz curve has the following properties:

• It always begins at the origin of coordinates and ends at the point .${\ displaystyle (0 \ vert 0)}$${\ displaystyle (1 \ vert 1)}$
• The derivative of the curve is monotonically increasing, which is why the curve itself is convex and lies below the diagonal.
• The Lorenz curve is continuous on the open interval (0.1), in the discrete case even piecewise linear .

The Lorenz curve is not defined for a mean value of the probability distribution of zero or infinity.

The Lorenz curve for a probability distribution is a continuous function. But Lorenz curves of discontinuous functions can be formulated as limit values ​​(Limes) of the Lorenz curves of the probability distributions - such as the perfect inequality line.

The data of a Lorenz curve can be summarized by the Gini coefficient and the Lorenz asymmetry coefficient.

The Lorenz curve is invariant with positive scaling. If is a random variable , then the random variable has the same Lorenz curve as for every positive number , whereby the Lorenz curve of a random variable is of course understood to be that of the associated distribution. ${\ displaystyle X}$${\ displaystyle cX}$${\ displaystyle c}$${\ displaystyle X}$

The Lorenz curve is not invariant under translations, i.e. under a constant shift of the values. If a random variable with a Lorenz curve and the mean is, then the following formula is obtained for the Lorenz curve of the shifted random variable , with a fixed constant: ${\ displaystyle X}$${\ displaystyle L_ {X} (F)}$${\ displaystyle \ mu _ {X}}$${\ displaystyle L_ {X + c} (F)}$${\ displaystyle X + c}$${\ displaystyle c \ neq - \ mu _ {X}}$

${\ displaystyle F-L_ {X + c} (F) = {\ frac {\ mu _ {X}} {\ mu _ {X} + c}} [F-L_ {X} (F)]}$

For a cumulative distribution function with the mean and the (generalized) inverse function holds for each with${\ displaystyle F (x)}$${\ displaystyle \ mu}$${\ displaystyle x (F)}$${\ displaystyle F}$${\ displaystyle 0

• If the Lorenz curve is differentiable, the following applies:
${\ displaystyle {\ frac {\ mathrm {d} L (F)} {\ mathrm {d} F}} = {\ frac {x (F)} {\ mu}} \ ,;}$
• If the Lorenz curve is twofold differentiable, then the probability density function exists at this point and:${\ displaystyle f (x)}$
${\ displaystyle {\ frac {\ mathrm {d} ^ {2} L (F)} {\ mathrm {d} F ^ {2}}} = {\ frac {1} {\ mu f [x (F) ]}} \ ,;}$
• If is continuously differentiable, then the tangent of is parallel to the perfect straight line of equality in the point . This is also the point at which the equality discrepancy , the vertical distance between the Lorenz curve and the perfect equality line, is greatest. The size of the discrepancy is equal to half the relative mean deviation:${\ displaystyle L (F)}$${\ displaystyle L (F)}$${\ displaystyle F (\ mu)}$${\ displaystyle FL (F)}$
${\ displaystyle F (\ mu) -L [F (\ mu)] = {\ frac {\ text {mean deviation}} {2 \ mu}} \ ,.}$

The Lorenz curve of a random variable is mirrored at the point when you pass from to , that is, with the terms introduced above: ${\ displaystyle X}$${\ displaystyle (0.5 \ vert 0.5)}$${\ displaystyle X}$${\ displaystyle -X}$

${\ displaystyle L _ {- X} (F) = 1-L_ {X} (1-F) \ ,.}$

### Extreme cases

The more evenly the sum of features is distributed among the carriers, the closer the Lorenz curve approaches the diagonal. In the extreme case of equal economic distribution (statistical one-point distribution) it coincides with it.

In the case of greater disparity, the curve moves downwards in the direction of the abscissa. For the extreme case of maximum uneven distribution (one feature carrier combines the entire feature sum) the Lorenz curve runs as a line on the abscissa up to and from there to the point . ${\ displaystyle 1 - {\ tfrac {1} {n}}}$${\ displaystyle (1 \ vert 1)}$

### Constantly and discretely classified data

The exact shape of the Lorenz curve depends on the type of data for the feature. In principle, continuous data (see example image above) must be distinguished from discrete data. In the second case, the Lorenz curve is a line through the points . ${\ displaystyle \ left (F_ {j} \ vert L_ {j} \ right)}$

## Measurement of the relative concentration (disparity)

The Lorenz curve offers a graphical way of looking at the extent of disparity within a distribution. The more the curve curves downwards, the greater the disparity (see section Extreme Cases ). In the event that two Lorenz curves intersect, however, the graphic can no longer be used to clearly determine which one has the greater disparity. The measurement using graphics is also too imprecise. Precise values ​​are provided by the Gini coefficient and coefficient of variation . The Gini coefficient is directly related to the Lorenz curve: it is twice the area between the Lorenz curve and the diagonal in the unit square.

## Example table for discretely classified data

A data collection for 5 classes, which are named with an index , resulted in the relative frequencies (proportion of the characteristic carriers of the class in the total of the characteristic carriers) and the proportions of the characteristic sum, which are allotted to the class , in the table below. From this we determine ${\ displaystyle j = 1, \ ldots, n}$${\ displaystyle j}$${\ displaystyle f_ {j}}$${\ displaystyle h_ {j}}$${\ displaystyle j}$

• ${\ displaystyle F_ {j} \ colon}$ cumulative (relative frequency),
• ${\ displaystyle L_ {j} \ colon}$cumulative (disparity) .${\ displaystyle h_ {j}}$
index ${\ displaystyle j}$ Relative frequency ${\ displaystyle f_ {j}}$ Cumulative relative frequency ${\ displaystyle F_ {j}}$ Disparity ${\ displaystyle h_ {j}}$ Cumulative disparity ${\ displaystyle L_ {j}}$
1 0.2 0.2 0.00 0.00
2 0.4 0.6 0.05 0.05
3 0.1 0.7 0.15 0.20
4th 0.1 0.8 0.30 0.50
5 0.2 1.0 0.50 1.00

Explanation:

The Lorenz curve is created by plotting on the abscissa and the ordinate and connecting the points with a line. ${\ displaystyle F_ {j}}$${\ displaystyle L_ {j}}$

The article on Pareto distribution contains another example of a Lorenz curve.

## Rothschild and Stiglitz theorem

Two distributions are given and with The Lorenz curve of lies above the Lorenz curve of Then and only then applies to every symmetrical and quasi-convex function${\ displaystyle \ left (x_ {1}, \ ldots, x_ {n} \ right)}$${\ displaystyle \ left (x_ {1} ^ {*}, \ ldots, x_ {n} ^ {*} \ right)}$${\ displaystyle \ sum x_ {v} = \ sum x_ {v} ^ {*} \ ,.}$${\ displaystyle \ left (x_ {1}, \ ldots, x_ {n} \ right)}$${\ displaystyle \ left (x_ {1} ^ {*}, \ ldots, x_ {n} ^ {*} \ right) \ ,.}$ ${\ displaystyle F \ colon}$

${\ displaystyle F \ left (x_ {1}, \ ldots, x_ {n} \ right) \ leq F \ left (x_ {1} ^ {*}, \ ldots, x_ {n} ^ {*} \ right ) \ ,.}$

Conclusion: If two Lorenz curves intersect, it depends on the choice of the respective symmetrical and quasi-convex function which of the two curves is to be described as the one with the greater inequality . ${\ displaystyle F}$

## length

The length of the Lorenz curve can also be cited as a measure of disparity (measure of the relative concentration) . The value range is valid for the definition range${\ displaystyle L_ {L}}$${\ displaystyle \ mathbb {W} _ {L_ {L}} = \ left \ {L \ in \ mathbb {R} {\ bigg \ vert} {\ sqrt {2}} \ leq L \ leq 2 \ right \ } \ ,;}$${\ displaystyle \ mathbb {D} _ {L_ {L}} = \ left \ {x \ in \ mathbb {R} \ vert 0 \ leq x \ leq 1 \ right \}.}$

### Discreet case

As the name suggests, this can be derived from the discrete Lorenz curve by adding up the lengths of the route sections. The following applies to the length of the discrete Lorenz curve:

{\ displaystyle {\ begin {aligned} L_ {L} & = {\ sqrt {{F \ left (x_ {1} \ right)} ^ {2} + {g \ left (x_ {1} \ right)} ^ {2}}} + {\ sqrt {{\ left [F \ left (x_ {2} \ right) -F \ left (x_ {1} \ right) \ right]} ^ {2} + {\ left [g \ left (x_ {2} \ right) -g \ left (x_ {1} \ right) \ right]} ^ {2}}} + \ ldots + {\ sqrt {{\ left [F \ left ( x_ {n} \ right) \ right] -F \ left (x_ {n-1} \ right)} ^ {2} + {\ left [g \ left (x_ {n} \ right) -g \ left ( x_ {n-1} \ right) \ right]} ^ {2}}} \\ & = \ sum \ limits _ {j = 1} ^ {n-1} {\ sqrt {{\ left [F \ left (x_ {j + 1} \ right) -F \ left (x_ {j} \ right) \ right]} ^ {2} + {\ left [g \ left (x_ {j + 1} \ right) -g \ left (x_ {j} \ right) \ right]} ^ {2}}} \,. \ end {aligned}}}

In the case of uniform distribution, there is an absolute concentration on only one single characteristic value${\ displaystyle L_ {L} = {\ sqrt {1 ^ {2} + 1 ^ {2}}} = {\ sqrt {2}} \ ,.}$${\ displaystyle L_ {L} = 1 ^ {2} + 1 ^ {2} = 2 \ ,.}$

The length of the steady / continuous, differentiable Lorenz curve [between the points and ] is calculated from the first derivative of the Lorenz curve function as follows: ${\ displaystyle A (0 \ vert 0)}$${\ displaystyle B (1 \ vert 1)}$${\ displaystyle L (F)}$

${\ displaystyle L_ {L} (a) = \ int \ limits _ {0} ^ {a} {\ sqrt {1+ \ left [L '(F) \ right] ^ {2}}} \ cdot \ mathrm {d} F}$

with . ${\ displaystyle a \ in [0,1]}$

## Applications

### Economics

In economics , the Lorenz curve is used to graphically represent the cumulative distribution function of the empirical probability distribution of wealth; it is a graph showing the degree of distribution assumed for the lower of the values. It is often used to show a distribution of income, with the bottom of the households illustrating how large the proportion of total income they own is in. The share of households is plotted on the abscissa, the share of income on the ordinate. It can also be used to show the distribution of income. In this sense, many economists view the Lorenz curve as a measure of social inequality (social inequality measure). It was developed in 1905 by Max O. Lorenz to illustrate the inequality of income distribution. ${\ displaystyle y \, \%}$${\ displaystyle x \, \%}$${\ displaystyle y \, \%}$

In addition to illustrating income distribution, the Lorenz curve is also used to represent market power or spatial distributions (compare: segregation ).

The Lorenz curve is also used in logistic ABC analysis , in which the Lorenz curve illustrates the distribution of goods, arranged according to classification property (e.g. value) and consumption quantity.

The Lorenz curve can also be used for business models - for example in consumer finance to determine the real non-payment when due (delinquency) of the consumer with the worst predicted risk / credit scores. ${\ displaystyle Y \, \%}$${\ displaystyle X \, \%}$

### ecology

The concept of the Lorenz curve is helpful for describing the inequality between the numbers of individuals in ecology and is used in research studies on biodiversity by comparing the cumulative proportions of animal species with the cumulative proportions of individuals.

## Concentration as well as disparity

The disparity (Lorenz curve) and (absolute) concentration ( concentration curve) are related measures, but describe different things. While the Lorenz curve shows which parts of the sum of features (ordinate) are attributable to which parts of the group of features (abscissa), the concentration curve shows which parts of the sum of features (ordinate) are attributable to which features (abscissa). This means that the Lorenz curve compares proportions with proportions, the concentration curve compares proportions with absolute numbers (abscissa). So high disparity and low concentration or high concentration and low disparity can occur at the same time. The following example illustrates the question:

Suppose companies share a market. In the table, the cases of high and low disparity or concentration are played through with (fictitious) absolute numbers: ${\ displaystyle x}$

High disparity Disparity low
Concentration high ${\ displaystyle \! \ x_ {1} = 90}$
${\ displaystyle \! \ x_ {2} = x_ {3} = 5}$
${\ displaystyle \! \ x_ {1} = 34}$
${\ displaystyle \! \ x_ {2} = x_ {3} = 33}$
Low concentration ${\ displaystyle \! \ x_ {1} = \ ldots = x_ {400} = 0 {,} 9}$
${\ displaystyle \! \ x_ {401} = \ ldots = x_ {1000} = 0 {,} 01}$
${\ displaystyle \! \ x_ {1} = \ ldots = x_ {400} = 0 {,} 9}$
${\ displaystyle \! \ x_ {401} = \ ldots = x_ {1000} = 0 {,} 89}$

## literature

• Joseph Lewis Gastwirth: A General Definition of the Lorenz Curve. In: Econometrica , Vol. 39, No. 6, New York Nov. 1971. pp. 1037-1039.
• Josef Bleymüller, Günther Gehlert, Herbert Gülicher: Statistics for economists. WiSt study course. 10th edition. Verlag Franz Vahlen, Munich 1996. ISBN 978-3-8006-2081-4 (3-8006-2081-2). 244 pp.