Multidimensional scaling

The multi-dimensional scaling (also multi-dimensional scaling , or similarity structure analysis , abbreviated: MDS ) is a bundle of methods of multivariate statistics . Your formal goal is to spatially arrange the objects in such a way that the gaps ( distances ) between the objects in the room correspond as precisely as possible to the dissimilarities / similarities raised . The further apart the objects are, the more dissimilar they are, and the closer they are to each other, the more similar they are. Information about pairs of objects is therefore collected in order to determine metric information about the objects therefrom.

The solution of multidimensional scaling, the so-called configuration , is usually estimated in two or three dimensions, which makes it easier to interpret. In principle, the configuration can be determined for objects in up to one- dimensional space. In addition to the spatial configuration of objects, multidimensional scaling provides a number of indicators (e.g. Stress1, S-Stress, ALSCAL, coefficient of determination , etc.) that assess the quality of the configuration. ${\ displaystyle n}$ ${\ displaystyle (n-1)}$

The multidimensional scaling goes back to the psychologist Warren S. Torgerson (publications 1952–1968). The most important statistical methods are metric and non-metric multidimensional scaling according to Kruskal.

One application example for multidimensional scaling is property fitting in marketing .

Different procedures of MDS

In the various MDS methods, a general distinction can be made between those for square matrices and those for rectangular matrices. In the case of data referred to as matrix conditional, the maximum values that can be compared with one another are within a matrix and, accordingly, in the case of row conditional data only the values within a row.

Three model constellations can be distinguished:

simple MDS: a matrix and a configuration (a perceptual space inherent in all subjects is assumed, which is not checked by the model.)
repeated MDS: more than one matrix but also only one configuration (same hypothesis as with the simple MDS, but here this is checked by the model)
INDSCAL: more than one matrix and more than one configuration, more precisely, compression and / or stretching factors are assigned to each individual matrix for each dimension and applied to a general configuration. A perceptual space inherent in all subjects is assumed, the dimensions of which, however, are assessed individually as being of different importance, which is checked by the procedure.

The procedures for line conditional data include:

Anchor point method: an object serves as a reference point for all other objects. The matrix is then square, but asymmetrical and therefore line conditional.
Multi-dimensional unfolding (MDU): not an object, but every subject is interpreted as an anchor point.

Metric multidimensional scaling

The aim of metric multidimensional scaling is to arrange objects with distances in high-dimensional space in a smaller -dimensional space in such a way that the Euclidean distances in this space are as exactly as possible the same as the distances . This configuration can be easily interpreted using the Euclidean metric , since distances between objects correspond to their distance as the crow flies. ${\ displaystyle d_ {ij}}$ ${\ displaystyle m}$ ${\ displaystyle d_ {ij}}$ ${\ displaystyle d_ {ij}}$

In addition to Euclidean distance measures, the metrics used in factor analyzes are also common. The Manhattan metric , among other things, is used in discrete models .

If similarity measures between objects are given as starting values instead of distances , then these can be determined by the transformation ${\ displaystyle c_ {ij}}$

{\ displaystyle d_ {ij} = {\ sqrt {c_ {ii} + c_ {jj} -2c_ {ij}}}}

translate into distances.

algorithm

The procedure for multidimensional scaling can be described in 4 steps:

Define matrix with

{\ displaystyle A = (a_ {ij})}

{\ displaystyle a_ {ij} = - {\ frac {1} {2}} d_ {ij} ^ {2}}

Define matrix with where denotes the average of the row , the average of the column and the average of all elements of .

{\ displaystyle B = (b_ {ij})}

{\ displaystyle b_ {ij} = a_ {ij} -a_ {i \ bullet} -a _ {\ bullet j} + a _ {\ bullet \ bullet}}

{\ displaystyle a_ {i \ bullet} = {\ frac {1} {n}} \ sum _ {j = 1} ^ {n} a_ {ij}}

{\ displaystyle i}

{\ displaystyle a _ {\ bullet j} = {\ frac {1} {n}} \ sum _ {i = 1} ^ {n} a_ {ij}}

{\ displaystyle j}

{\ displaystyle a _ {\ bullet \ bullet} = {\ frac {1} {n ^ {2}}} \ sum _ {i = 1} ^ {n} \ sum _ {j = 1} ^ {n} a_ {ij}}

{\ displaystyle A}

Determine the eigenvalues and associated eigenvectors of the matrix with the property: . ${\ displaystyle \ lambda _ {i}}$ ${\ displaystyle \ gamma _ {i} = (\ gamma _ {ij})}$ ${\ displaystyle B = (b_ {ij})}$ ${\ displaystyle \ sum _ {j = 1} ^ {n} \ gamma _ {ij} ^ {2} = \ lambda _ {i}}$

The coordinates of the points in the data to be scaled dimensional space are then obtained from the eigenvectors corresponding to the largest eigenvalues: .

{\ displaystyle m}

{\ displaystyle m}

{\ displaystyle x_ {i} = \ gamma _ {i}}

example

The distances of the fastest car connections between different cities are given and the coordinates of the cities are sought.

	Berlin	Frankfurt	Hamburg	Cologne	Munich
Berlin	0	548	289	576	586
Frankfurt	548	0	493	195	392
Hamburg	289	493	0	427	776
Cologne	576	195	427	0	577
Munich	586	392	776	577	0

The metric multidimensional scaling for a configuration in two dimensions with statistical software results

city	X	Y
Berlin	0.8585	−1.1679
Frankfurt	−0.6363	0.6660
Hamburg	1.5036	0.0800
Cologne	−0.0438	1.1760
Munich	−1.6821	−0.7542

The found configuration is unique, except for rotation and scaling:

Each rotated solution naturally delivers the same (Euclidean) distances between the cities and thus these solutions are equivalent.
Due to the standardization in the algorithm , a uniform multiplication of the distance of all cities from the zero point provides the same coordinates for the cities. ${\ displaystyle \ left (\ textstyle \ sum _ {j = 1} ^ {n} \ gamma _ {ij} ^ {2} = \ lambda _ {i} \ right)}$

Non-metric multidimensional scaling

The non-metric multidimensional scaling aims to expand the metric multidimensional scaling in two aspects:

no specification of an explicit function for converting (in) similarities into distances and
the use of non-Euclidean geometries to find configurations.

If the dissimilarities are related to the distances over , then this function must be weakly monotonic : if it applies , then it must apply . ${\ displaystyle \ delta _ {ij}}$ ${\ displaystyle d_ {ij}}$ ${\ displaystyle d_ {ij} = f (\ delta _ {ij})}$ ${\ displaystyle f}$ ${\ displaystyle \ delta _ {ij} <\ delta _ {kl}}$ ${\ displaystyle d_ {ij} = f (\ delta _ {ij}) <f (\ delta _ {kl}) = d_ {kl}}$

Therefore, the pairs of dissimilarities are ranked

{\ displaystyle \ delta _ {i_ {1} j_ {1}} <\ dots <\ delta _ {i_ {k} j_ {k}}}

so the monotony condition results

{\ displaystyle f (\ delta _ {i_ {1} j_ {1}}) <\ dots <f (\ delta _ {i_ {k} j_ {k}})}

.

Shepard-Kruskal algorithm

The Shepard-Kruskal algorithm determines the configuration iteratively :

Initialization : Choose the desired dimensionality and arrange objects randomly in the target space. (For the results can often be presented clearly.) Calculate the distances between all objects and .

{\ displaystyle (t = 0)}

{\ displaystyle m}

{\ displaystyle m = 2 {,} 3}

{\ displaystyle d_ {ij} ^ {(0)}}

{\ displaystyle i}

{\ displaystyle j}

Step : Estimate disparities of the objects and using their distance . The Pool-Adjacent Violators algorithm (see below) can be used for this. ${\ displaystyle t}$ ${\ displaystyle {\ hat {d}} _ {ij} ^ {(t)}}$ ${\ displaystyle i}$ ${\ displaystyle j}$ ${\ displaystyle d_ {ij} ^ {(t)}}$

Termination condition: As soon as one of the selected termination criteria (see following section) is reached for the iterative process, the iterative process ends with the configuration found, which is optimal (possibly only locally). Otherwise continue with point 4.

Adjustment of the positions to the disparities: Calculate the new coordinate values for all object pairs and (see below), e.g. B. similar to a gradient method . Determine the distances for the new positions and continue with point 2. ${\ displaystyle x_ {i}}$ ${\ displaystyle x_ {i} ^ {(t + 1)}}$ ${\ displaystyle i}$ ${\ displaystyle j \ neq i}$ ${\ displaystyle d_ {ij} ^ {(t + 1)}}$ ${\ displaystyle x_ {i} ^ {(t + 1)}}$

Pool-Adjacent Violators Algorithm

If the monotony condition is not violated between two adjacent points, we use the respective distance as disparity, ie . ${\ displaystyle {\ hat {d}} _ {ij} ^ {(t)} = d_ {ij} ^ {(t)}}$
If the monotonicity condition between two ( ) or more ( ) adjacent points is injured, so we use the average of the corresponding distances than disparities so . ${\ displaystyle p = 2}$ ${\ displaystyle p> 2}$ ${\ displaystyle {\ hat {d}} _ {i_ {l} j_ {l}} ^ {(t)} = 1 / p \ sum _ {q = 1} ^ {p} d_ {i_ {l + q } j_ {l + q}} ^ {(t)}}$

Which transformations are permissible when calculating the disparities depends on the scale level of the raw data. The distances in the perceptual space can, however, assume a different scale level. The extent to which an increase in the scale level is permissible is assessed using the compression quotient Q (number of similarities / (number of dimensions * number of objects)). With the "simple" MDS, the raw data are already available in aggregated form, so they mostly represent the mean values of the respondents' answers.

Calculation of the new positions

The new position is calculated as ${\ displaystyle x_ {i} ^ {(t + 1)}}$

{\ displaystyle x_ {i} ^ {(t + 1)} = x_ {i} ^ {(t)} + \ alpha \ sum _ {j \ neq i} \ left (1 - {\ frac {{\ hat {d}} _ {ij} ^ {(t)}} {d_ {ij} ^ {(t)}}} \ right) (x_ {i} ^ {(t)} - ​​x_ {j} ^ {( t)})}

.

It is the position of the object at the time and a weighting factor (choose not too large, because the stress value can also deteriorate - usually 0.2). ${\ displaystyle x_ {i} ^ {(t)}}$ ${\ displaystyle i}$ ${\ displaystyle t}$ ${\ displaystyle \ alpha}$

If two objects are too far apart in relation to their similarity ( is greater than 1, which makes the expression in the brackets negative), they are pushed towards each other (the direction is determined by the difference in the second bracket). Two rather dissimilar objects that are too close together are moved away from each other. This usually lowers the stress value and the iteration is continued with step 2, which usually lowers the stress value again. ${\ displaystyle {\ hat {d}} _ {ij} ^ {(t)} / d_ {ij} ^ {(t)}}$

example

Based on the example above , we can rank the distances and set up the monotony condition:

Distance:	${\ displaystyle 195}$	<	${\ displaystyle 289}$	<	${\ displaystyle 392}$	<	${\ displaystyle 427}$	<	${\ displaystyle 493}$	<	${\ displaystyle 548}$	<	${\ displaystyle 576}$	<	${\ displaystyle 577}$	<	${\ displaystyle 586}$	<	${\ displaystyle 776}$
Monotony condition:	${\ displaystyle d_ {F, K}}$	<	${\ displaystyle d_ {B, HH}}$	<	${\ displaystyle d_ {F, M}}$	<	${\ displaystyle d_ {HH, K}}$	<	${\ displaystyle d_ {F, HH}}$	<	${\ displaystyle d_ {B, F}}$	<	${\ displaystyle d_ {B, K}}$	<	${\ displaystyle d_ {K, M}}$	<	${\ displaystyle d_ {B, M}}$	<	${\ displaystyle d_ {HH, M}}$

At the beginning a random configuration was chosen:

position			Distance to
place	X	Y	Berlin	Frankfurt	Hamburg	Cologne	Munich
Berlin	0.9961	−1.5759	0
Frankfurt	−1.1453	0.7840	3.1866	0
Hamburg	−0.7835	0.9408	3.0824	0.3942	0
Cologne	−0.1025	−0.0208	1.9041	1.3172	1.1783	0
Munich	1.0352	−0.1281	1.4483	2.3635	2.1096	1.1428	0

this results in:

Monotonicity:	${\ displaystyle d_ {F, K}}$	${\ displaystyle \ leq}$	${\ displaystyle d_ {B, HH}}$	${\ displaystyle \ leq}$	${\ displaystyle d_ {F, M}}$	${\ displaystyle \ leq}$	${\ displaystyle d_ {HH, K}}$	${\ displaystyle \ leq}$	${\ displaystyle d_ {F, HH}}$	${\ displaystyle \ leq}$	${\ displaystyle d_ {B, F}}$	${\ displaystyle \ leq}$	${\ displaystyle d_ {B, K}}$	${\ displaystyle \ leq}$	${\ displaystyle d_ {K, M}}$	${\ displaystyle \ leq}$	${\ displaystyle d_ {B, M}}$	${\ displaystyle \ leq}$	${\ displaystyle d_ {HH, M}}$
${\ displaystyle d_ {ij} ^ {(0)}}$	${\ displaystyle 1.3172}$	${\ displaystyle \ leq}$	${\ displaystyle 3 {,} 0824}$	${\ displaystyle \ not \ leq}$	${\ displaystyle 2 {,} 3635}$	${\ displaystyle \ not \ leq}$	${\ displaystyle 1 {,} 1783}$	${\ displaystyle \ not \ leq}$	${\ displaystyle 0 {,} 3942}$	${\ displaystyle \ leq}$	${\ displaystyle 3 {,} 1866}$	${\ displaystyle \ not \ leq}$	${\ displaystyle 1 {,} 9041}$	${\ displaystyle \ not \ leq}$	${\ displaystyle 1 {,} 1428}$	${\ displaystyle \ not \ leq}$	${\ displaystyle 1 {,} 4483}$	${\ displaystyle \ not \ leq}$	${\ displaystyle 2 {,} 1096}$
PAV			${\ displaystyle (3 {,} 0824 + 2 {,} 3635 + 1 {,} 1783 + 0 {,} 3942) / 4}$								${\ displaystyle (3 {,} 1866 + 1 {,} 9041 + 1 {,} 1428 + 1 {,} 4483 + 2 {,} 1096) / 5}$
			${\ displaystyle = 1 {,} 7546}$								${\ displaystyle = 1 {,} 9447}$
${\ displaystyle {\ hat {d}} _ {ij} ^ {(0)}}$	${\ displaystyle 1 {,} 3172}$	${\ displaystyle \ leq}$	${\ displaystyle 1 {,} 7546}$	${\ displaystyle \ leq}$	${\ displaystyle 1 {,} 7546}$	${\ displaystyle \ leq}$	${\ displaystyle 1 {,} 7546}$	${\ displaystyle \ leq}$	${\ displaystyle 1 {,} 7546}$	${\ displaystyle \ leq}$	${\ displaystyle 1 {,} 9447}$	${\ displaystyle \ leq}$	${\ displaystyle 1 {,} 9447}$	${\ displaystyle \ leq}$	${\ displaystyle 1 {,} 9447}$	${\ displaystyle \ leq}$	${\ displaystyle 1 {,} 9447}$	${\ displaystyle \ leq}$	${\ displaystyle 1 {,} 9447}$

Solution of non-metric multidimensional scaling

The calculated Euclidean distances show that the monotony condition is violated in two areas:

${\ displaystyle d_ {B, HH} \ leq d_ {F, M} \ leq d_ {HH, K} \ leq d_ {F, HH}}$ and
${\ displaystyle d_ {B, F} \ leq d_ {B, K} \ leq d_ {K, M} \ leq d_ {B, M} \ leq d_ {HH, M}}$ .

The disparities are therefore calculated as the mean values (1.7546 and 1.9447) of the corresponding areas. With the disparities, the point positions can now be shifted. This procedure is iterated and leads to the solution shown here. ${\ displaystyle {\ hat {d}} _ {ij} ^ {(0)}}$

Termination or quality criteria

The aim of the process is an optimal adaptation of the MDS solution to the raw data and thus the lowest possible STRESS or energy value or the highest possible degree of certainty. These values are to be understood as the difference between disparity and distance. If the values no longer change or only change slightly, the iteration process is terminated.

STRESS dimensions

The STRESS is calculated (according to Kruskal ) as the root of the sum of the squared deviations of the disparities from the distances, divided by the sum of the squared distances. STRESS is therefore a standardized measure of variance:

Goodness of fit	STRESS 1	STRESS 2
low	0.2	0.4
sufficient	0.1	0.2
Well	0.05	0.1
excellent	0.025	0.05
Perfect	0	0

{\ displaystyle STRESS_ {1} = \ left ({\ frac {\ sum _ {i <j} (d_ {ij} - {\ hat {d}} _ {ij}) ^ {2}} {\ sum _ {i <j} d_ {ij} ^ {2}}} \ right) ^ {\ frac {1} {2}}}

An alternative STRESS measure is

{\ displaystyle STRESS_ {2} = \ left ({\ frac {\ sum _ {i <j} (d_ {ij} - {\ hat {d}} _ {ij}) ^ {2}} {\ sum _ {i <j} (d_ {ij} - {\ overline {d}}) ^ {2}}} \ right) ^ {\ frac {1} {2}}}

with the mean of all distances. ${\ displaystyle {\ overline {d}}}$

In principle, there are no exact specifications for which STRESS value is still acceptable and which can be described as “good”. “In order to have a norm at all, one examined the 'zero of all null hypotheses' and scaled thousands of random data via MDS and registered which stress values resulted” (cf. BORG / STAUFENBIEL 1989). Kruskal has created reference values for the STRESS value that you can use as a guide.

Coefficient of determination

In addition to the simple cost criteria STRESS, an alternative measure is used as a quality criterion for adapting the configuration to the raw data. The coefficient of determination is the squared correlation of the distances with the disparities and can be seen as the level of the linear adjustment of the disparities to the distances. In practice, values that are greater than 0.9 are considered acceptable for the coefficient of determination.

energy

The weighting of the summands in the formula leads to energy measures ${\ displaystyle STRESS_ {1}}$

{\ displaystyle E = \ left ({\ frac {\ sum _ {i <j} w_ {ij} (d_ {ij} - {\ hat {d}} _ {ij}) ^ {2}} {\ sum _ {i <j} w_ {ij} d_ {ij} ^ {2}}} \ right) ^ {\ frac {1} {2}}}

software

The MDS can be carried out automatically in statistical programs such as SPSS . In R , the cmdscale function performs an MDS. It is the same with Matlab , which MDS provides with the mdscale function.

literature

Thomas A. Runkler: Data mining methods and algorithms for intelligent data analysis . Vieweg + Teubner, 2010, pp. 41–47.
WS Torgerson: Theory & Methods of Scaling . Wiley, New York 1958.
I. Borg, Th. Staufenbiel: Theories and methods of scaling . Huber, Bern 2007.
Backhaus, Erichson, Plinke, Weiber: Multivariate Analysis Methods . Springer Verlag, Berlin 2000
R. Mathar: Multidimensional Scaling . Teubner, Stuttgart 1997
I. Borg, P. Groenen: Modern Multidimensional Scaling: Theory and Applications . Springer, New York 2005.

Individual evidence

↑ ^a ^b J. B. Kruskal. Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. In: Psychometrika , 29 (1), 1964, pp. 1–27, doi : 10.1007 / BF02289565
↑ Kappelhoff: Multidimensional Scaling - Example for Data Analysis. (PDF) Chair for Empirical Economic and Social Research, 2001
↑ Wojciech Basalaj: Proximity Visualization of Abstract Data . (PDF; 7.7 MB) 2001; Retrieved June 19, 2013

[kruskal-1] J. B. Kruskal. Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. In: Psychometrika , 29 (1), 1964, pp. 1–27, doi : 10.1007 / BF02289565

[2] Kappelhoff: Multidimensional Scaling - Example for Data Analysis. (PDF) Chair for Empirical Economic and Social Research, 2001

[3] Wojciech Basalaj: Proximity Visualization of Abstract Data . (PDF; 7.7 MB) 2001; Retrieved June 19, 2013