# Multiset

Multi-set is a term that varies the set concept from set theory . The peculiarity of multi-sets compared to the usual concept of set is that the elements of a multi-set can occur several times. Accordingly, the set operations used for multisets also have a modified meaning.

In computer science , multisets (also called multisets or bags ) represent a useful data structure. For example, the database language SQL handles tables as multisets by default.

## definition

A multiset over a set is a mapping of into the set of natural numbers . The number indicates how often the element occurs in the multi-set . The set of all multisets over can be written as. In the following, however, it is used to save vertical space . ${\ displaystyle M}$ ${\ displaystyle A}$${\ displaystyle A}$ ${\ displaystyle \ mathbb {N} _ {0}}$${\ displaystyle M (x), x \ in A}$${\ displaystyle x}$${\ displaystyle M}$${\ displaystyle A}$${\ displaystyle \ mathbb {N} _ {0} ^ {A}}$${\ displaystyle \ operatorname {MS} (A)}$

### Reduced base quantity

The reduced basic amount (engl. "Support") of a multi amount over the amount of the relevant elements of , in formulas: ${\ displaystyle M}$${\ displaystyle A}$${\ displaystyle \ operatorname {supp} (M)}$${\ displaystyle A}$

${\ displaystyle \ operatorname {supp} (M): = \ left \ {x \ in A \ mid M (x)> 0 \ right \}}$.

### Partial multi-quantity

A multiset is called part (multi) set of a multiset if each element of the reduced basic set of in occurs at least as often as in . Formally: ${\ displaystyle M}$${\ displaystyle N}$${\ displaystyle M}$${\ displaystyle N}$${\ displaystyle M}$

${\ displaystyle M \ subseteq N \ quad \ Longleftrightarrow \ quad \ forall x \ in \ operatorname {supp} (M): M (x) \ leq N (x)}$.

Two multi- sets and are equal if their reduced basic sets are equal and the multiplicities match. They are then also partial multiple sets of one another in both directions. ${\ displaystyle M}$${\ displaystyle N}$

### comment

The above definition with the admission of the (actually irrelevant) 0 value is a generalization of the indicator function for the usual quantities. It enables the provision of a “universe” as a basic set, to which all multisets in question are related, and consequently simplifies handling and comparison.

## Intuition

A multiset is clearly a set in which each element can appear any number of times. In this sense, sets are a special case of multi-sets in which each value only occurs once.

## notation

You note multisets like sets explicitly with curly brackets and write an element in as often as it occurs in the multiset. In order to distinguish multi-amounts from normal amounts, a small one (for bag ) is sometimes added as an index. Instead, some authors use modified brackets: . ${\ displaystyle _ {b}}$${\ displaystyle \ {\! \ vert \; \; \; \ vert \! \}}$

## Semi-abstract example

Let it be the multiset over with , and . Then you write or or . ${\ displaystyle M}$${\ displaystyle \ {a, b, c \}}$${\ displaystyle M (a) = 1}$${\ displaystyle M (b) = 3}$${\ displaystyle M (c) = 0}$${\ displaystyle M = \ lbrace a, b, b, b \ rbrace}$${\ displaystyle M = \ lbrace a, b, b, b \ rbrace _ {b}}$${\ displaystyle M = \ {\! \ vert a, b, b, b \ vert \! \}}$

## Illustrative examples

You take a die and roll the dice 20 times in a row. Then it can be that one

3 times a 1
2 times a 2
4 times a 3
5 times a 4
3 times a 5 and
3 times a 6

has thrown. The basic set is then ; the multiplicity of is 4; so . The multiset lists each throw, ignoring the order: ${\ displaystyle \ {1,2,3,4,5,6 \}}$${\ displaystyle 3}$${\ displaystyle M (3) = 4}$

${\ displaystyle M = \ {1,1,1,2,2,3,3,3,4,4,4,4,4,4,5,5,5,6,6,6 \}.}$

Another example is the prime factorization of 120:

${\ displaystyle 120 = 2 ^ {3} 3 ^ {1} 5 ^ {1}}$

It can be interpreted as a multi-set . ${\ displaystyle \ {2,2,2,3,5 \}}$

## Number of possible multi-amounts

The number of -elemental multisets over a -elemental set is called (analogous to the binomial coefficients ) as . This can be expressed as a binomial coefficient: ${\ displaystyle k}$${\ displaystyle n}$${\ displaystyle A}$${\ displaystyle \ left (\! \! {\ tbinom {n} {k}} \! \! \ right)}$

${\ displaystyle \ left (\! \! {\ dbinom {n} {k}} \! \! \ right) = {\ dbinom {k + n-1} {k}} = {\ dbinom {k + n -1} {n-1}} = {\ frac {n (n + 1) (n + 2) \ cdots (n + k-1)} {k!}},}$

as long as and . If so, the combinatorial quantity is meaningfully defined as . In all other cases it is the same . ${\ displaystyle k \ geq 0}$${\ displaystyle n \ geq 1}$${\ displaystyle k = n = 0}$${\ displaystyle 1}$${\ displaystyle 0}$

This indicates the number of possible outcomes when pulling distinguishable balls from an urn if the order is not observed and each ball drawn is put back into the urn after it has been drawn (see combination with repetition ).

### example

If 10 balls are drawn one after the other from an urn with 5 balls, each ball drawn is put back, so there is

${\ displaystyle \ left (\! \! {\ dbinom {5} {10}} \! \! \ right) = {\ dbinom {10 + 5-1} {10}} = 1001}$

possible combinations if the order of the balls drawn is not observed.

### Variant: multisets with at least one occurrence of each element type

Use to denote the number of possible multisets over a -elementary set , so that each element type occurs at least- times. Then it is easy to see that, as soon as the overall mandatory occurrences are removed from the multiset objects, it is a combinatorial enumeration of the first kind. More precisely: ${\ displaystyle \ left (\! \! {\ tbinom {n} {k}} \! \! \ right) ^ {+}}$${\ displaystyle n}$${\ displaystyle A}$${\ displaystyle x \ in A}$${\ displaystyle 1}$${\ displaystyle n}$${\ displaystyle k}$

${\ displaystyle \ left (\! \! {\ dbinom {n} {k}} \! \! \ right) ^ {+} = \ left (\! \! {\ dbinom {n} {kn}} \ ! \! \ right).}$

According to the os information, closer applies

${\ displaystyle \ left (\! \! {\ dbinom {n} {k}} \! \! \ right) ^ {+} = {\ dbinom {k-1} {n-1}}}$

as long . If so, the combinatorial quantity is meaningfully defined as . In all other cases it is the same . ${\ displaystyle k \ geq n> 0}$${\ displaystyle k = n = 0}$${\ displaystyle 1}$${\ displaystyle 0}$

### Variant: multi-quantities with capacity restrictions

Use or to denote the number of possible multi-sets over a -elementary set , so that each element type occurs between and Mal (or between and Mal). This is called a regular combination. Using combinatorial arguments one obtains the closed form: ${\ displaystyle \ left (\! \! {\ tbinom {n} {k}} \! \! \ right) _ {${\ displaystyle \ left (\! \! {\ tbinom {n} {k}} \! \! \ right) _ {\ leq c} ^ {+}}$${\ displaystyle n}$${\ displaystyle A}$${\ displaystyle x \ in A}$${\ displaystyle 0}$${\ displaystyle c-1}$${\ displaystyle 1}$${\ displaystyle c}$

${\ displaystyle \ left (\! \! {\ tbinom {n} {k}} \! \! \ right) _ {

and analogously for the variant. To derive this combinatorial quantity, consider the sets ${\ displaystyle {} ^ {+}}$

${\ displaystyle S = \ {M ~ {\ text {multiset from}} ~ \ {0,1, \ ldots, n-1 \} \ mid \ Sigma {} _ {x} M (x) = k \} ;}$
${\ displaystyle S_ {c} = \ {M \ in S \ mid \ forall {x:} \, M (x)

and immediately recognizes that and . You can see that , so . A bijection construction also proves that for everyone with . Based on this knowledge and the inclusion-exclusion formula for the cardinality of a finite union, the os closed form can be calculated. The variant can be derived analogously . ${\ displaystyle | S_ {c} | = \ left (\! \! {\ tbinom {n} {k}} \! \! \ right) _ {${\ displaystyle | S | = \ left (\! \! {\ tbinom {n} {k}} \! \! \ right)}$${\ displaystyle S_ {c} = \ bigcap {} _ {x \ in \ {0,1, \ ldots, n-1 \}} S \ setminus S_ {c} (x)}$${\ displaystyle | S_ {c} | = | S | - | \ bigcup {} _ {x \ in \ {1,2, \ ldots, n \}} S_ {c} (x) |}$${\ displaystyle | \ bigcap {} _ {x \ in I} S_ {c} (x) | = \ left (\! \! {\ tbinom {n} {k-rc}} \! \! \ right) }$${\ displaystyle I \ subseteq \ {0,1, \ ldots, n-1 \}}$${\ displaystyle | I | = r}$${\ displaystyle {} ^ {+}}$

### Combinatorial Identities: Sums

To create a multi-element multiset of element types, the possible values ​​for the number of the last element type are added up using the options for the first element types. In mathematical terms, this means . The sum display of the restricted combinatorial quantity now enables the calculation of its cumulative sum: ${\ displaystyle K}$${\ displaystyle n + 1}$${\ displaystyle n}$${\ displaystyle \ sum _ {k = 0} ^ {K} \ left (\! \! {\ tbinom {n} {k}} \! \! \ right) = \ left (\! \! {\ tbinom {n + 1} {K}} \! \! \ right)}$

${\ displaystyle \ sum _ {k = 0} ^ {K} \ left (\! \! {\ tbinom {n} {k}} \! \! \ right) _ {

From the sets of multisets in the os section, it can be seen that the total number of multisets over elements and capacity constraints is given by . Accordingly, the complementary sums can be formed as follows ${\ displaystyle n}$${\ displaystyle \ sum _ {k = 0} ^ {\ infty} \ left (\! \! {\ tbinom {n} {k}} \! \! \ right) _ {

${\ displaystyle \ sum _ {k> K} \ left (\! \! {\ tbinom {n} {k}} \! \! \ right) _ {

The os results apply analogously to the variant. These sums are important when calculating probabilities. ${\ displaystyle {} ^ {+}}$

### example

The limited combinatorial size occurs when types of objects available, including - (or - ) copies of each type, and you want to calculate how many combinations you can select them (regardless of the order). Example situations : (1.) Card colors, including cards and number of cards. Then is the number of possible hands. (2.) Number of types of molecules, each with particles, and the number of particles that flow into a vessel. Then is the number of possible mixes. ${\ displaystyle n}$${\ displaystyle 0}$${\ displaystyle c-1}$${\ displaystyle 1}$${\ displaystyle c}$${\ displaystyle k}$ ${\ displaystyle n = 4}$${\ displaystyle c-1 = 13}$${\ displaystyle k =}$${\ displaystyle \ left (\! \! {\ tbinom {n} {k}} \! \! \ right) _ { ${\ displaystyle n =}$${\ displaystyle c-1}$${\ displaystyle k =}$${\ displaystyle \ left (\! \! {\ tbinom {n} {k}} \! \! \ right) _ {

## Operations on multiset

A multiset over multisets over can be combined considering the multiplicities. This accomplishes with ${\ displaystyle A}$${\ displaystyle \ mu \ colon \ operatorname {MS} (\ operatorname {MS} (A)) \ to \ operatorname {MS} (A)}$

${\ displaystyle \ mu (M) (a) = \ sum _ {N \ in supp (M)} M (N) \ cdot N (a).}$

A function can be expanded to a function , whereby ${\ displaystyle f \ colon A \ to B}$${\ displaystyle \ operatorname {MS} (f) \ colon \ operatorname {MS} (A) \ to \ operatorname {MS} (B)}$

${\ displaystyle \ operatorname {MS} (f) (M) (b) = \ sum _ {a \ in f ^ {- 1} (b)} M (a).}$

Together with with ${\ displaystyle \ eta \ colon A \ to \ operatorname {MS} (A)}$

${\ displaystyle \ eta (a) (a ') = {\ begin {cases} 1 & a = a' \\ 0 & {\ text {otherwise}} \ end {cases}}}$

we are dealing with a monad structure .

The functor and can also be traced back to another useful operation. extends a function to a function by ${\ displaystyle \ operatorname {MS}}$${\ displaystyle \ mu}$${\ displaystyle \ operatorname {lift}}$${\ displaystyle f \ colon A \ to \ operatorname {MS} (B)}$${\ displaystyle \ operatorname {lift} (f) \ colon \ operatorname {MS} (A) \ to \ operatorname {MS} (B)}$

${\ displaystyle \ operatorname {lift} (f) (M) (b) = \ sum _ {a \ in supp (M)} M (a) \ cdot f (a) (b).}$

With the help of this operation, and can be set. ${\ displaystyle \ mu = \ operatorname {lift} (\ operatorname {id})}$${\ displaystyle \ operatorname {MS} (f) = \ operatorname {lift} (\ eta \ circ f)}$

### Union, Average and Difference

The (large) union of two multisets over the same basic set can either be directly as ${\ displaystyle A}$

${\ displaystyle (M \ uplus N) (a) = M (a) + N (a),}$

or by means of ${\ displaystyle \ mu}$

${\ displaystyle M \ uplus N = \ mu (\ {M, N \} _ {b})}$

can be specified.

As a small union of two multisets, the smallest multiset becomes

${\ displaystyle (M \ cup N) (a) = \ operatorname {max} (M (a), N (a))}$,

which includes both.

The average of two multi-amounts over the same base amount is application-specific. There are ${\ displaystyle A}$

1. ${\ displaystyle (M \ cap N) (a) = \ operatorname {min} (M (a), N (a))}$, such as
2. ${\ displaystyle (M \ cap N) (a) = M (a) \ cdot N (a).}$

The second definition can be traced back to the above if an additional operation is introduced. Be , then is defined by ${\ displaystyle \ operatorname {lift}}$${\ displaystyle f \ colon A \ times B \ to \ operatorname {MS} (C)}$${\ displaystyle \ operatorname {lift} _ {2} (f) \ colon \ operatorname {MS} (A) \ times \ operatorname {MS} (B) \ to \ operatorname {MS} (C)}$

${\ displaystyle \ operatorname {lift} _ {2} (f) (M, N) = \ operatorname {lift} (a \ mapsto \ operatorname {lift} (b \ mapsto f (a, b)) (N)) (M)}$.

The average in the second sense then results as with ${\ displaystyle M \ cap N = \ operatorname {lift} _ {2} (h) (M, N)}$

${\ displaystyle h (a, a ') = {\ begin {cases} \ {a \} _ {b} & a = a' \\\ emptyset & {\ text {otherwise}}. \ end {cases}}}$

There are also at least two meaningful definitions for the difference between two multisets over the same basic set . ${\ displaystyle A}$

1. ${\ displaystyle (M \ setminus N) (a) = {\ begin {cases} 0 & M (a)
2. ${\ displaystyle (M \ setminus N) (a) = {\ begin {cases} 0 & N (a) \ neq 0 \\ M (a) & {\ text {otherwise}}. \ end {cases}}}$

For both applies and . Which is the "right" one depends on the application. ${\ displaystyle X \ setminus \ emptyset = X}$${\ displaystyle X \ setminus X = \ emptyset}$

Note:
Let be multisets over the prime numbers. With and as multiplied multisets we have: ${\ displaystyle M, N}$${\ displaystyle m: = \ Pi M}$${\ displaystyle n: = \ Pi N}$

• The great union corresponds to the product .${\ displaystyle m \ cdot n}$
• The small union corresponds to the LCM , ie .${\ displaystyle \ operatorname {kgV} (m, n) = \ Pi {M \ cup N}}$
• The first version of the average corresponds to the GCD , ie .${\ displaystyle \ operatorname {ggT} (m, n) = \ Pi {M \ cap N}}$
• The first version corresponds to the difference .${\ displaystyle m / \ operatorname {ggT} (m, n)}$

## Generalizations

If you keep the operations defined in the previous section, you will get related structures by varying the multiplicity set.

• Real multiples in the interval result in probability distributions. The multi-set basic set becomes the set of possible events. The operation converts functions that generate probability distributions of other sets of events on the basis of events that have occurred into functions that can deal with probability distributions as input. Compare also fuzzy sets .${\ displaystyle [0,1]}$${\ displaystyle \ operatorname {lift}}$
• If body elements are allowed for the multiplicities and a scaling is also defined, multisets over A become vectors of a vector space with a base that is indexed by A. embodies the fact that for the definition of a linear mapping it is sufficient to define the images of the basis vectors. Similarly, functions on base index pairs converts to bilinear maps.${\ displaystyle \ operatorname {lift}}$${\ displaystyle \ operatorname {lift} _ {2}}$

## Individual evidence

1. Cristian S. Calude, Gheorghe Păun, Grzegorz Rozenberg, Arto Salomaa, Multiset Processing: Mathematical, Computer Science, and Molecular Computing Points of View Springer Verlag 2001, ISBN 3-540-43063-6 , p. 105
2. John Riordan: Permutation and combinations . In: An Introduction to Combinatorial Analysis ( en ). John Wiley & Sons Inc., 1958, p. 16.