Word (theoretical computer science)

In theoretical computer science , a word is a finite sequence of symbols in an alphabet . In contrast to the natural language meaning of words , which always have an independent meaning, a word in theoretical computer science has no linguistic meaning. It's just another term for a string .

Words or words are the elements of a formal language . They are therefore important for mathematical modeling , for the theory of programming languages , for computability theory and other areas of theoretical computer science.

definition

It should be a given alphabet and a natural number from the set of natural numbers, including zero ( ). A word of length is a finite sequence with for all . ${\ displaystyle \ Sigma}$ ${\ displaystyle n}$ ${\ displaystyle \ mathbb {N} _ {0}}$ ${\ displaystyle \ mathbb {N} _ {0} = \ {0,1,2, \ ldots \}}$ ${\ displaystyle w}$ ${\ displaystyle n}$ ${\ displaystyle (x_ {1}, x_ {2}, x_ {3}, \ ldots, x_ {n})}$ ${\ displaystyle x_ {i} \ in \ Sigma}$ ${\ displaystyle i \ in \ {1, \ ldots, n \}}$

The length of a word is noted as; the number of times the character appears in the word with . A special word is the empty word , which does not consist of any symbol (has a length of 0) and is usually represented with the Greek letter ( epsilon ) (which is also found occasionally). The set of all words that can be formed from an alphabet is the clover ash and positive cover over this alphabet. This is the disjoint union ${\ displaystyle n}$ ${\ displaystyle w}$ ${\ displaystyle | w |}$ ${\ displaystyle x}$ ${\ displaystyle w}$ ${\ displaystyle | w | _ {x}}$ ${\ displaystyle \ varepsilon}$ ${\ displaystyle \ Lambda}$ ${\ displaystyle \ Sigma}$

{\ displaystyle \ Sigma ^ {*}: = \ bigcup _ {n \ in \ mathbb {N} \ cup \ {0 \}} \ Sigma ^ {n}}

.

The non-empty words are then accordingly the 'positive envelope'

{\ displaystyle \ Sigma ^ {+}: = \ Sigma ^ {*} \ setminus \ {\ varepsilon \} = \ bigcup _ {n \ in \ mathbb {N}} \ Sigma ^ {n}}

.

The simplified notation is often used to indicate a word , but this is only possible if the alphabet used allows the symbols used to be clearly assigned. This short spelling can not be used with the alphabet , for example, because the spelling does not clearly indicate whether the word , or is meant. ${\ displaystyle w = x_ {1} x_ {2} x_ {3} \ ldots x_ {n}}$ ${\ displaystyle \ Sigma = \ {a, aa \}}$ ${\ displaystyle w = aaa}$ ${\ displaystyle (a, aa)}$ ${\ displaystyle (aa, a)}$ ${\ displaystyle (a, a, a)}$

Words of length can be understood as follows: ${\ displaystyle n}$

as finite sequences (sequence) - since tuples can be understood as sequences of finite length

{\ displaystyle n}

as elements of the -fold Cartesian product - since tuples can also be understood that way ${\ displaystyle n}$

Examples

Let it be the alphabet of Latin letters and . Then the words and examples of words are over and a word is over . You can see that and is. ${\ displaystyle \ Sigma _ {1}}$ ${\ displaystyle \ Sigma _ {2} = \ lbrace \ diamondsuit, \ heartsuit, \ spadesuit, \ clubsuit \ rbrace}$ ${\ displaystyle w_ {1} = house}$ ${\ displaystyle w_ {2} = xyzzy}$ ${\ displaystyle \ Sigma _ {1}}$ ${\ displaystyle w_ {3} = \ heartsuit \ clubsuit \ clubsuit \ heartsuit \ spadesuit}$ ${\ displaystyle \ Sigma _ {2}}$ ${\ displaystyle | w_ {1} | = 4}$ ${\ displaystyle | w_ {2} | = | w_ {3} | = 5}$

Operations on words

Concatenation

The concatenation or chain is a combination of two words to form a new word that is created by connecting the two symbol sequences. The concatenation of the two words and above an alphabet is indicated with or and is defined by: ${\ displaystyle x}$ ${\ displaystyle y}$ ${\ displaystyle \ Sigma}$ ${\ displaystyle xy}$ ${\ displaystyle x \ circ y}$

{\ displaystyle xy = x \ circ y: = (x_ {1}, x_ {2}, x_ {3}, \ ldots, x_ {n}, y_ {1}, y_ {2}, y_ {3}, \ ldots, y_ {k})}

According to the definition of the word and with and for all and . According to the definition above, is a prefix and a suffix of the word created by the concatenation . The length of a concatenated word corresponds to the sum of the lengths of the individual (partial) words. So for every word and : ${\ displaystyle x = (x_ {1}, x_ {2}, x_ {3}, \ ldots, x_ {n})}$ ${\ displaystyle y = (y_ {1}, y_ {2}, y_ {3}, \ ldots, y_ {k})}$ ${\ displaystyle n, k \ in \ mathbb {N} _ {0}}$ ${\ displaystyle x_ {i}, y_ {j} \ in \ Sigma}$ ${\ displaystyle i \ in \ {1, \ ldots, n \}}$ ${\ displaystyle j \ in \ {1, \ ldots, k \}}$ ${\ displaystyle x}$ ${\ displaystyle y}$ ${\ displaystyle x \ circ y}$ ${\ displaystyle u}$ ${\ displaystyle v}$

{\ displaystyle | u \ circ v | = | u | + | v |}

,

and for the absolute frequency of a character : ${\ displaystyle x}$

{\ displaystyle | u \ circ v | _ {x} = | u | _ {x} + | v | _ {x}}

.

The neutral element of the concatenation is the empty word, since for any word it applies that: ${\ displaystyle w}$

{\ displaystyle w \ circ \ varepsilon = \ varepsilon \ circ w = w}

In addition, since the concatenation is associative , the triple forms a monoid from the set of all words over any alphabet , the combination of the concatenation and the empty word as a neutral element . Associativity means that parentheses can easily be left out: ${\ displaystyle (\ Sigma ^ {*}, \ circ, \ varepsilon)}$ ${\ displaystyle \ Sigma}$

{\ displaystyle (house \ circ \ heartsuit \ clubsuit \ clubsuit \ heartsuit \ spadesuit) \ circ xyzzy = house \ circ (\ heartsuit \ clubsuit \ clubsuit \ heartsuit \ spadesuit \ circ xyzzy) = house \ heartsuit \ clubsuit \ clubsuit \ heartsuit \ spadesuit xyzzy}

In contrast, the concatenation is not commutative ; H. not for all words and that is true . For example: ${\ displaystyle u}$ ${\ displaystyle v}$ ${\ displaystyle u \ circ v = v \ circ u}$

{\ displaystyle house \ circ \ heartsuit \ clubsuit \ clubsuit \ heartsuit \ spadesuit = house \ heartsuit \ clubsuit \ clubsuit \ heartsuit \ spadesuit \ not = \ heartsuit \ clubsuit \ clubsuit \ heartsuit \ spadesuit house = \ heartsuit \ clubsuit \ clubsuit \ heartsuit \ spadesuit \ circ house}

power

The -th power of a word is defined as the -fold concatenation of this word with itself. The definition of the power is usually given recursively: ${\ displaystyle n}$ ${\ displaystyle w ^ {n}}$ ${\ displaystyle w}$ ${\ displaystyle (n-1)}$

{\ displaystyle w ^ {0}: = \ varepsilon}

{\ displaystyle w ^ {n + 1}: = w ^ {n} \ circ w}

(for )

{\ displaystyle n \ in \ mathbb {N} _ {0}}

For example:

{\ displaystyle (xyzzy) ^ {0} = \ varepsilon}

{\ displaystyle (\ heartsuit \ clubsuit \ clubsuit \ heartsuit \ spadesuit) ^ {1} = (\ heartsuit \ clubsuit \ clubsuit \ heartsuit \ spadesuit) ^ {0} \, \ circ \, \ heartsuit \ clubsuit \ clubsuit \ heartsuit \ spadesuit = \ varepsilon \, \ circ \, \ heartsuit \ clubsuit \ clubsuit \ heartsuit \ spadesuit = \ heartsuit \ clubsuit \ clubsuit \ heartsuit \ spadesuit}

{\ displaystyle (house) ^ {3} = (house) ^ {2} \, \ circ \, house = ((house) ^ {1} \, \ circ \, house) \, \ circ \, house = ((\ varepsilon \, \ circ \, house) \, \ circ \, house) \, \ circ \, house = house house house}

According to the definition of concatenation, the length of the -th power of any word is equal to the product of and the length of : ${\ displaystyle n}$ ${\ displaystyle w}$ ${\ displaystyle n}$ ${\ displaystyle w}$

{\ displaystyle | w ^ {n} | = n \ cdot | w |}

,

and for the absolute frequency of each character : ${\ displaystyle x}$

{\ displaystyle | w ^ {n} | _ {x} = n \ cdot | w | _ {x}}

reflection

The mirroring or the reverse of a word is the result of writing backwards. So if is, then is the finite sequence with and for all . So the length of a word is equal to the length of its reflection: ${\ displaystyle w ^ {R}}$ ${\ displaystyle w}$ ${\ displaystyle w}$ ${\ displaystyle w = (x_ {1}, x_ {2}, x_ {3}, \ ldots, x_ {n})}$ ${\ displaystyle w ^ {R}}$ ${\ displaystyle (y_ {1}, y_ {2}, y_ {3}, \ ldots, y_ {k})}$ ${\ displaystyle k = n}$ ${\ displaystyle y_ {i} = x_ {n + 1-i}}$ ${\ displaystyle i \ in \ {1, \ ldots, k \}}$

{\ displaystyle | w ^ {R} | = | w |}

For example, the following words apply:

{\ displaystyle \ varepsilon ^ {R} = \ varepsilon}

{\ displaystyle (fig) ^ {R} = bba}

{\ displaystyle (\ heartsuit \ clubsuit \ spadesuit \ heartsuit) ^ {R} = \ heartsuit \ spadesuit \ clubsuit \ heartsuit}

The reverse of a word can also be defined with the help of structural induction over the structure of the word in question. To do this, one defines the reverse of the empty word as the empty word in the beginning of induction. In the induction step, the reverse of a word composed of a partial word and a symbol is defined as the concatenation of the symbol with the reverse of the partial word:

Induction start: ${\ displaystyle w = \ varepsilon \ Rightarrow w ^ {R} = \ varepsilon ^ {R}: = \ varepsilon}$

Induction step: ${\ displaystyle (w = v \ circ a) \ land (v \ in \ Sigma ^ {*}, a \ in \ Sigma) \ Rightarrow w ^ {R} = (v \ circ a) ^ {R}: = a \ circ (v ^ {R})}$

This is how the reverse of a word can be derived step by step:

{\ displaystyle \ varepsilon ^ {R} = \ varepsilon}

{\ displaystyle (abb) ^ {R} = b \ circ (ab) ^ {R} = b \ circ b \ circ (a) ^ {R} = b \ circ b \ circ a \ circ (\ varepsilon) ^ {R} = b \ circ b \ circ a \ circ \ varepsilon = bba}

{\ displaystyle (\ heartsuit \ clubsuit \ spadesuit \ heartsuit) ^ {R} = \ heartsuit \ circ (\ heartsuit \ clubsuit \ spadesuit) ^ {R} = \ heartsuit \ circ \ spadesuit \ circ (\ heartsuit \ clubsuit) ^ {R} = \ heartsuit \ circ \ spadesuit \ circ \ clubsuit \ circ (\ heartsuit) ^ {R} = \ heartsuit \ spadesuit \ clubsuit \ heartsuit}

A word like , which is identical to its reflection, is called a palindrome . Mathematically, this mirror-symmetrical words than the fixed points of reflection ^R viewed. ${\ displaystyle abaaba}$

Prefix, infix and suffix

infix

An infix is an addition within a word. Every finite partial sequence of successive symbols of a word is called an infix or partial word of the word . An infix of a given word is therefore every word for which there is (at least) one , for which applies that on the one hand and on the other hand for each . Accordingly, a word is infix of a word if and only if it holds that there is at least one word and one word from Kleen's envelope over the alphabet of , such that : ${\ displaystyle w}$ ${\ displaystyle w}$ ${\ displaystyle w = (x_ {1}, x_ {2}, x_ {3}, \ ldots, x_ {n})}$ ${\ displaystyle {\ hat {w}} = (y_ {1}, y_ {2}, y_ {3}, \ ldots, y_ {k})}$ ${\ displaystyle i \ in \ mathbb {N} _ {0}}$ ${\ displaystyle k + i \ leq n}$ ${\ displaystyle x_ {j + i} = y_ {j}}$ ${\ displaystyle j \ in \ {1, \ ldots, k \}}$ ${\ displaystyle u}$ ${\ displaystyle w}$ ${\ displaystyle p}$ ${\ displaystyle s}$ ${\ displaystyle w}$ ${\ displaystyle p \ circ u \ circ s = w}$

{\ displaystyle u}

is infix of

{\ displaystyle w: \; \ Longleftrightarrow \ exists p, s \ in \ Sigma ^ {\ ast}: \; p \ circ u \ circ s = w}

If the word with an infix of words , and , but not the words , or the empty word . In many computer languages, the English term substring is used for Infix . ${\ displaystyle {\ hat {w}} = aba}$ ${\ displaystyle {\ hat {w}} \ in \ lbrace a, b \ rbrace ^ {*}}$ ${\ displaystyle babaab}$ ${\ displaystyle abaababb}$ ${\ displaystyle aba}$ ${\ displaystyle abba}$ ${\ displaystyle babbaabbab}$ ${\ displaystyle \ varepsilon}$

Specifically, the empty word is an infix of any word, and each word is an infix of itself. An infix of any word that is not identical to this is called a real infix .

prefix

A prefix is an addition to the beginning of a word. A prefix of a word is therefore every infix , for which applies that and for each . Accordingly, the prefix of the word is if and only if there is at least one from Kleen's envelope above the alphabet from which it was generated, such that : ${\ displaystyle w = (x_ {1}, x_ {2}, x_ {3}, \ ldots, x_ {n})}$ ${\ displaystyle {\ hat {w}} = (y_ {1}, y_ {2}, y_ {3}, \ ldots, y_ {k})}$ ${\ displaystyle k \ leq n}$ ${\ displaystyle x_ {j} = y_ {j}}$ ${\ displaystyle j \ in \ {1, \ ldots, k \}}$ ${\ displaystyle u}$ ${\ displaystyle w}$ ${\ displaystyle s}$ ${\ displaystyle w}$ ${\ displaystyle u \ circ s = w}$

{\ displaystyle u}

is prefix of

{\ displaystyle w: \; \ Longleftrightarrow \ exists s \ in \ Sigma ^ {\ ast}: \; u \ circ s = w}

For prefixes, too, every word is a prefix of itself and the empty word is a prefix of any word. A prefix of a word that is not identical to it is called a real prefix .

example

Be , these are the real prefixes for : ${\ displaystyle w = abaabb}$ ${\ displaystyle w}$

${\ displaystyle \ varepsilon}$
${\ displaystyle a}$
${\ displaystyle from}$
${\ displaystyle aba}$
${\ displaystyle abaa}$
${\ displaystyle abaab}$ .

suffix

A suffix , also called a postfix, is an addition to the end of a word. According to the definition of the infix, a suffix of a word is any partial word for which there is one , for which there is on the one hand and for each on the other . Accordingly, a word is a suffix of a word with if and only if there is at least one such that is: ${\ displaystyle w = (x_ {1}, x_ {2}, x_ {3}, \ ldots, x_ {n})}$ ${\ displaystyle {\ hat {w}} = (y_ {1}, y_ {2}, y_ {3}, \ ldots, y_ {k})}$ ${\ displaystyle i \ in \ mathbb {N} _ {0}}$ ${\ displaystyle k + i = n}$ ${\ displaystyle x_ {j + i} = y_ {j}}$ ${\ displaystyle j \ in \ {1, \ ldots, k \}}$ ${\ displaystyle u}$ ${\ displaystyle w}$ ${\ displaystyle w \ in \ Sigma ^ {\ ast}}$ ${\ displaystyle p \ in \ Sigma ^ {\ ast}}$ ${\ displaystyle p \ circ u = w}$

{\ displaystyle u}

is suffix of

{\ displaystyle w: \; \ Longleftrightarrow \ exists p \ in \ Sigma ^ {\ ast}: \; p \ circ u = w}

As with prefixes and infixes, the same applies to suffixes that the empty word is a suffix of any word and any word is always a suffix of itself. A suffix of a word that is not the same as it is called a real suffix .

example

Be , these are the real suffixes for : ${\ displaystyle w = abaabb}$ ${\ displaystyle w}$

${\ displaystyle baabb}$
${\ displaystyle aabb}$
${\ displaystyle fig}$
${\ displaystyle bb}$
${\ displaystyle b}$
${\ displaystyle \ varepsilon}$ .

literature

John E. Hopcroft , Jeffry D. Ullman : Introduction to Automata Theory, Formal Languages, and Complexity Theory . 3rd corrected edition. Addison-Wesley, Bonn et al. 1994, ISBN 3-89319-744-3 ( International Computer Library ).
Katrin Erk, Lutz Priese: Theoretical Computer Science. A comprehensive introduction . 2nd expanded edition. Springer-Verlag, Berlin et al. 2002, ISBN 3-540-42624-8 , pp. 27-28 ( Springer textbook ).
Gottfried Vossen, Kurt-Ulrich Witt: Basic course in theoretical computer science. An application-oriented introduction - for students in all computer science courses . 4th improved and enlarged edition. Vieweg, Wiesbaden 2006, ISBN 3-8348-0153-4 , p. 15 ( online ).

Notes and individual references

↑ Both plural forms are common, cf. z. B. dtv-Atlas zur Mathematik , Vol. I, ISBN 3-423-03007-0 , p. 245 versus Bauer, Goos: Informatik. Vol. I, ISBN 3-540-06332-3 , p. 28.
↑ Klaus Reinhardt: Priority payment machines and the synchronization of half-track languages , Faculty of Computer Science at the University of Stuttgart; PhD thesis 1994 (PDF; 509 KB)
↑ This is . ${\ displaystyle | w | = \ sum \ limits _ {x \ in \ Sigma} | w | _ {x}}$
↑ Yuri L. Ershov, Eugenii A. Palyutin: Mathematical Logic . Translated from the Russian by Vladimir Shokurov. Revised from the 1979 Russian edition. Mir Publishers, Moscow 1984, p. 16 ( mirtitles.org ).
^ Definition of tuple and its synonyms: Encyclopedia of Mathematics: Tuple
↑ The reflection of a word of length n is a special self-inverse permutation
${\ displaystyle \ sigma _ {n} = {\ begin {pmatrix} 1 & 2 & \ cdots & n \\ n & n-1 & \ cdots & 1 \ end {pmatrix}}}$ .
The reflection of arbitrarily long words is then the union ${\ displaystyle {} ^ {R} = \ bigcup \ {\ sigma _ {n} | n \ in \ mathbb {N} \}}$

Web links

Basic concepts of formal language - section "Word"

[1] Both plural forms are common, cf. z. B. dtv-Atlas zur Mathematik , Vol. I, ISBN 3-423-03007-0 , p. 245 versus Bauer, Goos: Informatik. Vol. I, ISBN 3-540-06332-3 , p. 28.

[KR1994-2] Klaus Reinhardt: Priority payment machines and the synchronization of half-track languages , Faculty of Computer Science at the University of Stuttgart; PhD thesis 1994 (PDF; 509 KB)

[3] This is . ${\ displaystyle | w | = \ sum \ limits _ {x \ in \ Sigma} | w | _ {x}}$

[4] Yuri L. Ershov, Eugenii A. Palyutin: Mathematical Logic . Translated from the Russian by Vladimir Shokurov. Revised from the 1979 Russian edition. Mir Publishers, Moscow 1984, p. 16 ( mirtitles.org ).

[5] Definition of tuple and its synonyms: Encyclopedia of Mathematics: Tuple

[6] The reflection of a word of length n is a special self-inverse permutation
${\ displaystyle \ sigma _ {n} = {\ begin {pmatrix} 1 & 2 & \ cdots & n \\ n & n-1 & \ cdots & 1 \ end {pmatrix}}}$ .
The reflection of arbitrarily long words is then the union ${\ displaystyle {} ^ {R} = \ bigcup \ {\ sigma _ {n} | n \ in \ mathbb {N} \}}$