Pumping lemma

The pumping lemma or pumping lemma (also called loop theorem ) describes a property of certain classes of formal languages in theoretical computer science . In many cases, the lemma can be used to prove that a formal language is not regular or context-free .

It takes its name from the term the lemma to pump to German inflate . It is derived from the fact that parts of words from languages of certain classes can be multiplied (inflated) so that the words that are created are also in the language.

A distinction is first made between the pumping lemma for regular languages and that for context-free languages. Pumping lemmas for extensions of context-free languages can also be found in the literature. However, more powerful language classes in the Chomsky hierarchy such as the context-sensitive languages and also the increasingly context-sensitive languages do not allow a pumping lemma.

Alternatively, the lemma or its characteristics are also referred to as the uvw theorem , uvwxy theorem , loop lemma , iteration lemma or Bar-Hillel's lemma .

Regular languages

Pumping lemma for regular languages

There is a natural number for every regular language , so the following applies: Every word in with a minimum length has a decomposition with the following three properties: ${\ displaystyle L}$ ${\ displaystyle n}$ ${\ displaystyle z}$ ${\ displaystyle L}$ ${\ displaystyle n}$ ${\ displaystyle z = uvw}$

The two words and together have at most the length . ${\ displaystyle u}$ ${\ displaystyle v}$ ${\ displaystyle n}$
The word is not empty. ${\ displaystyle v}$
For every natural number (with 0) the word is in the language , i.e. H. the words , , , etc are all in the language . ${\ displaystyle i}$ ${\ displaystyle uv ^ {i} w}$ ${\ displaystyle L}$ ${\ displaystyle uw}$ ${\ displaystyle uvw}$ ${\ displaystyle uvvw}$ ${\ displaystyle uvvvw}$ ${\ displaystyle L}$

The smallest that fulfills these properties is called the pumping number of language . ${\ displaystyle n}$ ${\ displaystyle L}$

In addition to the regular languages, there are also non-regular languages that satisfy this lemma. The Myhill-Nerode theorem or Jaffe's Pumping Lemma provide a necessary and sufficient condition for regular languages .

The pumping lemma contains several changes between universal and existential quantification. This can be seen well from the following formal formulation of the lemma. Therein denotes the set of all regular languages. ${\ displaystyle {\ mathcal {L}} _ {3}}$

{\ displaystyle {\ begin {aligned} \ forall L \ in {\ mathcal {L}} _ {3}. \, \ exists n \ in \ mathbb {N}. \, \ forall z \ in L. \, | z | \ geq n \ implies \ exists u, v, w. \ \ & z = u \ circ v \ circ w \ \ land \\ & | uv | \ leq n \ \ land \\ & | v |> 0 \ \ land \\ & \ forall i \ in \ mathbb {N} _ {0}. \, u \ circ v ^ {i} \ circ w \ in L \ end {aligned}}}

proof

The validity of the lemma is based on the fact that for every regular language there is a deterministic finite automaton that accepts the language. Above a finite alphabet, a regular language with an infinite number of words also contains those words that contain more characters than the automaton has states. In order to accept such words, the machine must contain a cycle that can then be run through as often as required. The sequence of letters that is read when running through the cycle can therefore appear any number of times in words of the language.

The idea of the pumping lemma is that a word part can be repeated any number of times through a cycle in the deterministic finite automaton.

The following proof equates the minimum length from the lemma with the number of states of the automaton and shows that because of the existence of a cycle every word with this minimum length has the required decomposition. ${\ displaystyle n}$

Be a regular language. Is finite, then there is a word with maximum length . Suppose that the premise is false for all and the implication is true. ${\ displaystyle L}$ ${\ displaystyle L}$ ${\ displaystyle k}$ ${\ displaystyle n = k + 1}$ ${\ displaystyle z \ in L}$ ${\ displaystyle | z | \ geq n}$

If infinite, then be a deterministic finite automaton that accepts. Since is regular, such an automaton always exists . Let be the number of states of this automaton, and be any word with at least characters. Now use to designate the sequence of states that runs through when reading from beginning with the start state . Since in is must of be accepted d. H. must be an accepting state. Since the automaton currently has statuses, a status repetition must occur after reading characters at the latest . That is, it exist with and . The machine runs through a cycle when reading . ${\ displaystyle L}$ ${\ displaystyle M}$ ${\ displaystyle L}$ ${\ displaystyle L}$ ${\ displaystyle M}$ ${\ displaystyle n}$ ${\ displaystyle z}$ ${\ displaystyle L}$ ${\ displaystyle n}$ ${\ displaystyle q_ {1} \ to \ dots \ to q_ {k}}$ ${\ displaystyle M}$ ${\ displaystyle z}$ ${\ displaystyle q_ {1}}$ ${\ displaystyle z}$ ${\ displaystyle L}$ ${\ displaystyle z}$ ${\ displaystyle M}$ ${\ displaystyle q_ {k}}$ ${\ displaystyle M}$ ${\ displaystyle n}$ ${\ displaystyle n}$ ${\ displaystyle i, j \ in \ {1, \ dots, n + 1 \}}$ ${\ displaystyle i \ neq j}$ ${\ displaystyle q_ {i} = q_ {j}}$ ${\ displaystyle M}$ ${\ displaystyle z}$

Be the part of that is read when it goes through the cycle . Furthermore, let the part of which is read when the sequence of states preceding it is run through and the part of which is read when the sequence of states behind it runs through . With this choice applies . ${\ displaystyle v}$ ${\ displaystyle z}$ ${\ displaystyle q_ {i} \ to \ dots \ to q_ {j}}$ ${\ displaystyle u}$ ${\ displaystyle z}$ ${\ displaystyle q_ {1} \ to \ dots \ to q_ {i}}$ ${\ displaystyle w}$ ${\ displaystyle z}$ ${\ displaystyle q_ {j} \ to \ dots \ to q_ {k}}$ ${\ displaystyle z = uvw}$

With this choice of , and the statements from the pumping lemma apply: ${\ displaystyle u}$ ${\ displaystyle v}$ ${\ displaystyle w}$

The length of is and therefore not greater than . ${\ displaystyle uv}$ ${\ displaystyle j-1}$ ${\ displaystyle n}$
The word is not empty, so that at least one character is read during the cycle. ${\ displaystyle v}$ ${\ displaystyle i \ neq j}$
For anything , the automaton first runs through the sequence of states when reading the word , then runs through the cycle and finally the sequence of states . At the end the machine is in the accepting state . Thus applies to everyone . ${\ displaystyle m \ geq 0}$ ${\ displaystyle uv ^ {m} w}$ ${\ displaystyle q_ {1} \ to \ dots \ to q_ {i}}$ ${\ displaystyle m}$ ${\ displaystyle q_ {i} \ to \ dots \ to q_ {j}}$ ${\ displaystyle q_ {j} \ to \ dots \ to q_ {k}}$ ${\ displaystyle q_ {k}}$ ${\ displaystyle uv ^ {m} w \ in L}$ ${\ displaystyle m \ geq 0}$

example

Is the language regular? ${\ displaystyle L = \ left \ {a ^ {m} b ^ {m} \ mid m \ geq 1 \ right \}}$

Suppose it is a regular language. Then there is a number according to the pumping lemma , so that all words can be decomposed with as described. ${\ displaystyle L}$ ${\ displaystyle n}$ ${\ displaystyle z \ in L}$ ${\ displaystyle \ left | z \ right | \ geq n}$

In particular, there is a decomposition with the properties described for the word . As a prefix of this word is, and according to one characteristic maximum length , has composed and entirely of letters . According to property 3 (for ) the word in must also lie. But since (property 2), this word contains more than , so it is not in . ${\ displaystyle uvw}$ ${\ displaystyle a ^ {n} b ^ {n} \ in L}$ ${\ displaystyle uv}$ ${\ displaystyle n}$ ${\ displaystyle uv}$ ${\ displaystyle v}$ ${\ displaystyle a}$ ${\ displaystyle i = 2}$ ${\ displaystyle uv ^ {2} w = a ^ {n + \ left | v \ right |} b ^ {n}}$ ${\ displaystyle L}$ ${\ displaystyle \ left | v \ right |> 0}$ ${\ displaystyle a}$ ${\ displaystyle b}$ ${\ displaystyle L}$

So the assumption that it is a regular language leads to a contradiction and is therefore wrong. ${\ displaystyle L}$

A non-regular language that satisfies the conditions of the pumping lemma

The language is not regular. However, it fulfills the properties of the pumping lemma, because every word can be broken down in such a way that it can also be used for all . You can simply choose the first letter. This is either a , the number of leading s is arbitrary. Or it is one or , without leading s but the number of leading s or s is arbitrary. ${\ displaystyle L = \ left \ {a ^ {m} b ^ {n} c ^ {n} \ mid m, n \ geq 1 \ right \} \ cup \ {b ^ {m} c ^ {n} | m, n \ geq 0 \}}$ ${\ displaystyle L}$ ${\ displaystyle z \ in L}$ ${\ displaystyle z = uvw}$ ${\ displaystyle i \ geq 0}$ ${\ displaystyle uv ^ {i} w \ in L}$ ${\ displaystyle v}$ ${\ displaystyle a}$ ${\ displaystyle a}$ ${\ displaystyle b}$ ${\ displaystyle c}$ ${\ displaystyle a}$ ${\ displaystyle b}$ ${\ displaystyle c}$

Jaffe's pumping lemma

Jeffrey Jaffe developed a generalized pumping lemma that is equivalent to the definition of regular languages . It is therefore a necessary and sufficient condition to prove the regularity of a language.

The language is regular if and only if a constant exists, making it suitable for all , a division with there, so for all and suffixes applies: ${\ displaystyle L \ subseteq \ Sigma ^ {*}}$ ${\ displaystyle n> 0, n \ in \ mathbb {N}}$ ${\ displaystyle z \ in \ Sigma ^ {*}}$ ${\ displaystyle | z | = n}$ ${\ displaystyle z = uvw}$ ${\ displaystyle u, w \ in \ Sigma ^ {*}, v \ in \ Sigma ^ {+}}$ ${\ displaystyle i \ geq 0}$ ${\ displaystyle x \ in \ Sigma ^ {*}}$

{\ displaystyle zx \ in L \ iff uv ^ {i} wx \ in L}

.

Context Free Languages

Pumping lemma for context-free languages

For every context-free language , there is a natural number , so that: Each word in with a minimum length has a partition with the following three properties: ${\ displaystyle L}$ ${\ displaystyle n}$ ${\ displaystyle z}$ ${\ displaystyle L}$ ${\ displaystyle n}$ ${\ displaystyle z = uvwxy}$

The words , and together have at most the length , i.e. H. . ${\ displaystyle v}$ ${\ displaystyle w}$ ${\ displaystyle x}$ ${\ displaystyle n}$ ${\ displaystyle | vwx | \ leq n}$
One of the words , is not empty. So . ${\ displaystyle v}$ ${\ displaystyle x}$ ${\ displaystyle | vx | \ geq 1}$
For every natural number (with 0) the word is in the language , i.e. H. the words , , etc. are all in . ${\ displaystyle i}$ ${\ displaystyle uv ^ {i} wx ^ {i} y}$ ${\ displaystyle L}$ ${\ displaystyle uwy}$ ${\ displaystyle uvwxy}$ ${\ displaystyle uvvwxxy}$ ${\ displaystyle L}$

In addition to the context-free languages, there are also non-context-free languages that satisfy this pumping lemma. The reverse of the lemma does not apply in general. A generalization of the pumping lemma for context-free languages is Ogden's lemma .

proof

Given a context-free grammar G in Chomsky normal form with variables, for which it is true that it describes the desired language. Be it a word given in this language, the following applies: . ${\ displaystyle N}$ ${\ displaystyle x}$ ${\ displaystyle \ left | x \ right | \ geq 2 ^ {\ left | N \ right |} = n}$

The idea of the pumping lemma for context-free languages is that a word part can be repeated any number of times by deriving the same variable multiple times.

Let us now consider a derivation tree T for with height h. Since our language was given in CNF , T has the form of a binary tree . It follows for the amount of T . So there is a path in T from the root to a leaf that is considered to be long . There are therefore two nodes with on this path that the same variables of G represent. ${\ displaystyle x}$ ${\ displaystyle h \ geq \ log (n) = \ left | N \ right |}$ ${\ displaystyle v_ {0} \ ldots v_ {h}}$ ${\ displaystyle h + 1> \ left | N \ right |}$ ${\ displaystyle v_ {j}, v_ {k}}$ ${\ displaystyle 0 \ leq j <k \ leq h}$ ${\ displaystyle A_ {j}, A_ {k}}$

If you look at the subtree that is spanned from, its leaves form the substring . The subtree that is opened by has the tree as a subtree . So you can divide the leaves of into leaves to the left of and leaves to the right of and thus obtain a division of the leaves of the form . The subtree also divides the entire derivation tree into three parts . So we get the sub- strings that are in the derivation tree to the left or right of the subtree spanned by , the sub- strings that are in the subtree but not in , and finally the sub- string that is in. Since and represent the same variables in our grammar, it follows that the path from to can be repeated any number of times. By repeating the path, we would create words of form without leaving our language. With which we would have proved the pumping lemma for context-free languages. ${\ displaystyle T_ {k}}$ ${\ displaystyle v_ {k}}$ ${\ displaystyle w}$ ${\ displaystyle T_ {j}}$ ${\ displaystyle v_ {j}}$ ${\ displaystyle T_ {k}}$ ${\ displaystyle T_ {j}}$ ${\ displaystyle T_ {k}}$ ${\ displaystyle T_ {k}}$ ${\ displaystyle T_ {j}}$ ${\ displaystyle vwx}$ ${\ displaystyle T_ {j}}$ ${\ displaystyle u, vwx, y}$ ${\ displaystyle u, y}$ ${\ displaystyle v_ {j}}$ ${\ displaystyle v, x}$ ${\ displaystyle T_ {j}}$ ${\ displaystyle T_ {k}}$ ${\ displaystyle w}$ ${\ displaystyle T_ {k}}$ ${\ displaystyle v_ {j}}$ ${\ displaystyle v_ {k}}$ ${\ displaystyle v_ {j}}$ ${\ displaystyle v_ {k}}$ ${\ displaystyle uv ^ {i} wx ^ {i} y}$

example

The word contains a maximum of two different letters.

{\ displaystyle vwx}

Is the language context free? ${\ displaystyle L = \ left \ {a ^ {m} b ^ {m} c ^ {m} \ mid m \ geq 1 \ right \}}$

We assume it is context free. Then be the corresponding constant from the pumping lemma. ${\ displaystyle L}$ ${\ displaystyle n}$

We look at the word . It then has a decomposition give so that , , for all is. There , the word contains at most two different letters. Therefore, and it is true, does not contain the same number of all three letters, is therefore not included in. That is a contradiction; the assumption that it is context-free is therefore wrong. ${\ displaystyle z = a ^ {n} b ^ {n} c ^ {n}}$ ${\ displaystyle z = uvwxy}$ ${\ displaystyle \ left | vx \ right | \ geq 1}$ ${\ displaystyle \ left | vwx \ right | \ leq n}$ ${\ displaystyle uv ^ {i} wx ^ {i} y \ in L}$ ${\ displaystyle i \ geq 0}$ ${\ displaystyle \ left | vwx \ right | \ leq n}$ ${\ displaystyle vwx}$ ${\ displaystyle \ left | vx \ right | \ geq 1}$ ${\ displaystyle uv ^ {2} wx ^ {2} y}$ ${\ displaystyle L}$ ${\ displaystyle L}$

A non-context-free language that satisfies the pumping lemma

The languages and are not context-free. However, and fulfill the properties of the pumping lemma: If a word does not contain the letter , this also applies to all words . If, on the other hand, the letter is contained, there is a decomposition with ( designate the empty word), and a suffix , so that again all words are contained in. For can and choose and thus is in included. ${\ displaystyle L_ {1} = \ {\ $ ^ {+} (a ^ {n} b ^ {n}) ^ {n} | n \ in \ mathbb {N} \}}$ ${\ displaystyle L_ {2} = \ left \ {a ^ {k} b ^ {l} c ^ {m} d ^ {n} \ mid k, l, m, n \ in \ mathbb {N}: k = 0 {\ text {or}} l = m = n \ right \}}$ ${\ displaystyle L_ {1}}$ ${\ displaystyle L_ {2}}$ ${\ displaystyle z \ in L_ {2}}$ ${\ displaystyle a}$ ${\ displaystyle uv ^ {i} wx ^ {i} y}$ ${\ displaystyle a}$ ${\ displaystyle u = v = w = \ varepsilon}$ ${\ displaystyle \ varepsilon}$ ${\ displaystyle x = a}$ ${\ displaystyle y}$ ${\ displaystyle uv ^ {i} wx ^ {i} y}$ ${\ displaystyle L_ {2}}$ ${\ displaystyle L_ {1}}$ ${\ displaystyle v = \ $}$ ${\ displaystyle x = \ epsilon}$ ${\ displaystyle uv ^ {i} wx ^ {i} y}$ ${\ displaystyle L_ {1}}$

Individual evidence

↑ Script (PDF) Humboldt University Berlin
↑ Jeffrey Jaffe: A necessary and sufficient pumping lemma for regular languages . doi: 10.1145 / 990524.990528

[1] Script (PDF) Humboldt University Berlin

[2] Jeffrey Jaffe: A necessary and sufficient pumping lemma for regular languages . doi: 10.1145 / 990524.990528