# Theory of optimality

The optimality theory ( English optimality theory , hereinafter OT ) is a model of theoretical linguistics . The aim of the theory is to describe which linguistic expressions are grammatical in a single language and which are not.

The theory assumes that there are many different ways of realizing every linguistic expression. For this purpose, all of these realizations enter into competition and, based on the grammar of a language, all possibilities are gradually excluded that do not match this grammar. The realization that remains at the end fulfills the grammar best in comparison to all other possibilities, so this realization is optimal with regard to the grammar.

## The model

Schematic representation of the OT. Legend: GEN = Generator, CAND = Candidates, EVAL = Evaluation, C = Constraints

In grammatical theory , it is believed that all languages of the world subject to the same principles. What the theory is supposed to explain in concrete terms is how the differences between these languages ​​come about and how the theory has to be parameterized so that it derives exactly the structures that are grammatical in a language. The concept of grammaticality refers to the forms that actually occur in a spoken language; an ungrammatical expression in the broader sense would be one that either does not appear in the language or that would not be understood by the speaker.

The grammar of a language is defined in the OT as an ordered set of so-called restrictions (English constraints ). These are rules that define exactly which properties an expression should not have. If a realization has one of these “forbidden” properties, it is said to violate the corresponding restriction .

The restrictions are universal, that is, they apply to all languages. A single language - more precisely its grammar - differs from another in that these restrictions are weighted differently. The order from the most important to the least important constraint is called ranking . In the OT the principles on which all languages ​​are based are the restrictions, the parameter assignment would be the ranking, which is specific in each individual language.

An expression is called an input in the OT , the set of possible realizations of this expression is called an output or candidate set . For each input there is a series of candidates , from which the one who best fulfills the input in terms of grammar - i.e. optimally - has to be selected.

The selection of the optimal candidate is called evaluation or competition . This process essentially works as follows: At the beginning there is the input, depending on the interpretation of the theory, this can be a deep structure , a word , the logical form of a sentence or something similar. The set of candidates is now generated for this input , i.e. a number of possibilities for how the input could be implemented, for example surface structures, the phonetic form of a word, the concrete sentence structure or other. Each of these candidates is distinguished by the fact that they violate certain restrictions. First, all candidates who violate the highest restriction are kicked out of the competition. Of the remaining candidates, those who violate the next lower constraint are now thrown out, and so on. This continues until there is only one candidate left, which is then the optimal candidate and represents a grammatical expression in a language.

Exactly where the input comes from depends to a large extent on the problem under consideration. In the case of phonology , which is largely about language production , the input comes from the mental lexicon , for example , and the phonetic realization of the lexeme is ultimately optimized . In other approaches, the input can also be the optimal candidate for a previous evaluation; one speaks here of so-called “local optimization” (see also the section on additional comments ). In the syntax there is usually no input at all, since one tries to describe the structure of a language independently of its use. The decision as to whether a structure is well-formed in a language arises here solely from the ranking of the constraints.

### Tableaus

So-called tableaus , tables that are intended to graphically illustrate the evaluation process, are an important aid in analyzes of optimality theory .

The concrete input of the evaluation is in the upper left field of the tableau. In addition, the restrictions are listed from left to right according to their ranking. A notation often used in the literature for the ranking (the order) of the restrictions is:

C  »C  »… »C ,${\ displaystyle _ {1}}$${\ displaystyle _ {2}}$${\ displaystyle _ {n}}$

wherein C  "C means that C is ranked higher than C . In the panels so C would always C left standing. ${\ displaystyle _ {i}}$${\ displaystyle _ {j}}$${\ displaystyle _ {i}}$${\ displaystyle _ {j}}$${\ displaystyle _ {i}}$${\ displaystyle _ {j}}$

In the first column of the tableau are the individual candidates that were generated from the input in GEN. If a candidate violates a restriction, each violation is individually marked with an asterisk (*) in the corresponding field . If a candidate is suboptimal , that is, if he violates a restriction that another candidate who is still in the competition does not violate or does not violate so often, his “elimination” is marked with an exclamation mark (!) After the *. The decisive injury is called "fatal". As can be seen in the following example, it can also happen that all candidates violate the same constraint (this is the case in constraint C ). Since there is no optimal candidate in this case, the next lower injuries decide. The optimal candidate is marked with the so-called "pointing finger", a pointing hand (☞). The gray coloring is an additional visual aid to highlight the suboptimal candidates. ${\ displaystyle _ {2}}$

T${\ displaystyle _ {1}}$ T${\ displaystyle _ {2}}$
INPUT C.${\ displaystyle _ {1}}$ C.${\ displaystyle _ {2}}$ C.${\ displaystyle _ {3}}$ C.${\ displaystyle _ {4}}$
CAND${\ displaystyle _ {1}}$   * *
CAND${\ displaystyle _ {2}}$   **! *
CAND${\ displaystyle _ {3}}$   * * *!
CAND${\ displaystyle _ {4}}$ *! *   ***
INPUT C.${\ displaystyle _ {3}}$ C.${\ displaystyle _ {2}}$ C.${\ displaystyle _ {1}}$ C.${\ displaystyle _ {4}}$
CAND${\ displaystyle _ {1}}$ *! *
CAND${\ displaystyle _ {2}}$ *! **
CAND${\ displaystyle _ {3}}$ *! *   *
CAND${\ displaystyle _ {4}}$   * * ***

What the two tableaux T and T is different, is alone the ranking of constraints C and C . It can be seen that by rearranging these constraints, the candidate CAND becomes optimal even though it violates more constraints overall than the other candidates. ${\ displaystyle _ {1}}$${\ displaystyle _ {2}}$${\ displaystyle _ {1}}$${\ displaystyle _ {3}}$${\ displaystyle _ {4}}$

### Types of restrictions

A restriction in the sense of the OT is a condition that a candidate either fulfills or not. If a candidate does not meet a condition, the corresponding restriction is considered violated. It cannot be ruled out that a restriction is violated several times by a candidate, see also the example from the syntax . There are generally two types of restrictions: loyalty and privacy restrictions.

Loyalty restrictions (T) relate directly to the interaction between the input and the candidate. In general, it can be said that loyalty restrictions are always violated when the characteristics of a candidate differ from those of the input.

Markedness restrictions (M), on the other hand, indicate special features that a candidate must have in order to be optimal in a language. For each of these M there are loyalty restrictions that nullify its effect. This explains why there is a particularity in one language (M »T), while in other languages ​​it is ungrammatic (T» M).

Another type of constraint is used in prosody or in the analysis of tonal languages . Here, so-called alignment constraints (literally: " alignment restrictions") determine in which directions, for example, tones should be associated with their corresponding segments .

## Examples

### A non-linguistic example

The three men Hans, Karl and Peter each want to buy a car. Everyone has precise ideas: Hans' car should be particularly economical and have a light color, his budget is € 12,000. Karl, on the other hand, wants a fast car, although he doesn't care about the color and he has around € 20,000 available. Peter really wants to buy a blue vehicle. For him, the main thing is that it drives, since his rich uncle gives him the car and the maintenance for it, money does not matter to him.

However, the car dealer only has a very limited range on offer:

1. A small car with 45 hp in dark blue for € 8,000,
2. A red 120 hp sports car for € 25,000 as well
3. A white station wagon with 90 hp for € 12,000.

The car dealer explains that the (hypothetical) rule of thumb applies: "The more horsepower a car has, the faster it is and the more expensive it is to maintain", so the small car should be viewed as "economical" and the sports car as "fast" so expensive car. The station wagon is also conventionally seen as a “fast” car and therefore “not economical”. In addition, it would not be a problem to reorder a model should two or more customers opt for the same vehicle.

The decision of who buys which car is like a process based on optimality theory: each of the three men has precise ideas ( input ) and three models to choose from ( candidates ). From the given situation, the following restrictions can be postulated for all three customers:

• The color should match the customer's idea (in short: color )
• The vehicle should not be more expensive than the customer has money ( price )
• The vehicle corresponds to the customer's idea of ​​economy and speed ( PS )

Depending on the customer, these restrictions are weighted differently: For Hans, PS is the most important, followed by a light color . The question of money stands with him last. He will opt for the first car, even if it does not match his color expectations, as the other two models are not economical enough. Karl’s priorities are similar; for him, too, the most important thing is PS when it comes to speed. Since his budget is limited, this limitation comes second and color last. He will opt for the station wagon because it is also called "fast" and the sports car is too expensive. Peter's demands on his car are weighted as follows: The focus is on the color , he doesn't care about the rest. He will buy the first car because it is completely what he wanted.

Each of the three buyers has now bought the car that they consider the most suitable, i.e. the one that appears to be optimal under the given circumstances (budget, offer and ideas) .

### Examples from linguistics

Two examples from the linguistic sub-areas of phonology and syntax are listed below.

#### Phonology

In the phonology of German there is a phenomenon which is called final hardening . This is how the word Lied is pronounced in German [ liːt ]. In the OT, on the other hand, it is assumed that the pronunciation [li: d] is also a possible pronunciation of German, especially since it is identical to the underlying form / liːd /. This underlying form becomes clear in inflected forms of the word, for example in the plural [ ˈliː.dɐ ], in which the plosive / d / is no longer at the end of a syllable and is therefore not subject to the final hardening, i.e. is pronounced voiced.

More important than the identity between the underlying form and pronunciation, however, is a limitation of the pronunciation options for final consonants: Voiced obstruction should be avoided here. Since the identity or loyalty restriction is less important in German than the restriction of the pronunciation options (restriction on being marked), the pronunciation [liːt] is preferred by speakers of German. In English , the fidelity restriction is more important than the aforementioned restriction on branding. The verb lead ( lead ) has the same underlying shape as the German word song . Since there is no final hardening in this language, it is pronounced there as [liːd] with a voiced [d].

Based on these assumptions, the following restrictions can be postulated:

• * [+ sth] $(markability restriction) • ID [± sth] (identity or loyalty restriction) The first constraint symbolizes the final hardening. It means that a candidate violates the restriction (indicated by the asterisk at the beginning of the restriction) if a voiced sound appears at the end of a syllable (indicated by the symbol "$" on the right). This sound then has the property of being [+ sth]. The second constraint says that all sounds agree with regard to their voicing in input and output, i.e. they should be ID entic.

The following two tables show the pronunciation of the words Lied in German (ranking of the restrictions: * [+ sth] $»ID [± sth]) and lead in English (ranking: ID [± sth]» * [+ sth]$) across from.

T : German${\ displaystyle _ {3}}$ T : English ${\ displaystyle _ {4}}$
Input: / liːd / * [+ sth] $ID [± sth] [liːt] * [lid] *! Input: / liːd / ID [± sth] * [+ sth]$
[liːt] *!
[lid]   *

(Note: The final hardening only affects plosives and fricatives in German , this fact was ignored for the sake of simplicity when postulating the constraints.)

#### syntax

An example from the syntax is the explanation of different Wh movement patterns in multiple question sentences in the languages ​​of the world. It is about the position of Wh-phrases (e.g. interrogative pronouns such as who , why , whose in German or why and what in English; or more complex phrases preceded by such an interrogative pronoun, such as whose mother or which of the many children you mean ). In German, for example, there is always only one Wh phrase at the beginning of a (partial) sentence:

 (1) a. * When did Fritz read 1 [which book] ? ${\ displaystyle _ {2}}$ b. When 1 did Fritz read t 1 [which book] 2 ? c. * When 1 [which book] 2 did Fritz read t 1 t 2 ?

In Korean, on the other hand, all Wh phrases remain in situ , that is, in the position where the respective answer to the question words would be in a statement:

 (2) a. Nŏnŭn muŏsŭl 1 wae 2 sassni? you What Why to buy b. * Muŏsŭl 1 nŏnŭn t 1 wae 2 sassni? What you Why to buy c. * Muŏsŭl 1 wae 2 nŏnŭn t 1 t 2 sassni? What Why you to buy

The Bulgarian , however, is to be a language in which all moves Wh-elements at the beginning of the sentence:

 (3) a. * Bunk 1 vižda kogo 2  ? who sees whom b. Bunk 1 kogo 2 t 1 vižda t 2 ? who whom sees

(Notes: The asterisk (*) stands for ungrammaticality; t indicates a track, i.e. the position from which the co-indexed element was moved. The index shows which element belongs to which track. The structural representation of the expressions is here very simplified.)

The following three constraints are sufficient for the analysis:

• W-Krit: A W-phrase must be at the beginning of the sentence.
• Pur-EP: This is a constraint which punishes the appearance of more than one element between the beginning of the sentence and the left bracket . (The exact definition is: no multiple specifiers are allowed in the CP.)
• Ökon: Prohibits movement (more precisely: traces - t ) in general.

The constraints are ranked as follows:

• German: Pur-EP »W-Krit» Ökon
• Korean: Pur-EP »Ökon» W-Krit
• Bulgarian: W-Krit »Pur-EP» Ökon

Since all restrictions are markedness restrictions, no input is necessary. How the candidates are generated can be disregarded.

The selection of the optimal candidates is calculated:

T : Multiple questions in German ${\ displaystyle _ {5}}$
Candidates Pure EP W-crit Econom
When did Fritz read 1 [which book] 2 ?   **!
When 1 did Fritz read t 1 [which book] 2 ?   * *
When 1 [which book] 2 did Fritz read t 1 t 2 ? *! **
T : Multiple questions in Korean ${\ displaystyle _ {6}}$ T : Multiple questions in Bulgarian ${\ displaystyle _ {7}}$
Candidates Pure EP Econom W-crit
Nŏnŭn muŏsŭl 1 wae 2 sassni?     **
Muŏsŭl 1 nŏnŭn t 1 wae 2 sassni?   *! *
Muŏsŭl 1 wae 2 nŏnŭn t 1 t 2 sassni? *! **
Candidates W-crit Pure EP Econom
Koj 1 vižda kogo 2 *!   *
Koj 1 kogo 2 t 1 vižda t 2   * **

## Developing the theory

The theory of optimality was developed by Alan Prince and Paul Smolensky in the early 1990s . First, they explained language-specific differences in the structuring of syllables. The OT was then applied to other phonological problem questions.

Work was published soon after that aimed to show algorithms that can be used to learn OT. The work "Optima" by Vieri, Samek-Lodovici and Alan Prince shows in a very formal way the laws of theory and deduces which properties candidates must have so that they can be optimal at all.

Since around 1995, OT has been increasingly used in areas outside of phonology, for example in syntax.

In the meantime, there are also approaches that get by entirely without input and assume that the number of candidates is kept to a minimum by other processes, for example by assuming that candidates can only compete with certain minimally different other candidates. (Compare also the example from the syntax )

### Comparison with generative grammar theories

In contrast to rule-based grammar theories, OT makes fundamentally different assumptions about the nature of constraints and candidates:

OT rule-based theories
Limitations are universal . Some restrictions can be language-specific, others universal.
Restrictions include U. vulnerable. All restrictions are inviolable.
Restrictions are in order. All restrictions apply equally.
Different candidates are in competition, both external (i.e. the other candidates) and internal (i.e. the structural and characteristic properties of the candidate himself) factors influence. The grammaticality of a candidate is solely due to internal properties.

The minimalist program (MP) shows certain similarities with the approaches of the optimality theory . So it says in Chomsky (1995), among other things, that two syntactic derivations ( derivations is preferable), can derive the same sentence, that which is most economical. Two comparable derivatives are therefore competing with each other. In addition, restrictions can also be violated in the MP under certain conditions, provided that other restrictions are thereby fulfilled.

## Formal description

In linguistics, constraints are generally viewed as the relation of a set of structures to a subset of the same: ${\ displaystyle C}$${\ displaystyle U}$ ${\ displaystyle U '}$

${\ displaystyle C \ colon U \ rightarrow U ^ {\ prime} \ qquad {\ rm {with}} \ quad U ^ {\ prime} \ subseteq U}$

In addition, there is a so-called layered hierarchy for each constraint , which sorts the elements according to how often an element violates the constraint. The adjunct “stratified” means that several elements can be ordered equally high within an order, formally that means ${\ displaystyle C}$ ${\ displaystyle C ^ {\ wedge}}$${\ displaystyle U}$

${\ displaystyle \ forall a, b, x \ in \ langle S; O \ rangle: \ neg (a> b \ vee b> a) \ rightarrow ((a> x) \ leftrightarrow (b> x)) \ qquad (a \ not = b)}$,

where is an order relation over a set . This means that two elements that are ranked equally high in the hierarchy behave in the same way with respect to all other elements in the order. ${\ displaystyle \ langle S; O \ rangle}$${\ displaystyle O}$${\ displaystyle S}$

The maximum layer of is the set of elements from which are most highly ordered within the order relation . In relation to the OT, this means that these elements violate the constraint less often than any other, although this does not imply that these elements do not violate the contraint at all. The consequence of this is that the restrictions are vulnerable. ${\ displaystyle C ^ {\ wedge}}$${\ displaystyle U}$${\ displaystyle \ langle U; C ^ {\ wedge} \ rangle}$

The relation gives the maximum layer of : ${\ displaystyle C}$${\ displaystyle C ^ {\ wedge}}$

${\ displaystyle C (U) = U ^ {\ prime} = \ left \ {x \; | \; x \ in U \ wedge x \ in \ max \ left (\ langle U; C ^ {\ wedge} \ rangle \ right) \ right \} \ ,.}$

The set is also referred to as the favoring set of , that is, the set of all structures that best meet the constraint compared to all other structures, that is, are “favored” by the constraint. ${\ displaystyle U '}$${\ displaystyle C}$

The second basic assumption of the OT is that the constraints themselves are ordered again, this ordering relation is called ranking ( ), the associated set of restrictions is called. Since the constraints are defined as relations, the ranking can also be viewed as a function in which the individual constraints are applied to the set according to their order : ${\ displaystyle R}$${\ displaystyle \ Sigma}$${\ displaystyle U}$

${\ displaystyle R (U) = C_ {n} \ circ \ dotsb \ circ C_ {1} (U) = C_ {n} (\ dots (C_ {1} (U)) \ dots)}$

This means that the set is reduced by the highest ranked constraint by the elements that violate more often than the elements that are contained in. The result is a set which in turn interacts with the next lower ranked constraint (see also linking of functions ), etc. This continues until all constraints have been processed. The resulting set now contains the structures that fulfill all the constraints with regard to the ranking better than all other structures and are therefore designated as optimal . They behave identically with regard to the restrictions from , i.e. violate all constraints equally often. ${\ displaystyle U}$${\ displaystyle C_ {1}}$${\ displaystyle C_ {1}}$${\ displaystyle \ max \ left (\ langle U; C_ {1} ^ {\ wedge} \ rangle \ right)}$${\ displaystyle U '}$${\ displaystyle C_ {2}}$${\ displaystyle R (U)}$${\ displaystyle U}$${\ displaystyle \ Sigma}$

In applying the theory, a limited number of candidates is usually chosen as the set of structures , whereby this number depends on the specifically selected input. A set of candidates is understood to mean the structures of the candidates, including the violations that they cause for each individual constraint . ${\ displaystyle K}$${\ displaystyle \ Sigma}$

## Evidence and Criticism

### evidence

An important argument in favor of an OT analysis of linguistic phenomena is that they can be parameterized, i.e. the possibility of deriving linguistic differences with one and the same theoretical framework. This happens in the OT by means of changes to the restriction order.

Another positive aspect of OT is the explanation of so-called repair phenomena (also known as last resort ; linguistic structures that are used if all alternatives would lead to violations of important restrictions), such as preventing hiatus in German by inserting the glottal plosive into initial vowels such as in [bəˈ. ʔ aχ.tən] or Do-support in English (mandatory use of the verb do in negated clauses like John did not kiss Mary or yes / no questions like Did John kiss Mary? ).

In addition, the OT can be used in many areas of linguistics. Current research into syntax, for example, deals with the question of how optimization processes can be embedded in existing syntax models. The concept of the local optimization of individual derivative steps within Noam Chomsky's Minimalist Program ( MP ) is very promising . In semantics / pragmatics, the principle of bidirectional optimality has proven to be a useful implementation.

### criticism

A major problem with OT is over-generation. Assuming 600 restrictions, all of which can be freely ranked, result

${\ displaystyle 600! \ approx 1 {,} 265 \ cdot 10 ^ {1408}}$

possible grammars. Some authors also assume that more complex constraints can be combined from a relatively small number of fundamental constraints using an operation called "local conjunction", a direct problem with this is that these combinations can potentially be continued indefinitely by recursion , which results in infinite constraints - and restriction regulations - would result.

A point of criticism that is often expressed is that by definition it is almost impossible to derive real optionality, i.e. to allow two or more candidates with different restriction profiles to become optimal with the same input. In the meantime, various possibilities have been presented to counter this problem. For example, it has been suggested that the same language can have more than one order of restriction. An important aid is the so-called coupling of restrictions (see above all Müller (2000: Chapter 5) and the analyzes presented therein). Other approaches question the existence of real optionality, saying that different realizations of linguistic expressions imply subtle differences in meaning. This means that grammatical expressions that are not the same but are nonetheless not in the same competition in a language because, for example, they are based on different inputs. A frequently cited problem is the relatively free order of sentences in German, which in such analyzes is often based on minimal differences in meaning. B. Focusing is concerned, is returned.

A point of criticism of the OT that is frequently mentioned, especially in the syntax, is the lack of grammatical grades. In the languages ​​of the world a distinction can be made between ungrammatical and unacceptable expressions: The sentences What do you think she did? , What do you think she did? and what do you think she did? are not common expressions for all speakers of German, although they should all be grammatical. However, this distinction is not possible with the OT, which can only filter an optimal candidate. There are approaches to circumvent this problem, but these bring radical changes to the basic idea of ​​OT.

The phenomenon of harmonic limitation has both positive and negative effects on the OT. Many candidates are blocked by other candidates, which means that the blocked candidate can never become optimal, no matter how the restrictions are ordered. This is precisely the case if there is another candidate for a candidate who violates the same restrictions with the same number of times, but does better than the blocked candidate with regard to at least one restriction. Under certain circumstances it can happen that an alleged grammatical structure is blocked by an ungrammatic one. An argument against OT that is used in this context is the so-called opacity .

### Further remarks

Due to the relatively recent development history and the rapid advancement of the theory, there are still some conceptual inconsistencies regarding individual analyzes. In almost all areas there has not yet been agreement on uniform terms for restrictions, and the nature of the input, especially in terms of syntax, also varies from author to author and from analysis to analysis.

For example, some authors take the logical form of a sentence as input. Others prefer the assumption that the input of a syntactic competition is the finished sentence, the candidates are the possible phonological realizations of the same, in which various semantically empty elements (e.g. complementers like “that”, copula or pronouns in pro-drop languages ) are omitted or necessary. The above-mentioned concept of local optimization takes a step of derivation as input. The optimal candidate for this optimization step is then the input for further optimization on a higher syntactic level, etc. Still others show arguments to dispense entirely with an input in the syntax.

Some linguists hold the OT for a meta-theory : A part of a grammatical theory designed rules-based grammar can be converted into an OT grammar and vice versa. However, the proof that such a conversion is always possible is still pending.

## literature

### German speaking

• Martin Businger: Theory of optimality. In: Christa Dürscheid : Syntax. Basics and theories. 5th edition. Vandenhoeck & Ruprecht, Göttingen 2010 (UTB, 3319), ISBN 978-3-8252-3319-8 , pp. 153-172 (introduction).
• Gereon Müller: Elements of the optimality-theoretical syntax . Stauffenburg Linguistics, Tübingen 2000, ISBN 3-86057-721-2 .

### English speaking

• Diana Archangeli, D. Terence Langendoen (Ed.): Optimality Theory - An Overview . 1st edition. Blackwell, Oxford 1997, ISBN 0-631-20225-0 .
• Alan Prince, Paul Smolensky: Optimality Theory . Rutgers Center for Cognitive Science, Rutgers, the State University of New Jersey, New Brunswick, NJ 1993.
• Rene Kager: Optimality theory . Cambridge University Press, Cambridge, New York 1999, ISBN 0-521-58019-6 .