Cohort model

from Wikipedia, the free encyclopedia
The cohort model in the scheme.

The cohort model is a model of the auditory word recognition which was developed in the 1980s by scientists under the leadership of the British psychologist William Marslen-Wilson. It describes how individual words stored in the mental lexicon are accessed while listening. The word that best corresponds to the incoming acoustic information is filtered out from a multitude of potential candidates .

overview

Word recognition vs. voice recognition

The cohort model is a model of auditory word recognition, that is, it tries to explain how access to the information stored in the head of individual words works when listening to speech. It is about the question of how the appropriate entry in the mental lexicon can be deduced from the acoustic signals that together make up a word . For these models, it is initially of no interest how the meanings of individual words result in the overall context of a linguistic utterance. Rather, it is all about the meaning of the words themselves, which is only specified by the context.

Here is an example: An ambiguous word like bank has many different meanings (for example rock layer , credit institute , seating , ...). Which of these is actually meant in a linguistic utterance results in most cases from the linguistic and non-linguistic context. In theory, all meanings of a word are stored in the same entry in the mental lexicon, as are all meanings of the word bank . The actual meaning is revealed when it is promoted by the context (in technical jargon: primed ). This also applies to morpho-syntactic processes. For example, when understanding the word banks, the entry for the lexeme bank is accessed first, then for that of the plural ending -en . In a further process, which is irrelevant for word recognition itself, the two recognized lexemes are merged and the word is understood in its specific meaning (here: several credit institutions , but not several seats , since the plural would be banks ). The same is true for complex expressions such as endocentric composites (z. B. bank employee ) or -derived words (z. B. workers from the entries for the verb stem employer and the substantivierenden suffix -er ). Speech recognition, but not word recognition, also includes syntactic analysis, such as the question of whether the word bank is embedded in a sentence , as a subject ( the bank is in the city center ) or as an object ( the ATM is in the bank) acts.

If speech recognition models are generally about finding a concrete meaning of a word or a sentence depending on the semantic , syntactic and linguistic context, word recognition models are limited to finding the entry in the mental lexicon corresponding to the word to be understood. The existence of such a mental lexicon is taken for granted. In summary, word recognition models generally address the question of how this lexicon is accessed. With such models, it is initially irrelevant which concrete meaning of the entry is ultimately given by the context. Word recognition is therefore part of speech recognition, but not to be equated with it.

Prehistory and delimitation

In psycholinguistics , there are roughly two types of models of auditory word recognition. On the one hand there are the phonological approaches, on the other hand the psycholinguistic approaches. The phonological approaches are older than the psycholinguistic approaches in terms of research history, but both still exist side by side and are accepted by different scientists depending on the question at hand.

Phonological Approaches

Scheme that compares the two approaches to auditory word recognition.
The phonological approaches ( left ) make repeated use of the acoustic signal ( lowest level ), and the entry in the mental lexicon (
upper level ) is only accessed after the complete abstraction ( middle level ) . In the psycholinguistic models ( right ), the acoustic signal is recoded once and the word is recognized by accessing the mental lexicon several times.

The phonological approaches describe word recognition as a dynamic process in which, after the acoustic signal has been received, this signal is used repeatedly during recognition. One speaks of bottom-up approaches, i.e. word recognition is based solely on the acoustic signal. In addition, these models assume early abstraction . Accordingly, the incoming signal is broken down into discrete units, for example distinctive features , very early on . Phonological properties of language also play an important role in recognition.

Examples are the motor theory , the acoustic invariance theory or the quantum theory of speech perception .

Psycholinguistic Approaches

The psycholinguistic approaches, on the other hand, focus on word segmentation and word recognition. If world knowledge plays no role or only a subordinate role in the phonological approaches, this is used in the psycholinguistic approaches to repair phonetically mutilated signals. This means that these models are top-down oriented, which means that when recognizing words, existing knowledge, for example about the structural properties of known words, is used. The cohort model is one of the first psycholinguistic theories about auditory word recognition. Others are, for example, the TRACE model by McClelland and Elman, as well as its further developments, the shortlist model or the merge model , some of which use the mechanisms of the cohort model.

The scientists working with William Marslen-Wilson developed the cohort model in the early 1980s on the basis of a series of experiments that produced results that previous models could not explain or could only explain with additional assumptions.

General functionality

The model

Marslen-Wilson divided the auditory word recognition in three macro levels: access (Engl. Access ), selection (Engl. Selection ), and integration .

In the model, access is understood as the conversion of acoustic signals into features or sounds . In the selection phase, the cohort formation mechanism (see below ) is used to select the appropriate entry in the mental lexicon and in the integration phase it is embedded in the corresponding semantic and syntactic context. The model alone does not make any statements about grasping the meaning of complex units of meaning such as sentences or phrases .

Lexical access in the cohort model

Scheme of the functioning of the cohort model. The listener perceives the input loud for loud and iteratively excludes all lexemes that do not match what they hear.

The basic idea of ​​the model is that the incoming acoustic signal (the so-called input ) is broken down serially in the phone when listening to spoken language . The listener recognizes the first sound of the word to be understood and opens a set of all entries stored in his mental lexicon that begin with this very sound. This set of lexical entries is called a cohort . In the next step the second sound of the word is analyzed. From the first cohort, all the entries are selected whose second sound matches the recognized one of the input. The remaining lexemes are removed from the cohort. In the following, only those lexemes are available that still match the previously recognized information of the input. This procedure is now repeated with the following sounds until the word is clearly recognized. This is the case when the cohort only contains one entry. The graphic on the right illustrates this general functionality of the model using the example of the recognizable English word "trespass" (in German: unauthorized entry ).

The original cohort model was able to explain context effects and the serial nature of auditory word recognition. After frequency effects and the handling of defective input became known as weaknesses of the model , Marslen-Wilson expanded the model in the mid-1980s. In the literature, the terms Cohort I and Cohort II have become established for these two stages of the model .

Experimental basis

William Marslen-Wilson conducted a series of experiments that revealed two important properties of auditory word recognition. On the one hand the serial character and on the other hand the influence of contextual information on word recognition. At the same time, he was able to demonstrate weaknesses in the previously existing models of speech perception. The cohort model, which should explain the observed effects, then grew out of the results of these experiments.

Shadowing experiments

The first experiments that Marslen-Wilson carried out were so-called shadowing experiments . In these experiments, the experimenter reads out a text that the test subject must repeat as quickly as possible. With an average word length of 500 ms, there was a delay of 250 ms between the words of the experimenter and the repetition by the test subject. This means that the subject could recognize a word and repeat it before the experimenter had finished saying the word. After deducting the time it takes to articulate the perceived words, it is now assumed that the process of pure recognition of a word takes about 200 ms. At normal speech speed, this corresponds to a length of about two to three sounds ( phonemes ).

In this context one speaks of the so-called uniqueness or recognition points . The uniqueness point (also discrimination point ) is the point from which a word is recognized beyond doubt, i.e. if there is no other word encoded by the same phoneme sequence as the one to be recognized. This is the case at the latest when a new word begins. The recognition point , on the other hand, is the point from which the listener can say with a high degree of certainty which word he is perceiving, i.e. after about 200 ms.

Investigator
thinks
Experimenter
reads aloud
president how ident
company com si ny
tomorrow tommor ane

In further such experiments, the test subject was read a text which, in contrast to the first experiment, contained errors. The position of the error within a word varied, whereby the error could be placed at the beginning, in the middle or at the end of the word. The table opposite shows some examples. These sometimes incorrect words were packed into three different texts, one of which was a normal text. The second text was semantically abnormal , the sentences in it were grammatically correct but made no coherent sense. The third text was semantically and syntactically abnormal , that is, a disconnected sequence of words.

In these experiments, the focus was on restoring the incorrect words. Restoration is understood to mean when the test subject can still reproduce an incorrect word correctly and does not recognize it as incorrect. It turns out that the subjects are most likely to restore incorrect words if the errors occurred at the end of the word and under normal conditions. If, on the other hand, the errors occurred at the beginning of a word or in abnormal contexts, they were almost always recognized as such and the corresponding words were not restored.

On the one hand, these observations speak for a serial character of the auditory word recognition, which explains the restoration at a later point in time. On the other hand, such experiments provide information about the role of context in recognizing incorrect words, which can be demonstrated by restoring them in normal contexts and noticing the errors in abnormal contexts.

Word monitoring experiments

A third series of experiments were so-called word monitoring experiments . The test subject receives a series of words acoustically over headphones at short intervals and has the task of pressing a key in front of him when he hears a predetermined word. The reaction time was measured, i.e. the time between the beginning of the word to be recognized and the pressing of the key. The respective word could appear in three different contexts. Either after a semantically related, after a syntactically related or after an unrelated word. For example, if the word to be recognized is “ eagle ”, a semantically related word “bird” (because bird is a hyperonym for eagle), a syntactically related word “the” (“the eagle”) and an unrelated word “blue” (“ blue ”and“ eagle ”are neither semantically nor syntactically related) sein.

In these experiments it turned out that a semantically related preceding word had the shortest response time. The response time was considerably longer for a syntactically related preceding word, and the longest for unrelated preceding words.

Conclusions from the results

In addition to proving the serial character of the auditory word recognition, the experiments also clarified the so-called context effect .

In certain linguistic contexts, certain classes of words are more likely than others to occur. For example, in languages ​​such as German or English, it is relatively unlikely that an article will be followed by a verb. In this case, one speaks of the syntactic context, which primes certain word classes, i.e. makes them probable. Priming manifests itself in a shorter recognition time in contrast to unprimed lexemes.

In addition to the syntactic context, the semantic context also plays an important role in the speed of word recognition. For example, it is relatively unlikely that a technical term from landscaping is used in a speech act that has nuclear physics as its topic. Similar effects can also be seen at the phonological level. For example, if two consecutive words rhyme, the second will be recognized more quickly. If they do not rhyme, however, the recognition of the second word takes longer.

Cohort I.

Serial word recognition

The serial character of auditory word recognition is explained in the cohort model as follows: When a word is spoken, it reaches the listener in the form of acoustic waves , whereby each sound of the word is coded as a specific pattern of overlapping waves. A cognitive module upstream of the lexical access now uses these waves to determine which specific sound they represent. In this context, serial is understood to mean that exactly one sound is transmitted at any point in time. Accordingly, it is not the case that two different sounds are transmitted from the speaker to the listener at the same time. The word to be understood is therefore considered sound for sound.

After recognizing the first sound, the mental lexicon, i.e. the module of the cognitive system in which all the words that have been learned in the course of life are stored, is accessed for the first time. All entries that begin with the recognized sound are activated. In psycholinguistics, activation is understood to mean that an entry is accessed and further information is retrieved. These are assumed to be included in the entry. This information can for example

  • be semantic in nature, i.e. contain information about the meaning of the word;
  • be of a morpho-syntactic nature, i.e. information about whether it is, for example, a verb or a noun, whether the word needs arguments , what gender it belongs to, etc .;
  • be of a phonological nature, which say, among other things, with which sounds the word is formed.

When the following sounds are recognized, all entries that no longer match what was heard are removed from the cohort, that is, the activation is canceled and the information retrieved is “forgotten”. At a certain point, the cohort is only one entry large, this is the point at which the word is recognized without any doubt. In the literature, this point is called the uniqueness point . In this first version of the cohort model, Marslen-Wilsen follows an all-or-nothing approach , a lemma can either be activated or not activated, there are no gradations or differences in activation.

Marslen-Wilson made this clear with the English word trɛs.pʌs ( trespass , German for unauthorized entry). The first thing the listener recognizes is the phoneme [t], followed by [r], then [ɛ] etc. Already with the first sound a cohort is mentally opened. All words are activated which are listed in the mental lexicon and which also begin with the sound [t]. If the second phoneme ([r]) is recognized, all words that do not start with the phoneme sequence [tr] are deleted from the cohort. If, for example, the cohort consists of the words tree (tree), trespass (unauthorized entry), time (time), train (train), tress (locke) when the [t] is recognized, the time becomes from the when the second sound is recognized Cohort removed because it does not start with the sound sequence [tr]. If the third sound is recognized ([ɛ]), the cohort is shortened to the entries trespass, tress and training ( tree is implemented phonetically as [triː] ). Only when the fifth sound, [p] is recognized, is only one word left in the cohort (namely trespass ), and the chain of sounds is clearly recognized as the word trespass .

Explanation of the context effects

In the cohort model, the faster recognition of context-bound lexemes is explained by the assumption of parallel activation . This assumption says that all members of a cohort are activated equally, that is, the listener mentally accesses all members of the cohort. This activation is then withdrawn again when the lexeme no longer fits the given input, i.e. when the lexeme is deleted from the cohort due to a different phoneme sequence.

Since all information in the lexeme, i.e. morpho-syntactic, phonological as well as semantic, is always activated when individual lexemes are activated, the listener also accesses related entries in the mental lexicon. These are other entries in the dictionary that have the same or sufficiently similar properties.

The results of the above experiments can be explained with the cohort model as follows: When the first word is understood, it and all words related to it are activated. When the following word is recognized, the activated information is still available. If one of the indirectly activated lexemes is now included in the cohort for the following word, the second word is recognized after a shorter time compared to unrelated and thus not yet activated words.

Another consequence of the assumption about parallel activation is that the discrepancy between the uniqueness point and the recognition point can be explained directly.

criticism

Very soon after the cohort model was published, problems emerged that the model could not solve without further assumptions. These include the frequency effect and how to deal with defective input .

The frequency effect means in a narrower sense that the listener recognizes a word that he uses frequently more quickly than another which he tends to use less often. With defective input it is meant that in spoken language the word seldom reaches the listener as a whole. Noise and background noise in many cases mutilate part of the acoustic information that is transmitted from the speaker to the listener. Even so, in most cases the listener is able to understand what is being said.

This criticism came mainly from James L. McClelland and Jeffrey L. Elman and subsequently led to the development of the TRACE model , an alternative to the cohort model, which, however, draws on its essential key points.

To address the problems, Marslen-Wilson, after a series of experiments, expanded his model, which is known in the current literature as Cohort II .

Cohort II

Soon after the criticism of his model appeared, Marslen-Wilson carried out a series of further experiments to check the effects mentioned, and then modified his model according to the results. Marslen-Wilson explained the experiments, their results and the modifications to the model in an article from 1987 (see literature ).

Frequency effect

General experimental setup of an experiment for lexical decision-making

To test the effectiveness of the frequency effect, Marslen-Wilson had experiments carried out again in the mid-1980s. These were so-called lexical decision experiments with visual target words . In these experiments, the test subjects are given the task of deciding, by pressing a button, of words that are displayed on a screen, whether they are words in their language or not. In addition, the test subjects were presented with various words (so-called destructor words ) via headphones . In these experiments, the point in time at which the word was displayed on the screen in relation to the word heard was varied.

auditory input visual input Detection time
CAP‣T SHIP fast
CAP‣T GUARD normal
CAPTAIN‣ SHIP very fast
CAPTIVE‣ SHIP slowly
CAPTAIN‣ GUARD slowly
CAPTIVE‣ GUARD very fast

The table opposite illustrates the results of this experiment. For example, the test subjects were given the words captain or captive acoustically. The corresponding words visually presented for the lexial decision were ship (English "ship") or guard (English here: "Bewacher"). The time at which the word was presented on the screen could be immediately before the T or at the end of the acoustically presented word (represented in the table by the character “„ ”).

The word captain is believed to be more frequent than captive , meaning that captain tends to be more common than captive in the average vocabulary of a native English speaker . As the table shows, the test subjects recognized the word ship more quickly than the word guard if it was presented at an early point in time (i.e. immediately before the / t / they heard). This can be explained by the fact that the word captain is activated when understanding the first three phonemes ( cap ) and can therefore prime the word ship , while the other possible word ( captive ) is not activated or is less activated, creating a priming effect for guard did not materialize and recognition took longer. However, if the words to be recognized were presented late, this effect was lost. Marslen-Wilson concluded that the frequency effect is true, but only works early and is overwritten by general context effects at a later point in time and loses its effectiveness.

In order to explain the frequency effect with his model, Marslen-Wilson was in the second version of the model, the all-or-nothing -adoption fall and replaced it with a goodness-of-fit approach (English, mutatis mutandis. What works best fits ). If after the first the members of a cohort were all activated and deactivated equally, the second version of the model assumes that certain entries in the lexicon have a greater activation potential than others. Frequent entries are accordingly activated more strongly within a cohort than less frequent entries, which should explain the earlier activation of the frequented words.

Defective input

Various experiments have shown that test subjects are able to recognize words if certain parts of the words have been mutilated by background noises, for example noise or by playing another sound. For example, if you play the test subjects with the word universal embedded in a sentence, replace the s with noise and ask the test subjects where the error was, most people find it difficult to pinpoint the error, let alone that they made a mistake at all to notice. If, however, a noiseless gap is left in place of the s , the test subjects correctly recognize the error in almost all cases.

The original version of the cohort model works at the phoneme level. The linguistic units that are recognized serially are the sounds of the word to be recognized. In order to explain the observation of the understanding of linguistic utterances despite potential background noise, this assumption was dropped and the model was changed in such a way that it now works with distinctive features . In theory, sounds can be broken down into different distinctive features, for example the sound / t / has the features [-voiced, CORONAL, -sonorantic, -nasal] and so on.

In the second version of the model, the cohorts are no longer opened after recognizing certain sounds, but only when an indefinite number of overlapping phonological features is given. This means that the entries in the mental lexicon are not specified phonologically, i.e. are stored as a chain of phonemes, but in the form of chains of distinctive features. If some of these features are now covered over by background noise, the entries remain in the cohort if the remaining features match.

criticism

The delimitation of words is mentioned as an important problem of both versions of the cohort model. The model does not provide a mechanism that can recognize the beginning and end of a word in a coherent text. Still, it provides a very robust mechanism to explain isolated word recognition.

Current developments

Despite its weaknesses, the cohort model is now considered to be a standard model for auditory word recognition, the main features of which were integrated into many later models.

The successive exclusion of unsuitable elements of a candidate set can also be found in the theory of optimality , a formal model of the grammar of human languages.

Basic features of the model have also been integrated into computational linguistics. The truncation in the research in database systems makes use of the basic mechanism of reduction of a relevant result set by segment-wise exclude potential results advantage. It thus forms a direct application of the cohort model in computational linguistics.

literature

Primary literature

  • William D. Marslen-Wilson and Alan Welsh: Processing Interactions and Lexical Access during Word Recognition in Continuous Speech. In: Cognitive Psychology Vol. 10, No. 1, 1978, pp. 29-63 ( doi: 10.1016 / 0010-0285 (78) 90018-X )
  • William D. Marslen-Wilson: Spoken Word Recognition. In H. Bouma and DG Bouwhuis (eds.): Attention and Performance X. Lawrence Erlbaum, Hove, 1984
  • William D. Marslen-Wilson: Functional Parallelism in Spoken Word Recognition. Cognition 25 : 71-102 1987 ( doi: 10.1016 / 0010-0277 (87) 90005-9 )

Secondary literature

  • Rainer Dietrich: Psycholinguistics. 1st edition, Metzler, Stuttgart, 2002, ISBN 3-476-10342-0
  • Trevor A. Harley: The Psychology of Language. From data to theory. 3rd edition, Psychology Press, Hove, New York, 2008, ISBN 978-1-84169-382-8 , pages 268-273
  • M. Gareth Gaskell and Gerry Altmann (Eds.): The Oxford Handbook of Psycholinguistics. Oxford University Press, Oxford, 2007, ISBN 978-0-19-856897-1 ( Online , as of April 18, 2009)

Individual evidence

  1. compare Harley (2008: 241f.)
  2. ^ William D. Marslen-Wilson and Lorraine Komisarjevsky Tyler: The temporal structure of spoken language understanding. In: Cognition No. 8, 1980, pp. 1-71
  3. ^ DE Meyer and RW Schvaneveldt: Facilitation in recognizing pairs of words: Evidence of a dependence between retrieval operations. In: Journal of Experimental Psychology No. 90, 1971, pp. 227-234
  4. ^ DE Meyer, RW Schvaneveldt and MG Ruddy: Functions of graphemic and phonemic codes in visual word recognition. In: Memory & Cognition No. 2, 1974, pp. 309-321
  5. James L. McClelland and Jeffrey L. Elman: The TRACE Model of Speech Perception. In: Cognitive Psychology Vol. 18, No. 1, 1986, doi: 10.1016 / 0010-0285 (86) 90015-0 , pp. 1-86.
  6. see for example Harley (2008), page 273