Instrumental and operant conditioning

from Wikipedia, the free encyclopedia
positive and negative reinforcement
positive and negative punishment
SR model
SOR model
SORKC model
classical conditioning
counter conditioning
operant conditioning
time-out technique
response cost
token system
premack principle
learning stimulus control
contingency management
Amplifier Loss Theory
Two Factor Theory
Obsessional Process

Instrumental and operant conditioning , also known as learning from success , are paradigms of behavioristic learning psychology and relate to the learning of stimulus-response patterns (stimulus-response) from originally spontaneous behavior. The frequency of behavior is permanently changed by its pleasant (appetizing) or unpleasant (aversive) consequences. This means that desirable behavior is reinforced through reward and undesirable behavior is suppressed through punishment.

A distinction is made between this type of learning and classical conditioning , which affects triggered behavior (the learning organism has no control over the stimulus or its reaction).


Thorndike's model

The research into instrumental conditioning begins with the animal experiments of Edward Lee Thorndike , which he made as part of his doctoral thesis (1898) at Columbia University . He put chickens, cats and dogs in self-made puzzle boxes of various levels of difficulty and measured the time it took the experimental animals to liberate themselves . As an incentive, he placed food next to the cage, visible to the animals. After the animal was successful and had been rewarded with food, he put the animal back in the cage and again measured the time until the cage opened (so-called discrete trial procedure ). An average cat initially needed 160 seconds for a simple puzzle box , but it got faster and faster and only needed 7 seconds after 24 attempts. The results of his experiments took Thorndike in his "law of effect" (law of effect) together:

“Of several responses made to the same situation, those which are accompanied or closely followed by satisfaction to the animal will, other things being equal, be more firmly connected with the situation, so that, when it recurs, they will be more likely to recur; those which are accompanied or closely followed by discomfort to the animal will, other things being equal, have their connections with that situation weakened, so that, when it recurs, they will be less likely to occur. "

“Of all the reactions in a given situation […] those reactions that are accompanied or followed by satisfaction for the animal are more closely linked to the situation, so that if the situation occurs again, those reactions will also occur again; those reactions that were accompanied or followed by discomfort for the animal [...] lose their connection to the situation, so that these reactions occur less often when the situation occurs again. "

- Edward Lee Thorndike : "Law of Effect" , doctoral thesis, 1898

Thorndike's stimulus-response model , with Pavlov's experiments on classical conditioning , laid the foundation for the behaviorism founded by John B. Watson , which was to dominate psychological research for decades.

Behavioral research was shaped by no one more than by Burrhus Frederic Skinner , who continued and developed Thorndike and Watson's work. His cages, the Skinner boxes , contain the possibility of executing the target behavior (e.g. pressing a lever) at any time (so-called free operant procedure ). According to a defined reinforcement plan , this behavior has certain consequences for the animal.

Difference Between Instrumental and Operant Conditioning

Although the term instrumental conditioning is mostly equated with operant conditioning , this equation is incorrect:

  • In instrumental conditioning, one looks at the strengthening or weakening of instrumental behavior. Behavior is therefore used as an instrument (= means, tool) to bring about something. A living being then aims to achieve a certain goal and either succeeds or not. Depending on the outcome (outcome) , the next time it will show the same behavior or rather a different behavior.
  • In operant conditioning one considers any spontaneous behavior that the living being can show unintentionally or purely by chance and that can be repeated without further conditions (such as the existence of a problem).

Basic concepts

Since behaviorists limit themselves to what is observable, they summarize all internal states - for example, perceptions , emotions and thoughts - in a so-called black box . According to behaviorist theory, this black box is affected by environmental stimuli in such a way that behavior (response) is evoked. This answer has a consequence. If any behavior is shown in a certain context (stimulus), it may be that this behavior is carried out more frequently in the future (under the same circumstances) (then one can conclude that the consequence was "pleasant"), or it will be less frequent ( then the consequence was "unpleasant"). In the first case one speaks of " reinforcement ", in the second case of "punishment".

The classifications “pleasant / unpleasant” or “appetizing / aversive” are not to be understood as subjectively experienced states - as such, they have no place in a behaviorist theory - but as an expression of whether these states are sought or avoided. Thorndike defined as follows: “A comfortable state means a state that the animal does not avoid, often even seeks out and maintains. Unpleasant refers to a condition that the animal normally avoids or leaves. "

Reinforcement happens when the consequence of the behavior is a pleasant stimulus ("positive reinforcement") or the absence of an unpleasant stimulus ("negative reinforcement"). Correspondingly, punishment occurs when the consequence is an unpleasant stimulus (“positive punishment”) or the elimination of a pleasant stimulus (“negative punishment”, omission training or “DRO” = differential reinforcement of other behavior ).

Discriminative cues (signal stimuli) are stimuli that signal certain behavioral consequences. Example: A rat only receives a reward (food) if a lamp has been lit beforehand.

Contingency scheme

Four cases of operant conditioning: positive reinforcement, negative reinforcement, type I punishment, and type II punishment

In learning theory , contingency (late Latin contingentia "possibility") is the immediate and regular consequence (Latin consequi "follow, achieve"), i. H. Consequence of behavior. There are four classic basic forms of contingency in operant conditioning:

  1. Positive reinforcement is the increase in the likelihood of a behavior occurring when the behavior triggers a pleasant ( appetizing ) immediate consequence (e.g. recognition, respect, food, money).
  2. Negative reinforcement is the increase in the likelihood of a behavior occurring when the behavior prevents or ends an unpleasant ( aversive ) immediate consequence (e.g. removing noise, glaring light, heat or cold).
  3. Positive punishment is the lowering of the probability of a behavior occurring if the behavior triggers an unpleasant ( aversive ) immediate consequence (e.g. noise, bright light, heat or cold, electric shock).
  4. Negative punishment is the lowering of the probability of occurrence of behavior if the behavior prevents or ends a pleasant ( appetizing ) immediate consequence (e.g. removal of food, warmth, Christmas bonus). Negative reinforcement and punishment are often confused with one another. The word negative here only stands for removing a stimulus.
Contingency scheme of instrumental & operant conditioning
Consistency presented There is no consequence
Pleasant consequence positive reinforcement negative punishment

(Withdrawal punishment)

Unpleasant consequence positive punishment

(Presentation punishment)

negative reinforcement

Negative reinforcement is of the greatest clinical importance as it is used to maintain avoidance behavior, e.g. B. makes a massive contribution to phobic and obsessive-compulsive disorders : Patients do everything to avoid a condition that is perceived as aversive (tight spaces, unwashed hands) so that they never experience whether they can endure the condition or master the situation in the meantime .

If there is neither positive nor negative reinforcement, the behavior is extinguished . The deletion must not be confused with the withdrawal of a positive reinforcer (negative punishment).

Primary and Secondary Amplifiers

In operant conditioning, amplifiers ensure that the occurrence of a certain reaction (so-called instrumental or operant reaction ) is favored or made more difficult. Reinforcers can be quite different in individual cases, maybe a little chocolate for a child, nodding or patting on the back of an adult can be reinforcement enough (social reinforcer). What ultimately functions as a reinforcer, however, determines the test subject (i.e. the person in whom a certain behavior is to be reinforced). It is important that the reinforcers are contingent (i.e. immediate, recognizable, regular), adequate in terms of motivation and satisfy needs (e.g. hunger, need for activity). A reinforcer that occurs hours after the desired reaction is no longer recognizable for what it was awarded and therefore has no effect. A full rat won't do anything for a food pill either - the tension between needs is missing. In order to prevent fatigue in laboratory rats, research uses the findings of classical conditioning: The stimulus "amplifier" (here: food) is coupled with an initially neutral stimulus (e.g. a whistle), which makes the whistle through classical conditioning conditioned stimulus, which then - like food - also has the effect of a reward (discriminative reference stimulus). There are different types of amplifiers. Two classic types are: primary amplifiers and secondary amplifiers.

Primary enhancers are those enhancers that work from birth. According to Miller and Dollard, any reduction in an overly intense stimulus acts as a primary reinforcement. Primary enhancers are, for example, food and drink, as they reduce hunger and thirst, but body contact is also the primary enhancer.

Secondary amplifiers (see also token system ), on the other hand, are learned amplifiers. So they are initially neutral stimuli that receive secondary amplifier quality through the repeated coupling with primary amplifiers. Examples of secondary amplifiers are money, because the function is only learned. At first, money is a completely neutral stimulus until it is learned that it can be used to satisfy needs.

Token Conditioning

Similar to secondary amplifiers, there is the principle of giving so-called tokens for reinforcement . These can then later - after sufficient accumulation of tokens - be exchanged for other things, actions, services, etc. The principle is often used in behavioral shaping when the patient's behavior is to be shaped in a certain way in therapeutic facilities.

Premack principle

“The opportunity to behave more likely can reinforce less likely behavior.” ( David Premack , 1962) Behavior that we like to do and often does has a reinforcing effect on behavior that we do less like and often does. Take, for example, a child with an afternoon at leisure. If we are completely free to choose between “watching TV”, “doing homework” or “tidying up the room”, it will likely spend most of its time watching TV, doing some homework and leaving the room untidy. Now we can use the behavior with the higher probability of occurrence as an amplifier: the child will spend more time on homework if they are allowed to watch TV afterwards, and they will spend more time cleaning up if they are allowed to do homework afterwards.

In addition to the premack principle, animal experiments with rats have shown that behavior that is less likely to occur can also serve as an amplifier: Let us assume that a rat that sits in a cage for an hour spends 50 without any external constraints Minutes licking a water dispenser and ten minutes running in a running bike. According to the premack principle, you can now reinforce the longer race on the wheel by licking without any problems. However, it also works the other way around. If the rat has to lick for two minutes and then run for a minute in the impeller, this will not act as a reinforcer, as the rat easily gets to its basic behavioral frequency of 10 minutes of running an hour under this reinforcement plan. However, if the rat has to lick fifteen minutes to be allowed to run for a minute, this behavior acts as a reinforcement for the licking. Thus, behavior with a lower frequency of occurrence can also serve as an amplifier.

Continuous reinforcement

Here is reinforced with every desired reaction. This leads to a sharp increase in the learning curve. The test subject learns quickly, but just as quickly forgets again if there is no more reinforcement. This reinforcement plan is optimal in the acquisition phase, i.e. when you first learn the target behavior.

To prevent extinction , the coupling must be repeated occasionally. The following, differently successful amplifier plans have emerged.

Odds boost

The quota plans are divided into fixed (fixed-ratio) and variable (variable-ratio) quota plans. In the case of the fixed quota plans, the reinforcement is given after a certain number of the desired reactions, in the case of the variable quota plans after an average number of the desired reactions. Example: Every fifth (FR-5-Plan) or an average of every fifth (VR-5-Plan) occurrence of the target behavior is reinforced.

Most of the reactions take place on variable quota plans (variable ratio), since the reinforcement cannot be foreseen. At the same time, the reactions learned here are also the most resistant to deletion.

See also intermittent reinforcement .

Interval reinforcement

With this method, after the last reinforced behavior, reinforcement again at the earliest after a constant or variable time interval, as soon as the desired behavior occurs. Example: No behavior is reinforced for a period of 20 seconds (fixed interval) or an average of 20 seconds (variable interval) .

Rate enhancement

It is reinforced when the target behavior is shown with high frequency or low frequency. Boosting high frequencies gives the same result as ratio plans, boosting low frequencies gives the same result as interval plans.

New behavior: shaping, chaining and the Skinner box

Even complex sequences of behaviors can be promoted through positive or negative reinforcement and methods such as shaping and chaining . Learning with the help of a so-called Skinner box is an elegant method of teaching a test animal new behavior in a vivid and reproducible way .

When shaping (also approximating the complete sequence of the desired behavior is called) not only strengthened, but even any approach to the desired behaviors. If a pigeon is supposed to peck at a red point on a disc, the pigeon already moves its head towards the disc; when she looks at the pane; when it approaches the disc; then when she picks on the target and finally when she hits the red dot on the target. In particular, this technique is used to learn more complex behaviors. In this way, even unnatural sequences of movements in animals can be conditioned, such as those seen in the circus.


Research into learning through conditioning is strictly limited to observable behavior and does not speculate about constructs that may underlie the behavior. Therefore it does not clarify how learning works through intrinsic motivation (e.g. curiosity ). First theoretical models - e.g. B. Albert Bandura's social-cognitive learning theory , which speculates in particular about learning on the model - delivered hypotheses or speculative statements about these behavior patterns, which, however, cannot adequately explain the mechanism of action of the doorbell mat against enuresis .

From an ethical perspective, there is criticism: The educational consequences of behaviorism in the instrumental and operant conditioning are considered problematic in humans, provided they are used in a way that at dressage and brainwashing recalls. Operant and classical conditioning should be used in an ethical manner. This requires that the conditioning process is explained to the learners in detail, as far as these people can understand, and that they can consciously decide for or against it. This also means that the learners determine the learning objectives themselves. This may not be the case with children, the mentally handicapped and the elderly, for example. Likewise, it is not the case when the conditioning of certain sensations is used in response to the presentation of certain products in advertising.

See also

Web links

Individual evidence

  1. ^ Wilhelm F. Angermeier: Control of the behavior. Learning from success . 2., rework. Springer, Berlin, Heidelberg, New York 1976, ISBN 978-3-540-07575-2 .
  2. Philip G. Zimbardo: Psychology . Springer, 2013, ISBN 978-3-662-22364-2 , pp. 275 ( limited preview in Google Book Search).
  3. ^ Martin Wiegand: Processes of Organizational Learning . Springer, 2013, ISBN 978-3-322-89128-0 , pp. 343 ( limited preview in Google Book search).
  4. In the original: “By a satisfying state of affairs is meant one which the animal does nothing to avoid, often doing such things as attain and preserve it. By a discomforting or annoying state of affairs is meant one which the animal commonly avoids and abandons. "
  5. Carsten Vollmer: Media-based learning: status and potential in company educational work ., 2014, ISBN 978-3-8324-4687-1 , p. 10 ( limited preview in Google Book search).
  6. ^ Franz Petermann, Andreas Maercker, Wolfgang Lutz, Ulrich Stangier: Clinical Psychology - Basics . Hogrefe Verlag, 2017, ISBN 978-3-8409-2160-5 , pp. 45 ( limited preview in Google Book search).
  7. In the original: "An opportunity to engage in more probable responses will reinforce a less probable response."
  8. ^ Richard J. Gerrig: Psychology . Ed .: Tobias Dörfler, Jeanette Roos. 21st edition. Pearson, Hallbergmoos 2018, ISBN 978-3-86894-323-8 .