Amplifier (psychology)
Amplifier is a term used in behavioral learning theories and describes an appetizing (“pleasant”) stimulus that occurs contingently (recognizable, regular) as a consequence of a certain behavior and increases the likelihood of occurrence or the speed of execution of this behavior ( amplification ). A distinction has become established according to the content of the reinforcers, social reinforcers, activity or action reinforcers, material reinforcers, symbolic reinforcers, hidden and informative reinforcers.
Associative approaches
Associative theories are among the historically oldest explanations of instrumental and operant conditioning . They go back to Edward Lee Thorndike and his law of effect .
These theorists explain the observed learning effects with the formation of associations between stimuli or between a stimulus and behavior during conditioning . In the simplest case, an instrumental conditioning situation comprises three elements: (ambient) stimuli S (stimulus), a reaction R and a consequence to the reaction O (outcome, here: amplifier). Within the associationists there are different assumptions as to which of these elements associations are made between.
SR learning
Thorndike suspected an SR association . The organism links the environmental stimuli of a certain situation with its reaction. The amplifier only serves to develop and strengthen this association; it is not itself part of what has been learned.
From this Thorndike formulated his law of effect : If behavior in a certain situation has satisfactory consequences, this leads to a strengthening of the association between situation and reaction. As a result, the likelihood of the reaction occurring in the situation increases.
Today this approach has been largely refuted experimentally. In so-called reinforcer devaluation experiments (amplifier devaluation ) one can show that an association is also formed between reinforcer, S and R: Suppose we carry out instrumental reinforcement with rats. Food serves as a reinforcer; the rats have to push a lever into a certain cage (= S ) in order to be reinforced. In order to ensure that the amplifier works, the rats are left to starve for a while before the experiment. The rats now learn to push the lever and are rewarded with food. After this learning phase we devalue the amplifier - i. H. we make it less “valuable”. We do this by giving the rat free access to food. The rat will eat its fill. If we now put them back in the cage with the lever (identical stimulus), we observe that the rat presses the lever much less often (different reaction) than at the end of the learning phase. SR learning cannot explain this effect, because if the rat had only learned an association between the cage stimulation and the lever pushing, it should have shown the reaction undiminished. However, since we had devalued the amplifier and thus could reduce the reaction frequency, there must also be an association with the amplifier.
SO learning
Another approach emphasizes v. a. the association between situation stimuli and reinforcers. A distinction is made between two processes:
Modern Two-Process-Theory
If an organism is brought into a reinforcement situation, it learns an association between stimulus and reaction through instrumental conditioning, as Thorndike already assumed. In addition, he learns through classical conditioning that the stimulus is a reliable predictor of the consequence (SO association). This stimulus-stimulus association motivates the instrumental reaction. It is assumed that a central emotional state is triggered in the organism due to the SO association in the reinforcement situation . If the environmental stimuli (S) announce an appetizing consequence (O, e.g. food), this stimulus triggers a kind of "hope" for food in the organism. This hope then motivates the showing of the instrumental reaction.
So-called transfer-of-control experiments in particular provide empirical evidence . If the diffuse environmental stimuli trigger an emotional state in the organism and this motivates the reaction, then the classic conditioning of an explicit stimulus should reinforce this motivation and thus lead to a stronger reaction. For this purpose, a classic conditioning is carried out before the instrumental learning phase, in which an explicit stimulus (e.g. a sound) is paired with food. If this tone is then presented during the instrumental learning phase, the reaction frequency is actually increased.
However, it was also found that the concept of a general “central emotional state” cannot be confirmed. If a rat is reinforced with food pellets and a stimulus is paired with sugar water in a subsequent classical conditioning, then the presentation of the sugar water stimulus during the instrumental amplification with pellets does not increase the reaction frequency. Since both stimuli are consumption stimuli, they should both have triggered "hope" in the rat. However, the result indicates an amplifier-specific association.
RO learning
A more modern approach includes all three elements S- (RO) in the association formation. Since an SO association can be learned under the condition that R is shown, it is assumed that the environmental stimuli S act as a discriminative stimulus and activate the RO association in the organism. However, a hierarchical S- (RO) association has to be proven separately, as transfer-of-control designs do not prove a direct need for the R to activate the central emotional component, which ultimately results in an increased response rate.
The proof of the RO association is based on the following experiment:
First, instrumental conditioning is performed with a rat in a learning cage. The rat must move a horizontal lever. If she pushes it to the left, it is reinforced with feed pellets; if she pushes it right, she gets sugar water. After learning enough, the rat pushes the lever about the same number of times in both directions. After this phase, a devaluation of one of the two amplifiers is carried out. This is done by giving the rat free access to food pellets (but not to sugar water!). The rat eats itself full of pellets, which means that feed pellets as reinforcers lose their effectiveness (see above).
Now the rat is brought back into the instrumental situation. One now observes that the rat hardly pushes the lever to the left (where it would get pellets), but almost exclusively to the right in order to receive the alternative, non-devaluated amplifier (sugar water).
This result cannot be explained by SR associations. As already shown above, the amplifier evaluation should then not have had any influence on the association between environmental stimuli and reaction, and both reactions should therefore occur unchanged.
SO learning or the two-process theory cannot explain the result either. This theory rules out that different associations between certain reactions and certain consequences can be learned in the same stimulus situation. If an SO association were decisive, then the devaluation of one of the two amplifiers should have led to a reduction in both reactions in the situation. Instead, only a specific response associated with a specific enhancer was affected. So there must be specific reaction-enhancer associations.
Primary and Secondary Amplifiers
A distinction can be made between primary and secondary amplifiers. While primary enhancers meet physiological needs, e.g. B. to satisfy hunger, secondary amplifiers are only the announcement or the promise of a primary amplifier (see token system ). A typical secondary reinforcer is money, which originally does not satisfy any need by itself. Secondary reinforcers have arisen from primary reinforcers through classical conditioning and gain their significance through contingency with these reinforcers (e.g. money for food). One example is the so-called magazine training : an experimental animal that is rewarded with feed pellets learns to perceive the sound of the pellet falling into the feed container as a reward.
Behavioral regulation theories
All associative theories have in common that they regard reinforcers as certain stimuli . So whether a stimulus can act as an amplifier depends on the unique properties of the stimulus. So a stimulus is either an amplifier or not.
A newer perspective is moving away from the focus on classic stimulus associations. Rather, this approach emphasizes the restrictions on behavior imposed by a reinforcement plan.
Consummatory Response Theory
The first theory to move away from the assumption that amplifiers are special stimuli was the Consummatory Response Theory. This assumes that amplifiers do not represent a special kind of stimulus, but that they trigger a special reaction . It has been observed that amplifiers often induce consumption reactions (e.g. ingestion of food, drink). In the behavioral systems approach, one speaks of certain behavioral systems that are activated by stimuli (e.g. the food system). Amplifiers are usually stimuli that are presented at the end of such a chain of behavior and terminate the behavioral system through a triggered consumer reaction. This does not emphasize the stimulus properties per se, but the triggered reaction that turns a stimulus into an amplifier.
For example, it has been observed that saccharin can act as an enhancer in animal experiments . Saccharin is a sweetener, but it has no biological nutritional value. Even so, saccharin can serve as an enhancer because it triggers a consumption response. If it were the special properties of the stimulus that make an enhancer, then saccharin should not have an enhancing effect, as it has no biological value.
Premack principle
In an effort to find a non-circular definition of reinforcer , FD Sheffield (1948) had pointed out that behavior-modifying reinforcement or punishment triggers not only a perception but also a behavior. It can be argued that it is not water but drinking water, not toys, but playing that are the real amplifiers. In conventional terminology, enhancer has always been defined as an ( appetizing or aversive ) stimulus (perception). David Premack now postulated that behavior A can reinforce another behavior B, namely exactly when A is shown spontaneously more often than B (in the original: "Given two responses of the different likelihood h and l, the opportunity to perform the higher probability response H after the lower probability response L will result in reinforcement of response L. ").
In order to identify a reinforcer, it is necessary to determine the frequency of behavior over a certain period of time free of any restriction (base-line behavior distribution). This gives a scale that indicates how likely it is that the behaviors recorded will occur spontaneously. The less probable behavior can then be reinforced with the more probable behavior, i.e. H. increase its probability of occurrence. Premack placed capuchin monkeys in an experimental cage with three possible behaviors and found that they spontaneously moved the lever most often, opened the door second most and moved the piston least often (baseline recording, scale of behavioral probabilities). In the testing phase, one of these behaviors could only be performed after one of the others was shown. As predicted by the Premack principle, the following pattern emerged: the monkeys moved the piston more often if they could then open the door or move the lever. They opened the door more often if that meant they could move the lever afterwards.
For example, suppose we observe a rat that has free access to water and can run on a running bike at its own pace. We measure that the rat runs for 50 minutes in the impeller and drinks for 10 minutes. Drinking is therefore less likely to occur than running bike races. Allowing this rat to go on the impeller after it has drunk increases the amount of time it spends drinking. The opposite does not work: if she is only allowed to drink after she has been on the bike, she will therefore not drink more than before. In the case of a thirsty rat who prefers to drink rather than run, the opposite is true: we can increase the likelihood of running a balance bike by making it a condition for drinking.
Premack conducted an experiment with kindergarten children. At first the children were observed without restrictions. During this time, some children preferred to play with a slot machine, others preferred to eat candy. They were divided into two groups according to their preferences. Subsequently, one could reinforce the candy eating in the slot machine group by playing and in the candy group the playing with the candy eating could be reinforced. However, in none of the groups with the lower-probable behavior could one reinforce the higher-probable behavior.
Response-Deprivation Hypothesis
The response deprivation ( behavioral restriction) hypothesis (Timberlake & Allison, 1974) represents a generalization of the premack principle. With premack, only the less probable behavior in the base-line condition could be reinforced by the more probable behavior. However, you can make any behavior a reinforcer - by lowering its frequency below the base-line rate. This can then be used to reinforce any other behavior.
Example: In the free base-line condition, a rat spends 10 minutes in one hour running on the impeller. In the next step, the behavior to be reinforced is made a condition for this running amount, that is, only if the rat shows the desired behavior, it is allowed into the running bike as a reward.
Conversely, the hypothesis also allows conclusions to be drawn about which behaviors are pushed below their base-line rate by restrictions, namely those that act as reinforcers. Example: The observation that drinking functions as an enhancer in an animal experiment leads to the conclusion that the normal drinking rate of the animals had previously been reduced.
Behavioral Bliss Point
This approach also starts from the base-line behavior distribution and defines a reinforcer according to the imposed reinforcement plan. If one observes an organism in a situation without restriction (base-line), then it is assumed that it distributes its behavior between two behavior alternatives in a preferred way. This inherent distribution preference is called the bliss point . Take, for example, a student who has a choice between watching TV and studying. If we watch him in his free choice, then we B. found that he watches 60 minutes of television for every 15 minutes of study. This preferred behavioral distribution characterizes the bliss-point. This can be best represented by drawing both behavior alternatives in a two-dimensional coordinate system. The x-axis denotes the time spent on behavior x (watching TV) - the y-axis the time spent on behavior y (learning). In our case we drew a point at 15 min and 60 min - that's where the bliss-point is.
Which of the two behaviors is reinforced and which acts as a reinforcer depends solely on the reinforcement plan that is now imposed on this behavior distribution. It is assumed that under the restrictions of a reinforcement plan an organism is always motivated to come as close as possible to its original bliss point. The behavior is distributed over both alternatives in such a way that the time spent with both is as close as possible to the bliss point.
If we want to reinforce learning in the above example and want to use television as an amplifier, we have to construct the restrictions in such a way that the organism cannot achieve its desired 60 minutes of television after 15 minutes of learning. For example, we could stipulate that the time spent must be the same for both alternatives - that 1 min of watching TV requires 1 min of learning or 10 min of watching TV 10 min of learning, etc. This restriction can be expressed in the coordinate system as a straight line (y = x), which increases by 45 °. The bliss-point is therefore to the right below the straight line. The student will now distribute his behavior so that he comes as close as possible to this point. According to Staddon's minimum deviation model , this resulting distribution corresponds to a point that connects an orthogonal perpendicular on the straight line with the bliss point.
If we want to intensify television with learning, we have to design the restrictions in such a way that the organism does not achieve its targeted 15 minutes of learning for every 60 minutes of television. If we imagine the coordinate system with the bliss-point, then the straight line that represents the imposed restrictions must run in such a way that the bliss-point lies to the right below this. For example, we can stipulate that 10 minutes of television must take place per 1 minute of learning or 60 minutes of television per 6 minutes of study, etc. This amplifier plan represents a straight line (y = 0.1x), to which the bliss-point lies above on the left. So we can enhance television with learning and increase its behavioral rate.
In general, it can be said in a simplified manner that a graphical amplifier plan function amplifies behavior x with behavior y if the bliss point is on the left above the graph . Behavior y is reinforced by behavior x if the bliss-point is on the right below the function. If the straight line runs exactly through the bliss-point, then no reinforcement effect will occur, since the behavioral distribution will then address the bliss-point (i.e. the bass-line distribution).
criticism
This model takes a “molar” approach. This means that the organism distributes its behavior optimally over a long period of time. It is not of interest how this distribution comes about at a given moment, but one observes the behavior over a long time and deduces the optimum from it. But does an organism really always act in this way? Is he actually trying to achieve the optimum “in the long term”, or does he decide spontaneously, individually at a point in time? Furthermore, there are doubts as to whether the “value” of a behavior alternative under reinforcement restrictions is the same as under base-line conditions. Perhaps in the example above, television has less incentive if you have to study for a long time beforehand? In addition, determining the behavior distribution in the field (i.e. in reality) is very complicated. There are a variety of alternative behaviors, all of which need to be considered. For example, in the example above, the student could simply escape the amplifier schedule by going to the movies or listening to the radio instead of watching TV and studying for it.
literature
- Bickel, WK, Madden GJ (1999): A comparison of measures of relative reinforcing efficacy and behavioral economics: cigarettes and money in smokers. In: Behavioral Pharmacology , 10 (6-7), 627-637.
- DeGrandpre RJ, Bickel WK, Hughes JR, Layng MP, Badger G. (1993): Unit price as a useful metric in analyzing effects of reinforcer magnitude. Journal of Experimental Analysis of Behavior, 60 (3), 641-661.
- Domjan, M. (2005): The principles of learning and behavior. (5th Ed.) Wadsworth Publishing.
- Domjan, M. (2004): The essentials of learning and conditioning. (3rd Ed.). Wadsworth Publishing.
- Madden GJ, Bickel WK, Jacobs EA (2000): Three predictions of the economic concept of unit price in a choice context. Journal of Experimental Analysis of Behavior, 73 (1), 45-64.
- Rescorla RA, Solomon RL (1967): Two-process learning theory: Relationships between Pavlovian conditioning and instrumental learning. Psychological Review, 74 (3), 151-182.
- Timberlake, W. (1993): Behavior systems and reinforcement: an integrative approach. Journal of Experimental Analysis of Behavior, 60 (1), 105-28.
- Urcuioli PJ, DeMarse T., Lionello-DeNolf KM (2001): Assessing the contributions of SO and RO associations to differential-outcome matching through outcome reversals. J. of Exp. Psychology: Animal Behavior Processes, 27 (3), 239-251.
Individual evidence
- ^ Franz J. Schermer: Learning and memory . Kohlhammer Verlag, 2013, ISBN 978-3-17-025414-5 ( google.de [accessed on May 20, 2017]).
- ↑ Werner Herkner: Psychology . Springer-Verlag, 2013, ISBN 978-3-7091-7644-3 , pp. 162 ( limited preview in Google Book search).
- ↑ Avoidance Training and the contiguity principle. Sheffield, Fred D. Journal of Comparative and Physiological Psychology, Vol 41 (3), Jun 1948. Retrieved October 29, 2011 .