Punishment (game theory)

from Wikipedia, the free encyclopedia

In game theory , the penalty is used to minimize the incentive to cheat other players with the aim of gaining a personal advantage. In this context the punishment is used as an instrument to maintain and promote cooperation .

basic forms

There are two different forms of punishment, the sanction and the norm. A sanction is imposed on other group members and could, for example, include the exclusion of the cheater from future games of the group. Norms are set within groups with regard to the behavior of the members. They increase the personal costs of each individual and thus strengthen cooperation within the group. The increase in personal costs is understood to mean, for example, an increase in the feeling of shame, a feeling of guilt or an increase in disgust for the disapproval of other group members. This leads to a decrease in the incentive or the courage to cheat. The difference between norm and sanction is that if a norm is violated, the other group members are not obliged to do anything to punish the cheat. Exceeding the norm automatically means paying high personal costs for him.

application

Choice of punishment

When choosing the penalty, make sure that it is easily understandable and clearly defined in order to show the players possible consequences from the start of the game. In addition, security plays a crucial role. It must be ensured that penalties occur and that cooperative behavior within the game is rewarded. Another important property is the level of the penalty, which should be high enough to generate a deterrent. If the sentence is too high, possible wrong decisions regarding the discovery of cheating or fraud become too costly in return. The determination of the height is therefore a crucial decision.

Repeated Play Penalties

Penalties in the area of repeated play (i.e. ongoing relationships in which basic situations are repeated) only arise during the course of the game. They are used to impose costs on a player (for example the termination of the game) if he is not acting in the common interest of the other players.

Infinite repetition

In a repeated prisoner's dilemma , for example, the following constellation (K = cooperation; S = cheating)

K S.
K (2, 2) (0, 3)
S. (3, 0) (1, 1)

and a cooperative behavior of players x and y result in a win of 2 per player (K, K).

where denotes the discount rate of both players.

If y cooperates in the first game and x cheats, x gets a win of 3 and y gets 0 (S, K). In the second game, x is punished because y also cheats after the negative experience in the first game ( Tit for Tat ) and both receive a proceeds of 1 (S, S).

This shows that x, as long as it is greater or equal , has no incentive to cheat, because:

>

Since the game is repeated an infinite number of times, the following periods are the same as the first period described. Throughout the game, it makes more sense for both players to cooperate.

Final repetition

In the context of the repeated games, the games, the end of which is already determined at the beginning (e.g. a legislative period ), represent a position to be considered separately. The cooperation in these games will end when there is no more time for the punishment. If one of the players cheats, no one else will want to cooperate with him in the future. For this reason there is no cooperation from the start. Cheating is the dominant strategy for the entire duration of the game, since the last game will always be determined by cheating and thus also the penultimate, third from last, etc. This principle applies regardless of the length of the game for the games whose end is known.

In addition, it should be noted that the costs of cheating only arise after the profits are made. If the present is worth more than the future, in which the costs are incurred, due to a critical situation, it makes sense to cheat today. An example of this is politics , where the time after the election does not seem to be as important as the time before it.

But there are also exceptional cases for the principle described in the penultimate section in which cooperation takes place anyway. The first explanation is based on the fact that while the games are finite, it is not known how many will actually take place. For this reason there is still an incentive to cooperate. There are also “nice” people who always cooperate. Another possibility is the pretense of cooperative behavior, the sole purpose of which is that the other reciprocates it. This means that mutual advantages can be exploited before both sides plan to use the other.

Guaranteed punishment

The certainty that deviating from cooperation in a repeated game will result in penalties can be increased by using the guaranteed penalty. A guaranteed penalty occurs automatically in the event of a deviation from the cooperation and punishes the cheater. The result is that non-cooperative behavior with short-term gains is not worthwhile, as this cheating is guaranteed to be discovered and the penalty is usually higher than the short-term gain. For this reason, the cooperation in the game will continue.

Altruistic punishment

The so-called altruistic punishment is carried out by the group members among themselves. Scientific research shows that group members are willing to punish members who do not meet their group contribution and show no prospect of future performance improvement. The altruistic punishment promotes cooperation in society and is also used in cases where high costs arise for the group. In addition, the use of altruistic punishments demonstrably leads to the fact that the willingness to cooperate, which decreases with increasing group size, can be maintained in significantly larger groups.

The tendency in human behavior to reward and punish altruistically in this sense is also called strong reciprocity .

Game theory examples

Repeated Play Penalties

There is a price agreement between two providers, A and B, which provider A breaks after a while by undercutting it. The result is that A initially achieves higher profits than B because of the offer at a cheaper price. If the cheating is discovered, A must expect a negative reaction from the other. Provider B will also lower prices in order to improve its position. Ultimately, A has achieved nothing, except lowering the market price and the profits for itself and the provider B.

Guaranteed punishment

Visualization: Example Guaranteed Punishment

In order to demonstrate the effectiveness of the guaranteed penalty using the example above, dealer B could issue a “lifelong low-price guarantee”. This means that if the customer finds the purchased product cheaper at a local dealer, he will receive a discount from B equal to twice the price difference.

The adjacent figure visualizes the following procedure. Suppose both dealers sell a television set for $ 300. If provider A were to deviate from the price agreement in this case and reduce the price to € 275, this would lead to a guaranteed punishment. The reason for this is that customers are aware of the guarantee promise made by retailer B and are tempted to buy the television from B. They would then inform B that A is selling the television set at a lower price and have B pay off the € 50 discount (2 x € 25 price difference). With this behavior, they unconsciously contribute to maintaining the price cartel between the two providers. The result corresponds to an immediate price reduction by dealer B to € 250.

However, since he does not want to give away any money, he would also reduce his price to € 275 as soon as he found out about this cheating from the first customer. In any case, however, merchant A would end up punishing himself with a loss of profit and therefore leave the price at € 300.

Altruistic punishment

A notable example of this behavior in groups is the expedition to the South Pole carried out by Robert Falcon Scott in 1911–1912 . He trained a group of Siberian dogs for the purpose of transportation, which developed a remarkable system of cooperation within a few months. This was encouraged and maintained by certain punishments. It involved the union of several dogs against the dogs that did not pull their weight or put on too much weight. The punishment was always in the same way by murder. In the context of human groups, this mechanism of punishment does not involve such a high degree of punishment; the basic idea is comparable. This natural punitive mechanism leads to the growth of human civilization and to the fact that groups within which this punitive behavior is used are better able to survive in times of crisis, war or disaster.

An example in the context of human groups is the game described by Binmore. He describes a world that consists exclusively of mothers and daughters and does not take into account the presence of love. Life runs in two periods, youth and old age. In the first period, youth, each daughter bakes two loaves of bread and at the end becomes a mother, since each gives birth to a daughter. In old age, however, the mothers are too weak and no longer able to produce anything. This means that every player is doing very well when she is young, as she is well looked after by the two loaves of bread. In old age, however, she would have to live very poorly, since the perishable bread cannot be kept until old age.

All players would prefer to have one bread in their youth and one in their old age. This becomes possible when each daughter gives her mother a piece of bread in period one. This daughter is now considered a conformist . The result is the following equilibrium: Each daughter only gives one of her loaves to her mother under one condition. This condition is that her mother also gave a loaf of bread to her own mother when she was young. If the mother in question did not give bread in her youth (non-conformist), she does not receive bread from her daughter in return. This results in a mechanism, because conformists reward other conformists by handing them over and punish non-conformists by refusing a loaf of bread (altruistic penalty mechanism).

A small example:

Anna (mother), Bea (daughter) and Caro (granddaughter) live in the world described. In her youth, the non-conformist Bea refused to give Anna a loaf of bread. In the next period Caro will punish Bea by not giving her any of her loaves either. This certainty prevented Bea a period before from becoming a non-conformist, since otherwise she would not have a loaf of bread in old age.

literature

Web links

Wiktionary: Punishment  - explanations of meanings, word origins, synonyms, translations

Individual evidence

  1. Avinash K. Dixit , Barry J. Nalebuff: Game Theory for Beginners . Pp. 97-101.
  2. ^ A b Robert Boyd , Herbert Gintis, Samuel Bowles, Peter J. Richerson: The Evolution of Altruistic Punishment . P. 215.
  3. ^ Avinash K. Dixit , Susan Skeath: Games of strategy . Pp. 397-399.
  4. a b Avinash K. Dixit , Barry J. Nalebuff: Game theory for beginners . P. 104 ff.
  5. K. Dutta Prajit: Strategies and games . P. 249.
  6. ^ Avinash K. Dixit , Susan Skeath: Games of strategy . P. 642.
  7. ^ Joel Watson: Strategy: an introduction to game theory . P. 263 ff.
  8. a b Avinash K. Dixit , Barry J. Nalebuff: Game theory for beginners . P. 97 ff.
  9. ^ Avinash K. Dixit , Susan Skeath: Games of strategy . P. 348 f.
  10. a b Avinash K. Dixit , Barry J. Nalebuff: Game theory for beginners . Pp. 101-104 ff.
  11. ^ A b Avinash K. Dixit , Susan Skeath: Games of strategy . P. 461.
  12. ^ Robert Boyd, Herbert Gintis, Samuel Bowles, Peter J. Richerson: The Evolution of Altruistic Punishment . P. 219 f.
  13. Ken G. Binmore: Playing for real: a text on game theory . P. 343.