Collective picture problem

from Wikipedia, the free encyclopedia
Sports collection pictures

The coupon collector's problem , collector's problem , scrapbooks issue or problem of complete series ( English coupon collector's problem ) deals with the question of how many images of a collection Series are for sale to a scrapbook to complete.

With the classic collector's picture problem, it is assumed that all pictures are randomly mixed and bought face down and that all motifs appear equally often. The latter requirement is not met, for example, in trading card games , also known as “trading card games”, since the occurrence of individual cards varies greatly. It could also be proven that in many collective picture series the pictures are not mixed randomly. Another important role is played by the option of buying and exchanging cards. The classic collective picture problem could not be solved in this general problem.

With the aid of probability theory and Monte Carlo simulations , collecting strategies can be optimized in order to fill the scrapbook as cheaply as possible. A solution was found for an optimized collection strategy based on changed, practice-relevant assumptions.

The scrapbook problem is one of the few math problems that is regularly reported and discussed in the mass media due to the popularity of soccer scrapbooks.

history

Historical ice hockey collection picture

Collective pictures have a long tradition, as early as the 19th century there were collective pictures as product additions e.g. B. with chocolate or cigarettes. One variant was that coupons had to be collected and the collector received a bonus after sending in a complete series. In the 1930s, pictures of movie or sports stars appeared on the inside of the lids of ice cream packs. But there were also other collectibles early on, e.g. B. buttons or figures in cereal boxes. It was not until the 1960s that scrapbooks became increasingly established in which the pictures were sold concealed in bags or packages, i.e. H. without the collector knowing which pictures he is receiving. With the introduction of football collector's albums , promoted by the Panini company , collecting became a mass phenomenon, especially at world and European championships. In the meantime, scrapbooks for standard merchandising items, e.g. B. for movies.

The basic combinatorial considerations go back to Markow . The concrete mathematical treatment began in 1930 with George Pólya , who presented the problem from the manufacturer's point of view: “Each package of our goods contains two different flower images; the full collection includes 72 different flower pictures; Anyone who collects and sends in a full collection will receive a free bonus. The salesperson who advertises this must reasonably ask himself: What is the average number of packs sold per premium? ”This work therefore coined the name as Coupon Collector's Problem . Sometimes it was also referred to as the Dixie Cup Problem (after the ice cream cups, which were named Dixie Cup after the manufacturer). Starting with William Feller's standard work on probability theory, numerous mathematicians have dealt with the problem of collecting pictures. As a result, the collective picture problem has established itself as a classic mathematical problem.

Sometimes the task is also referred to as the “collector's picture paradox”, because a surprising number of pictures are required to complete the album without swapping or buying. For Pólya's flower picture collection you have to collect an average of 174 parcels, i.e. 348 flower pictures (with only 72 different pictures), for the scrapbook for the European Football Championship 2016 you need as many as 4828 pictures (with 680 different pictures) - apparently the ratio of the average required increases Images to the number of different images stronger than linear.

The classic picture collection problem

The classic problem of collecting pictures raises the question of how many pictures a single collector has to buy on average in order to get a complete series of pictures if he foregoes swapping and buying and buys each picture individually. Pólya had already defined the classic task and justified it as follows: “Buyers exchange their pictures or throw them away, the seller can withhold a picture, etc. As you can see, failure to meet the requirements can favor both one and the other party and that's why the calculation of the average number under the aforementioned conditions seems to me to have a certain value, at least as a first orientation ”.

Assumptions

The following assumptions are made in the classic collective picture problem:

A1: The images are mixed well during production, that is, they are randomly distributed among the packages.

A2: All images appear equally often.

A3: There is no duplicate image in a package.

The result space can be defined as the set of possible collective picture sequences:

.
Urn model of the collective picture problem with six pictures

Mathematically, the collecting picture problem corresponds to an urn model with replacement, and the numbers drawn mean the numbers of the collecting pictures. We are looking for the number of draws that are necessary until each number has been drawn at least once. This means that the individually drawn numbers of the collective pictures have a discrete uniform distribution .

parcel

Collective pictures are usually not sold individually, but in packets with p pictures (often also called bags or boosters). The manufacturer guarantees that no image appears more than once within a package.

It has been proven that assumption A3 for the Panini sticker is fulfilled due to the special packaging process with the Fifimatic packaging machine.

The general solution is complicated, but comparing the results shows that the effect of the packets is rather small if the packet size is small in relation to the total number of collector images. For most practical applications, this means that the packet size can be neglected, because the probability that there is at least one double card in a packet is like the birthday paradox

.

Wait until the next new picture

The random variable assigns the number of purchases to each result that have to be made in order to get a new card after the -th different card, different from those previously purchased . This corresponds to the waiting time until the next new image.

Histogram of a simulation (100,000 realizations) of the classic Pokémon problem. You can see the wide spread: In the worst case, 2522 cards had to be bought.

Since a new picture is sure to come with the first purchase, the probability is . When you buy the second picture she is . This random variable is geometrically distributed . In general:

.

The expected value for is:

.

This means in particular that in order to get the last missing image in the collection, one has to buy images on average . But that's just as many pictures as you have to collect at all, and explains why collecting without swapping and buying is so expensive.

Probability distribution

The random variable

,

indicates how many purchases must be made to own all cards. This is the sum of independent geometrically distributed random variables, which has a discrete phase distribution . The probability distribution can be calculated by evaluating the associated Markov chain or recursively.

Expected value

Average number of S images required for a complete series of n images with standard deviation (gray)

Since the individual expected values ​​of exist, the expected value for the new random variable also exists :

.

Then:

is the -th partial sum of the harmonic series . is the factor that indicates how many times more pictures you have to buy compared to the size of the scrapbook. For large n , the approximation is considered by the natural logarithm : .

Variance

Since the waiting times for the next new images are geometrically distributed and independent, the variance also results in the number of images to be collected

.

The figure shows the growing standard deviation with strongly.

Buy more

Most manufacturers offer to buy a certain number of pictures (often fifty), usually at a higher price. In addition, you can specifically buy certain collective pictures, e.g. B. in special collectors' fairs or online shops. The important question is when is it worth buying again?

If pictures are still missing and a certain picture is offered at the price of , the purchase is worthwhile if the average costs are lower than the costs for the next picture in normal collection. If the normal price of a single picture is, it is worth buying if

.

If it is possible to buy more pictures, it is most worthwhile to buy them completely at the end. Otherwise you would have to on average

Buying pictures instead of buying them, and this saving effect can be huge.

The expected number of images to buy is reduced to

This means that the factor is drastically reduced and, as a first approximation, only depends on the ratio of the album size to the number of post-purchase pictures and not directly on the album size. This enables a simplified calculation of the average cost of the scrapbook.

To deceive

Another important extension is swapping duplicates. This traditionally takes place in the circle of friends, but increasingly also in organized file sharing exchanges, e.g. B. on the Internet. In mathematical modeling, fair trading is usually considered, i.e. H. one picture is exchanged for another. It is also assumed that the exchange partners cooperate and collect and exchange fairly together until each collector has filled his album. This corresponds to the case that a collector completes scrapbooks.

No closed solution has yet been found for this case; Using the example of the World Cup album, it was shown by simulation that the effect is considerable. Asymptotically applies to the mean value for a fixed number of exchange partners

For

This shows the effect of swapping: while the first collector has to buy cards on average , each additional collector only needs to buy additional cards on average . With this formula you can roughly estimate how much money you can save by swapping. For example, fair swapping would reduce the costs for two exchange partners to around 60% of the costs of a single collector.

In a youth research project, the factor was examined for the scenario “fair exchange with re- buying” using extensive simulations, whereby the average number of images to be bought in total is all exchange partners. The factor therefore indicates how many times more pictures each collector bought on average than the full album contains pictures. While the factor for a single collector can be approximated well by the natural logarithm of the number of collector's pictures (or double the logarithm for a large number of swap partners) without swapping and buying more, it has been plausibly proven experimentally that with fair swapping with more buying, at fixed replacement percentage, strives when the number of exchanging collectors strives.

Critique of the Classical Assumptions

The assumptions of the classic collector's picture problem are often criticized as unrealistic, and rumors are circulating on the Internet in particular that manufacturers either artificially reduce pictures or do not mix the pictures properly, i.e. H. too many duplicates or some images too rare.

A journalist from Spiegel online examined the acceptance A2 for the scrapbook for the soccer world cup 2014 together with the website stickermanager.com . By looking at the processed database of 8.33 million entries, the statistician Christian Hesse came to the conclusion: "The serious differences cannot be explained by the effects of chance alone" . It was assumed that the online survey is representative. However, this only applied to the German edition; z. For example, with the Swiss edition, which is produced separately and is visually distinguishable, the 2.36 million registered stickers appeared equally frequently.

In an investigation of the manufacturing process, it could be shown that the packaging of the collector's pictures of Topps and Panini creates systematic deviations or patterns that differ significantly from random mixing. For the packaging process of the Panini collective pictures it could be proven that instead of possible different packages (assuming A1 for package size p ) only a maximum of different packages can occur. However, this is not detrimental to the collector.

This means that the assumptions of the classic collective picture problem, especially A1, are not fulfilled in practice.

Optimized collection strategy

So far, there are no analytical results for the combination of swapping and re-buying under the classic assumptions. Simulation has shown that the potential of such optimized strategies is considerable. However, it is assumed that there are no duplicates in a display (i.e. packaging for retail with 50 or 100 packets). However, this is only correct in individual cases. Instead, it has been proven that there are fewer doubles in a display than would be expected with a random mix. In addition, displays are cheaper than buying individual packages.

The recommendation based on the simulation results is the following optimized collection strategy:

  1. Buying a display
  2. Purchase further packages and exchange as many pictures as possible until only the possible post-purchase pictures are missing
  3. Re-purchase (at the end) of the maximum number of images allowed from the manufacturer

The optimized strategy and the associated costs were confirmed with good accuracy in a public experiment by the Göttinger Tageblatt for the scrapbook for the European Football Championship 2016.

If there are D images in a display and d different images appear underneath , then the following applies

However, the classic assumption of collective groups is no longer realistic today, since most of the time people swap spontaneously in file sharing networks or over the Internet. Assuming that the collector T can swap missing pictures after buying the display before buying it, the result is the proportion of swap pictures

The variance V of the number of images to be collected can also be determined using these assumptions

In this way the costs of the collective picture album for the optimized collecting strategy can be estimated under realistic assumptions and the collective picture problem is solved under these assumptions.

The general collective picture problem

In the general collectible picture problem, the probability can be different for each picture . This is the most general case of a discrete probability distribution . The solution can no longer be derived with elementary probability calculation, but only with generating functions . For the mean value results

.

With this you can calculate the average number of cards required for trading card games for a collector without having to buy more .

Examples

cube

Game dice

For educational purposes, the problem of collecting pictures is often introduced and illustrated with the help of a normal dice . In practice this could e.g. B. correspond to collectible figures that are mixed with a product.

You have to throw an average of 14.7 times to get each number at least once, because it counts

Assuming the package size (this corresponds to dice with two dice excluding a double), one obtains

Numbers d. H. you have to roll the dice on average , as two dice (with different numbers) are thrown with one throw. It applies , d. H. one can expect to save at least one image on average through the package.

A simple extension is to play with two distinguishable dice and thus simulate a scrapbook with collective pictures. A corresponding collecting game was developed for use in school lessons, with which various collecting strategies can be tried out. This corresponds to the official DFB scrapbook published by REWE for Euro 2016, whereby the scrapbooks here represent an addition to the product for a purchase value of € 10 each and cannot be bought later. As a single collector you need almost 150 collector's pictures to complete. Exchanging takes on a special meaning, because with a fair exchange each additional collector would only have to collect about 67 pictures, and that would also be the limit value for a large number of exchange partners.

Pokémon

Pokémon trading cards

Suppose there are 150 different designs on the Pokémon cards that you need to buy, and the cards are individually wrapped. If it was a classic scrapbook, an individual collector would have to buy an average of around 839 cards without having to buy more cards.

However, Pokémon is actually a trading card game . H. There are more than 5 different types of cards with different rarities, from normal to extremely rare. The big difference can be seen in two types of cards. B. 30 out of 150 cards occur half as often as in the classic collectible picture game. Then the individual collector would have to buy an average of around 1213 cards. If there are 10 very rare cards that occur ten times less often, then he even needed around 4,372 cards. This means that buying and swapping are even more important than with the classic collector's pictures.

Football collection pictures

The Panini album for the EM 2016 with 680 pictures costs at least € 95.20 with a regular price of 14 cents per picture. This would also be the expected price for the last missing image if it were collected normally. The following applies to the factor: for a single collector without swapping and buying more , this results in an average number of pictures to be bought of around 4828 (i.e. 965.5 packets) and corresponds to around 676 €. But the standard deviation is also large (869 images), i.e. H. in around 95.4 percent of all cases a single collector would have to collect between 3090 and 6566 images. This makes it clear that it is essential to swap and buy in order to reduce costs.

The influence of the packets is slight, with a pack of five it is only about 0.016, i.e. H. with about every 60th package one would have to expect a double picture if the cards were shuffled randomly. This influence is often overestimated or incorrectly assessed. With an exact calculation you have to buy an average of 963.2 packages, i.e. H. the average saving is only 12 images and can therefore be practically neglected, even in view of the large standard deviation.

If you can buy 50 more pictures, you save about 3060 pictures, i. H. then you only need to buy an average of 1768 tickets and 50 additional tickets. This then costs around € 257 at 20 cents per post-purchase image (without taking into account the small packet effect). The standard deviation also drops to about 82 images, i.e. H. in about 95.4 percent of all cases an individual collector would have to collect images between 1604 and 1936.

In the case of an infinitely large collecting community, the factor for each collector would be about 1.88, i.e. H. everyone would have to buy an average of around 1275 images, which corresponds to a price of around € 179. For the 2014 World Cup album, it was estimated that an optimized collection strategy could fill the scrapbook for around € 125. If you take into account the price increase and the additional collector's pictures of the EM album, this strategy results in a price of around € 150. In a public collection experiment, this was confirmed with a cost of € 155. In the borderline case of consequent subsequent purchases with an infinitely large collecting community, this would result in € 98.20 per collector.

The Panini album for the 2018 World Cup hardly differs from the EM album with 682 pictures. The main difference is the higher price of 18 cents per picture, ie the album normally costs at least 122.76 €. All other classic results can be derived by simple conversion. Under realistic assumptions, e.g. For example, d = 450, t = 0.5 and a price of € 75 per display results in average costs of around € 184 for the optimized strategy. The spread is relatively small, 99% of the collectors could fill the album for costs between 176 € and 193 € under these assumptions. If the collector chooses not to swap, he has to pay an average of € 276. If he swaps all doubles, he only needs € 111 on average. Similar results were confirmed by simulation.

Individual evidence

  1. a b c George Pólya: A probability task in customer acquisition . ZAMM - Journal for Applied Mathematics and Mechanics, Volume 10, Issue 1, 1930, pp. 96–97.
  2. NewsTimes: Commemorative Dixie cups can be collectibles . Retrieved April 28, 2016.
  3. Kellog's Memorabilia and Collectibles . Retrieved April 28, 2016.
  4. ^ Andrei Andrejewitsch Markow: Probability calculation . Teubner, Berlin, 1912, pp. 101-108.
  5. ^ A b c d Donald J. Newman, Lawrence Shepp: The double dixie cup problem . American Mathematical Monthly 67 (1960), pp. 58-61.
  6. ^ William Feller: An introduction to probability theory and its applications . Volume 1: Wiley, New York 1950, ISBN 0-471-25708-7 , p. 225.
  7. a b c Norbert Henze : Stochastics for beginners. 10th edition. Springer Spectrum, Wiesbaden 2013, ISBN 978-3-658-03076-6 , pp. 192ff.
  8. a b c d Niklas Braband, Sonja Braband and Malte Braband: The secret of the Fifimatic . Junge Wissenschaft, No. 114, 2017, pp. 26–32.
  9. Wolfgang Stadje: The Collector's problem with Group Drawings . Advances in Applied Probability, Vol. 22, No. 4 (Dec., 1990), pp. 866-882.
  10. Elke Warmuth, Walter Warmuth: Elementary probability calculation: From dealing with chance . Vieweg + Teubner 1998, ISBN 9783519002253 , pp. 128-129.
  11. Solve the problem of collecting pictures systematically with the help of school mathematics (PDF; 358 kB).
  12. Andreas Binzenhöfer, Tobias Hoßfeld: Why Panini football albums are also fun for computer scientists . In: Hans-Georg Weigand (Ed.): Football - a science in itself . Königshausen & Neumann, Würzburg 2006, pp. 181–191.
  13. a b c Niklas Braband and Sonja Braband: Don't worry about collecting pictures anymore! Young Science, Issue No. 110, 2016, pp. 16–24.
  14. ^ A b Philippe Flajolet, Danièle Gardy, Loÿs Thimonier: Birthday paradox, coupon collectors, caching algorithms and self-organizing search. ( GZIP ; 77 kB) In: Discrete Applied Mathematics . Vol. 39, 1992, pp. 207-229.
  15. Holger Dambeck: EM sticker: Math tricks make collecting Panini cheaper . Spiegel Online , May 31, 2012.
  16. ^ A b c Sylvain Sardy and Yvan Velenik: Paninimania: sticker rarity and cost-effective strategy . Swiss Statistical Society, Bulletin no. 66 (2010), pp. 2-6.
  17. ^ Doron Zeilberger : How Many Singles, Doubles, Triples, Etc., Should The Coupon Collector Expect?
  18. Panini World Cup sticker: sample of millions shows massive unequal distribution . Spiegel online , June 19, 2014, accessed June 19, 2014.
  19. a b c Andreas Fuhrmann: Fuhrmanns EM sticker blog . Retrieved August 27, 2016
  20. a b c d Niklas Braband, Sonja Braband, Malte Braband: A Useful Solution of the Coupon Collector's Problem . February 26, 2017, arxiv : 1702.08874 [math] .
  21. Elke Warmuth, Stefan Lange: Make mathematics differently - an initiative for teacher training . Retrieved April 30, 2016.
  22. ^ Holger Dambeck: World Cup album. This is how expensive the craze for collecting pictures is . Spiegel Online , June 30, 2011.
  23. ^ Frank Förster: Every (two) years again: football pictures. In: Hans Humenberger and Martin Bracke (eds.): New materials for a reality-based mathematics lesson 3: ISTRON series (Reality references in mathematics lessons ). Springer, Heidelberg 2016, pp. 58–67.
  24. Wales Online: A maths genius worked out exactly how much it will cost to fill your Panini Euro 2016 album . Retrieved May 4, 2016.
  25. Cross Validated: Expectation of collecting stickers in groups . Retrieved May 4, 2016.
  26. The basics of Panini stickers. In: dw.com. Deutsche Welle , May 12, 2016, accessed May 14, 2016 .
  27. Press factsheet. Panini Verlags GmbH, March 19, 2018, accessed on March 27, 2018 (German).
  28. Laurie Belcher: Football Crazy, Probability Mad: How much does it really cost to complete the World Cup 2018 Sticker album? In: Goodscienceblog. March 12, 2018, accessed March 30, 2018 .
This version was added to the list of articles worth reading on June 14, 2016 .