German tank problem

from Wikipedia, the free encyclopedia
During the Second World War the production of German tanks, such as. B. the Panther tank , accurately estimated by the Allied intelligence services using statistical methods

The German tank problem ( English for problem of the German tanks ) or taxi problem consists in the probability theory of estimating the maximum of a discrete uniform distribution by a sample drawing without replacement .

The problem is named after its use by Allied forces during World War II to estimate the monthly production rate of German tanks, taking advantage of German manufacturing practices. In the process, serial numbers in ascending order were assigned to various armored components (chassis, gearbox, engine, wheels), which then fell into the hands of the Allied forces to a small extent. Broadly , the problem can also be applied to other randomly observed serial numbers (such as taxi numbers or products sold).

As a mathematical problem, the serial numbers are modeled as an uninterrupted sequence of whole numbers, starting with the serial number 1; German manufacturing practice and labeling conventions in the war environment were more complex and are not dealt with here.

The problem can be addressed using either frequentist inference or Bayesian inference , with different results. Estimating the population maximum based on a single sample gives different results, whereas estimating based on multiple samples is a practical estimation question whose answer is simple (especially in the frequentist variant) but not obvious (especially in Bayesian) Variant) is.

Assumptions

It is believed that the opponent made a number of tanks marked with consecutive integers starting with the serial number 1. Regardless of the date of manufacture of the tank, the operating history or the serial number it is wearing, the serial numbers determined are up to the point in time evenly distributed throughout the analysis .

calculation

Function graphs of the estimated population size N , for the number of samples k and the largest sample serial number m , using frequentistic (dashed lines) and Bayesian analysis (solid line shows the expected value, and the shading shows the possible range within one standard deviation)

The formula for estimating the total number of tanks based on the number of samples and the largest observed serial number under the frequentist approach is

while Bayesian analysis (primarily) provides a probability distribution for the number of tanks

from which the expected value and the standard deviation for the number of tanks can be determined according to the following formula:

example

Suppose tanks with serial numbers 19, 40, 42 and 60 are captured. The maximum serial number observed is . The unknown total number of tanks is denoted by.

The frequentist formula delivers in this case

,

while with Bayesian analysis a distribution can be determined that provides the following estimated value:

.

This distribution has a positive skew , which is related to the fact that there are at least 60 tanks.

Historical problem

Loading of new "Panther" armored vehicles for transport to the front (1943)

As the war progressed, the Western Allies made intensive efforts to determine the extent of German manufacturing and approached it in two ways: conventional intelligence and statistical estimation. In many cases, statistical analysis significantly outperformed conventional communications technology. In some cases, conventional communications technology has been used in conjunction with statistical methods, as was the case with estimating the production of Panther tanks just before D-Day .

The Allied command structure had suspected that the Panzerkampfwagen V Panther seen in Italy with their fast, elongated 7.5 cm KwK 42 / L70 cannons were unusually heavy tanks and could only be seen in small numbers in northern France, similar to the Tiger I. was seen in Tunisia. The US Army was confident that the Sherman tank would continue to perform well, as it did against the Panzerkampfwagen III and Panzerkampfwagen IV in North Africa and Sicily. Just before D-Day, there were rumors that a large number of Panther V tanks were in service.

To verify this information, the Allies tried to estimate the number of tanks produced. To do this, they used the serial numbers of captured or destroyed tanks. Gear numbers were used as main numbers, as these fell into two uninterrupted series of numbers. Chassis and engine numbers were also used, but they were more complicated to use. Various other components were used to verify the analysis. Similar analyzes were performed on wheels which were observed to be numbered consecutively (i.e. 1, 2, 3, ...,  N ). (The lower bound was unknown, but to simplify the discussion this detail is usually omitted, taking the lower bound as 1.)

The analysis of the armored wheels gave an estimate of the number of molds used. A discussion with British wheel manufacturers then estimated the number of wheels that could be made from these many shapes, which resulted in the number of tanks produced each month. The analysis of the wheels of two tanks (32 wheels, 64 wheels in total) resulted in an estimate of 270 tanks that were produced in February 1944, considerably more than previously assumed.

Post-war German records showed production for the month of February 1944 was 276. The statistical approach was found to be far more accurate than traditional intelligence methods, and the term “ German tank problem ” was accepted as a name for this type of statistical analysis.

This serial number analysis was not only used to estimate production. It also served to understand the German production structure more generally, including the number of factories, the relative importance of factories, the length of the supply chain (based on the delay between production and use), changes in production, and the use of resources such as Rubber.

Specific dates

According to conventional estimates by the Allied secret service, the Germans produced around 1,400 tanks a month between June 1940 and September 1942. Applying the formula discussed here to the serial numbers of the captured tanks, the number was calculated at 246 per month. After the end of the war, recorded German production numbers from Albert Speer's ministry showed that the actual number was 245.

The estimates for some months are given as follows:

month Statistical estimate Secret Service Estimate German records
June 1940 169 1000 122
June 1941 244 1550 271
August 1942 327 1550 342

Similar analyzes

The production of V2 rockets was accurately estimated using statistical methods

Similar serial number analyzes were used for other military equipment during World War II, most successfully for the V2 rocket .

Factory markings on Soviet military equipment were analyzed during the Korean War and by German intelligence during World War II.

In the 1980s, some Americans gained access to the Israeli Merkava tank production line. The production numbers were secret, but the tanks had serial numbers that allowed production to be estimated.

The formula was used in non-military contexts, e.g. B. to estimate the number of Commodore 64 computers built, whereby the result (12.5 million) agrees with the lower estimates. The production numbers of iPhones and other cell phones were also estimated using the IMEI number using this method.

Countermeasures

In order to prevent the analysis of serial numbers, serial numbers can be avoided or useful additional information can be reduced. Alternatively, serial numbers that resist cryptanalysis can be used, most effectively by randomly drawing numbers without replacing them from a list that is much larger than the number of items produced (see one-time pad ), or by generating random numbers and their comparison with the list of numbers already assigned; Collisions are likely unless the number of possible digits is more than twice the number of digits in the number of objects produced (where the serial number can be in any base); see birthday paradox . A cryptographically secure pseudo-random number generator can be used for this. All of these methods require a lookup table (or breaking the cipher) to determine the production sequence from the serial number, which makes it difficult to use the serial numbers: For example, a range of serial numbers cannot be retrieved, but must be looked up individually or a list must be made to be created.

Alternatively, consecutive serial numbers can be encrypted with a simple substitution cipher, which enables easy decoding, but can also be easily broken by a known plain text attack : even if the plaintext is started from any point, it has a pattern (i.e. the Numbers are consecutive). An example is Ken Follett's novel " The Second Memory ", in which the encryption of the serial numbers of the Jupiter-C rocket is described:

H U N T S. V I. L. E. X
1 2 3 4th 5 6th 7th 8th 9 0

The code word here is Huntsville (with no repeated letters) to get a 10 letter key. So rocket number 13 was "HN" and rocket number 24 was "UT".

Strong encryption of serial numbers without enlarging them can be achieved with format-preserving encryption . Instead of storing a really random permutation on the set of all possible serial numbers in a large table, such algorithms derive a pseudo-random permutation from a secret key. Security can then be defined as the pseudo-random permutation, which is indistinguishable from a truly random permutation for an attacker who does not know the key.

See also

Web links

literature

Individual evidence

  1. a b taxi problem. (PDF) Heidelberg University, accessed on November 26, 2019 .
  2. a b How many Commodore 64 computers were really sold? . February 1, 2011. Retrieved July 6, 2014.
  3. a b Holger Dambeck: The Allies' computing trick: How serial numbers revealed the Nazi industry . November 22, 2010. Retrieved February 6, 2018.
  4. ^ Armored Ground Forces policy statement. Chief of staff. November 1943.
  5. a b Gavyn Davies does the maths - How a statistical formula won the war . July 20, 2006. Retrieved July 6, 2014.
  6. Robert Matthews: Data sleuths go to war, in sidebar feature "Hidden truths" . May 23, 1998. Archived from the original on April 18, 2001.
  7. ^ Bob Carruthers: Panther V in Combat . Coda Books Ltd, March 1, 2012, ISBN 978-1-908538-15-4 , p. 94.
  8. ^ Ruggles, Brodie, pp. 82-83.
  9. ^ Ruggles, Brodie, 89.
  10. ^ Ruggles, Brodie, pp. 90-92.
  11. Volz
  12. ^ Johnson