Mathematical Statistics

As mathematical statistics refers to the branch of statistics that analyzes the methods and procedures of statistics in mathematical or with their help only justified. Together with probability theory , mathematical statistics form the sub-area of ​​mathematics known as stochastics . The terms inductive statistics , assessing statistics and inferential statistics ( closing statistics ) are mostly used synonymously, which characterize the part of the statistics that is complementary to the descriptive statistics .

The mathematical basis of mathematical statistics is probability theory.

The subject of statistics are populations , the members of which all have a certain characteristic . We are looking for statements about how often this characteristic assumes its possible values ​​within the population. Often the statements are limited to derived quantities such as the average of the characteristic values that the members of the population have.

Age pyramid: distribution of gender and age in the German population (2010)

One example is the age distribution, which is often shown graphically as an age pyramid , whereby the population can be the German population, for example. Since a precise determination of the age distribution of Germans requires an elaborate full survey such as a census , one is looking for methods with which largely reliable statements can be made based on partial surveys. As in the example of the Politbarometer , only the members of a randomly selected subset of the population, a so-called sample , are examined for the characteristic of interest.

methodology

If the age distribution in the population were known, probabilities for the age distributions observable within samples could be calculated using formulas from probability theory , which are subject to random fluctuations due to the random selection of the samples. In mathematical statistics, such calculations are used in order to be able to draw the other way round from the sample result to the population: On the basis of the characteristic values ​​specifically observed for a sample, those frequency distributions within the population are characterized with which the observation result can be plausibly explained. Theoretical investigations focus not only on the conclusions themselves, but also on estimates of how numerically accurate and how certain such forecasts are.

The frequency distributions of interest to a user are only indirectly the subject of the methods of mathematical statistics. Instead, these methods refer to random variables . In particular, those random variables are considered whose probability distribution corresponds to the relative frequencies of the feature values . Especially for the example of age distribution given, a realized value of the random variable is equal to the age of a randomly selected German. In this way, the observed values ​​determined from a sample can be understood as so-called realizations of independently and identically distributed random variables . In this case, the prior knowledge is represented by a family of probability distributions or by a corresponding family of probability measures . One speaks of a distribution assumption . This can contain statements about possible feature values, for example with regard to their integers, as well as about the type of distribution, for example "the values ​​are normally distributed ".

The central area of ​​mathematical statistics is estimation theory , within which suitable estimation methods are developed. The methodology is such that, based on the distribution assumption, certain classes of estimation functions are examined and compared with regard to various quality criteria (such as sufficiency or efficiency ). Such an estimation function can be both a single-valued approximation of a desired parameter of the population as well as a range estimate in the form of a so-called confidence interval . Concrete assumptions about the population can be checked by means of suitable statistical tests . Based on a hypothesis on the basis of the sample result, a 0-1 decision about the rejection or retention of the hypothesis is brought about.

Mathematical statistics also include the theories of statistical selection procedures as well as optimal experimental and survey planning .

Statistical models

A complete formalization of statistical questions on the basis of mathematical objects is achieved with the concept of the statistical model , often also referred to as statistical space . In contrast to the previously described, more application-oriented scenario, there is no need to define a population:

The possible sampling results are combined into a set , the sampling area. The events observable therein are formally characterized by a σ-algebra defined for the sample space . The distribution assumption, that is to say the probability distributions in question, correspond to a family of probability measures . A statistical model is formally a triple . If a real parameter vector is , then, one speaks of a parametric model with parameter space . The case of a real parameter is called a one -parameter model . ${\ displaystyle x}$${\ displaystyle {\ mathcal {X}}}$${\ displaystyle {\ mathcal {X}}}$ ${\ displaystyle {\ mathcal {F}}}$${\ displaystyle (P _ {\ vartheta}) _ {\ vartheta \ in \ Theta}}$${\ displaystyle ({\ mathcal {X}}, {\ mathcal {F}})}$${\ displaystyle ({\ mathcal {X}}, {\ mathcal {F}}, P _ {\ vartheta} \ colon \ vartheta \ in \ Theta)}$${\ displaystyle \ vartheta}$${\ displaystyle \ Theta \ subseteq \ mathbb {R} ^ {d}}$ ${\ displaystyle \ Theta}$${\ displaystyle d = 1}$

A measurable function of in another measuring room is called a sample function or statistics . An estimator function or, for short, an estimator for a characteristic of the parameter is a sample function . ${\ displaystyle S}$${\ displaystyle ({\ mathcal {X}}, {\ mathcal {F}})}$ ${\ displaystyle ({\ mathcal {S}}, \ Sigma)}$${\ displaystyle \ tau (\ vartheta) \ in {\ mathcal {S}}}$${\ displaystyle T \ colon {\ mathcal {X}} \ to {\ mathcal {S}}}$