This is a collection of formulas for the mathematical sub-area of stochastics including probability theory , combinatorics , random variables and distributions as well as statistics .
notation
In stochastics, in addition to the usual mathematical notation and mathematical symbols, there are the following frequently used conventions:
-
Random variables are in uppercase letters written , etc.
-
Realizations of a random variable are written with the appropriate lower case letters , e.g. As for the observations in a sample : .
- Lowercase letters are used to denote probability functions and probability densities , e.g. B. .
- Capital letters are used to designate distribution functions , e.g. B. .
- In particular, the probability density of the standard normal distribution , the designation and is used for the distribution function .
-
Greek letters (e.g. ) are used to denote unknown parameters (population parameters).
- An estimator is often referred to with a circumflex above the appropriate symbol, e.g. B. (spoken: Theta roof ).
- The arithmetic mean is denoted by (spoken: across ).
In the following, a probability space is always given. In it the result space is an arbitrary nonempty set , a σ-algebra of subsets of which contains, and a probability measure on
Basics
Axioms:
Every event is assigned a probability such that:
-
,
-
,
- holds for pairwise disjoint events
Calculation rules: The axioms result in:
- For true , especially
- The following applies to the counter-event
Laplace experiments
Conditional probability
Bayes' theorem :
Independence :
- Two events are independent
Faculty : Number of possibilities when pulling allballs out of an urn (without replacing):
in which
|
without repetition (of n elements)
|
with repetition (of r + s +… + t = n elements, from which r , s … t are indistinguishable)
|
permutation
|
|
|
Binomial coefficient " n over k "
Number of options when pulling of balls from an urn containing balls:
|
without repetition (without replacing) (see hypergeometric distribution )
|
with repetition (with replacement) (see binomial distribution )
|
variation
|
|
|
combination
|
|
|
Discrete random variables
A function is called a probability function of a discrete random variable if the following properties are met:
- For all true
The following then applies to the associated random variable:
A random variable and its distribution are called discrete if the function has property (2). One calls the probability function of .
Constant random variables
A function is called the density (function) of a continuous random variable if the following properties are met:
- For all true
The following then applies to a continuous random variable:
A random variable and its distribution are called continuous if there is a suitable density function with this property. The function is called the density (function) of .
The following applies to the probability
-
for all
Expected value and variance are given by
Expected value, variance, covariance, correlation
For the expected value , the variance , the covariance and the correlation :
-
, general
- The following applies to independent random variables :
- The following applies to independent random variables :
Chebyshev inequality :
A -step Bernoulli experiment is given (i.e. the same experiment, independent of each other, with only two possible outcomes and constant probabilities) with the probability of success and the probability of failure . The probability distribution of the random variable : number of successes is called the binomial distribution .
The probability of success is calculated using the formula:
Expected value :
Variance :
Standard deviation :
σ-rules
(Probabilities of neighborhoods of the expected value in binomial distributions) The following assignments apply between the radius of a neighborhood around the expected value and the associated probability of the neighborhood (if ):
Radius of the surroundings
|
Probability of the environment
|
1σ
|
0.68
|
2σ
|
0.955
|
3σ
|
0.997
|
Probability of the environment
|
Radius of the surroundings
|
0.90
|
1.64σ
|
0.95
|
1.96σ
|
0.99
|
2.58σ
|
Standardize a distribution
If the random variable has a distribution with expected value and standard deviation , then the standardized variable is defined by
The standardized variable has the expected value 0 and the standard deviation 1.
Poisson approximation
A binomial distribution with a large sample size ≥ 100 and a low probability of success is given . With the help of you can then approximately calculate the probability of success:
The relationships can be summarized as follows:
Poisson distribution
Applies to the distribution of a random variable
Approximation formulas from Moivre and Laplace
Let be a binomially distributed random variable with (usable approximation better ). The probability for exact and at most successes can be calculated approximately by:
Standard normal distribution
The density (function) (also known as the bell curve ) of the standard normal distribution is defined by:
and the distribution function by:
Approximation formulas for a discrete distribution using the continuity correction:
In a population of scope , two characteristic values of scope or are represented. A sample of the size is taken. Then the distribution of the random variable is called : Number of copies of the 1st characteristic expression in the sample of a hypergeometric distribution .
The probability that there are exactly copies of the 1st characteristic value in the sample is:
= Number of elements, = number of positive elements, = number of draws, = number of successes.
Let the proportion with which the 1st characteristic expression occurs in the totality then applies:
A Bernoulli experiment with a probability of success is given . The distribution of the random variable : the number of stages up to the first success is called the geometric distribution . The following applies:
-
(Success exactly on the -th attempt)
-
( Failures in a row or the first success only comes after the -th attempt)
-
(Success at the latest at the -th attempt or at least one success occurs by the -th attempt)
The expected value is
Further
The innumerable other special distributions cannot all be listed here; reference is made to the list of univariate probability distributions .
Approximations of distributions
Under certain approximation conditions, distributions can also be approximated through one another in order to simplify calculations. Depending on the textbook, the approximation conditions can be slightly different.
|
To
|
From
|
|
|
|
Discrete distributions
|
Binomial distribution
|
- |
,
|
,
|
Hypergeometric distribution
|
|
, ,
|
|
Poisson distribution
|
|
- |
,
|
Continuous distributions
|
Chi-square distribution
|
|
|
|
Student's t-distribution
|
|
|
|
Normal distribution
|
|
|
-
|
In the transition from a discrete distribution to a continuous distribution, a continuity correction (if or ) also comes into consideration and in particular .
Critical values
The -level is the value of a probability distribution for which: . There is a standard notation for some commonly used distributions:
-
or for the standard normal distribution
-
or for the t-distribution with degrees of freedom
-
or for the chi-square distribution with degrees of freedom
-
or for the F-distribution with and degrees of freedom
Location dimensions
Arithmetic mean:
Median
mode
Measures of dispersion
empirical variance :
empirical standard deviation :
Measures of connection
Empirical covariance :
Empirical correlation coefficient :
Equation of the regression line of a linear single regression : with
-
,
where and mean the arithmetic mean.
Mean values
Average |
Two numbers |
General
|
mode
|
Expression with the highest frequency
|
Median (median)
|
If sorted:
|
Arithmetic mean
|
|
|
Geometric mean
|
|
|
Harmonious mean
|
|
|
Square mean
|
|
|
parameter
In general, in statistics, unknown population or model parameters are identified with Greek letters (e.g. ).
- The arithmetic mean in the population: .
- The variance in the population: .
- The share value of a dichotomous variable in the population: .
- The intercept and the slope in the simple linear regression model .
An estimate function for an unknown parameter is often indicated by a capital letter in the parameter name from the descriptive statistics. The estimator results from the sample variables .
Individual evidence
-
^ Yates, F. (1934). Contingency Tables Involving Small Numbers and the χ2 Test . Supplement to the Journal of the Royal Statistical Society 1 (2): 217-235. JSTOR Archive for the journal
Web links