Descriptive statistics

from Wikipedia, the free encyclopedia

The aim of descriptive (also: descriptive ) statistics is to clearly display and organize empirical data using tables, key figures (also: measures or parameters) and graphics. This is particularly useful when there is a large amount of data, as it cannot be easily surveyed.

Differentiation from other sub-areas of statistics

In addition to descriptive statistics, statistics also include

The aim of exploratory statistics is to find previously unknown structures and relationships in the data and thereby generate new hypotheses. These hypotheses based on sample data can then be examined for their general validity within the framework of the conclusive statistics using methods of probability theory.

From the inductive or inferential statistics (inferential) the descriptive statistics differs in that it no statements on a going beyond the cases studied population does and does not allow testing hypotheses. Descriptive statistics do not use stochastic models (the basis of inductive statistics), so that the statements made are not secured by error probabilities.

The methods of descriptive statistics can therefore be applied to any type of sample, while the methods of inductive statistics have to meet a number of requirements, including sampling. The methods of exploratory statistics are mostly identical to those of descriptive statistics; rather, the aim of the analysis is to determine what distinguishes the two areas.

Methods of descriptive statistics

There are essentially three methods to display the data:

Tables
In tables, data is displayed in a matrix with rows and columns if the data structure allows this. One line usually corresponds to an observation and one column to a data variable. The disadvantage of a table is that even with small data sets, the structure of the data is difficult to grasp. Sometimes rearranging columns or rows can help.
Diagrams
In diagrams and graphics, the data or certain aspects of the same are graphically presented clearly. However, this usually requires a summary of the data, so that information from the data is lost. For example, in a scatter plot of two variables, the relationship between the data can be clearly seen, but the number of observations with the same numerical values ​​is lost ( overplotting ).
parameter
In parameters (also measures or key figures ), an aspect of the data is reduced to a single number (aggregated). In order to describe the data, a large number of different parameters are then calculated to compensate for the loss of information due to the strong summary.
table diagram parameter
Aggregation of data low medium high
Clarity low medium high
Information content high medium low

Parameters (statistical parameters)

Three types of metrics are mainly of interest:

Location dimensions
as a central tendency of a frequency distribution. The skewness and excess of a frequency distribution can be determined from the position of the various values ​​for the central tendency to one another .
Measures of dispersion
for the variability (scatter or dispersion) of a frequency distribution and
Measures of connection
for the connection (also: correlation) of two variables.

The choice of suitable parameters depends on the scale or measurement level of the data and on the robustness of the parameter.

Examples

  • Representation of the average temperature and the temperature fluctuations in a region by means of mean value and dispersion; Information on how often certain temperatures are exceeded ( quantile ); Comparison by region and / or time period using graphs or tables.
  • Comparison of the final grades of two school years in a subject with the respective mean values ​​and scatter.
  • There are five red and four blue balls in an urn . Three balls are drawn from this urn without replacing. If you define the random variable as the number of red balls among the three drawn, it is hypergeometrically distributed with the number of red balls in the urn, the total number of balls in the urn and the number of attempts. All information about the distribution of can be obtained here.

See also

literature

Web links

Individual evidence

  1. ^ Jürgen Bortz: Statistics for human and social scientists . 6th edition, Springer, Heidelberg 2005, ISBN 354021271X , p. 1.