Aggregate function

from Wikipedia, the free encyclopedia

An aggregate function is a function that summarizes certain properties of data .

separations

The functions used in an aggregation can be divided into

  • Distributive functions : sum (SUM), number (COUNT), maximum (MAX), minimum (MIN), top N ...

Formally, a distributive function can be described as follows: The data to be aggregated is divided into partitions and it is the attribute that is to be aggregated. Then there is a function that operates on a set of the same results as and can be represented as:

.

This is the entirety of all attribute values ​​in the database and are those attribute values ​​that are in the partition .

For example, for the function Count (COUNT) :

So you first calculate the cardinality of the individual partitions and then add up the individual results.

  • Algebraic functions : mean (AVG), truncated mean (truncated AVG), standard deviation ...

Algebraically, functions are defined in the same way as before, but you have greater freedom here with regard to the function that works on the individual partitions. If you previously had to use the same function that is used for the entire database, you can select a different function here . An algebraic aggregation function thus has the following representation:

.

Here is a function that operates on a set of result tuples of the function . and are defined as above.

For example, for the function Average (AVG) :

The expressions and denote the first or second element of a 2-tuple.

To put it clearly, a tuple is calculated for each partition from the total sum and the number of tuples in this partition. Then the total average is simply calculated from the total divided by the total number.

  • Holistic functions : median , rank , percentile, most common value ...

Holistic functions are aggregation functions for which neither of the previous two definitions apply.

calculation

Distributive and algebraic aggregate functions can be calculated from one or a fixed set of facts from lower-level classification levels, while with holistic aggregate functions, the population of all facts must be used. Distributive and algebraic functions are therefore more "good-natured"; H. they can be parallelized or executed step-by-step (less storage space required!), whereas this is not possible with holistic functions.

Individual evidence

  1. Evaluate and aggregate data with aggregate functions and GROUP BY. August 23, 2016. Retrieved October 3, 2016 .
  2. Aggregate functions. In: SQL = Structured Query Language (SEQUEL) - Basics of Database Systems I. Wortschatz Lexikon Uni Leipzig, pp. VII-19 ff , accessed on October 3, 2016 .
  3. Prof. Dr.-Ing. Thomas Wiedemann: Querying databases with the Select command - 5. (No longer available online.) In: Basics of Computer Science - Database Technology - SQL and Market Overview. Dresden University of Applied Sciences, Department of Computer Science / Mathematics, pp. 6/22 , archived from the original on October 3, 2016 ; accessed on October 3, 2016 . Info: The archive link was inserted automatically and has not yet been checked. Please check the original and archive link according to the instructions and then remove this notice. @1@ 2Template: Webachiv / IABot / iasp2.informatik.htw-dresden.de