Cardinality (databases)

from Wikipedia, the free encyclopedia

The cardinality of a set is the number of elements in that set. The term is often used in the context of relational databases , as a database table can also be understood as a set of rows. Columns and rows of a database table can each be understood as a set of attribute values. The cardinality of a database table (relation) is the number of rows in this table. The cardinality of a column ( attribute ) in a database table is the number of different attribute values ​​in this column. One also speaks of "low cardinality" when the ratio of the number of different values ​​that occur in a column to the total number of rows in the table is low. For this reason, this relationship is sometimes referred to as the cardinality of a column, in deviation from the actual meaning of the term cardinality .

See also those of the above. Definitions deviating meaning of cardinality of a relationship type in database modeling .

Column cardinality

Categories

The lower the cardinality of the column in a database table, the more duplicates or NULL values ​​there are in this column. There are three categories, the boundaries of which are fluid:

  • High Cardinality: Have columns with very few duplicates or with unique values. Examples of high cardinality columns are those that store user names or user IDs.
  • Low cardinality: Have columns whose values ​​have a large number of duplicates, for example those with status flags or classifications in a few categories, such as information on gender .
  • Normal Cardinality: Have columns that don't fall into either of the above two categories.

Cardinality and selectivity

The query optimizer of relational databases must estimate the selectivity of database queries in order to be able to deduce the size of the intermediate results and thus to be able to find an optimal evaluation plan. Cardinalities play an important role here.

The following is an example of this; Let there be a table KUNDENwith the column INHALTin which there are 100 entries; It is also assumed that there is a uniform distribution for the attribute values . The following query is now made using the query language SQL :

select *
from   KUNDEN
where  INHALT = X;

The following applies to the selectivity sel of this query when the predicate is INHALT = X to be TRUEevaluated: where c corresponds to the number of different attribute values ​​in the column (every c th row qualifies for the result set). The following also applies: INHALT

  • If the cardinality of the column is INHALThigh, the selectivity of the query is very pronounced. For example, if in column INHALTall attribute values are different ( key property ), so there are no duplicates, then , sel thus is a small value. In this case, exactly one line qualifies for the result set if the predicate is to be evaluated.INHALT = XTRUE
  • If the cardinality of the column is INHALTlow, the selectivity of the query is rather weak. For example, if only two different values INHALTare stored in the column , then is . In this case, exactly 50 rows qualify for the result set if the predicate is to be evaluated.INHALT = XTRUE

This example considers special cases; the key property of a column or the equal distribution of the attribute values ​​of a column cannot generally be assumed. Therefore, the query optimizer must use statistical values ​​to estimate the cardinalities in order to be able to assign a selectivity factor to a query.

Cardinality and indices

B-tree indexes are good for indexing columns with high cardinality ; bitmap indexes are suitable for indexing columns with low cardinality . However, the selectivity of a query is always decisive for the efficiency of the index usage - bitmap indices are more suitable than B-tree indices for data with low cardinality, but not particularly efficient for queries with weak selectivity, because queries with weak selectivity can generally not run efficiently using indexes.

Individual evidence

  1. ^ Alfons Kemper, André Eickler: Database systems. An introduction . Oldenbourg, Munich 2004, ISBN 3-486-27392-2 , pp. 254 .
  2. Special-Purpose Rowsets. Microsoft, accessed January 22, 2016 (Microsoft's definition of column cardinality).
  3. Definition of the column cardinality in the online documentation of the Oracle DB. Oracle Corporation, March 6, 2010, accessed January 22, 2016 .
  4. ^ Alfons Kemper, André Eickler: Database systems. An introduction . Oldenbourg, Munich 2004, ISBN 3-486-27392-2 , pp. 255 .
  5. Explanation of bitmap indexes in the online documentation of the Oracle DB. Oracle Corporation, March 6, 2010, accessed January 22, 2016 .