Pivot table

from Wikipedia, the free encyclopedia

Pivot tables ( English pivot [ pɪvət ] from French pivot , in German often pronounced in French [ pi.vo ]: pivot, pivot point, key figure, axis; also pivot table reports or pivot table reports) are a special type of tables that offer the possibility of displaying and evaluating data in a table in various ways without having to change the output data or table (s).

The benefit lies in reducing large amounts of data to manageable sizes.

history

The name Pivot was first used in 1991 in the USA by Brio Technology in the DataPivot product . Although PivotTable is a commonly used term, Microsoft registered PivotTable as a trademark in the United States in 1994 (discontinued or expired in 2020).

Data sources

Data sources for pivot tables can include lists and databases from spreadsheets , but also tables from external database applications, e.g. B. within an Office package . Modern spreadsheets have a special function for this purpose, with which a number of similar data records can be combined in groups and which provides a pivot table as the result. In the common Office packages, an assistant (pivot table assistant or data pilot) helps with the creation of pivot tables.

structure

A pivot table consists of several areas, each of which can contain any fields (column headings) of the original data. Typically, when creating the pivot table, the required fields are selected from a list and dragged into the desired area with the mouse.

structure

The division of the fields into row and column fields determines the structure of the pivot table. If this division or order is changed, more or less data is not displayed, but rather is shown in a different form. Moving a row or column field to another location is also known as pivoting that field. For example, swapping row and column fields (while maintaining their order) results in a transposed table.

By double-clicking on a cell in a pivot table, groups are shown and hidden (" drill-down " and "roll-up") in order to show more or less details. If the cell belongs to a data field, all individual data records from the original data that are included in the calculation of this cell are displayed on a separate worksheet after double-clicking.

Example of a pivot table

The user can freely choose which fields are grouped and which fields are output and can also interactively change them after the pivot table has been created, without having to repeat the entire creation process. When grouping by two or more fields, the pivot table can be displayed as a crosstab , which increases its clarity.

Application example "Grouping by field X" means that all data records from the original table, which have exactly the same content in column X, are combined into a single data record in the result table (aggregation into categories ). The results table therefore contains all the categories that appear in the original table only once. If all the entries in the original table are different, the aggregated table has just as many entries, but if all entries are identical, it has only one entry.

If there are predominantly different values ​​in a column of the data to be analyzed (e.g. measured values ​​with many significant digits), grouping according to this column using a pivot table does not make sense. In this case, you have to define suitable classes (e.g. intervals ) yourself beforehand and add them to the output data as a new column. An exception are columns with dates, which can be grouped in various programs directly in the pivot table according to days, months, quarters, etc.

Field types

A distinction is made between the following areas:

Line fields
A field dragged here means that the output data is grouped according to this field. A line is created in the pivot table for each different field content that occurs in the output data. If two fields are selected as row fields, then all groups belonging to the second field within each group belonging to the first field are displayed in the pivot table. If there are more than two line fields, this division continues for all fields. The order of the line fields is relevant and is sensibly chosen by the user so that the result is as clear as possible.
Column fields
Effect a grouping in the same way as line fields; however, the various contents of a column field are not shown in rows, but in columns. If the user uses row and column fields at the same time, he has created a crosstab .
Data fields
Determine what is shown at the intersection of rows and columns. For each data field, an aggregation function (such as "Sum" or "Number of data records") ensures that exactly one value is entered in each cell of the pivot table, even if there are many data records that are members of the are groups belonging to the cell. If several data fields are selected, the user can decide whether the different data fields should be displayed next to each other in columns or one below the other in rows (exactly as for column and row fields). The same field can also be used multiple times as a data field (meaningfully with a different aggregation function).
Side fields
Allow filtering, i. H. a restriction of the pivot table to those data records of the initial set that have certain values ​​in the selected page fields.

Databases

The principle of pivoting is also the basis for the views in MOLAP databases. These non-relational databases are multidimensional. For data analysis, two dimensions are in the view level, the other dimensions are aggregated in the background as in the pivot view.

Comparison with SQL

Pivot tables make it possible to carry out evaluations in a spreadsheet, which can be achieved in the database query language SQL with a group-by clause. With the usual functions provided by spreadsheets for calculating cell contents, such groupings are only possible to a very limited extent or not at all.

The different types of fields in a pivot table correspond to certain parts of an SQL query:

  • Row and column fields in pivot tables correspond to fields in the group-by clause.
  • Data fields correspond to expressions in the Select part of the SQL command. These expressions necessarily contain aggregation functions such as B. the sum function.
  • Page fields correspond to simple conditions in the Having or Where clause of the SQL command.

Not all of the capabilities offered by SQL or the spreadsheet itself are supported by pivot tables. For example, only a few predefined aggregation functions are available in Microsoft Excel ; Median and 95% quantile are missing for example.

Database query programs are generally more flexible than pivot tables; H. more questions can be answered with SQL. However, the query programs typically offer fewer options for preparing the results in an appealing way and are less convenient to use. Many programs cannot output data sets as a crosstab and special formatting is only possible by hand (i.e. without wizards and controls ) or not at all.

particularities

For performance reasons, pivot tables are generally not updated automatically each time the output data is changed, as is usual with other functions in spreadsheets. The user has to start the update manually if necessary.

The evaluation takes place according to data types. B. the number 3/12 , the text 3/12 and the date 3/12 are evaluated separately. If the database is not maintained consistently, this can lead to errors.

With some Office packages, pivot tables can be displayed graphically as an interactive pivot chart (German: Pivot diagram).

Web links

Individual evidence

  1. Pivot tables in Excel (PDF; 752 kB), page 1: "Introduction" at cc-center.de from April 2003, accessed on June 23, 2014.
  2. Cross tab analysis and reporting method . December 24, 1996 ( google.com [accessed March 21, 2020]).
  3. ^ Trademark Status & Document Retrieval. Retrieved June 10, 2020 .