Select (SQL): Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Fjerdingen (talk | contribs)
m Some minor adjustments, mostly towards ISO SQL terminology.
 
(86 intermediate revisions by 48 users not shown)
Line 1: Line 1:
The [[SQL]] '''SELECT''' statement returns a [[result set]] of records from one or more [[Table (database)|tables]].<ref>{{cite web |url=http://msdn2.microsoft.com/en-us/library/ms189499.aspx |title=Transact-SQL Syntax Conventions |accessdate= |author=Microsoft }}</ref><ref>{{cite web |url=http://dev.mysql.com/doc/refman/5.0/en/select.html|title=SQL SELECT Syntax |accessdate= |author=MySQL}}</ref>
{{Short description|SQL statement that returns a result set of rows from one or more tables}}
The [[SQL]] '''SELECT''' statement returns a [[result set]] of rows, from one or more [[Table (database)|tables]].<ref>{{cite web |url=http://msdn2.microsoft.com/en-us/library/ms189499.aspx |title=Transact-SQL Syntax Conventions |author=Microsoft |date=23 May 2023 }}</ref><ref>{{cite web |url=http://dev.mysql.com/doc/refman/5.0/en/select.html|title=SQL SELECT Syntax |author=MySQL}}</ref>


A '''SELECT''' statement retrieves zero or more rows from one or more [[Database Tables|database tables]] or database [[View (database)|views]]. In most applications, <code>SELECT</code> is the most commonly used [[data query language]] (DQL) command. As SQL is a [[declarative programming]] language, <code>SELECT</code> queries specify a result set, but do not specify how to calculate it. The database translates the query into a "[[query plan]]" which may vary between executions, database versions and database software. This functionality is called the "[[query optimizer]]" as it is responsible for finding the best possible execution plan for the query, within applicable constraints.
A SELECT statement retrieves zero or more rows from one or more [[Database Tables|database tables]] or database [[View (database)|views]]. In most applications, <code>SELECT</code> is the most commonly used [[data manipulation language]] (DML) command. As SQL is a [[declarative programming]] language, <code>SELECT</code> queries specify a result set, but do not specify how to calculate it. The database translates the query into a "[[query plan]]" which may vary between executions, database versions and database software. This functionality is called the "[[query optimizer]]" as it is responsible for finding the best possible execution plan for the query, within applicable constraints.


The SELECT statement has many optional clauses:
The SELECT statement has many optional clauses:

* <code>[[Where (SQL)|WHERE]]</code> specifies which rows to retrieve.
* <code>SELECT</code> list is the list of [[column (database)|columns]] or SQL expressions to be returned by the query. This is approximately the [[relational algebra]] [[Projection_(relational_algebra)|projection]] operation.
* <code>[[Alias (SQL)|AS]]</code> optionally provides an alias for each column or expression in the <code>SELECT</code> list. This is the relational algebra [[Rename_(relational_algebra)|rename]] operation.
* <code>[[From (SQL)|FROM]]</code> specifies from which table to get the data.<ref>Omitting FROM clause is not standard, but allowed by most major DBMSes.</ref>
* <code>[[Where (SQL)|WHERE]]</code> specifies which rows to retrieve. This is approximately the relational algebra [[Selection_(relational_algebra)|selection]] operation.
* <code>[[Group by (SQL)|GROUP BY]]</code> groups rows sharing a property so that an [[aggregate function]] can be applied to each group.
* <code>[[Group by (SQL)|GROUP BY]]</code> groups rows sharing a property so that an [[aggregate function]] can be applied to each group.
* <code>[[Having (SQL)|HAVING]]</code> selects among the groups defined by the GROUP BY clause.
* <code>[[Having (SQL)|HAVING]]</code> selects among the groups defined by the GROUP BY clause.
* <code>[[Order by (SQL)|ORDER BY]]</code> specifies an order in which to return the rows.
* <code>[[Order by (SQL)|ORDER BY]]</code> specifies how to order the returned rows.

* <code>[[Alias (SQL)|AS]]</code> provides an alias which can be used to temporarily rename tables or columns.
== Overview ==

<code>SELECT</code> is the most common operation in SQL, called "the query". <code>SELECT</code> retrieves data from one or more [[Table (database)|table]]s, or expressions. Standard <code>SELECT</code> statements have no persistent effects on the database. Some non-standard implementations of <code>SELECT</code> can have persistent effects, such as the <code>SELECT INTO</code> syntax provided in some databases.<ref name="ms-sql-select-into">{{ cite book | chapter = Transact-SQL Reference | title = SQL Server Language Reference | series = SQL Server 2005 Books Online | publisher = Microsoft | date = 2007-09-15 | url = http://msdn.microsoft.com/en-us/library/ms188029.aspx | access-date = 2007-06-17 }}</ref>

Queries allow the user to describe desired data, leaving the [[Database management system|database management system (DBMS)]] to carry out [[query plan|planning]], [[query optimizer|optimizing]], and performing the physical operations necessary to produce that result as it chooses.

A query includes a list of columns to include in the final result, normally immediately following the <code>SELECT</code> keyword. An asterisk ("<code>*</code>") can be used to specify that the query should return all columns of all the queried tables. <code>SELECT</code> is the most complex statement in SQL, with optional keywords and clauses that include:

* The <code>[[From (SQL)|FROM]]</code> clause, which indicates the table(s) to retrieve data from. The <code>FROM</code> clause can include optional <code>[[Join (SQL)|JOIN]]</code> subclauses to specify the rules for joining tables.
* The <code>[[Where (SQL)|WHERE]]</code> clause includes a comparison predicate, which restricts the rows returned by the query. The <code>WHERE</code> clause eliminates all rows from the result set where the comparison predicate does not evaluate to True.
* The <code>GROUP BY</code> clause projects rows having common values into a smaller set of rows. <code>GROUP BY</code> is often used in conjunction with SQL aggregation functions or to eliminate duplicate rows from a result set. The <code>WHERE</code> clause is applied before the <code>GROUP BY</code> clause.
* The <code>[[Having (SQL)|HAVING]]</code> clause includes a predicate used to filter rows resulting from the <code>GROUP BY</code> clause. Because it acts on the results of the <code>GROUP BY</code> clause, aggregation functions can be used in the <code>HAVING</code> clause predicate.
* The <code>[[Order by (SQL)|ORDER BY]]</code> clause identifies which column[s] to use to sort the resulting data, and in which direction to sort them (ascending or descending). Without an <code>ORDER BY</code> clause, the order of rows returned by an SQL query is undefined.
* The <code>DISTINCT</code> keyword<ref>
{{cite book
| title = SAS 9.4 SQL Procedure User's Guide
| date=10 July 2013 | url = https://books.google.com/books?id=ESjMAAAAQBAJ
| publisher = SAS Institute
| publication-date = 2013
| page = 248
| isbn = 9781612905686
| access-date = 2015-10-21
| quote = Although the UNIQUE argument is identical to DISTINCT, it is not an ANSI standard.
}}
</ref> eliminates duplicate data.<ref>
{{cite book
| last1 = Leon
| first1 = Alexis
| author-link1 = Alexis Leon
| last2 = Leon
| first2 = Mathews
| year = 1999
| chapter = Eliminating duplicates - SELECT using DISTINCT
| title = SQL: A Complete Reference
| url = https://books.google.com/books?id=dmiPz2MMpfwC
| location = New Delhi
| publisher = Tata McGraw-Hill Education
| publication-date = 2008
| page = 143
| isbn = 9780074637081
| access-date = 2015-10-21
| quote = [...] the keyword DISTINCT [...] eliminates the duplicates from the result set.
}}
</ref>

The following example of a <code>SELECT</code> query returns a list of expensive books. The query retrieves all rows from the ''Book'' table in which the ''price'' column contains a value greater than 100.00. The result is sorted in ascending order by ''title''. The asterisk (*) in the ''select list'' indicates that all columns of the ''Book'' table should be included in the result set.

<syntaxhighlight lang="sql">
SELECT *
FROM Book
WHERE price > 100.00
ORDER BY title;
</syntaxhighlight>

The example below demonstrates a query of multiple tables, grouping, and aggregation, by returning a list of books and the number of authors associated with each book.

<syntaxhighlight lang="sql">
SELECT Book.title AS Title,
count(*) AS Authors
FROM Book
JOIN Book_author
ON Book.isbn = Book_author.isbn
GROUP BY Book.title;
</syntaxhighlight>

Example output might resemble the following:

Title Authors
---------------------- -------
SQL Examples and Guide 4
The Joy of SQL 1
An Introduction to SQL 2
Pitfalls of SQL 1

Under the precondition that ''isbn'' is the only common column name of the two tables and that a column named ''title'' only exists in the ''Book'' table, one could re-write the query above in the following form:

<syntaxhighlight lang="sql">
SELECT title,
count(*) AS Authors
FROM Book
NATURAL JOIN Book_author
GROUP BY title;
</syntaxhighlight>

However, many{{quantify|date=October 2015}} vendors either do not support this approach, or require certain column-naming conventions for natural joins to work effectively.

SQL includes operators and functions for calculating values on stored values. SQL allows the use of expressions in the ''select list'' to project data, as in the following example, which returns a list of books that cost more than 100.00 with an additional ''sales_tax'' column containing a sales tax figure calculated at 6% of the ''price''.

<syntaxhighlight lang="sql">
SELECT isbn,
title,
price,
price * 0.06 AS sales_tax
FROM Book
WHERE price > 100.00
ORDER BY title;
</syntaxhighlight>

=== Subqueries ===

Queries can be nested so that the results of one query can be used in another query via a [[relational operator]] or aggregation function. A nested query is also known as a ''subquery''. While joins and other table operations provide computationally superior (i.e. faster) alternatives in many cases (all depending on implementation), the use of subqueries introduces a hierarchy in execution that can be useful or necessary. In the following example, the aggregation function <code>AVG</code> receives as input the result of a subquery:

<syntaxhighlight lang="sql">
SELECT isbn,
title,
price
FROM Book
WHERE price < (SELECT AVG(price) FROM Book)
ORDER BY title;
</syntaxhighlight>

A subquery can use values from the outer query, in which case it is known as a [[correlated subquery]].

Since 1999 the SQL standard allows WITH clauses, i.e. named subqueries often called [[common table expression]]s (named and designed after the IBM DB2 version 2 implementation; Oracle calls these [[subquery factoring]]). CTEs can also be [[recursive]] by referring to themselves; [[Hierarchical and recursive queries in SQL|the resulting mechanism]] allows tree or graph traversals (when represented as relations), and more generally [[fixpoint]] computations.

=== Derived table ===

A derived table is a subquery in a FROM clause. Essentially, the derived table is a subquery that can be selected from or joined to. Derived table functionality allows the user to reference the subquery as a table. The derived table also is referred to as an ''inline view'' or a ''select in from list''.

In the following example, the SQL statement involves a join from the initial Books table to the derived table "Sales". This derived table captures associated book sales information using the ISBN to join to the Books table. As a result, the derived table provides the result set with additional columns (the number of items sold and the company that sold the books):

<syntaxhighlight lang="sql">
SELECT b.isbn, b.title, b.price, sales.items_sold, sales.company_nm
FROM Book b
JOIN (SELECT SUM(Items_Sold) Items_Sold, Company_Nm, ISBN
FROM Book_Sales
GROUP BY Company_Nm, ISBN) sales
ON sales.isbn = b.isbn
</syntaxhighlight>


==Examples==
== Examples ==
{| class="wikitable" style="float: right; clear:right; margin: 1em" border="1"
{| class="wikitable" style="float: right; clear:right; margin: 1em" border="1"
!Table "T"
!Table "T"
Line 84: Line 218:
|-
|-
| 1 || a
| 1 || a
|}
|-
|align="center"|does not exist
||{{code|2=sql|1=SELECT 1+1, 3*2;}}
|align="center"|
{| cellpadding="2" rules="all" style="border: 1px solid gray; text-align: center;"
! `1+1` !! `3*2`
|-
| 2 || 6
|}
|}
|}
|}
Given a table T, the ''query'' {{code|2=sql|1=SELECT * FROM T}} will result in all the elements of all the rows of the table being shown.
Given a table T, the ''query'' {{code|2=sql|1=SELECT * FROM T}} will result in all the elements of all the rows of the table being shown.


With the same table, the query {{code|2=sql|1=SELECT C1 FROM T}} will result in the elements from the column C1 of all the rows of the table being shown. This is similar to a ''[[Projection (relational algebra)|projection]]'' in [[Relational algebra]], except that in the general case, the result may contain duplicate rows. This is also known as a Vertical Partition in some database terms, restricting query output to view only specified fields or columns.
With the same table, the query {{code|2=sql|1=SELECT C1 FROM T}} will result in the elements from the column C1 of all the rows of the table being shown. This is similar to a ''[[Projection (relational algebra)|projection]]'' in [[relational algebra]], except that in the general case, the result may contain duplicate rows. This is also known as a Vertical Partition in some database terms, restricting query output to view only specified fields or columns.


With the same table, the query {{code|2=sql|1=SELECT * FROM T WHERE C1 = 1}} will result in all the elements of all the rows where the value of column C1 is '1' being shown&nbsp;— in [[Relational algebra]] terms, a ''[[Selection (relational algebra)|selection]]'' will be performed, because of the WHERE clause. This is also known as a Horizontal Partition, restricting rows output by a query according to specified conditions.
With the same table, the query {{code|2=sql|1=SELECT * FROM T WHERE C1 = 1}} will result in all the elements of all the rows where the value of column C1 is '1' being shown{{snd}} in [[relational algebra]] terms, a ''[[Selection (relational algebra)|selection]]'' will be performed, because of the WHERE clause. This is also known as a Horizontal Partition, restricting rows output by a query according to specified conditions.


With more than one table, the result set will be every combination of rows. So if two tables are T1 and T2, {{code|2=sql|1=SELECT * FROM T1, T2}} will result in every combination of T1 rows with every T2 rows. E.g., if T1 has 3 rows and T2 has 5 rows, then 15 rows will result.
With more than one table, the result set will be every combination of rows. So if two tables are T1 and T2, {{code|2=sql|1=SELECT * FROM T1, T2}} will result in every combination of T1 rows with every T2 rows. E.g., if T1 has 3 rows and T2 has 5 rows, then 15 rows will result.

Although not in standard, most DBMS allows using a select clause without a table by pretending that an imaginary table with one row is used. This is mainly used to perform calculations where a table is not needed.


The SELECT clause specifies a list of properties (columns) by name, or the wildcard character (“*”) to mean “all properties”.
The SELECT clause specifies a list of properties (columns) by name, or the wildcard character (“*”) to mean “all properties”.


== Limiting result rows ==
== Limiting result rows ==

Often it is convenient to indicate a maximum number of rows that are returned. This can be used for testing or to prevent consuming excessive resources if the query returns more information than expected. The approach to do this often varies per vendor.
Often it is convenient to indicate a maximum number of rows that are returned. This can be used for testing or to prevent consuming excessive resources if the query returns more information than expected. The approach to do this often varies per vendor.


In [[International Organization for Standardization|ISO]] [[SQL:2003]], result sets may be limited by using
In [[International Organization for Standardization|ISO]] [[SQL:2003]], result sets may be limited by using
* [[Cursor (databases)|cursors]], or
* [[Cursor (databases)|cursors]], or
* By introducing ''SQL window function'' to the SELECT-statement
* by adding a [[SQL window function]] to the SELECT-statement


ISO [[SQL:2008]] introduced the <code>FETCH FIRST</code> clause.
ISO [[SQL:2008]] introduced the <code>FETCH FIRST</code> clause.


According to PostgreSQL v.9 documentation, an '''SQL Window function''' ''performs a calculation across a set of table rows that are somehow related to the current row'', in a way similar to aggregate functions.
According to PostgreSQL v.9 documentation, an SQL window function "performs a calculation across a set of table rows that are somehow related to the current row", in a way similar to aggregate functions.<ref>[https://www.postgresql.org/docs/9.1/static/tutorial-window.html PostgreSQL 9.1.24 Documentation - Chapter 3. Advanced Features]</ref>
The name recalls signal processing [[window function |window functions]]. A window function call always contains an '''OVER''' clause.
<ref>[https://www.postgresql.org/docs/9.1/static/tutorial-window.html PostgreSQL 9.1.24 Documentation - Chapter 3. Advanced Features]</ref>
The name recalls signal processing [[window function | window functions]]. A window function call always contains an '''OVER''' clause.


=== ROW_NUMBER() window function ===
=== ROW_NUMBER() window function ===

<code>ROW_NUMBER() OVER</code> may be used for a ''simple table'' on the returned rows, e.g. to return no more than ten rows:
<code>ROW_NUMBER() OVER</code> may be used for a ''simple table'' on the returned rows, e.g. to return no more than ten rows:


<source lang="sql" highlight="3">
<syntaxhighlight lang="sql" highlight="3">
SELECT * FROM
SELECT * FROM
( SELECT
( SELECT
Line 121: Line 263:
FROM tablename
FROM tablename
) AS foo
) AS foo
WHERE row_number <= 11
WHERE row_number <= 10


</syntaxhighlight>
</source>


ROW_NUMBER can be [[Nondeterministic algorithm|non-deterministic]]: if ''sort_key'' is not unique, each time you run the query it is possible to get different row numbers assigned to any rows where ''sort_key'' is the same. When ''sort_key'' is unique, each row will always get a unique row number.
ROW_NUMBER can be [[Nondeterministic algorithm|non-deterministic]]: if ''sort_key'' is not unique, each time you run the query it is possible to get different row numbers assigned to any rows where ''sort_key'' is the same. When ''sort_key'' is unique, each row will always get a unique row number.


=== RANK() window function ===
=== RANK() window function ===

The <code>RANK() OVER</code> window function acts like ROW_NUMBER, but may return more or less than ''n'' rows in case of tie conditions, e.g. to return the top-10 youngest persons:
The <code>RANK() OVER</code> window function acts like ROW_NUMBER, but may return more or less than ''n'' rows in case of tie conditions, e.g. to return the top-10 youngest persons:


<source lang="sql" highlight="3">
<syntaxhighlight lang="sql" highlight="3">
SELECT * FROM (
SELECT * FROM (
SELECT
SELECT
Line 139: Line 280:
age
age
FROM person
FROM person
)AS foo
) AS foo
WHERE ranking <= 10
WHERE ranking <= 10
</syntaxhighlight>
</source>


The above code could return more than ten rows, e.g. if there are two people of the same age, it could return eleven rows.
The above code could return more than ten rows, e.g. if there are two people of the same age, it could return eleven rows.


=== FETCH FIRST clause ===
=== FETCH FIRST clause ===

Since ISO [[SQL:2008]] results limits can be specified as in the following example using the <code>FETCH FIRST</code> clause.
Since ISO [[SQL:2008]] results limits can be specified as in the following example using the <code>FETCH FIRST</code> clause.


<syntaxhighlight lang="sql" highlight="2">SELECT * FROM T
<code>SELECT * FROM T '''FETCH FIRST 10 ROWS ONLY'''</code>
FETCH FIRST 10 ROWS ONLY</syntaxhighlight>


This clause currently is supported by CA DATACOM/DB 11, IBM DB2, SAP SQL Anywhere, PostgreSQL, EffiProz, H2, HSQLDB version 2.0, Oracle 12c and [[Mimer SQL]].
This clause currently is supported by CA DATACOM/DB 11, IBM DB2, SAP SQL Anywhere, PostgreSQL, EffiProz, H2, HSQLDB version 2.0, Oracle 12c and [[Mimer SQL]].
Line 155: Line 296:
Microsoft SQL Server 2008 and higher [https://docs.microsoft.com/en-us/sql/t-sql/queries/select-order-by-clause-transact-sql?view=sql-server-2017#using-offset-and-fetch-to-limit-the-rows-returned supports <code>FETCH FIRST</code>], but it is considered part of the <code>ORDER BY</code> clause. The <code>ORDER BY</code>, <code>OFFSET</code>, and <code>FETCH FIRST</code> clauses are all required for this usage.
Microsoft SQL Server 2008 and higher [https://docs.microsoft.com/en-us/sql/t-sql/queries/select-order-by-clause-transact-sql?view=sql-server-2017#using-offset-and-fetch-to-limit-the-rows-returned supports <code>FETCH FIRST</code>], but it is considered part of the <code>ORDER BY</code> clause. The <code>ORDER BY</code>, <code>OFFSET</code>, and <code>FETCH FIRST</code> clauses are all required for this usage.


<syntaxhighlight lang="tsql" highlight="2">SELECT * FROM T
<code>SELECT * FROM T '''ORDER BY acolumn DESC OFFSET 0 ROWS FETCH FIRST 10 ROWS ONLY'''</code>
ORDER BY acolumn DESC OFFSET 0 ROWS FETCH FIRST 10 ROWS ONLY</syntaxhighlight>


=== Non-standard syntax ===
=== Non-standard syntax ===

Some DBMSs offer non-standard syntax either instead of or in addition to SQL standard syntax. Below, variants of the ''simple limit'' query for different DBMSes are listed:
Some DBMSs offer non-standard syntax either instead of or in addition to SQL standard syntax. Below, variants of the ''simple limit'' query for different DBMSes are listed:
{|class="wikitable"
{|class="wikitable"
|-
|-
| <source lang="sql" enclose="div">SET ROWCOUNT 10
| <syntaxhighlight lang="tsql" highlight="1">SET ROWCOUNT 10
SELECT * FROM T</source>
SELECT * FROM T</syntaxhighlight>
| [[Microsoft SQL Server|MS SQL Server]] (This also works on Microsoft SQL Server 6.5 while the '''Select top 10 * from T''' does not)
| [[Microsoft SQL Server|MS SQL Server]] (This also works on Microsoft SQL Server 6.5 while the '''Select top 10 * from T''' does not)
|-
|-
| <code>SELECT * FROM T '''LIMIT 10 OFFSET 20'''</code>
| <syntaxhighlight lang="postgres" highlight="2">SELECT * FROM T
LIMIT 10 OFFSET 20</syntaxhighlight>
| [[Netezza]], [[MySQL]], [[SQL Anywhere|SAP SQL Anywhere]], [[PostgreSQL]] (also supports the standard, since version 8.4), [[SQLite]], [[HSQLDB]], [[H2 (DBMS)|H2]], [[Vertica]], [[Polyhedra DBMS|Polyhedra]], [[Couchbase Server]]
| [[Netezza]], [[MySQL]], [[MariaDB]] (also supports the standard version, since version 10.6), [[SQL Anywhere|SAP SQL Anywhere]], [[PostgreSQL]] (also supports the standard, since version 8.4), [[SQLite]], [[HSQLDB]], [[H2 (DBMS)|H2]], [[Vertica]], [[Polyhedra DBMS|Polyhedra]], [[Couchbase Server]], [[Snowflake Computing]], [[Virtuoso Universal Server|OpenLink Virtuoso]]
|-
|-
| <code>SELECT * from T '''WHERE ROWNUM <= 10'''</code>
| <syntaxhighlight lang="sql" highlight="2">SELECT * from T
WHERE ROWNUM <= 10</syntaxhighlight>
| [[Oracle database|Oracle]]
| [[Oracle database|Oracle]]
|-
|-
Line 184: Line 327:
| [[Microsoft SQL Server|MS SQL Server]], [[Adaptive Server Enterprise|SAP ASE]], [[Microsoft Access|MS Access]], [[SAP IQ]], [[Teradata]]
| [[Microsoft SQL Server|MS SQL Server]], [[Adaptive Server Enterprise|SAP ASE]], [[Microsoft Access|MS Access]], [[SAP IQ]], [[Teradata]]
|-
|-
| <code>SELECT * FROM T '''SAMPLE 10''' </code>
| <syntaxhighlight lang="sql" highlight="2">SELECT * FROM T
SAMPLE 10</syntaxhighlight>
| [[Teradata]]
| [[Teradata]]
|-
| <code>SELECT '''TOP 20, 10''' * FROM T</code>
| [[Virtuoso Universal Server|OpenLink Virtuoso]] (skips 20, delivers next 10)<ref name="docs_9.19">{{Cite web |title=9.19.10. The TOP SELECT Option |author=OpenLink Software |work=docs.openlinksw.com |access-date=1 October 2019 |url= http://docs.openlinksw.com/virtuoso/topselectoption/ |language=en-US}}</ref>

|-
|-
| <code>SELECT '''TOP 10 START AT 20''' * FROM T</code>
| <code>SELECT '''TOP 10 START AT 20''' * FROM T</code>
Line 193: Line 341:
| [[Firebird (database server)|Firebird]]
| [[Firebird (database server)|Firebird]]
|-
|-
| <code>SELECT * FROM T '''ROWS 20 TO 30'''</code>
| <syntaxhighlight lang="sql" highlight="2">SELECT * FROM T
ROWS 20 TO 30</syntaxhighlight>
| [[Firebird (database server)|Firebird]] (since version 2.1)
| [[Firebird (database server)|Firebird]] (since version 2.1)
|-
|-
| <source lang="sql" enclose="div">SELECT * FROM T
| <syntaxhighlight lang="sql" highlight="2">SELECT * FROM T
WHERE ID_T > 10 FETCH FIRST 10 ROWS ONLY</source>
WHERE ID_T > 10 FETCH FIRST 10 ROWS ONLY</syntaxhighlight>
| [[IBM DB2|DB2]]
| [[IBM Db2]]
|-
|-
| <source lang="sql" enclose="div">SELECT * FROM T
| <syntaxhighlight lang="sql" highlight="2">SELECT * FROM T
WHERE ID_T > 20 FETCH FIRST 10 ROWS ONLY</source>
WHERE ID_T > 20 FETCH FIRST 10 ROWS ONLY</syntaxhighlight>
| [[IBM DB2|DB2]] (new rows are filtered after comparing with key column of table T)
| [[IBM Db2]] (new rows are filtered after comparing with key column of table T)
|}
|}


== Rows Pagination ==
=== Rows Pagination ===
'''Rows Pagination''' <ref>Ing. Óscar Bonilla, MBA</ref> is an approach used to limit and display only a part of the total data of a query in the database. Instead of showing hundreds or thousands of rows at the same time, the server is requested only one page (a limited set of rows, per example only 10 rows), and the user starts navigating by requesting the next page, and then the next one, and so on. It is very useful, specially in web systems, where there is no dedicated connection between the client and the server, so the client does not have to wait to read and display all the rows of the server.
'''Rows Pagination'''<ref>Ing. Óscar Bonilla, MBA</ref> is an approach used to limit and display only a part of the total data of a query in the database. Instead of showing hundreds or thousands of rows at the same time, the server is requested only one page (a limited set of rows, per example only 10 rows), and the user starts navigating by requesting the next page, and then the next one, and so on. It is very useful, specially in web systems, where there is no dedicated connection between the client and the server, so the client does not have to wait to read and display all the rows of the server.


'''Data in Pagination approach:'''
==== Data in Pagination approach ====
* {rows} = Number of raws in a page
* <code>{rows}</code> = Number of rows in a page
* {page_number} = Number of the current page
* <code>{page_number}</code> = Number of the current page
* {begin_base_0} = Number of the row - 1 where the page starts = (page_number-1) * rows
* <code>{begin_base_0}</code> = Number of the row - 1 where the page starts = (page_number-1) * rows

* {sorting_cols} = It is very important to sort the rows with a set of columns of the table whose values ​​are unique, with the aim that each time the same query is executed, the rows always appear in the same order. This is achieved by placing any column or columns in the "order by" and adding the field or fields of the primary key or any other unique index at the end of these fields list
==== Simplest method (but very inefficient) ====
<br>
# Select all rows from the database
'''Simplest method (but very inefficient):'''<br>
# Read all rows but send to display only when the row_number of the rows read is between <code>{begin_base_0 + 1}</code> and <code>{begin_base_0 + rows}</code>
1) Select all rows from the database. Remember that {sorting_cols} must have unique values.<br>
<syntaxhighlight lang="sql">Select *
2) Read all rows but send to display only when the row_number of the rows read is between {begin_base_0 + 1} and {begin_base_0 + rows}
<pre>Select *
from {table}
from {table}
order by {sorting_cols}</pre>
order by {unique_key}</syntaxhighlight>

<br>
'''Other simple method (a little more efficient than read all rows):'''<br>
==== Other simple method (a little more efficient than read all rows) ====
1) Select all the rows from the beginning of the table to the last row to display ({begin_base_0 + rows}). Remember that {sorting_cols} must have unique values.<br>
# Select all the rows from the beginning of the table to the last row to display (<code>{begin_base_0 + rows}</code>)
2) Read the {begin_base_0 + rows} rows but send to display only when the row_number of the rows read is greater than {begin_base_0}
# Read the <code>{begin_base_0 + rows}</code> rows but send to display only when the row_number of the rows read is greater than <code>{begin_base_0}</code>
{|class="wikitable"
{|class="wikitable"
|-
|-
| '''SQL'''
! SQL
| '''Dialect'''
! Dialect
|-
|-
|
|
<pre>select *
<syntaxhighlight lang="postgresql">select *
from {table}
from {table}
order by {sorting_cols}
order by {unique_key}
FETCH FIRST {begin_base_0 + rows} ROWS ONLY</pre>
FETCH FIRST {begin_base_0 + rows} ROWS ONLY</syntaxhighlight>
| SQL ANSI 2008<br>Postgresql<br>SQL Server 2012<br>Derby<br>Oracle 12c<br>DB2 12
| SQL ANSI 2008<br>PostgreSQL<br>SQL Server 2012<br>Derby<br>Oracle 12c<br>DB2 12<br>Mimer SQL
|-
|-
|
|
<pre>Select *
<syntaxhighlight lang="mysql">Select *
from {table}
from {table}
order by {sorting_cols}
order by {unique_key}
LIMIT {begin_base_0 + rows}</pre>
LIMIT {begin_base_0 + rows}</syntaxhighlight>
| MySQL<br>SQLite
| MySQL<br>SQLite
|-
|-
|
|
<pre>Select TOP {begin_base_0 + rows} *
<syntaxhighlight lang="tsql">Select TOP {begin_base_0 + rows} *
from {table}
from {table}
order by {sorting_cols}</pre>
order by {unique_key}</syntaxhighlight>
| SQL Server 2005
| SQL Server 2005
|-
|-
|
|
<syntaxhighlight lang="mysql">Select *
<pre>SET ROWCOUNT {begin_base_0 + rows}
from {table}
order by {unique_key}
ROWS LIMIT {begin_base_0 + rows}</syntaxhighlight>
| Sybase, ASE 16 SP2
|-
|
<syntaxhighlight lang="tsql">SET ROWCOUNT {begin_base_0 + rows}
Select *
Select *
from {table}
from {table}
order by {sorting_cols}
order by {unique_key}
SET ROWCOUNT 0</pre>
SET ROWCOUNT 0</syntaxhighlight>
| Sybase, SQL Server 2000
| Sybase, SQL Server 2000
|-
|-
|
|
<pre>Select *
<syntaxhighlight lang="sql">Select *
FROM (
FROM (
SELECT *
SELECT *
FROM {table}
FROM {table}
ORDER BY {sorting_cols}
ORDER BY {unique_key}
) a
) a
where rownum <= {begin_base_0 + rows}</pre>
where rownum <= {begin_base_0 + rows}</syntaxhighlight>
| Oracle 11
| Oracle 11
|}
|}
<br>
<br>

'''Method with positioning:'''<br>
==== Method with positioning ====
1) Select only <rows> rows starting from the next row to display ({begin_base_0 + 1}). Remember that {sorting_cols} must have unique values.<br>
# Select only <code>{rows}</code> rows starting from the next row to display (<code>{begin_base_0 + 1}</code>)
2) Read and send to display all the rows read from the database
# Read and send to display all the rows read from the database
{|class="wikitable"
{|class="wikitable"
|-
|-
| '''SQL'''
! SQL
| '''Dialect'''
! Dialect
|-
|-
|
|
<pre>Select *
<syntaxhighlight lang="postgres">Select *
from {table}
from {table}
order by {sorting_cols}
order by {unique_key}
OFFSET {begin_base_0} ROWS
OFFSET {begin_base_0} ROWS
FETCH NEXT {rows} ROWS ONLY</pre>
FETCH NEXT {rows} ROWS ONLY</syntaxhighlight>
| SQL ANSI 2008<br>Postgresql<br>SQL Server 2012<br>Derby<br>Oracle 12c<br>DB2 12
| SQL ANSI 2008<br>PostgreSQL<br>SQL Server 2012<br>Derby<br>Oracle 12c<br>DB2 12<br>Mimer SQL
|-
|-
|
|
<pre>Select *
<syntaxhighlight lang="postgres">Select *
from {table}
from {table}
order by {sorting_cols}
order by {unique_key}
LIMIT {rows} OFFSET {begin_base_0}</pre>
LIMIT {rows} OFFSET {begin_base_0}</syntaxhighlight>
| MySQL<br>MariaDB<br>Postgresql<br>SQLite
| MySQL<br>MariaDB<br>PostgreSQL<br>SQLite
|-
|-
|
|
<pre>Select *
<syntaxhighlight lang="mysql">Select *
from {table}
from {table}
order by {sorting_cols}
order by {unique_key}
LIMIT {begin_base_0}, {rows}</pre>
LIMIT {begin_base_0}, {rows}</syntaxhighlight>
| MySQL<br>MariaDB<br>SQLite
| MySQL<br>MariaDB<br>SQLite
|-
|-
|
|
<syntaxhighlight lang="mysql">Select *
<pre>select TOP {begin_base_0 + rows}
from {table}
order by {unique_key}
ROWS LIMIT {rows} OFFSET {begin_base_0}</syntaxhighlight>
| Sybase, ASE 16 SP2
|-
|
<syntaxhighlight lang="tsql">Select TOP {begin_base_0 + rows}
*, _offset=identity(10)
*, _offset=identity(10)
into #temp
into #temp
from {table}
from {table}
ORDER BY {sorting_cols}
ORDER BY {unique_key}
select * from #temp where _offset > {begin_base_0}
select * from #temp where _offset > {begin_base_0}
DROP TABLE #temp</pre>
DROP TABLE #temp</syntaxhighlight>
| Sybase 12.5.3:
| Sybase 12.5.3:
|-
|-
|
|
<pre>SET ROWCOUNT {begin_base_0 + rows}
<syntaxhighlight lang="tsql">SET ROWCOUNT {begin_base_0 + rows}
select *, _offset=identity(10)
select *, _offset=identity(10)
into #temp
into #temp
from {table}
from {table}
ORDER BY {sorting_cols}
ORDER BY {unique_key}
select * from #temp where _offset > {begin_base_0}
select * from #temp where _offset > {begin_base_0}
DROP TABLE #temp
DROP TABLE #temp
SET ROWCOUNT 0</pre>
SET ROWCOUNT 0</syntaxhighlight>
| Sybase 12.5.2:
| Sybase 12.5.2:
|-
|-
|
|
<pre>select TOP {rows} *
<syntaxhighlight lang="tsql">select TOP {rows} *
from (
from (
select *, ROW_NUMBER() over (order by {sorting_cols}) as _offset
select *, ROW_NUMBER() over (order by {unique_key}) as _offset
from {table}
from {table}
) a
) xx
where _offset > {begin_base_0}</pre>
where _offset > {begin_base_0}</syntaxhighlight>
<br>
<br>
| SQL Server 2005
| SQL Server 2005
|-
|-
|
|
<pre>SET ROWCOUNT {begin_base_0 + rows}
<syntaxhighlight lang="tsql">SET ROWCOUNT {begin_base_0 + rows}
select *, _offset=identity(int,1,1)
select *, _offset=identity(int,1,1)
into #temp
into #temp
from {table}
from {table}
ORDER BY {sorting_cols}
ORDER BY {unique-key}
select * from #temp where _offset > {begin_base_0}
select * from #temp where _offset > {begin_base_0}
DROP TABLE #temp
DROP TABLE #temp
SET ROWCOUNT 0</pre>
SET ROWCOUNT 0</syntaxhighlight>
| SQL Server 2000
| SQL Server 2000
|-
|-
|
|
<pre>SELECT * FROM (
<syntaxhighlight lang="sql">SELECT * FROM (
SELECT rownum-1 as _offset, a.*
SELECT rownum-1 as _offset, a.*
FROM(
FROM(
SELECT *
SELECT *
FROM {table}
FROM {table}
ORDER BY {sorting_cols}
ORDER BY {unique_key}
) a
) a
WHERE rownum <= {begin_base_0 + cant_regs}
WHERE rownum <= {begin_base_0 + cant_regs}
)
)
WHERE _offset >= {begin_base_0}</pre>
WHERE _offset >= {begin_base_0}</syntaxhighlight>
| Oracle 11
| Oracle 11
|}
|}
<br>
<br>

'''Method with filter (it is more sophisticated but necessary for very big dataset):'''<br>
==== Method with filter (it is more sophisticated but necessary for very big dataset) ====
1) Select only then <rows> rows with filter:<br>
# Select only then <code>{rows}</code> rows with filter:
1.1) First Page: select only the first {rows} rows, depending on the type of database. Remember that {sorting_cols} must have unique values, but in the case of a very big dataset it must have other considerations<br>
1.2) Next Page: select only the first {rows} rows, depending on the type of database, where the {sorting_cols} is grater than {last_val} (the value of the {sorting_cols} of the last row in the current page)<br>
## First Page: select only the first <code>{rows}</code> rows, depending on the type of database
1.3) Previous Page: sort the data in the reverse order, select only the first {rows} rows, where the {sorting_cols} is less than {first_val} (the value of the {sorting_cols} of the first row in the current page), and sort the result in the correct order<br>
## Next Page: select only the first <code>{rows}</code> rows, depending on the type of database, where the <code>{unique_key}</code> is greater than <code>{last_val}</code> (the value of the <code>{unique_key}</code> of the last row in the current page)
## Previous Page: sort the data in the reverse order, select only the first <code>{rows}</code> rows, where the <code>{unique_key}</code> is less than <code>{first_val}</code> (the value of the <code>{unique_key}</code> of the first row in the current page), and sort the result in the correct order
2) Read and send to display all the rows read from the database
# Read and send to display all the rows read from the database


{|class="wikitable"
{|class="wikitable"
|-
|-
| '''First Page'''
! First Page
| '''Next Page'''
! Next Page
| '''Previous Page'''
! Previous Page
| '''Dialect'''
! Dialect
|-
|-
|
|
<pre>select *
<syntaxhighlight lang="postgresql">select *
from {table}
from {table}
order by {sorting_cols}
order by {unique_key}
FETCH FIRST {rows} ROWS ONLY</pre>
FETCH FIRST {rows} ROWS ONLY</syntaxhighlight>
|
|
<pre>select *
<syntaxhighlight lang="postgresql">select *
from {table}
from {table}
where {sorting_cols} > {last_val}
where {unique_key} > {last_val}
order by {sorting_cols}
order by {unique_key}
FETCH FIRST {rows} ROWS ONLY</pre>
FETCH FIRST {rows} ROWS ONLY</syntaxhighlight>
|
|
<small>select *
<syntaxhighlight lang="postgresql">select *
from (
from (
Select *
select *
from {table}
from {table}
where {sorting_cols} < {first_val}
where {unique_key} < {first_val}
order by {sorting_cols} DESC
order by {unique_key} DESC
FETCH FIRST {rows} ROWS ONLY
FETCH FIRST {rows} ROWS ONLY
) a
) a
order by {sorting_cols}</small>
order by {unique_key}</syntaxhighlight>
| SQL ANSI 2008<br>Postgresql<br>SQL Server 2012<br>Derby<br>Oracle 12c<br>DB2 12
| SQL ANSI 2008<br>PostgreSQL<br>SQL Server 2012<br>Derby<br>Oracle 12c<br>DB2 12<br>Mimer SQL
|-
|-
|
|
<pre>select *
<syntaxhighlight lang="mysql">select *
from {table}
from {table}
order by {sorting_cols}
order by {unique_key}
LIMIT {rows}</pre>
LIMIT {rows}</syntaxhighlight>
|
|
<pre>select *
<syntaxhighlight lang="mysql">select *
from {table}
from {table}
where {sorting_cols} > {last_val}
where {unique_key} > {last_val}
order by {sorting_cols}
order by {unique_key}
LIMIT {rows}</pre>
LIMIT {rows}</syntaxhighlight>
|
|
<small>select *
<syntaxhighlight lang="mysql">select *
from (
from (
select *
select *
from {table}
from {table}
where {sorting_cols} < {first_val}
where {unique_key} < {first_val}
order by {sorting_cols} DESC
order by {unique_key} DESC
LIMIT {rows}
LIMIT {rows}
) a
) a
order by {sorting_cols}</small>
order by {unique_key}</syntaxhighlight>
| MySQL<br>SQLite
| MySQL<br>SQLite
|-
|-
|
|
<pre>select TOP {rows} *
<syntaxhighlight lang="tsql">select TOP {rows} *
from {table}
from {table}
order by {sorting_cols}</pre>
order by {unique_key}</syntaxhighlight>
|
|
<pre>select TOP {rows} *
<syntaxhighlight lang="tsql">select TOP {rows} *
from {table}
from {table}
where {sorting_cols} > {last_val}
where {unique_key} > {last_val}
order by {sorting_cols}</pre>
order by {unique_key}</syntaxhighlight>
|
|
<small>select *
<syntaxhighlight lang="tsql">select *
from (
from (
select TOP {rows} *
select TOP {rows} *
from {table}
from {table}
where {sorting_cols} < {first_val}
where {unique_key} < {first_val}
order by {sorting_cols} DESC
order by {unique_key} DESC
) a
) a
order by {sorting_cols}</small>
order by {unique_key}</syntaxhighlight>
| SQL Server 2005
| SQL Server 2005
|-
|-
|
|
<pre>SET ROWCOUNT {rows}
<syntaxhighlight lang="tsql">SET ROWCOUNT {rows}
select *
select *
from {table}
from {table}
order by {sorting_cols}
order by {unique_key}
SET ROWCOUNT 0</pre>
SET ROWCOUNT 0</syntaxhighlight>
|
|
<pre>SET ROWCOUNT {rows}
<syntaxhighlight lang="tsql">SET ROWCOUNT {rows}
select *
select *
from {table}
from {table}
where {sorting_cols} > {last_val}
where {unique_key} > {last_val}
order by {sorting_cols}
order by {unique_key}
SET ROWCOUNT 0</pre>
SET ROWCOUNT 0</syntaxhighlight>
|
|
<small>SET ROWCOUNT {rows}
<syntaxhighlight lang="tsql">SET ROWCOUNT {rows}
select *
select *
from (
from (
select *
select *
from {table}
from {table}
where {sorting_cols} < {first_val}
where {unique_key} < {first_val}
order by {sorting_cols} DESC
order by {unique_key} DESC
) a
) a
order by {sorting_cols}
order by {unique_key}
SET ROWCOUNT 0</small>
SET ROWCOUNT 0</syntaxhighlight>
| Sybase, SQL Server 2000
| Sybase, SQL Server 2000
|-
|-
|
|
<pre>select *
<syntaxhighlight lang="sql">select *
from (
from (
select *
select *
from {table}
from {table}
order by {sorting_cols}
order by {unique_key}
) a
) a
where rownum <= {rows}</pre>
where rownum <= {rows}</syntaxhighlight>
|
|
<pre>select *
<syntaxhighlight lang="sql">select *
from (
from (
select *
select *
from {table}
from {table}
where {sorting_cols} > {last_val}
where {unique_key} > {last_val}
order by {sorting_cols}
order by {unique_key}
) a
) a
where rownum <= {rows}</pre>
where rownum <= {rows}</syntaxhighlight>
|
|
<small>select *
<syntaxhighlight lang="sql">select *
from (
from (
select *
select *
Line 484: Line 648:
select *
select *
from {table}
from {table}
where {sorting_cols} < {first_val}
where {unique_key} < {first_val}
order by {sorting_cols} DESC
order by {unique_key} DESC
) a1
) a1
where rownum <= {rows}
where rownum <= {rows}
) a2
) a2
order by {sorting_cols}</small>
order by {unique_key}</syntaxhighlight>
| Oracle 11
| Oracle 11
|}
|}
<br>

=== Considerations with very big data sets ===

When we talk about "Very big data set" we are talking about paging a table with hundreds of thousands or millions of data.<br>

'''Unique Values'''

If {sorting_cols} did not have unique values, then, for example, if on a page of 10 rows, these sorting columns were repeated, for example, if there are 5 rows with these repeated columns, and correspond to the last 2 rows of a page, should appear again in the first 3 rows of the next page, but since "{sorting_cols}> {last_val}" was leaked, these next 3 rows will be lost. If these sorting columns are not repeated or are unique, necessarily the first row that fulfills the condition "{sorting_cols}> {last_val}" corresponds to the next value, and queues will never be lost in queries.<br>

'''Related Index'''

We might think that "{sorting_cols}" will always correspond to the primary key, an alternating key or a unique index of the table, which although it is true is very correct, it is not always suitable for all cases of real life .<br>

What is mandatory is that because of the large amount of data, there should always be an index associated with the {sorting_cols}. Remember that each time the query is made, the database will try to sort the millions of records by these columns, but if there is already an index, what it does is read the index and it will not waste time sorting the data.<br>

However, depending on the DBMS optimizer, it is possible to add the columns of the primary key to the columns of the index.<br>
* For example, in Oracle 10, there was a rule that the optimizer only uses the index when all the columns of the index are used, that is, in this case it is mandatory to have an index with all the columns {sorting_cols}
* Most current optimizers are smart enough to use the index when sorting data by starting with the same columns. In this case the {sorting_cols} can start with the columns of an index and add the column or columas of the unique index. However, this method only works when the granulity of the values ​​of the first index is relatively greater than the unique values, so that the time of ordering the unique values ​​within the sets of repeated values ​​is not very large. Otherwise, it will be necessary again to have an index for all columns on {sorting_cols}.

'''Complex "Greater than (>)" condition'''

For example, if we have an "Employees" table that has a primary key by "Num_Employee", and an Index by "Last_name, First_name", we can display the data by "Num_Employee" or by "Last_name, First_name".
<br>
In the first case, the condition "where {sorting_cols}> {last_val}" and the ordering "order by {sorting_cols}" will be easily implemented as: <br>
<pre>where Num_Employee > {Num_Employee_of_last_row} -- it is the value of the Num_Employee of the last row of the current page
...
order by Num_Employee</pre>
But the second case is more complex. We want to show the data sorted by "Last_name, First_name", but since this index does not give us unique values, we add the primary key column, so the {sorting_cols} remains as "Last_name, First_name, Num_Employee". Automatically we already have the columns of the "Order by": "order by Last_name, First_name, Num_Employee" But the condition is more complex. <br>
The easiest is to concatenate the values, placing a separator between each value:
<pre>-- Must not use this method because it generates table scan
where Last_name || ',' || First_name || ',' || Num_Employee >
{Last_name_of_last_row} || ',' || {First_name_of_last_row} || ',' || {Num_Employee_of_last_row}
...
order by Last_name, First_name, Num_Employee</pre>
However, the concatenation of values ​​means that the index is not used, then the filter is searched in all the records (table scan), which means that a great amount of time is required.
To force the optimizer to use the index, it is necessary to use each column individually, as follows: <br>
<pre>where
(
Col1 > {Col1_of_last_row}
or Col1 = {Col1_of_last_row} and Col2 > {Col2_of_last_row}
or Col1 = {Col1_of_last_row} and Col2 = {Col2_of_last_row} and Col3 > {Col3_of_last_row}
...
or Col1 = {Col1_of_last_row} and Col2 = {Col2_of_last_row} and Col3 = {Col3_of_last_row}...and Col_n > {Col_n_of_last_row}
)
...
order by Col1, Col2, Col3 ..., Col_n</pre>
Then, the optimal method to perform the complex filter of the second case is:<br>
<pre>-- Must use this method in order to use the index
where (
Last_name > {Last_name_of_last_row}
or Last_name = {Last_name_of_last_row} and First_name > {Fist_name_of_last_row}
or Last_name = {Last_name_of_last_row} and First_name = {Fist_name_of_last_row} and Num_Employee > {Num_Employee_of_last_row}
)
...
order by Last_name, First_name, Num_Employee</pre>
I test this method in SQLite and it did the query in 0.34 seconds with 4 millions rows<br>


== Hierarchical query ==
== Hierarchical query ==
Line 556: Line 663:


For example,
For example,
{{sxhl|2=tsql|

sum(population) OVER( PARTITION BY city )
sum(population) OVER( PARTITION BY city )
}}

calculates the sum of the populations of all rows having the same ''city'' value as the current row.
calculates the sum of the populations of all rows having the same ''city'' value as the current row.


Partitions are specified using the '''OVER''' clause which modifies the aggregate. Syntax:
Partitions are specified using the '''OVER''' clause which modifies the aggregate. Syntax:
{{sxhl|2=bnf|1=

<OVER_CLAUSE> :: =
<OVER_CLAUSE> :: =
OVER ( [ PARTITION BY <expr>, ... ]
OVER ( [ PARTITION BY <expr>, ... ]
[ ORDER BY <expression> ] )
[ ORDER BY <expression> ] )
}}

The OVER clause can partition and order the result set. Ordering is used for order-relative functions such as row_number.
The OVER clause can partition and order the result set. Ordering is used for order-relative functions such as row_number.


==Query evaluation ANSI==
== Query evaluation ANSI ==

The processing of a SELECT statement according to ANSI SQL would be the following:<ref>Inside Microsoft SQL Server 2005: T-SQL Querying by Itzik Ben-Gan, Lubor Kollar, and Dejan Sarka</ref>
The processing of a SELECT statement according to ANSI SQL would be the following:<ref>Inside Microsoft SQL Server 2005: T-SQL Querying by Itzik Ben-Gan, Lubor Kollar, and Dejan Sarka</ref>


{{ordered list
{{ordered list
|1=<source lang="sql">
|1=<syntaxhighlight lang="postgresql">
select g.*
select g.*
from users u inner join groups g on g.Userid = u.Userid
from users u inner join groups g on g.Userid = u.Userid
where u.LastName = 'Smith'
where u.LastName = 'Smith'
and u.FirstName = 'John'
and u.FirstName = 'John'
</syntaxhighlight>
</source>


|2= the FROM clause is evaluated, a cross join or Cartesian product is produced for the first two tables in the FROM clause resulting in a virtual table as Vtable1
|2= the FROM clause is evaluated, a cross join or Cartesian product is produced for the first two tables in the FROM clause resulting in a virtual table as Vtable1
|3= the ON clause is evaluated for vtable1; only records which meet the join condition g.Userid = u.Userid are inserted into Vtable2
|3= the ON clause is evaluated for vtable1; only records which meet the join condition g.Userid = u.Userid are inserted into Vtable2
|4= If an outer join is specified, records which were dropped from vTable2 are added into VTable 3, for instance if the above query were:
|4= If an outer join is specified, records which were dropped from vTable2 are added into VTable 3, for instance if the above query were:
<source lang="sql">
<syntaxhighlight lang="postgresql">
select u.*
select u.*
from users u left join groups g on g.Userid = u.Userid
from users u left join groups g on g.Userid = u.Userid
where u.LastName = 'Smith'
where u.LastName = 'Smith'
and u.FirstName = 'John' </source>
and u.FirstName = 'John' </syntaxhighlight>
all users who did not belong to any groups would be added back into Vtable3
all users who did not belong to any groups would be added back into Vtable3
|5= the WHERE clause is evaluated, in this case only group information for user John Smith would be added to vTable4
|5= the WHERE clause is evaluated, in this case only group information for user John Smith would be added to vTable4
|6= the GROUP BY is evaluated; if the above query were:
|6= the GROUP BY is evaluated; if the above query were:
<source lang="sql">
<syntaxhighlight lang="postgresql">
select g.GroupName, count(g.*) as NumberOfMembers
select g.GroupName, count(g.*) as NumberOfMembers
from users u inner join groups g on g.Userid = u.Userid
from users u inner join groups g on g.Userid = u.Userid
group by GroupName
group by GroupName
</syntaxhighlight>
</source>
vTable5 would consist of members returned from vTable4 arranged by the grouping, in this case the GroupName
vTable5 would consist of members returned from vTable4 arranged by the grouping, in this case the GroupName
|7= the HAVING clause is evaluated for groups for which the HAVING clause is true and inserted into vTable6. For example:
|7= the HAVING clause is evaluated for groups for which the HAVING clause is true and inserted into vTable6. For example:
<source lang="sql">
<syntaxhighlight lang="postgresql">
select g.GroupName, count(g.*) as NumberOfMembers
select g.GroupName, count(g.*) as NumberOfMembers
from users u inner join groups g on g.Userid = u.Userid
from users u inner join groups g on g.Userid = u.Userid
group by GroupName
group by GroupName
having count(g.*) > 5
having count(g.*) > 5
</syntaxhighlight>
</source>
|8= the SELECT list is evaluated and returned as Vtable 7
|8= the SELECT list is evaluated and returned as Vtable 7
|9= the DISTINCT clause is evaluated; duplicate rows are removed and returned as Vtable 8
|9= the DISTINCT clause is evaluated; duplicate rows are removed and returned as Vtable 8
Line 610: Line 716:
}}
}}


==Window function support by RDBMS vendors==
== Window function support by RDBMS vendors ==
The implementation of window function features by vendors of relational databases and SQL engines differs wildly. Most databases support at least some flavour of window functions. However, when we take a closer look it becomes clear that most vendors only implement a subset of the standard. Let's take the powerful RANGE clause as an example. Only Oracle, DB2, Spark/Hive, and Google Big Query fully implement this feature. More recently, vendors have added new extensions to the standard, e.g. array aggregation functions. These are particularly useful in the context of running SQL against a distributed file system (Hadoop, Spark, Google BigQuery) where we have weaker data co-locality guarantees than on a distributed relational database (MPP). Rather than evenly distributing the data across all nodes, SQL engines running queries against a distributed filesystem can achieve data co-locality guarantees by nesting data and thus avoiding potentially expensive joins involving heavy shuffling across the network. User-defined aggregate functions that can be used in window functions are another extremely powerful feature.

The implementation of window function features by vendors of relational databases and SQL engines differs wildly. Apart from MySQL, most databases support at least some flavour of window functions. However, when we take a closer look it becomes clear that most vendors only implement a subset of the standard. Let's take the powerful RANGE clause as an example. Only Oracle, DB2, Spark/Hive, and Google Big Query fully implement this feature. More recently, vendors have added new extensions to the standard, e.g. array aggregation functions. These are particularly useful in the context of running SQL against a distributed file system (Hadoop, Spark, Google BigQuery) where we have weaker data co-locality guarantees than on a distributed relational database (MPP). Rather than evenly distributing the data across all nodes, SQL engines running queries against a distributed filesystem can achieve data co-locality guarantees by nesting data and thus avoiding potentially expensive joins involving heavy shuffling across the network. User-defined aggregate functions that can be used in window functions are another extremely powerful feature.


==Generating data in T-SQL==
== Generating data in T-SQL ==
Method to generate data based on the union all
Method to generate data based on the union all
<source lang="sql">
<syntaxhighlight lang="tsql">
select 1 a, 1 b union all
select 1 a, 1 b union all
select 1, 2 union all
select 1, 2 union all
Line 622: Line 727:
select 2, 1 union all
select 2, 1 union all
select 5, 1
select 5, 1
</syntaxhighlight>
</source>


SQL Server 2008 supports the "row constructor" specified in the SQL3 ("SQL:1999") standard
SQL Server 2008 supports the "row constructor" feature, specified in the [[SQL:1999]] standard
<source lang="sql">
<syntaxhighlight lang="tsql">
select *
select *
from (values (1, 1), (1, 2), (1, 3), (2, 1), (5, 1)) as x(a, b)
from (values (1, 1), (1, 2), (1, 3), (2, 1), (5, 1)) as x(a, b)
</syntaxhighlight>
</source>


==References==
== References ==
{{Reflist}}
{{Reflist}}


==Sources==
== Sources ==
* Horizontal & Vertical Partitioning, Microsoft SQL Server 2000 Books Online.
* Horizontal & Vertical Partitioning, Microsoft SQL Server 2000 Books Online.


==External links==
== External links ==
* [http://wwwlgis.informatik.uni-kl.de/cms/fileadmin/courses/SS2008/NEDM/RDDM.Chapter.06.Windows_and_Query_Functions_in_SQL.pdf Windowed Tables and Window function in SQL], Stefan Deßloch
* [http://wwwlgis.informatik.uni-kl.de/cms/fileadmin/courses/SS2008/NEDM/RDDM.Chapter.06.Windows_and_Query_Functions_in_SQL.pdf Windowed Tables and Window function in SQL], Stefan Deßloch
* [http://download.oracle.com/docs/cd/B14117_01/server.101/b10759/statements_10002.htm Oracle SELECT Syntax.]
* [https://download.oracle.com/docs/cd/B14117_01/server.101/b10759/statements_10002.htm Oracle SELECT syntax]
* [http://www.firebirdsql.org/rlsnotesh/rlsnotes210.html#rnfb20x-dml-select-syntax Firebird SELECT Syntax.]
* [https://www.firebirdsql.org/rlsnotesh/rlsnotes210.html#rnfb20x-dml-select-syntax Firebird SELECT syntax]
* [http://dev.mysql.com/doc/refman/5.1/en/select.html Mysql SELECT Syntax.]
* [https://dev.mysql.com/doc/refman/8.0/en/select.html MySQL SELECT syntax]
* [http://www.postgresql.org/docs/current/static/sql-select.html Postgres SELECT Syntax.]
* [https://www.postgresql.org/docs/current/static/sql-select.html PostgreSQL SELECT syntax]
* [http://www.sqlite.org/lang_select.html SQLite SELECT Syntax.]
* [https://www.sqlite.org/lang_select.html SQLite SELECT syntax]


{{SQL}}
{{SQL}}

Latest revision as of 19:30, 20 March 2024

The SQL SELECT statement returns a result set of rows, from one or more tables.[1][2]

A SELECT statement retrieves zero or more rows from one or more database tables or database views. In most applications, SELECT is the most commonly used data manipulation language (DML) command. As SQL is a declarative programming language, SELECT queries specify a result set, but do not specify how to calculate it. The database translates the query into a "query plan" which may vary between executions, database versions and database software. This functionality is called the "query optimizer" as it is responsible for finding the best possible execution plan for the query, within applicable constraints.

The SELECT statement has many optional clauses:

  • SELECT list is the list of columns or SQL expressions to be returned by the query. This is approximately the relational algebra projection operation.
  • AS optionally provides an alias for each column or expression in the SELECT list. This is the relational algebra rename operation.
  • FROM specifies from which table to get the data.[3]
  • WHERE specifies which rows to retrieve. This is approximately the relational algebra selection operation.
  • GROUP BY groups rows sharing a property so that an aggregate function can be applied to each group.
  • HAVING selects among the groups defined by the GROUP BY clause.
  • ORDER BY specifies how to order the returned rows.

Overview[edit]

SELECT is the most common operation in SQL, called "the query". SELECT retrieves data from one or more tables, or expressions. Standard SELECT statements have no persistent effects on the database. Some non-standard implementations of SELECT can have persistent effects, such as the SELECT INTO syntax provided in some databases.[4]

Queries allow the user to describe desired data, leaving the database management system (DBMS) to carry out planning, optimizing, and performing the physical operations necessary to produce that result as it chooses.

A query includes a list of columns to include in the final result, normally immediately following the SELECT keyword. An asterisk ("*") can be used to specify that the query should return all columns of all the queried tables. SELECT is the most complex statement in SQL, with optional keywords and clauses that include:

  • The FROM clause, which indicates the table(s) to retrieve data from. The FROM clause can include optional JOIN subclauses to specify the rules for joining tables.
  • The WHERE clause includes a comparison predicate, which restricts the rows returned by the query. The WHERE clause eliminates all rows from the result set where the comparison predicate does not evaluate to True.
  • The GROUP BY clause projects rows having common values into a smaller set of rows. GROUP BY is often used in conjunction with SQL aggregation functions or to eliminate duplicate rows from a result set. The WHERE clause is applied before the GROUP BY clause.
  • The HAVING clause includes a predicate used to filter rows resulting from the GROUP BY clause. Because it acts on the results of the GROUP BY clause, aggregation functions can be used in the HAVING clause predicate.
  • The ORDER BY clause identifies which column[s] to use to sort the resulting data, and in which direction to sort them (ascending or descending). Without an ORDER BY clause, the order of rows returned by an SQL query is undefined.
  • The DISTINCT keyword[5] eliminates duplicate data.[6]

The following example of a SELECT query returns a list of expensive books. The query retrieves all rows from the Book table in which the price column contains a value greater than 100.00. The result is sorted in ascending order by title. The asterisk (*) in the select list indicates that all columns of the Book table should be included in the result set.

SELECT *
 FROM  Book
 WHERE price > 100.00
 ORDER BY title;

The example below demonstrates a query of multiple tables, grouping, and aggregation, by returning a list of books and the number of authors associated with each book.

    SELECT Book.title AS Title,
           count(*) AS Authors
     FROM  Book
     JOIN  Book_author
       ON  Book.isbn = Book_author.isbn
 GROUP BY Book.title;

Example output might resemble the following:

Title                  Authors
---------------------- -------
SQL Examples and Guide 4
The Joy of SQL         1
An Introduction to SQL 2
Pitfalls of SQL        1

Under the precondition that isbn is the only common column name of the two tables and that a column named title only exists in the Book table, one could re-write the query above in the following form:

SELECT title,
       count(*) AS Authors
 FROM  Book
 NATURAL JOIN Book_author
 GROUP BY title;

However, many[quantify] vendors either do not support this approach, or require certain column-naming conventions for natural joins to work effectively.

SQL includes operators and functions for calculating values on stored values. SQL allows the use of expressions in the select list to project data, as in the following example, which returns a list of books that cost more than 100.00 with an additional sales_tax column containing a sales tax figure calculated at 6% of the price.

SELECT isbn,
       title,
       price,
       price * 0.06 AS sales_tax
 FROM  Book
 WHERE price > 100.00
 ORDER BY title;

Subqueries[edit]

Queries can be nested so that the results of one query can be used in another query via a relational operator or aggregation function. A nested query is also known as a subquery. While joins and other table operations provide computationally superior (i.e. faster) alternatives in many cases (all depending on implementation), the use of subqueries introduces a hierarchy in execution that can be useful or necessary. In the following example, the aggregation function AVG receives as input the result of a subquery:

SELECT isbn,
       title,
       price
 FROM  Book
 WHERE price < (SELECT AVG(price) FROM Book)
 ORDER BY title;

A subquery can use values from the outer query, in which case it is known as a correlated subquery.

Since 1999 the SQL standard allows WITH clauses, i.e. named subqueries often called common table expressions (named and designed after the IBM DB2 version 2 implementation; Oracle calls these subquery factoring). CTEs can also be recursive by referring to themselves; the resulting mechanism allows tree or graph traversals (when represented as relations), and more generally fixpoint computations.

Derived table[edit]

A derived table is a subquery in a FROM clause. Essentially, the derived table is a subquery that can be selected from or joined to. Derived table functionality allows the user to reference the subquery as a table. The derived table also is referred to as an inline view or a select in from list.

In the following example, the SQL statement involves a join from the initial Books table to the derived table "Sales". This derived table captures associated book sales information using the ISBN to join to the Books table. As a result, the derived table provides the result set with additional columns (the number of items sold and the company that sold the books):

SELECT b.isbn, b.title, b.price, sales.items_sold, sales.company_nm
FROM Book b
  JOIN (SELECT SUM(Items_Sold) Items_Sold, Company_Nm, ISBN
        FROM Book_Sales
        GROUP BY Company_Nm, ISBN) sales
  ON sales.isbn = b.isbn

Examples[edit]

Table "T" Query Result
C1 C2
1 a
2 b
SELECT * FROM T;
C1 C2
1 a
2 b
C1 C2
1 a
2 b
SELECT C1 FROM T;
C1
1
2
C1 C2
1 a
2 b
SELECT * FROM T WHERE C1 = 1;
C1 C2
1 a
C1 C2
1 a
2 b
SELECT * FROM T ORDER BY C1 DESC;
C1 C2
2 b
1 a
does not exist SELECT 1+1, 3*2;
`1+1` `3*2`
2 6

Given a table T, the query SELECT * FROM T will result in all the elements of all the rows of the table being shown.

With the same table, the query SELECT C1 FROM T will result in the elements from the column C1 of all the rows of the table being shown. This is similar to a projection in relational algebra, except that in the general case, the result may contain duplicate rows. This is also known as a Vertical Partition in some database terms, restricting query output to view only specified fields or columns.

With the same table, the query SELECT * FROM T WHERE C1 = 1 will result in all the elements of all the rows where the value of column C1 is '1' being shown – in relational algebra terms, a selection will be performed, because of the WHERE clause. This is also known as a Horizontal Partition, restricting rows output by a query according to specified conditions.

With more than one table, the result set will be every combination of rows. So if two tables are T1 and T2, SELECT * FROM T1, T2 will result in every combination of T1 rows with every T2 rows. E.g., if T1 has 3 rows and T2 has 5 rows, then 15 rows will result.

Although not in standard, most DBMS allows using a select clause without a table by pretending that an imaginary table with one row is used. This is mainly used to perform calculations where a table is not needed.

The SELECT clause specifies a list of properties (columns) by name, or the wildcard character (“*”) to mean “all properties”.

Limiting result rows[edit]

Often it is convenient to indicate a maximum number of rows that are returned. This can be used for testing or to prevent consuming excessive resources if the query returns more information than expected. The approach to do this often varies per vendor.

In ISO SQL:2003, result sets may be limited by using

ISO SQL:2008 introduced the FETCH FIRST clause.

According to PostgreSQL v.9 documentation, an SQL window function "performs a calculation across a set of table rows that are somehow related to the current row", in a way similar to aggregate functions.[7] The name recalls signal processing window functions. A window function call always contains an OVER clause.

ROW_NUMBER() window function[edit]

ROW_NUMBER() OVER may be used for a simple table on the returned rows, e.g. to return no more than ten rows:

SELECT * FROM
( SELECT
    ROW_NUMBER() OVER (ORDER BY sort_key ASC) AS row_number,
    columns
  FROM tablename
) AS foo
WHERE row_number <= 10

ROW_NUMBER can be non-deterministic: if sort_key is not unique, each time you run the query it is possible to get different row numbers assigned to any rows where sort_key is the same. When sort_key is unique, each row will always get a unique row number.

RANK() window function[edit]

The RANK() OVER window function acts like ROW_NUMBER, but may return more or less than n rows in case of tie conditions, e.g. to return the top-10 youngest persons:

SELECT * FROM (
  SELECT
    RANK() OVER (ORDER BY age ASC) AS ranking,
    person_id,
    person_name,
    age
  FROM person
) AS foo
WHERE ranking <= 10

The above code could return more than ten rows, e.g. if there are two people of the same age, it could return eleven rows.

FETCH FIRST clause[edit]

Since ISO SQL:2008 results limits can be specified as in the following example using the FETCH FIRST clause.

SELECT * FROM T 
FETCH FIRST 10 ROWS ONLY

This clause currently is supported by CA DATACOM/DB 11, IBM DB2, SAP SQL Anywhere, PostgreSQL, EffiProz, H2, HSQLDB version 2.0, Oracle 12c and Mimer SQL.

Microsoft SQL Server 2008 and higher supports FETCH FIRST, but it is considered part of the ORDER BY clause. The ORDER BY, OFFSET, and FETCH FIRST clauses are all required for this usage.

SELECT * FROM T 
ORDER BY acolumn DESC OFFSET 0 ROWS FETCH FIRST 10 ROWS ONLY

Non-standard syntax[edit]

Some DBMSs offer non-standard syntax either instead of or in addition to SQL standard syntax. Below, variants of the simple limit query for different DBMSes are listed:

SET ROWCOUNT 10
SELECT * FROM T
MS SQL Server (This also works on Microsoft SQL Server 6.5 while the Select top 10 * from T does not)
SELECT * FROM T 
LIMIT 10 OFFSET 20
Netezza, MySQL, MariaDB (also supports the standard version, since version 10.6), SAP SQL Anywhere, PostgreSQL (also supports the standard, since version 8.4), SQLite, HSQLDB, H2, Vertica, Polyhedra, Couchbase Server, Snowflake Computing, OpenLink Virtuoso
SELECT * from T 
WHERE ROWNUM <= 10
Oracle
SELECT FIRST 10 * from T Ingres
SELECT FIRST 10 * FROM T order by a Informix
SELECT SKIP 20 FIRST 10 * FROM T order by c, d Informix (row numbers are filtered after order by is evaluated. SKIP clause was introduced in a v10.00.xC4 fixpack)
SELECT TOP 10 * FROM T MS SQL Server, SAP ASE, MS Access, SAP IQ, Teradata
SELECT * FROM T 
SAMPLE 10
Teradata
SELECT TOP 20, 10 * FROM T OpenLink Virtuoso (skips 20, delivers next 10)[8]
SELECT TOP 10 START AT 20 * FROM T SAP SQL Anywhere (also supports the standard, since version 9.0.1)
SELECT FIRST 10 SKIP 20 * FROM T Firebird
SELECT * FROM T
ROWS 20 TO 30
Firebird (since version 2.1)
SELECT * FROM T
WHERE ID_T > 10 FETCH FIRST 10 ROWS ONLY
IBM Db2
SELECT * FROM T
WHERE ID_T > 20 FETCH FIRST 10 ROWS ONLY
IBM Db2 (new rows are filtered after comparing with key column of table T)

Rows Pagination[edit]

Rows Pagination[9] is an approach used to limit and display only a part of the total data of a query in the database. Instead of showing hundreds or thousands of rows at the same time, the server is requested only one page (a limited set of rows, per example only 10 rows), and the user starts navigating by requesting the next page, and then the next one, and so on. It is very useful, specially in web systems, where there is no dedicated connection between the client and the server, so the client does not have to wait to read and display all the rows of the server.

Data in Pagination approach[edit]

  • {rows} = Number of rows in a page
  • {page_number} = Number of the current page
  • {begin_base_0} = Number of the row - 1 where the page starts = (page_number-1) * rows

Simplest method (but very inefficient)[edit]

  1. Select all rows from the database
  2. Read all rows but send to display only when the row_number of the rows read is between {begin_base_0 + 1} and {begin_base_0 + rows}
Select * 
from {table} 
order by {unique_key}

Other simple method (a little more efficient than read all rows)[edit]

  1. Select all the rows from the beginning of the table to the last row to display ({begin_base_0 + rows})
  2. Read the {begin_base_0 + rows} rows but send to display only when the row_number of the rows read is greater than {begin_base_0}
SQL Dialect
select *
from {table}
order by {unique_key}
FETCH FIRST {begin_base_0 + rows} ROWS ONLY
SQL ANSI 2008
PostgreSQL
SQL Server 2012
Derby
Oracle 12c
DB2 12
Mimer SQL
Select *
from {table}
order by {unique_key}
LIMIT {begin_base_0 + rows}
MySQL
SQLite
Select TOP {begin_base_0 + rows} * 
from {table} 
order by {unique_key}
SQL Server 2005
Select *
from {table}
order by {unique_key}
ROWS LIMIT {begin_base_0 + rows}
Sybase, ASE 16 SP2
SET ROWCOUNT {begin_base_0 + rows}
Select * 
from {table} 
order by {unique_key}
SET ROWCOUNT 0
Sybase, SQL Server 2000
Select *
    FROM (
        SELECT * 
        FROM {table} 
        ORDER BY {unique_key}
    ) a 
where rownum <= {begin_base_0 + rows}
Oracle 11


Method with positioning[edit]

  1. Select only {rows} rows starting from the next row to display ({begin_base_0 + 1})
  2. Read and send to display all the rows read from the database
SQL Dialect
Select *
from {table}
order by {unique_key}
OFFSET {begin_base_0} ROWS
FETCH NEXT {rows} ROWS ONLY
SQL ANSI 2008
PostgreSQL
SQL Server 2012
Derby
Oracle 12c
DB2 12
Mimer SQL
Select *
from {table}
order by {unique_key}
LIMIT {rows} OFFSET {begin_base_0}
MySQL
MariaDB
PostgreSQL
SQLite
Select * 
from {table} 
order by {unique_key}
LIMIT {begin_base_0}, {rows}
MySQL
MariaDB
SQLite
Select *
from {table}
order by {unique_key}
ROWS LIMIT {rows} OFFSET {begin_base_0}
Sybase, ASE 16 SP2
Select TOP {begin_base_0 + rows}
       *,  _offset=identity(10) 
into #temp
from {table}
ORDER BY {unique_key} 
select * from #temp where _offset > {begin_base_0}
DROP TABLE #temp
Sybase 12.5.3:
SET ROWCOUNT {begin_base_0 + rows}
select *,  _offset=identity(10) 
into #temp
from {table}
ORDER BY {unique_key} 
select * from #temp where _offset > {begin_base_0}
DROP TABLE #temp
SET ROWCOUNT 0
Sybase 12.5.2:
select TOP {rows} * 
from (
      select *, ROW_NUMBER() over (order by {unique_key}) as _offset
      from {table}
) xx 
where _offset > {begin_base_0}


SQL Server 2005
SET ROWCOUNT {begin_base_0 + rows}
select *,  _offset=identity(int,1,1) 
into #temp
from {table}
ORDER BY {unique-key}
select * from #temp where _offset > {begin_base_0}
DROP TABLE #temp
SET ROWCOUNT 0
SQL Server 2000
SELECT * FROM (
    SELECT rownum-1 as _offset, a.* 
    FROM(
        SELECT * 
        FROM {table} 
        ORDER BY {unique_key}
    ) a 
    WHERE rownum <= {begin_base_0 + cant_regs}
)
WHERE _offset >= {begin_base_0}
Oracle 11


Method with filter (it is more sophisticated but necessary for very big dataset)[edit]

  1. Select only then {rows} rows with filter:
    1. First Page: select only the first {rows} rows, depending on the type of database
    2. Next Page: select only the first {rows} rows, depending on the type of database, where the {unique_key} is greater than {last_val} (the value of the {unique_key} of the last row in the current page)
    3. Previous Page: sort the data in the reverse order, select only the first {rows} rows, where the {unique_key} is less than {first_val} (the value of the {unique_key} of the first row in the current page), and sort the result in the correct order
  2. Read and send to display all the rows read from the database
First Page Next Page Previous Page Dialect
select *
from {table} 
order by {unique_key}
FETCH FIRST {rows} ROWS ONLY
select * 
from {table} 
where {unique_key} > {last_val}
order by {unique_key}
FETCH FIRST {rows} ROWS ONLY
select * 
 from (
   select * 
   from {table} 
   where {unique_key} < {first_val}
   order by {unique_key} DESC
   FETCH FIRST {rows} ROWS ONLY
 ) a
 order by {unique_key}
SQL ANSI 2008
PostgreSQL
SQL Server 2012
Derby
Oracle 12c
DB2 12
Mimer SQL
select *
from {table}
order by {unique_key}
LIMIT {rows}
select * 
from {table} 
where {unique_key} > {last_val}
order by {unique_key}
LIMIT {rows}
select * 
 from (
   select * 
   from {table} 
   where {unique_key} < {first_val}
   order by {unique_key} DESC
   LIMIT {rows}
 ) a
 order by {unique_key}
MySQL
SQLite
select TOP {rows} * 
from {table} 
order by {unique_key}
select TOP {rows} * 
from {table} 
where {unique_key} > {last_val}
order by {unique_key}
select * 
 from (
   select TOP {rows} * 
   from {table} 
   where {unique_key} < {first_val}
   order by {unique_key} DESC
 ) a
 order by {unique_key}
SQL Server 2005
SET ROWCOUNT {rows}
select *
from {table} 
order by {unique_key}
SET ROWCOUNT 0
SET ROWCOUNT {rows}
select *
from {table} 
where {unique_key} > {last_val}
order by {unique_key}
SET ROWCOUNT 0
SET ROWCOUNT {rows}
 select *
 from (
   select * 
   from {table} 
   where {unique_key} < {first_val}
   order by {unique_key} DESC
 ) a
 order by {unique_key}
 SET ROWCOUNT 0
Sybase, SQL Server 2000
select *
from (
    select * 
    from {table} 
    order by {unique_key}
  ) a 
where rownum <= {rows}
select *
from (
  select * 
  from {table} 
  where {unique_key} > {last_val}
  order by {unique_key}
) a 
where rownum <= {rows}
select * 
 from (
   select *
   from (
     select * 
     from {table} 
     where {unique_key} < {first_val}
     order by {unique_key} DESC
   ) a1
   where rownum <= {rows}
 ) a2
 order by {unique_key}
Oracle 11

Hierarchical query[edit]

Some databases provide specialised syntax for hierarchical data.

A window function in SQL:2003 is an aggregate function applied to a partition of the result set.

For example,

 sum(population) OVER( PARTITION BY city )

calculates the sum of the populations of all rows having the same city value as the current row.

Partitions are specified using the OVER clause which modifies the aggregate. Syntax:

<OVER_CLAUSE> :: =
    OVER ( [ PARTITION BY <expr>, ... ]
           [ ORDER BY <expression> ] )

The OVER clause can partition and order the result set. Ordering is used for order-relative functions such as row_number.

Query evaluation ANSI[edit]

The processing of a SELECT statement according to ANSI SQL would be the following:[10]

  1. select g.*
    from users u inner join groups g on g.Userid = u.Userid
    where u.LastName = 'Smith'
    and u.FirstName = 'John'
    
  2. the FROM clause is evaluated, a cross join or Cartesian product is produced for the first two tables in the FROM clause resulting in a virtual table as Vtable1
  3. the ON clause is evaluated for vtable1; only records which meet the join condition g.Userid = u.Userid are inserted into Vtable2
  4. If an outer join is specified, records which were dropped from vTable2 are added into VTable 3, for instance if the above query were:
    select u.*
    from users u left join groups g on g.Userid = u.Userid
    where u.LastName = 'Smith'
    and u.FirstName = 'John'
    
    all users who did not belong to any groups would be added back into Vtable3
  5. the WHERE clause is evaluated, in this case only group information for user John Smith would be added to vTable4
  6. the GROUP BY is evaluated; if the above query were:
    select g.GroupName, count(g.*) as NumberOfMembers
    from users u inner join groups g on g.Userid = u.Userid
    group by GroupName
    
    vTable5 would consist of members returned from vTable4 arranged by the grouping, in this case the GroupName
  7. the HAVING clause is evaluated for groups for which the HAVING clause is true and inserted into vTable6. For example:
    select g.GroupName, count(g.*) as NumberOfMembers
    from users u inner join groups g on g.Userid = u.Userid
    group by GroupName
    having count(g.*) > 5
    
  8. the SELECT list is evaluated and returned as Vtable 7
  9. the DISTINCT clause is evaluated; duplicate rows are removed and returned as Vtable 8
  10. the ORDER BY clause is evaluated, ordering the rows and returning VCursor9. This is a cursor and not a table because ANSI defines a cursor as an ordered set of rows (not relational).

Window function support by RDBMS vendors[edit]

The implementation of window function features by vendors of relational databases and SQL engines differs wildly. Most databases support at least some flavour of window functions. However, when we take a closer look it becomes clear that most vendors only implement a subset of the standard. Let's take the powerful RANGE clause as an example. Only Oracle, DB2, Spark/Hive, and Google Big Query fully implement this feature. More recently, vendors have added new extensions to the standard, e.g. array aggregation functions. These are particularly useful in the context of running SQL against a distributed file system (Hadoop, Spark, Google BigQuery) where we have weaker data co-locality guarantees than on a distributed relational database (MPP). Rather than evenly distributing the data across all nodes, SQL engines running queries against a distributed filesystem can achieve data co-locality guarantees by nesting data and thus avoiding potentially expensive joins involving heavy shuffling across the network. User-defined aggregate functions that can be used in window functions are another extremely powerful feature.

Generating data in T-SQL[edit]

Method to generate data based on the union all

select 1 a, 1 b union all
select 1, 2 union all
select 1, 3 union all
select 2, 1 union all
select 5, 1

SQL Server 2008 supports the "row constructor" feature, specified in the SQL:1999 standard

select *
from (values (1, 1), (1, 2), (1, 3), (2, 1), (5, 1)) as x(a, b)

References[edit]

  1. ^ Microsoft (23 May 2023). "Transact-SQL Syntax Conventions".
  2. ^ MySQL. "SQL SELECT Syntax".
  3. ^ Omitting FROM clause is not standard, but allowed by most major DBMSes.
  4. ^ "Transact-SQL Reference". SQL Server Language Reference. SQL Server 2005 Books Online. Microsoft. 2007-09-15. Retrieved 2007-06-17.
  5. ^ SAS 9.4 SQL Procedure User's Guide. SAS Institute (published 2013). 10 July 2013. p. 248. ISBN 9781612905686. Retrieved 2015-10-21. Although the UNIQUE argument is identical to DISTINCT, it is not an ANSI standard.
  6. ^ Leon, Alexis; Leon, Mathews (1999). "Eliminating duplicates - SELECT using DISTINCT". SQL: A Complete Reference. New Delhi: Tata McGraw-Hill Education (published 2008). p. 143. ISBN 9780074637081. Retrieved 2015-10-21. [...] the keyword DISTINCT [...] eliminates the duplicates from the result set.
  7. ^ PostgreSQL 9.1.24 Documentation - Chapter 3. Advanced Features
  8. ^ OpenLink Software. "9.19.10. The TOP SELECT Option". docs.openlinksw.com. Retrieved 1 October 2019.
  9. ^ Ing. Óscar Bonilla, MBA
  10. ^ Inside Microsoft SQL Server 2005: T-SQL Querying by Itzik Ben-Gan, Lubor Kollar, and Dejan Sarka

Sources[edit]

  • Horizontal & Vertical Partitioning, Microsoft SQL Server 2000 Books Online.

External links[edit]