Continuous Query Language
The Continuous Query Language (CQL) is a declarative query language for data streams in data stream management systems . It is an extension of SQL . The CQL was developed until January 2006 as part of the STREAM project at Stanford University .
Components
The main extension of CQL compared to SQL is that in addition to relations (database tables), data streams also exist as data types.
Data streams can be understood as potentially infinite sequences of time-value pairs.
CQL also defines the following operators , which are explained in the following sections:
- the window operator
[...]
- the insert stream operator
ISTREAM
- the delete stream operator
DSTREAM
- the relation stream operator
RSTREAM
- the sampling operator
SAMPLE
for the selection from a data stream.
The conventional operators from SQL can be used for the transformation between relations.
The conversion of flows into other flows takes place via the detour of relations.
Examples
Weather data is measured continuously and arrives as a data stream in an application. The following CQL expression uses a window operator to return the average temperature over the last 24 hours:
SELECT AVG(Temperatur) FROM Wetter [Range 1 Day]
Since this is a continuous request, it is converted back into a data stream by default via istream. So the full query is:
SELECT ISTREAM(AVG(Temperatur) FROM Wetter [Range 1 Day])
In another database, lightning bolts recognized by a lightning detection system are entered with their coordinates. The following request generates a data stream from this:
SELECT ISTREAM(position) FROM Blitz
In combination with other data, for example, a data stream of lightning strikes in selected areas can be generated and output as a current lightning warning.
The operators in detail
window
A window is defined to convert currents into relations . The syntax for this is
S [W]
where is S
a stream and W
a window. The window can be specified by a period or by a number of tuples:
-
[Rows 100]
returns, for example, the last 100 tuples -
[Range 5 Seconds]
returns for example the tuples of the last 5 seconds
If no window is specified, it is [∞]
assumed that the relation contains all tuples from the data stream up to the current point in time. In addition, a restriction can be specified by a WHERE
clause and, in the case of windows based on the number of tuples, a grouping PARTITION BY
.
In the following example, a stream “order” with the attributes “customer” and “payment type” is given. The following request delivers the last 5 orders of each customer who paid by credit card.
SELECT * FROM Bestellungen B
[PARTITION BY B.Kunde ROWS 5]
WHERE B.Zahlungsart = 'Kreditkarte'
In connection with the operator RSTREAM
, the keywords NOW
and are also SLIDE
possible (see there).
Insert stream
The operator ISTREAM
is used to create a data stream from a relation in which a copy is sent to the data stream every time a tuple is inserted into the relation - but only if the inserted tuples are not duplicates. The following request provides the number of orders within the last hour with each new order:
SELECT ISTREAM(COUNT(*)) FROM Bestellungen B [Range 1 Hour]
Delete stream
The operator DSTREAM
creates a data stream from a relation in which every time a tuple is removed from a relation, this tuple is sent to the data stream. The following request delivers the deleted customers as a data stream:
DSTREAM (SELECT * FROM Kunden)
Relation stream
The operator RSTREAM
converts an entire relation into a data stream. This means that all tuples that are currently contained in the relation are sent to the data stream.
The key word NOW
for the length of a window provides RSTREAM
the current value in connection with the operator . In the following example two streams connection establishment and connection end are given, each of which supplies the beginning and the end of a communication connection. The following query continuously delivers all completed connections that lasted a maximum of 5 minutes.
SELECT RSTREAM(A.*)
FROM Verbindungsaufbau [Range 5 Minutes] A, Verbindungsende [Now] E
WHERE A.id = E.id
The keyword SLIDE
can be used to specify how often a window should be updated or when the window should be moved. The following query, for example, provides the moving monthly average of the total order amount from a stream of orders on a daily basis:
SELECT RSTREAM( AVG(B.Summe) ) FROM Bestellungen B [Range 1 Month Slide 1 Day]
Samples
In order to get a random selection from a stream, there is the operator SAMPLE
to which a percentage is passed. For example, the request SELECT * FROM T SAMPLE(10)
from the stream T
only outputs each element with a probability of 10%, so that on average only every tenth is included in the result stream .
rating
A major advantage of the CQL is that it is based directly on the - only partially declarative - SQL and the queries can be formulated declaratively . However, this requires efficient procedures in the DSMS for reformulating the request into a request plan.
Systems competing with STREAM either directly use an imperative model for formulating queries (e.g. Aurora / Borealis ) or have a syntax comparable to CQL ( TelegraphCQ ) or, in principle, also enable the execution of CQL queries ( PIPES ).
The processing of queries via XML data streams is a separate research area in which query languages based on XPath instead of SQL are developed.
See also
literature
- Brian Babcock, Shivnath Babu, Mayur Datar, Rajeev Motwani , Jennifer Widom: Models and Issues in Data Stream Systems . PODS 2002, Madison, Wisconsin.
- Arvind Arasu, Shivnath Babu, Jennifer Widom: The CQL Continuous Query Language: Semantic Foundations and Query Execution . Stanford, 2003 (Technical Report)
- Rajeev Motwani, Jennifer Widom, Arvind Arasu, Brian Babcock, Shivnath Babu, Mayur Datar, Gurmeet Manku, Chris Olston, Justin Rosenstein and Rohit Varma: Query Processing, Resource Management, and Approximation in a Data Stream Management System . Stanford, 2002 (CIDR 2003)
Web links
- Stanford Stream Data Manager (STREAM; English)
- CQL introduction from STREAM (English)
- CQL specification for STREAM (English)
- Stream Query Repository - a collection of queries in CQL