Continuous Query Language

from Wikipedia, the free encyclopedia

The Continuous Query Language (CQL) is a declarative query language for data streams in data stream management systems . It is an extension of SQL . The CQL was developed until January 2006 as part of the STREAM project at Stanford University .

Types of conversions between flows and relations in CQL

Components

The main extension of CQL compared to SQL is that in addition to relations (database tables), data streams also exist as data types.

Data streams can be understood as potentially infinite sequences of time-value pairs.

CQL also defines the following operators , which are explained in the following sections:

  • the window operator [...]
  • the insert stream operator ISTREAM
  • the delete stream operator DSTREAM
  • the relation stream operator RSTREAM
  • the sampling operator SAMPLEfor the selection from a data stream.

The conventional operators from SQL can be used for the transformation between relations.

The conversion of flows into other flows takes place via the detour of relations.

Examples

Weather data is measured continuously and arrives as a data stream in an application. The following CQL expression uses a window operator to return the average temperature over the last 24 hours:

SELECT AVG(Temperatur) FROM Wetter [Range 1 Day]

Since this is a continuous request, it is converted back into a data stream by default via istream. So the full query is:

SELECT ISTREAM(AVG(Temperatur) FROM Wetter [Range 1 Day])

In another database, lightning bolts recognized by a lightning detection system are entered with their coordinates. The following request generates a data stream from this:

SELECT ISTREAM(position) FROM Blitz

In combination with other data, for example, a data stream of lightning strikes in selected areas can be generated and output as a current lightning warning.

The operators in detail

window

A window is defined to convert currents into relations . The syntax for this is S [W]where is Sa stream and Wa window. The window can be specified by a period or by a number of tuples:

[Rows 100] returns, for example, the last 100 tuples
[Range 5 Seconds] returns for example the tuples of the last 5 seconds

If no window is specified, it is [∞]assumed that the relation contains all tuples from the data stream up to the current point in time. In addition, a restriction can be specified by a WHEREclause and, in the case of windows based on the number of tuples, a grouping PARTITION BY.

In the following example, a stream “order” with the attributes “customer” and “payment type” is given. The following request delivers the last 5 orders of each customer who paid by credit card.

SELECT * FROM Bestellungen B
[PARTITION BY B.Kunde ROWS 5]
WHERE B.Zahlungsart = 'Kreditkarte'

In connection with the operator RSTREAM, the keywords NOWand are also SLIDEpossible (see there).

Insert stream

The operator ISTREAMis used to create a data stream from a relation in which a copy is sent to the data stream every time a tuple is inserted into the relation - but only if the inserted tuples are not duplicates. The following request provides the number of orders within the last hour with each new order:

SELECT ISTREAM(COUNT(*)) FROM Bestellungen B [Range 1 Hour]

Delete stream

The operator DSTREAMcreates a data stream from a relation in which every time a tuple is removed from a relation, this tuple is sent to the data stream. The following request delivers the deleted customers as a data stream:

DSTREAM (SELECT * FROM Kunden)

Relation stream

The operator RSTREAMconverts an entire relation into a data stream. This means that all tuples that are currently contained in the relation are sent to the data stream.

The key word NOWfor the length of a window provides RSTREAMthe current value in connection with the operator . In the following example two streams connection establishment and connection end are given, each of which supplies the beginning and the end of a communication connection. The following query continuously delivers all completed connections that lasted a maximum of 5 minutes.

SELECT RSTREAM(A.*)
FROM Verbindungsaufbau [Range 5 Minutes] A, Verbindungsende [Now] E
WHERE A.id = E.id

The keyword SLIDEcan be used to specify how often a window should be updated or when the window should be moved. The following query, for example, provides the moving monthly average of the total order amount from a stream of orders on a daily basis:

SELECT RSTREAM( AVG(B.Summe) ) FROM Bestellungen B [Range 1 Month Slide 1 Day]

Samples

In order to get a random selection from a stream, there is the operator SAMPLEto which a percentage is passed. For example, the request SELECT * FROM T SAMPLE(10)from the stream Tonly outputs each element with a probability of 10%, so that on average only every tenth is included in the result stream .

rating

A major advantage of the CQL is that it is based directly on the - only partially declarative - SQL and the queries can be formulated declaratively . However, this requires efficient procedures in the DSMS for reformulating the request into a request plan.

Systems competing with STREAM either directly use an imperative model for formulating queries (e.g. Aurora / Borealis ) or have a syntax comparable to CQL ( TelegraphCQ ) or, in principle, also enable the execution of CQL queries ( PIPES ).

The processing of queries via XML data streams is a separate research area in which query languages ​​based on XPath instead of SQL are developed.

See also

literature

Web links

This version was added to the list of articles worth reading on June 7, 2005 .