Jump to content

Language Integrated Query: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Line 39: Line 39:


===Language Extensions===
===Language Extensions===
While LINQ is primarily implemented as a [[library (computing)|library]] for [[.NET Framework 2.0]], it also defines a set of language extensions that can be optionally implemented by languages to make queries a first class language construct and provide [[syntactic sugar]] for writing queries. These language extensions have initially been implemented in [[C sharp|C# 3.0]] and [[VB.NET|VB 9.0]], with other languages like [[F sharp|F#]] and [[Nemerle]] having announced preliminary support. The language extensions include:<ref name="linq">{{cite web | url = http://msdn2.microsoft.com/hi-in/library/bb397921(VS.90).aspx | title = LINQ Framework | accessdate = 2007-11-30}}</ref>
While LINQ is primarily implemented as a [[library (computing)|library]] for [[.NET Framework 2.0]], it also defines a set of language extensions that can be optionally implemented by languages to make queries a first class language construct and provide [[syntactic sugar]] for writing queries. These language extensions have initially been implemented in [[C sharp|C# 3.0]] and [[VB.NET|VB 9.0]], with other languages like [[F sharp language|F#]] and [[Nemerle]] having announced preliminary support. The language extensions include:<ref name="linq">{{cite web | url = http://msdn2.microsoft.com/hi-in/library/bb397921(VS.90).aspx | title = LINQ Framework | accessdate = 2007-11-30}}</ref>
*Query syntax: Languages are free to choose a query syntax, which it will recognize natively. These languages keywords must be translated by the compiler to appropriate LINQ method calls. The language can implement operator reordering and other optimizations at the keyword level
*Query syntax: Languages are free to choose a query syntax, which it will recognize natively. These languages keywords must be translated by the compiler to appropriate LINQ method calls. The language can implement operator reordering and other optimizations at the keyword level
*Implicitly typed variables: It allows variables to be declared without specifying their types. C# 3.0 and VB 9.0 implements them with the <code>var</code> keyword. Such objects are still [[strong typing|strongly typed]]; for these objects the compiler uses [[type inference]] to infer the type of the variables. This allows the result of the queries to be specified and their result defined without declaring their the type of the intermediate variables.
*Implicitly typed variables: It allows variables to be declared without specifying their types. C# 3.0 and VB 9.0 implements them with the <code>var</code> keyword. Such objects are still [[strong typing|strongly typed]]; for these objects the compiler uses [[type inference]] to infer the type of the variables. This allows the result of the queries to be specified and their result defined without declaring their the type of the intermediate variables.

Revision as of 07:57, 8 December 2007

Language Integrated Query (LINQ, pronounced "link") is a Microsoft .NET Framework component that adds native data querying capabilities to .NET languages using a syntax reminiscent of SQL. Many of the concepts that LINQ has introduced were originally trialled in Microsoft's research project. LINQ has been released as a part of .NET Framework 3.5 on November 19, 2007.

LINQ defines a set of query operators that can be used to query, project and filter data in arrays, enumerable classes, XML, relational databases, and third party data sources. While it allows any data source to be queried, it requires that the data must be encapsulated as objects. So, if the data source does not natively store data as objects, the data must be mapped to the object domain. Queries written using the query operators are executed either by the LINQ query processing engine or, via an extension mechanism, handed over to LINQ providers which either implement a separate query processing engine or translate to a different format to be executed on a separate data store (such as on a database server as SQL queries). The results of a query are returned as a collection of in-memory objects that can be enumerated.

Architecture

Standard Query Operators

The set of query operators defined by LINQ are surfaced to the user as the Standard Query Operator API. The query operators supported by the API are:[1]

Select / SelectMany
The Select statement is used to perform a projection on the collection to select either all the data members that make up the object or a subset of it. The SelectMany operator is used to perform a one-to-many projection, i.e., if the objects in the collection contain another collection as a data member, SelectMany can be used to select the entire sub-collection. The user supplies a function, as a delegate, which selects the data members. The delegate is invoked on all the objects to project out the unneeded data members. Selection creates an object of a different type, which has either same or as many data members as the original class. The class must be already defined for the code to be compilable.
Where
The Where operator allows to define a set of predicate rule which are evaluated for each object in the collection, and objects which do not match the rule are filtered away. The predicate is supplied to the operator as a delegate.
Join / GroupJoin
The Join operator performs an inner join on two collections, based on matching keys for objects in each collection. It takes two functions as delegates, one for each collection, that it executes on each object in the collection to extract the key from the object. It also takes another delegate via which the user specifies which data elements, from the two matched elements, should be used to create the resultant object. The GroupJoin operator is used to perform a group join. Like Select operator, the results of a join are instantiations of a different class, with all the data members of both the types of the source objects, or a subset of them.
Take / TakeWhile
The Take operator is used to select the first n objects from a collection, while the TakeWhile operator, which takes a predicate, selects those objects which match the predicate.
Skip / SkipWhile
The Skip and SkipWhile operators are complements of Take and TakeWhile - they skip the first n objects from a collection, or those objects which match a predicate (for the case of SkipWhile).
OfType
The OfType operator is used to select the elements of a certain type T.
Concat
The Concat operator concatenates two collections.
OrderBy / ThenBy
The OrderBy operator is used to specify the primary sort ordering of the elements in a collection according to some key. The default ordering is in ascending order, to reverse the order the OrderByDescending operator is to be used. ThenBy and ThenByDescending specifies subsequent ordering of the elements. The function to extract the key value from the object is specified by the user as a delegate.
Reverse
The Reverse operator reverses a collection.
GroupBy
The GroupBy operator takes a delegate that extracts a key value and returns a collection of IGrouping<Key, Values> objects, for each distinct key value. The IGrouping objects can then be used to enumerate all the objects for a particular key value.
Distinct
The Distinct operator removes duplicate instances of a key value from a collection. The function to retrieve the key value is to be supplied as a delegate.
Union / Intersect / Except
These operators are used to perform a union, intersection and difference operation on two sequences, respectively.
EqualAll
The EqualAll operator checks if all elements in two collections are equal.
First / FirstOrDefualt / Last / LastOrDefault
These operators take a predicate. The First operator returns the first element for which the predicate yields true or throws an exception if nothing matches. The FirstOrDefault operator is like the First operator except that it returns the first element in the collection in case nothing matches the predicate. The last operator retrieves the last element to match the predicate, or throws an exception in case nothing matches. The LastOrDefault returns the last element in the case nothing matches the element.
Single
The Single operator takes a predicate and returns the element which matches the predicate. An exception is thrown if none or more than one elements match the predicate.
ElementAt
The ElementAt operator retrieves the element at a given index in the collection.
Any / All / Contains
The Any operator checks if there are any element in the collection matching the predicate. It does not select the element, but returns true for a match. The All operator checks if all elements match the predicate. The Contains operator checks if the collection contains a given value.
Count
The Count operator counts the number of elements in the given collection.
Sum / Min / Max / Average / Aggregate
These operators take a predicate that retrieves a certain numeric value from each element in the collection and uses it to find the sum, minimum, maximum, average or aggregate values of all the elements in the collection, respectively.

The Standard Query Operator also specifies certain operators which converts a collection into other types:[1]

  • ToSequence: converts the collection to IEnumerable<T> type.
  • ToQueryable: converts the collection to IQueryable<T> type.
  • ToArray: converts the collection an array.
  • ToList: converts the collection to IList<T> type.
  • ToDictionary: converts the collection to IDictionary<T, K> type, indexed by the key K.
  • ToLookup: converts the collection to ILookup<T, K> type, indexed by the key K.
  • Cast: converts each element in the collection to a different type.

The query operators are defined in the IEnumerable<T> interface as generic extension methods, and a concrete implementation is provided in the Sequence class. As a result, any class which implements the IEnumerable<T> interface has access to these methods and are queryable. LINQ also defines a set of generic Func delegates, which define the type of delegates handled by the LINQ query methods. Any function wrapped in a Func delegate can be used by LINQ. Each of these methods return an IEnumerable<T>, so the output of one can be used as input to another, resulting in query composability. The functions, however, are lazily evaluated, i.e., the collections are enumerated only when the result is retrieved. The enumeration is halted as soon as a match is found, and the delegates evaluated on it. When a subsequent object in the resultant collection is retrieved, the enumeration of the source collection is continued beyond the element already evaluated. However, grouping operations, like GroupBy and OrderBy, as well as Sum, Min, Max, Average and Aggregate, require data from all elements in collection, and force an eager evaluation. LINQ does not feature a query optimizer and the query operators are evaluated in the order they are invoked. The LINQ methods are compilable in .NET Framework 2.0, as well.[1]

Language Extensions

While LINQ is primarily implemented as a library for .NET Framework 2.0, it also defines a set of language extensions that can be optionally implemented by languages to make queries a first class language construct and provide syntactic sugar for writing queries. These language extensions have initially been implemented in C# 3.0 and VB 9.0, with other languages like F# and Nemerle having announced preliminary support. The language extensions include:[2]

  • Query syntax: Languages are free to choose a query syntax, which it will recognize natively. These languages keywords must be translated by the compiler to appropriate LINQ method calls. The language can implement operator reordering and other optimizations at the keyword level
  • Implicitly typed variables: It allows variables to be declared without specifying their types. C# 3.0 and VB 9.0 implements them with the var keyword. Such objects are still strongly typed; for these objects the compiler uses type inference to infer the type of the variables. This allows the result of the queries to be specified and their result defined without declaring their the type of the intermediate variables.
  • Anonymous types: Anonymous types allow classes, which contains only data member declarations, to be inferred by the compiler and define it. It helps for Select and Join operators, whose results are of a different type than the type of the original objects. The compiler uses type inference to infer what fields the classes will have and define them automatically, along with accessors and mutators for them.
  • Object Initializer: Object initializers allow the an object to be created and initialized in a single scope, this allows creation of delegates that extract fields from an object, create a new object and assign the extracted data to the fields of the new object in a single statement, as is required for Select and Join operators.
  • Lambda expressions: Lambda expressions are used to create delegates inline with other code. This allows the predicates and extraction functions to be written inline with the queries.

For example, in the query to select all the objects in a collection with SomeProperty less than 10,

 var results =  from   c in SomeCollection
                let x = SomeValue * 2
                where (c => c.SomeProperty() < x)
                select (c => new {p.SomeProperty, p.OtherProperty} );

 foreach (var result in results)
         Console.WriteLine(result);

the types of variables result, c and results all are inferred by the compiler - assuming SomeCollection is IEnumberable<SomeClass>, c will be SomeClass, results will be IEnumerable<SomeOtherClass> and result will be SomeOtherClass, where SomeOtherClass will be a compiler generated class with only the SomeProperty and OtherProperty properties and their values set from the corresponding clauses of the source objects. The operators are then translated into method calls as:

 IEnumerable<SomeOtherClass> results = 
      SomeCollection.Where
      (
           c => c.SomeProperty < (SomeValue * 2)
      )
      .Select
      (
           c => new {c.SomeProperty, c.OtherProperty}
      )
 foreach(SomeOtherClass result in results)
      Console.WriteLine( result.ToString() );

LINQ Providers

LINQ also defines another interface, IQueryable<T>, which defines the same interfaces to the Standard Query Operators as IEnumerable<T>. However, the concrete implementation of the interface, instead of evaluating the query, converts the query expression, with all the operators and predicates, into an expression tree.[3] The Expression tree preserves the high level structure of the query and can be examined at runtime. The type of the source collection defines which implementation will run - if the collection type implements IEnumerable<T>, it executes the local LINQ query execution engine and if it implements the IQueryable<T> implementation, it invokes the expression tree-based implementation. An extension method is also defined for IEnumerable<T> collections to be wrapped inside an IQueryable<T> collection, to force the latter implementation.

The expression trees are at the core of LINQ extensibility mechanism, by which LINQ can be adapted for any data source. The expression trees are handed over to LINQ Providers, which are data source-specific implementations that adapt the LINQ queries to be used with the data source. The LINQ Providers analyze the expression trees representing the query ("query trees") and generate a DynamicMethod (which are methods generated at runtime) by using the reflection APIs to emit CIL code. These methods are executed when the query is run.[3] LINQ comes with LINQ Providers for in-memory object collections, SQL Server databases, ADO.NET datasets and XML documents. These different providers define the different flavors of LINQ:

LINQ to Objects
The LINQ to Objects provider is used for querying in-memory collections, using the local query execution engine of LINQ. The code generated by this provider refer the implementations of the standard query operators as defined in the Sequence class and allows IQueryable<T> collections to be queried locally. LINQ to Objects is not dynamic, that is once a result set has been created and used, any changes to the source collection does not reflect automatically on the result set.[4]
LINQ to XML
The LINQ to XML provider converts an XML document to a collection of XElement objects, which are then queried against using the local execution engine that is provided as a part of the implementation of the standard query operator.[5]
LINQ to SQL
The LINQ to SQL provider allows LINQ to be used to query SQL Server databases. Since SQL Server data resides on a remote server, and because it already includes a querying engine, LINQ to SQL does not use the query engine of LINQ. Instead, it converts a LINQ query to SQL query which is then sent to SQL Server for processing.[6] However, since SQL Server stores the data as relational data and LINQ works with data encapsulated in objects, the two representations must be mapped to one another. For this reason, LINQ to SQL also defines the mapping framework. The mapping is done by defining classes which corresponds to the tables in database, and containing all or a subset of the columns in the table as data members.[7] The correspondence, along with other relational model attributes such as primary keys are specified using LINQ to SQL-defined attributes. For example,
[Table(Name="Customers")]
Public class Customer
{
     [Column(IsPrimaryKey = true)]
     public int CustID;

     [Column]
     public string CustName;
}
this class definitions maps to a table named Customers and the two data members correspond to two columns. The classes must be defined before LINQ to SQL can be used. Visual Studio 2008 includes a mapping designer which can be used to create the mapping between the data schemas in the object as well as relational domain. It can automatically create the corresponding classes from a database schema, as well as allow manual editing to create a different view by using only a subset of the tables or columns in a table.[7]
The mapping is implemented by the DataContext which takes a connection string to the server, and can be used to generate a Table<T> where T is the type that the database table will be mapped to. The Table<T> encapsulates the data in the table, and impelements the IQueryable<T> interface, so that the expression tree is created, which the LINQ to SQL provider handles. It converts the query into T-SQL and retrieves the result set from the database server. Since the processing happens at the database server, local methods, which are not defined as a part of the lambda expressions representing the predicates, cannot be used. However, it can used the stored procedures on the server. Any changes to the result set are tracked and can be submitted back to the database server.[7]
LINQ to DataSets
The LINQ to SQL provider works only with Microsoft SQL Server databases; to support any generic database, LINQ also includes the LINQ to DataSets, which uses ADO.NET to handle the communication with the database. Once the data is in ADO.NET Datasets, LINQ to Datasets execute queries against these datasets.[8]
Other providers
The LINQ providers can be implemented by third parties as well. Several database server specific providers are available from the database vendors.

Windows Search (LINQ to System Search)[9]

Google (Linq to Google)[10]

PLINQ

Template:Future software Microsoft, as a part of the Parallel FX Library, is developing PLINQ, or Parallel LINQ, a parallel execution engine for LINQ queries. It defines the IParallelEnumerable<T> interface. If the source collection implements this interface, the parallel execution engine is invoked. The PLINQ engine executes a query in a distributed manner on a multi-core or multi-processor system.[11]

References

  1. ^ a b c "Standard Query Operators". Microsoft. Retrieved 2007-11-30.
  2. ^ "LINQ Framework". Retrieved 2007-11-30.
  3. ^ a b "Anders Hejlsberg - LINQ". Retrieved 2007-11-30.
  4. ^ "LINQ to Objects Overview". Retrieved 2007-11-30.
  5. ^ ".NET Language-Integrated Query for XML Data". Retrieved 2007-11-30.
  6. ^ "LINQ to SQL". Retrieved 2007-11-30.
  7. ^ a b c "LINQ to SQL: .NET Language-Integrated Query for Relational Data". Retrieved 2007-11-30.
  8. ^ "LINQ to DataSets". Retrieved 2007-11-30.
  9. ^ "System Search to LINQ".
  10. ^ "Glinq".
  11. ^ "Programming in the Age of Concurrency: Concurrent Programming with PFX". Retrieved 2007-10-16.

External links