Web of Ideas

Developers blog

Introduction to LINQ for C#

04 May 2012, 13:56 Subscribe to Richies blog (RSS)

What Is Linq?

Well, it stands for Language INtegrated Query, and in simple terms its SQL for code. It lets you query any data within your code using an SQL-like syntax. Simples.

But there’s a lot more to it than that, as I discovered yesterday when I was looking at ‘expression trees’. I won’t go into expression trees here, that’s for another blog, but in the process, I discovered what Linq is all about and how it fits nicely into the C# language.

How is Linq implemented in C# and dotNET

To enable SQL-like syntax in the C# compiler, a lot had to change. A lot of new features were added to C# 3.0, which are mostly useful in their own right, but clearly designed primarily to support Linq.

Linq only became properly available in the dotNET framework version 3.5 which includes the new assemblies required (i.e. System.Linq), but the compiler foundations were put in place in C# version 3.0.

Take a look at this simple Linq statement:

var robotMonsters =
    from m in Monsters
    where m.IsRobot == true
    select new { m.Name, m.HomePlanet, m.FavouriteColour }

It uses the following features of C# that were added in version 3.0

Query Keywords:
from, where and select. These are language keywords, known as ‘query keywords’, and built into C#.
There are other query keywords - group, into, orderby, join, let, in, on, equal, by, ascending and descending.

Note that these aren’t ‘proper’ keywords. They get converted into a different construct by the compiler (see below).

Type inference:
The var type for the result. When you use var, you are telling the compiler to figure out the type because you can’t be bothered.
By the way, I’m not a fan of var where it isn’t needed, as it makes code more difficult to read – having the type explicitly in the code helps you understand what is going on. There’s too much lazy use of var, in my opinion!
However, in this Linq query, the var type is essential because the result of the query is an anonymous type (see below) so there is no explicit type to use.

Anonymous types:
The result in the query above is a collection of a new type that contains Name, HomePlanet and FavouriteColour members. This type is not explicitly declared anywhere – the compiler creates it on the fly. From the
new { b.Name, b.HomePlanet, b.FavouriteColour }
syntax, it knows the type of all the members, so it is able to do this. This is why we have to use the var keyword for the robotMonsters result, as there is not an explicit type. The compiler will create a new anonymous type, and will figure out that the robotMonsters result is a collection of this type.

I mentioned above that query keywords are not proper keywords, because the compiler pre-processes them into a different construct. The compiler will pre-process the above Linq query into something like this:

var robotMonsters =
    Monsters
        .Where(m => m.IsRobot == true)
        .Select(m => new {m.Name, m.HomePlanet, m.FavouriteColour});

Which takes us nicely onto some other C# 3.0 features:

Extension methods

These give us the ability to extend existing classes without having access to the code or having to derive from them. For example, to extend string to remove rude words, you would create a method (in a utility class), like this:

public static string RemoveRudeWords(this string s)
{
    // remove rude words from s and return it
}
Then you could use this syntax:
    string s = “dash, darn and blast”;
    s = s.RemoveRudeWords();

In reality, its just smoke and mirrors – behind the scenes C# calls your helper method; but its nicer to read and more ‘OO’.

Linq uses extension methods to add functionality to collection objects in C#, for example it adds Where and Select as extension methods to IEnumerable and all the built-in classes that use it, so the above query works. These extensions (along with all the other Linq stuff not built in to the compiler directly) are in the System.Linq library.

Lamda expressions:
These are a progression of anonymous methods from C# 2.0. It’s basically a concise syntax for declaring a method. In the example, the method we are declaring for the Select would like this if explicitly coded:

public static var GetRobotMonster(Monster m)
{
return new {m.Name. m.HomePlanet, m.FavouriteColour};
}

and, if we coded this method, we could pass it in as the delegate to the Select, like this:

.Select(GetRobotMonster)

Lamda expressions, like anonymous methods, enable you to create a method ‘on the fly’, but they strip out all unnecessary syntax.

m => new {m.Name, m.HomePlanet, m.FavouriteColour}

m is the parameter into the method – the compiler can work out its type, so no need to specify it. The compiler can also work out the return type from the context, so no need to specify it. The return type from the Select (and Where) extension methods is templated - IEnumerable<T>, so T will become the anonymous type.

Whats the point of Linq?

Linq doesn’t do anything you couldn’t already do. If you want to query some data in C#, you could write this code:

List results = new List();
foreach (Monster m in Monsters)
{
  if (m.IsRobot)
  {
     results.Add(m);
  }
}

This is known as imperative programming – you specify exactly how to do it (i.e. you loop through all monsters, check if they are robots, if they are add to the results).

Linq is an example of declarative programming. You specify what you want to do only (get all Monsters that are robots), and you don’t care how it is done.
Heads up – common interview question, so remember the difference between imperative and declarative programming!

It is something of a paradigm shift to work with Linq, especially for programmers like me who have years of imperative programming under their belt. However, it is cleaner, and its easier to see what’s going on in other peoples code (as we all know, your own code in 2 weeks time is effectively other peoples code). So, Linq is more productive, if used appropriately. Its probably not more technically efficient, as a generic ‘Where’ algorithm will never be as fast as a custom-coded solution; but that isn’t normally a huge concern unless you’re right in the inner loop of a graphics algorithm for XBox.

The main point of Linq, from Microsofts perspective anyway, is to have a consistent query language for all kinds of different data sources – relational databases (via ADO.Net), XML, in-program collections, etc.. Any new classes you develop can become ‘Linq compliant’ by implementing a few methods. More interestingly, you can make any existing class ‘Linq compliant’ by implementing a few extension methods, even if you don’t have access to the original code.

Microsoft would clearly like Linq to be the query language of choice for programmers – i.e. replace XPath, SQL and other methods of querying data in code. I think they’ve got a pretty good shot at it – it makes sense to have a consistent way of querying, and Linq is well thought out and flexible enough to be the contender.

That’s it for now. Off to do some imperative cooking of my dinner.

Imperative: Chop onions, garlic, chilli, vegetables. Fry for 10 minutes. Add curry paste. Fry another 2 minutes. Add tinned tomatoes. Cook for 20 minutes. Add coriander leaves. Serve.

Declarative: Phone for takeaway.

Comments	8
Ratings	542
Average rating	1011%
Posted	04 May 2012, 13:56
View blog	Richies blog

Previous blog entry: Using CHECKSUM and BINARY_CHECKSUM to get hash values from any SQL Server columns

Rate this blog entry

1 (Very poor) 2 (Poor) 3 (Satisfactory) 4 (Good) 5 (Very good) Enter your name

Comments

No comments have been made.