Thursday, January 28, 2010

LINQ Support for IronPython

This is a collection of thoughts and a summary of a mailing list conversation about LINQ support in IronPython. The IronPython team (which seems to be just Dino & Dave at this point) seemed receptive to the idea; it's just a matter of resources. Therefore, please vote on the linked issues so that they can get their priorities straight.

There are two key parts to LINQ support: extension methods and expression trees. Each is useful on its own, but both are required to really take advantage of LINQ.

Extensions Methods

An extension method is a way of adding a function to a class without editing the class, or providing a default implementation for a class implementing an interface. LINQ is almost entirely composed of extension methods on the IEnumerable<T> and IQueryable<T>interfaces, so supporting them in IronPython is critical to supporting LINQ and many other interfaces. In this case I'm only talking about IronPython being able to c0nsume extension methods, not create them.

In C#, an extension method is made available by a using directive for the namespace containing a static class that contains the method. For example, the Enumerable class is in the namespace System.Linq and contains about half of LINQ's extension methods. To use this class from C# requires only:

using System.Linq;

The Python equivalent of this would be:

from System.Linq import *

Now, this style is frowned upon in Python circles because it pollutes the namespace unnecessarily. A more Pythonic equivalent would be:

from System.Linq import Enumerable

When doing an import, IronPython would have to check if Enumerable contains any static methods marked with ExtensionAttribute and add them to the list of possible methods to resolve for the applicable type. I actually tried to implement this at one point, but haven't had the time to finish it up.

The issue for this is CodePlex #17250 - Support for LINQ extension methods.

Expression Trees

An expression tree is an abstract, language-independent representation of a piece of code that can be more easily parsed and transformed than the raw code it was generated from. From the expression tree, the LINQ provider (such as LINQ to SQL) determines how to convert it into a query in its target language (such as SQL). The DLR actually uses "expression" trees as well (a superset of the LINQ classes that support statements as well as expressions), which are compiled into IL code and then executed.

In C# (and VB), lambda expressions are convertible to expression trees. Python also has lambda expressions, and these should also be convertible to expression trees – in particular, expression trees that exactly match what would be created by the C# compiler for an equivalent lambda, which is what every existing LINQ provider would expect.

The issue for this is CodePlex #26044 - lambda should be convertible to Expression<...>.