Doing Simple Things with ExpressionVisitor

LINQ is one of the more powerful technologies in .NET. Particularly, it introduced expression trees to the language, giving you the limited ability to inspect and rewrite code from code. Basically, code manipulating code–good stuff.

For example, here is a very simple expression tree that adds 73 to a given number:

1
Expression<Func<int, int>> someExpr = x => x + 73;

You can actually address the individual elements inside of the expression tree:

1
2
3
4
5
6
var nodeType = someExpr.Body.NodeType;
var otherValue = ((ConstantExpression)((BinaryExpression)someExpr.Body).Right).Value;
 
// writes "Add" and "73" to the console
Console.WriteLine(nodeType);
Console.WriteLine(otherValue);

(Let’s ignore those little ugly casts for now.)

Here is another expression tree that takes the average of three numbers:

1
Expression<Func<int, int, int, double>> someExpr = (x, y, z) => (x + y + z) / 3.0;

And here’s how we’d list the names of the parameters in the numerator:

1
2
3
Console.WriteLine(((ParameterExpression)((BinaryExpression)((BinaryExpression)((UnaryExpression)((BinaryExpression)someExpr.Body).Left).Operand).Left).Left).Name);
Console.WriteLine(((ParameterExpression)((BinaryExpression)((BinaryExpression)((UnaryExpression)((BinaryExpression)someExpr.Body).Left).Operand).Left).Right).Name);
Console.WriteLine(((ParameterExpression)((BinaryExpression)((UnaryExpression)((BinaryExpression)someExpr.Body).Left).Operand).Right).Name);

Of course, inspecting trees like this is a technique that doesn’t scale well.

Luckily, in .NET 4.0 (or in .NET 3.5, courtesy of The Wayward Weblog), the System.Linq.Expressions.ExpressionVisitor class makes examining expression trees a bit less painful:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
using System;
using System.Linq.Expressions;
 
class MyExpressionVisitor : ExpressionVisitor
{
    protected override Expression VisitParameter(ParameterExpression node)
    {
        Console.WriteLine(node.Name);
        return base.VisitParameter(node);
    }
}
 
class Program
{
    public static void Main(string[] args)
    {
        Expression<Func<int, int, int, double>> someExpr = (x, y, z) => (x + y + z) / 3.0;
        var myVisitor = new MyExpressionVisitor();
        myVisitor.Visit(someExpr);
    }
}

ExpressionVisitor contains a method for every possible expression tree node, so we could actually write the whole expression back out to the console by overriding the methods that we care about:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
class MyExpressionVisitor : ExpressionVisitor
{
    protected override Expression VisitBinary(BinaryExpression node)
    {
        Console.Write("(");
 
        this.Visit(node.Left);
 
        switch (node.NodeType)
        {
            case ExpressionType.Add:
                Console.Write(" + ");
                break;
 
            case ExpressionType.Divide:
                Console.Write(" / ");
                break;
        }
 
        this.Visit(node.Right);
 
        Console.Write(")");
 
        return node;
    }
 
    protected override Expression VisitConstant(ConstantExpression node)
    {
        Console.Write(node.Value);
        return node;
    }
 
    protected override Expression VisitParameter(ParameterExpression node)
    {
        Console.Write(node.Name);
        return node;
    }
}

It actually writes out a slightly different output:

(((x + y) + z) / 3)xyz

Removing the extra parentheses actually turns out to be quite tricky because more context is required to determine which ones can be removed and which ones can’t. But for most purposes, that won’t be a problem. The extra xyz, however…we’ll need to get rid of that:

1
2
3
4
5
6
7
8
9
10
11
class Program
{
    public static void Main(string[] args)
    {
        Expression<Func<int, int, int, double>> someExpr = (x, y, z) => (x + y + z) / 3.0;
        var myVisitor = new MyExpressionVisitor();
 
        // visit the expression's Body instead
        myVisitor.Visit(someExpr.Body);
    }
}

By walking just the Body of the lambda, we ignore the Parameters that we don’t need to have listed twice:

(((x + y) + z) / 3)

Much better.

You can use this technique to regenerate C# from LINQ, SQL from LINQ, or a lot of other different languages. Instances of ExpressionVisitor are at the heart of every interpretation of a LINQ tree, and doing anything fancy under the surface of LINQ requires a good understanding of this class. It’s also a nice illustration of the visitor pattern, which has applications even beyond LINQ.