Doing Simple Things with ExpressionVisitor

LINQ is one of the more powerful technologies in .NET. Particularly, it introduced expression trees to the language, giving you the limited ability to inspect and rewrite code from code. Basically, code manipulating code–good stuff.

For example, here is a very simple expression tree that adds 73 to a given number:

1
Expression<Func<int, int>> someExpr = x => x + 73;

You can actually address the individual elements inside of the expression tree:

1
2
3
4
5
6
var nodeType = someExpr.Body.NodeType;
var otherValue = ((ConstantExpression)((BinaryExpression)someExpr.Body).Right).Value;
 
// writes "Add" and "73" to the console
Console.WriteLine(nodeType);
Console.WriteLine(otherValue);

(Let’s ignore those little ugly casts for now.)

Here is another expression tree that takes the average of three numbers:

1
Expression<Func<int, int, int, double>> someExpr = (x, y, z) => (x + y + z) / 3.0;

And here’s how we’d list the names of the parameters in the numerator:

1
2
3
Console.WriteLine(((ParameterExpression)((BinaryExpression)((BinaryExpression)((UnaryExpression)((BinaryExpression)someExpr.Body).Left).Operand).Left).Left).Name);
Console.WriteLine(((ParameterExpression)((BinaryExpression)((BinaryExpression)((UnaryExpression)((BinaryExpression)someExpr.Body).Left).Operand).Left).Right).Name);
Console.WriteLine(((ParameterExpression)((BinaryExpression)((UnaryExpression)((BinaryExpression)someExpr.Body).Left).Operand).Right).Name);

Of course, inspecting trees like this is a technique that doesn’t scale well.

Luckily, in .NET 4.0 (or in .NET 3.5, courtesy of The Wayward Weblog), the System.Linq.Expressions.ExpressionVisitor class makes examining expression trees a bit less painful:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
using System;
using System.Linq.Expressions;
 
class MyExpressionVisitor : ExpressionVisitor
{
    protected override Expression VisitParameter(ParameterExpression node)
    {
        Console.WriteLine(node.Name);
        return base.VisitParameter(node);
    }
}
 
class Program
{
    public static void Main(string[] args)
    {
        Expression<Func<int, int, int, double>> someExpr = (x, y, z) => (x + y + z) / 3.0;
        var myVisitor = new MyExpressionVisitor();
        myVisitor.Visit(someExpr);
    }
}

ExpressionVisitor contains a method for every possible expression tree node, so we could actually write the whole expression back out to the console by overriding the methods that we care about:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
class MyExpressionVisitor : ExpressionVisitor
{
    protected override Expression VisitBinary(BinaryExpression node)
    {
        Console.Write("(");
 
        this.Visit(node.Left);
 
        switch (node.NodeType)
        {
            case ExpressionType.Add:
                Console.Write(" + ");
                break;
 
            case ExpressionType.Divide:
                Console.Write(" / ");
                break;
        }
 
        this.Visit(node.Right);
 
        Console.Write(")");
 
        return node;
    }
 
    protected override Expression VisitConstant(ConstantExpression node)
    {
        Console.Write(node.Value);
        return node;
    }
 
    protected override Expression VisitParameter(ParameterExpression node)
    {
        Console.Write(node.Name);
        return node;
    }
}

It actually writes out a slightly different output:

(((x + y) + z) / 3)xyz

Removing the extra parentheses actually turns out to be quite tricky because more context is required to determine which ones can be removed and which ones can’t. But for most purposes, that won’t be a problem. The extra xyz, however…we’ll need to get rid of that:

1
2
3
4
5
6
7
8
9
10
11
class Program
{
    public static void Main(string[] args)
    {
        Expression<Func<int, int, int, double>> someExpr = (x, y, z) => (x + y + z) / 3.0;
        var myVisitor = new MyExpressionVisitor();
 
        // visit the expression's Body instead
        myVisitor.Visit(someExpr.Body);
    }
}

By walking just the Body of the lambda, we ignore the Parameters that we don’t need to have listed twice:

(((x + y) + z) / 3)

Much better.

You can use this technique to regenerate C# from LINQ, SQL from LINQ, or a lot of other different languages. Instances of ExpressionVisitor are at the heart of every interpretation of a LINQ tree, and doing anything fancy under the surface of LINQ requires a good understanding of this class. It’s also a nice illustration of the visitor pattern, which has applications even beyond LINQ.

Building Custom Abstract Data Models in .NET

One of the easiest things to do in .NET is take either a list of vanilla C# objects or the contents of a System.Data.DataTable and show it in a grid through some kind of data binding. It was possible in the WinForms days with the System.Windows.Forms.DataGridView or a vendor-supplied grid (Syncfusion, Infragistics, etc.). It is still very much possible in WPF with the WPF Toolkit grid and any number of (and ChartFX gets special kudos for being able to “figure out” which columns/series are important and graph them by default).

But what if DataTables aren’t your thing, but vanilla C# objects aren’t “dynamic” enough for you? What if you need custom on-the-fly columns but don’t/can’t deal with the headaches of ADO.NET?

(You can skip straight to the Quick How-To if you’d like.)

Using Reflection to build a dynamic grid (don’t do this)

Let’s step back for a bit and consider the vanilla C# object case. Let’s say I wanted to render a list of these objects without using the automagic abilities that seem to be provided in every grid:

1
2
3
4
5
public class Order
{
    public string Symbol { get; }
    public long Quantity { get; }
}

A first stab at this might be to use the Reflection API:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
public class OrderGrid : UserControl
{
    private void InitializeCustomProperties()
    {
        PropertyInfo[] properties = typeof(Order).GetProperties();
 
        // start the awful vendor-specific binding between the
        // reflection pieces and the grid
        foreach (PropertyInfo pi in properties)
        {
            myGrid.Columns.Add(pi.Name);
        }
    }
 
    private void _myGrid_CellValueRequired(object sender, MyGridCellEventArgs e)
    {
        string columnName = myGrid.Columns[e.ColumnIndex];
        PropertyInfo[] properties = typeof(Order).GetProperties();
        PropertyInfo pi = Array.Find(properties, p => p.Name == columnName);
        if (pi == null) return;
 
        var orders = (List<Order>)myGrid.ItemsSource;
        Order order = orders[e.RowIndex];
 
        // now read the actual property value
        e.Value = pi.GetValue(order);
        e.Handled = true;
    }
}

Seems simple and harmless enough, right? Now let’s layer on a slight air of dynamic data—something that seemingly necessitates a reflection-based solution:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
public class Order
{
    public string Symbol { get; }
    public long Quantity { get; }
    public IList ExtendedAttributes { get; }
}
 
public class OrderGrid : UserControl
{
    private void InitializeCustomProperties()
    {
        PropertyInfo[] properties = typeof(Order).GetProperties();
 
        // start the awful vendor-specific binding between the
        // reflection pieces and the grid
        foreach (PropertyInfo pi in properties)
        {
            if (pi.Name != "ExtendedAttributes")
            {
                myGrid.Columns.Add(pi.Name);
            }
        }
    }
 
    private void _myGrid_CellValueRequired(object sender, MyGridCellEventArgs e)
    {
        PropertyInfo[] properties = typeof(Order).GetProperties();
 
        // only the first n - 1 columns represent reflection properties;
        // the last few columns come from ExtendedAttributes
        if (e.ColumnIndex < (properties.Length - 1))
        {
            string columnName = myGrid.Columns[e.ColumnIndex];
            PropertyInfo pi = Array.Find(properties, p => p.Name == columnName);
            if (pi == null) return;
 
            var orders = (List<Order>)myGrid.ItemsSource;
            Order order = orders[e.RowIndex];
 
            // now read the actual property value
            e.Value = pi.GetValue(order);
            e.Handled = true;
        }
        else
        {
            e.Value = (e.ColumnIndex - (properties.Length - 1));
            e.Handled = true;
        }
    }
}

It’s starting to get a little nastier now—as the Order class grows and its “ExtendedAttributes” properties get more complex, we end up doing more and more in the view to render this complex object properly—there is no metadata and no reflecting over “ExtendedAttributes”. We don’t know name or type information at the very least, and we certainly can’t decorate the items of the list with an Attribute like we could with a “real” C# property. There is no assurance that every Order’s ExtendedAttributes properties will behave the same way—we’re naïvely using index positioning to uniquely identify a property; what if my Order objects are coming from all over the place?

And how do DataTables always render so well? How does every grid under the sun know how to properly pull schema information from a DataTable using specific properties like DataTable.Columns and the appropriate accessors on every individual row? Does it really just boil down to this:

1
2
3
4
5
6
7
8
if (source.GetType() == typeof(DataTable))
{
    // DataTable specific code
}
else
{
    // a pile of ugly Reflection code 
}

And then what about the Windows Forms Designer? There are all sorts of properties that don’t show up, or show up with slightly different names, or slightly different characteristics than what one would expect through only reflection. Is there more switch/case hackery at work here?

The looks-like-a-DataTable-but-isn’t Approach (better)

It turns out that this problem is already solved, partially because of the original Windows Forms Designer, and partially because of DataTables. A rich API exists for allowing objects to describe themselves, and for lists of objects to publish information about their contents, even if the list is empty. System.Data.DataTable implements several interfaces, but the most important one for purposes of our discussion is
System.ComponentModel.IListSource, which contains an interesting method:

  • GetList: Returns an IList that can be bound to a data source from an object that does not implement an IList itself.

On DataTable, this returns an instance of DataView, which according to the documentation, “Represents a databindable, customized view of a DataTable for sorting, filtering, searching, editing, and navigation.” It turns out that a DataView is what is typically rendered by a grid. We can think of it as a a IList of DataRow objects (it’s actually a list of DataRowView, but that’s not terribly important here), which structurally, is much more similar to a list of boring C# objects.

There are a few other important interfaces that DataView implements other than IList (and ICollection and IEnumerable): ITypedList to provide schema information on the constituent rows (especially when the list is empty, because then we can’t ask the rows themselves) and IBindingList, which allows views to control sorting and filtering.

It turns out that if you implement a few of these key interfaces, you get all this magic for free, too.

The Quick How-To

  1. (Optional) Implement System.ComponentModel.IListSource on your “main” data class—you’d only need to do this if you didn’t want your main class to directly represent your “list” of data.
  2. Create a list class. It should provide implementation for a few interfaces:
    • System.ComponentModel.ITypedList.
      Your implementation of ITypedList.GetItemProperties(PropertyDescriptor[]) should return a PropertyDescriptorCollection that contains the “properties” of the rows of your data set. This interface is required when your list is empty so that a way of providing object metadata exists without resorting to querying the contents of the list.
    • System.Collections.IList.
      This is a hard requirement. Dealing with generic types generically is usually not straightforward and can sometimes require reflection; even to this day, a lot of code in WPF looks for (and makes use of) this interface.
    • System.Collections.Generics.IList<T>.
      I haven’t seen too much in the way of generic components requiring an implementation of this interface, but it’s a good idea to implement regardless—it’ll be easier to work with LINQ and other APIs that expect generic collections.
    • System.Collections.Specialized.INotifyCollectionChanged (optional).
      This is the same interface that classes like ObservableCollection<T> use to propagate their contents changes in WPF. (If you need Windows Forms support, you’ll have to instead implement…)
    • System.ComponentModel.IBindingList (optional).
      Implement this interface only if you need backwards compatibility with Windows Forms. It’s a bit of a hybrid interface, incorporating aspects of both ICollection and ICollectionView, but as a result of that, you’ll effectively limit clients to one view on your collection.
  3. (Optionally) Implement System.ComponentModel.ICustomTypeDescriptor on your “row” data class. I usually implement this by having each of my “rows” retain a reference to the “list”, and merely calling ITypedList.GetItemProperties(PropertyDescriptor[]) on the list. Implementing something once takes less time than implementing something twice.

    I like to implement ICustomTypeDescriptor on the row because it provides a context-insensitive way of retrieving properties of the row. If someone created a System.Collections.Generic.List<T> containing my “rows”, this list is still eligible for rendering in a grid without any additional work (except, of course, when the list is empty, because then there would be no way of providing schema information).

  4. Create a subclass of System.ComponentModel.PropertyDescriptor (read this if you don’t know what this class is for) for each property that you want your rows to have. You will want/need to implement a few important properties and methods:
    • public Type PropertyType { get; } (override required)
      The type of the property. If you want the type of values contained in the column to be completely wide open, then return typeof(object).
    • public TypeConverter Converter { get; } (override optionally)
      The converter that is used in conjunction with the property. For real properties, this would be defined through the usage of a [System.ComponentModel.TypeConverterAttribute] attribute on the property’s type or the property itself.
    • public Type ComponentType { get; } (override required)
      The type that defines the property. This is essentially your “row” class.
    • public object GetValue(object component); (override required)
      This is probably the most important method; this is the method that tells the system how to fetch a value from your row for the “property”.
    • public void SetValue(object component, object value); (override required)
      You can override this to set the value of this property on your row; otherwise, you can throw NotSupportedException if you want.
    • public bool IsReadOnly { get; } (override required)
      Return true or false, depending on whether or not the property should be publicly modifiable.
    • public bool CanResetValue(object component); (override required)
      When you’re using the Windows Forms Designer, this is the method that determines whether or not you can “reset” the value of the property to some default value. Other visualizers could make use of it too, but I haven’t seen too many that take advantage of it.
    • public void ResetValue(object component); (override required)
      This actually implements the resetting of a value as mentioned above. Either implement it or throw NotSupportedException.
    • public bool ShouldSerializeValue(object component); (override required)
      The Windows Forms Designer uses this method to determine whether or not a property’s value should be serialized. If a property was set to its default value, this method would return false because there would be no reason to actually persist the value of the property (since reapplying it wouldn’t result in a change).

The Quick Example

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
/// <summary>
/// The class that serves as the "entry point" for your data model.
/// </summary>
/// <remarks>
/// It's not always necessary to create a class like this, but it can be
/// helpful from a design standpoint. Here, like the DataTable, I expose
/// a collection of "columns" that client APIs can use for their own
/// purposes.
/// </remarks>
public class MyDataTable : IListSource
{
    private readonly MyDataRowCollection _dataRows;
    private readonly PropertyDescriptorCollection _properties;
 
    public MyDataTable(...)
    {
        _dataRows = new MyDataRowCollection(rows, columns);
        _properties = columns;
    }
 
    public MyDataRowCollection Rows { get { return _dataRows; } }
 
    public PropertyDescriptorCollection Columns { get { return _properties; } }
 
    /// <summary>
    /// Returns the underlying collection of elements for this source.
    /// </summary>
    /// <remarks>
    /// It is somewhat rare to see this implemented any other way than through
    /// an explicit implementation because you will generally want to
    /// expose a specific collection type to expose more advanced behavior.
 
    /// (We do the same here through the <see cref="Rows"/> property.
    /// </remarks>
    System.Collections.IList IListSource.GetList()
    {
        return _dataRows;
    }
 
    bool IListSource.ContainsListCollection
    {
        get { return false; }
    }
}
 
/// <summary>
/// The class that directly represents a collection of rows.
/// </summary>
/// <remarks>
/// You can either implement IList<T>/IList manually, or inherit from another
/// collection implementation. However, your particular use case will probably
/// only need or want to expose a subset of the functionality of those interfaces,
/// so it may be beneficial to implement the interfaces directly and hide some
/// collection methods through explicit implementation.
/// </remarks>
public class MyDataRowCollection
    : ITypedList, ReadOnlyCollection<MyDataRow>,
        IList<MyDataRow>, IList
{
    private readonly MyDataTable _parent;
 
    public MyDataRowCollection(MyDataTable parent, IList<MyDataRow> rows)
        : base(rows)
    {
        _parent = parent;
    }
 
    public MyDataTable ParentTable { get { return _parent; } }
 
    PropertyDescriptorCollection ITypedList.GetItemProperties(
            PropertyDescriptor[] listAccessors)
    {
        return _parent.Columns;
    }
 
    string ITypedList.GetListName(PropertyDescriptor[] listAccessors)
    {
        return null;
    }
}
 
/// <summary>
/// The class that represents the data in a "row".
/// </summary>
public class MyDataRow : ICustomTypeDescriptor
{
    private readonly MyDataRowCollection _parentCollection;
    private readonly object[] _rows;
 
    internal MyDataRow(MyDataRowCollection parentCollection)
    {
        _parentCollection = parentCollection;
    }
 
    public object this[string name]
    {
        get { return _rows[ColumnIndexFromName(name)]; }
        set { _rows[ColumnIndexFromName(name)] = value; }
    }
 
    private int ColumnIndexFromName(string name)
    {
        PropertyDescriptor pd = _parentCollection.ParentTable.Columns[name];
        if (pd != null)
        {
            return _parentCollection.ParentTable.Columns.IndexOf(pd);
        }
        return -1;
    }
 
    AttributeCollection ICustomTypeDescriptor.GetAttributes()
    {
        return AttributeCollection.Empty;
    }
 
    string ICustomTypeDescriptor.GetClassName()
    {
        return typeof(MyDataRow).FullName;
    }
 
    string ICustomTypeDescriptor.GetComponentName()
    {
        return typeof(MyDataRow).Name;
    }
 
    TypeConverter ICustomTypeDescriptor.GetConverter()
    {
        return null;
    }
 
    EventDescriptor ICustomTypeDescriptor.GetDefaultEvent()
    {
        return null;
    }
 
    PropertyDescriptor ICustomTypeDescriptor.GetDefaultProperty()
    {
        return null;
    }
 
    object ICustomTypeDescriptor.GetEditor(Type editorBaseType)
    {
        return null;
    }
 
    EventDescriptorCollection ICustomTypeDescriptor.GetEvents(Attribute[] attributes)
    {
        return EventDescriptorCollection.Empty;
    }
 
    EventDescriptorCollection ICustomTypeDescriptor.GetEvents()
    {
        return EventDescriptorCollection.Empty;
    }
 
    PropertyDescriptorCollection ICustomTypeDescriptor.GetProperties(Attribute[] attributes)
    {
        return ((ITypedList)_parentCollection).GetItemProperties(null);
    }
 
    PropertyDescriptorCollection ICustomTypeDescriptor.GetProperties()
    {
        return ((ITypedList)_parentCollection).GetItemProperties(null);
    }
 
    object ICustomTypeDescriptor.GetPropertyOwner(PropertyDescriptor pd)
    {
        return this;
    }
}
 
/// <summary>
/// The class that represents information about a "column".
/// </summary>
public class MyDataColumn : PropertyDescriptor
{
    private readonly Type _propertyType;
 
    public MyDataColumn(string name, Type type, Attribute[] attrs) : base(name, attrs)
    {
        _propertyType = type;
    }
 
    public override object GetValue(object component)
    {
        return ((MyDataRow)component)[this.Name];
    }
 
    public override void SetValue(object component, object value)
    {
        ((MyDataRow)component)[this.Name] = value;
    }
 
    public override bool IsReadOnly
    {
        get { return false; }
    }
 
    public override Type PropertyType
    {
        get { return _propertyType; }
    }
 
    public override Type ComponentType
    {
        get { return typeof(MyDataRow); }
    }
 
    public override void ResetValue(object component)
    {
        throw new NotSupportedException();
    }
 
    public override bool CanResetValue(object component)
    {
        return false;
    }
 
    public override bool ShouldSerializeValue(object component)
    {
        return false;
    }
}

Now Use It

There is a lot that got glossed over here, but it’s a very deep subject. It amazes me how few .NET developers know about and make use of this layer (either by consuming objects in this fashion or by publishing objects that conform to this API), especially because there is so much buy-in from third-party vendors and projects, and especially because this is one of the few components of the binding API that has not had to change much at all between Windows Forms and WPF.

If you are working on an infrastructure project of any kind that purports to expose data to a .NET GUI (Windows Forms or WPF), absolutely make use of this in order to keep both your view and your data model as flexible and independent as possible; you’ll be happy you did. —DKT

Of Magic and Properties in C# (and why Reflection isn’t the answer)

Reflection can be used to discover information about classes at runtime. And of course, you can use the Reflection API to cause all kinds of magic. You could iterate through a bunch of objects and output all of their property values. Or you could increment all the numeric properties by one. Or find all the properties tagged as [Bold] and draw them on the screen in a grid in a funny way. Or write each property-change value to a log file.

Considering the following class:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
public class Order : DependencyObject, INotifyPropertyChanged
{
    public static readonly DependencyProperty SymbolProperty =
        DependencyProperty.Register(
            "Symbol", typeof(string), typeof(Order),
            new PropertyMetadata(null));
 
    private double _quantity;
    private OrderSide _side;
 
    public Order()
    {
        _quantity = 100.0;
    }
 
    public string Symbol
    {
        get { return (string)GetValue(SymbolProperty); }
        set { SetValue(SymbolProperty, value); }
    }
 
    [DefaultValue(100.0)]
    public double Quantity
    {
        get { return _quantity; }<br>
        set
        {
            if (_quantity != value)
            {
                _quantity = value;
                OnPropertyChanged(new PropertyChangedEventArgs("Quantity"));
            }
        }
    }
 
    [TypeConverter(typeof(OrderSideConverter))]
    public OrderSide Side
    {
        get { return _side; }
        set
        {
            if (_side != value)
            {
                _side = value;
                if (SideChanged != null)
                {
                    SideChanged(this, EventArgs.Empty);
                }
            }
        }
    }
 
    public event EventHandler SideChanged;
 
    public event PropertyChangedEventHandler PropertyChanged;
 
    protected virtual void OnPropertyChanged(PropertyChangedEventArgs e)
    {
        if (PropertyChanged != null)
        {
            PropertyChanged(this, e);
        }
    }
}

We could easily figure out how to abstractly interact with a class like this…right?

What, really, is a property?

I suppose you could say it’s just a list of optional attributes, optional modifiers, a type, a property name, and a getter, setter, or both. But much as we argue to our Java counterparts that a property is more than a simple wrapping of a get() and set() method, a full-fledged property is a bit more than its individual curly braces and access modifiers. One of the biggest aspects of a real property is change notification, which is never actually part of the property declaration*. And judging from the pathological Order class that started the post, there are a lot of ways to skin that notifying cat. And I haven’t even mentioned the “fake” properties in things like a System.Data.DataTable, where the Reflection API is of no real use at all.

At a minimum, really, a property is a field that is:

  • observable
  • gettable, and optionally settable (fields that are settable and not gettable are weird and we won’t talk about them any more)
  • otherwise describable through custom attributes that provide additional metadata

More than just a “property”, and definitely more than what you get back when you reflect over a type and pull out all of its System.Reflection.PropertyInfos. And, again, reflecting over a DataTable tells you nothing about schema information at all.

If only there were a way of describing a property…

…oh wait, there is. There’s the System.ComponentModel.PropertyDescriptor class:

Although it may seem redundant to have two classes to describe properties, this isn’t quite the same information as can be retrieved through reflection. There is less emphasis on being an object model for compiled code and more emphasis on providing tools for abstractly working with properties—for example, there is no concept of access levels (if you’re holding an instance of PropertyDescriptor, it might as well be public). Some concepts exposed as Attributes become first-level concepts in this API (BrowsableAttribute, DisplayNameAttribute, TypeConverterAttribute become IsBrowsable, DisplayName, and Converter, for example). There is also a linkage to the event model that provides change notifications through AddValueChanged and RemoveValueChanged.

You could probably picture being able to build a generic grid based on nothing more than a list of objects and a list of PropertyDescriptors about them; in fact, that’s exactly what any grid worth its .NET salt does—from interacting with POCOs to DataTables. And a good portion of this API was driven by the requirements of the original Windows Forms Designer, one of the most “abstract” views of all.

To get a list of PropertyDescriptors, call one of the overloads of TypeDescriptor.GetProperties. You always get back an appropriate PropertyDescriptor for the “type” of the property—dependency properties give you back instances of DependencyPropertyDescriptor, and internal subclasses of PropertyDescriptors are returned for other properties (and they’ll properly take into account implementations of the INotifyPropertyChanged interface). In fact, one of the common methods for listening for changes in dependency properties is really just taking advantage of the fact that the PropertyDescriptor API provides a way for arbitrary clients to listen for changes in property values whereas DependencyProperty does not.

This is also the appropriate API that you should use when working with object models abstractly. Reflection isn’t going to always provide you all of the information you need, and you have a lot more to self-assemble. Reflection is also the slowest of all possibilities. Although much of the work done through PropertyDescriptors uses reflection under the hood, it doesn’t necessarily need to: interacting with DependencyProperty through this API requires no reflection, and calling DependencyPropertyDescriptor.GetValue(object) is a heck of a lot faster than calling the corresponding PropertyInfo.GetValue(object).

PropertyDescriptor is also a facet of the appropriate API when building an abstract data model—a model where the properties are not known at compile-time and instead drive controls at run-time. Building views that work off of this abstract data model have been done to death and there is almost never a reason to build your own any more**. Instead, next time I’ll cover the road less-travelled, your very own abstract data model. —DKT

*It’s actually a bit of a shame that the property syntax doesn’t support a built-in event that is every much as part of the property declaration as the getter and setter. Part of what makes a PropertyDescriptor necessary is exactly that the property definition isn’t the one-stop shop for everything property-related.

**Off the top of my head, System.Windows.Forms.DataGridView, Syncfusion, Infragistics support binding in Windows Forms, the WPF Toolkit grid for WPF, and ChartFX does an admirable WPF-y job for your charting needs.

Dealing with Large Data Sets

Large data sets are increasingly becoming a reality of front-ends, GUIs and web apps alike. Users are getting more comfortable with tons of data, and are generating a lot of it too. Large data sets are as a much a reality as multi-threading—they’re both incredibly annoying to deal with, and they’re both not ignorable with today’s computers (and users).

Strategies

There are four basic strategies to dealing with large data sets: Ignoring the problem, Filtering, Paging, and Data Virtualization.

Ignoring the problem

Don’t do this unless your users never deal with more than ten things at a time. (Incidentally, I have never met anyone with an e-mail account with less than ten e-mails, or anyone with an iPod with ten songs—have you?)

Instant F. Why’d you bother writing the app in the first place? Your users are furious because they’re either spending large quantities of time staring at progress bars, or worse, a frozen screen.

Filtering

Filtering sometimes avoids the need for data virtualization, but this is usually nothing more than a band-aid for a broken arm—watch as your first user types in a query that somehow manages to return 90% of your data set and bring your app to its knees. In order for filtering to be effective, your user must be able and willing to provide enough criteria to narrow down the data set into something that’s not a “large data set”. From the user’s point of view, there is a very high up-front cost in using the UI—more often than not, “filtering” means twenty text boxes/dropdowns/checkboxes, all using names for fields that no one really understands. (“Hey, Bob—what do you suppose the “Type” dropdown is for? It’s got ‘PCX’, ‘AVW’, and ‘BOO’ in it.”) Those cryptic fields are not self-describing because the only thing that could help—the data itself—is locked behind a dizzying array of options.

This’ll get you anything from a C+ to a F, depending on the data set. Everything will be fine until That User types in “T” for a search query. (By the way, did you know “T” is the ticker symbol for AT&T?)

Paging

Implementing paging shows that you care about your users’ time. The GUI doesn’t feel overloaded or clunky and it’s a familiar concept to the user: everyone knows what “Page 5 of 752 (81–100, 15,035 items total)” means. It’s not a coincidence that most (all?) webmail apps show things in pages of 20 or 50 or so e-mails; the browser would choke over trying to render a <table> with thousands of rows and would provide for a very nasty user experience.

It’s not perfect, though:

  • Changing data. If you’re dealing with a data set where rows are frequently added/removed while the user is looking at it, paging is becomes a mediocre solution. You can then either keep the page boundaries stable (which requires at least the server to remember the full data set at the time of initial query) or ignore the problem (which would cause rows to randomly disappear or appear twice as the user Next Pages through your data set).
  • Slower user experience. Most operations are a request-reply back to the server—sorting, grouping, filtering, etc. Most clicks make the user wait.
  • No table scanning. There is something to be said for quickly scrolling through thousands of items, scanning for something when a search isn’t giving me back what I want. (Was it “Bob’s Restaurant”? Oh! I see it there on Row 452; I was actually looking for “Bill’s Burgers and Fries”.)
  • n-dimeinsional Data. Paging is very linear. If you have more than one dimension of data, paging is not very useful. I’m referring to every data set that looks better as a pretty graph.

This can get you as high as an A– (it definitely works for all the search engines) down to a C– (can you imagine what working with iTunes would be like if they made you “Click here for the next 20 results”?).

Data Virtualization

Make the user think you loaded everything. When they click the sort headers, they don’t know that you’re actually sucking out objects back from disk. When they’re whipping through the scrollbar, they have no idea that you’re grabbing loading the next 100 rows into memory.

The only downside? It’s obnoxiously difficult on most development platforms to get right.

What is virtualization?

Any developer who has worked with a grid control knows how important virtualization is for performance. The basic idea behind virtualization is very simple: if the user can’t see it, then the computer doesn’t need it. You can present the user with a giant scrollable expanse of cells; the grid will handle creating/dropping/recycling grid cells as necessary. Any grid control worth anything supports virtualization; even the basic, built-in ListBox and ListView in WPF (and Silverlight 3.0) support row-based virtualization. But in most simple binding situations, you generally need a collection of all of the data that you want to render for your grid loaded in memory.

For applications that don’t delve into large collections of data, there isn’t much point. The added complexity isn’t worth it. But that being said, a lot of commonly-used types of applications benefit from data virtualization:

  • Music library applications—think iTunes with tens of thousands of files of music, movies, etc.
  • Desktop mail applications—the inbox with thousands of e-mails
  • Google Maps—possibly the only web app that I can think of that shows off what data virtualization can truly look like

It would be really neat if Google could take the same approach with Mail as they do in Maps—a virtual table that doesn’t actually download all the mail, but instead lets you pan through the inbox using a fake scrollbar as if everything was already downloaded, but they don’t because it’s a pain in the neck and paging usually works just as well. (Or maybe they just haven’t thought of it yet. If you see that feature within the next few months or years, you can thank me for giving them the idea.)

A+, if you pull off the illusion flawlessly.

How do I implement it?

Most grid controls allow you to hook into their virtualization so that you could conceivably provide your own data virtualization to go along with the control virtualization native to the control, but there isn’t much out of the box in .NET to help you with the data side. There is no built-in “virtual” ICollection<T>, although there is nothing to stop someone from implementing one. Indeed, Beatriz Costa wrote about some data virtualization techniques that can be used in WPF and Silverlight, but they aren’t perfect.

Although .NET is a great platform for quickly building apps, Cocoa is a great platform for quickly building apps that can handle lots and lots of data. Cocoa on the Mac (and, as of iPhone OS 3.0, Cocoa Touch) supports data virtualization out-of-the-box using Core Data. Core Data provides a mechanism for defining a data model, and then it takes care of persistence of objects for you. On Cocoa Touch, NSFetchedResultsController acts as an intermediary between Core Data and your controls, essentially handling loading and unloading of data as the UI requires it. Given the memory constraints on the iPhone, Apple really had to provide a solid solution for data virtualization—keeping a list of even a few hundred moderately complicated objects could cause your app to run out of memory and crash.

Core Data uses SQLite under the hood*, and it is a very good solution for rolling your own data virtualization scheme. SQLite is an embedded database engine. There are no servers to install, no processes to run—it’s just a library that allows SQL access to a file on the file system. It has the semantics of a database and the semantics of a file, depending on which one is more convenient. System.Data.SQLite is an excellent ADO.NET provider that you can use in .NET (unfortunately, because it uses native code, it’s off-limits in Silverlight):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
using System.Data.SQLite;
 
public static class SQLiteTest
{
    public static void Main()
    {
        // colors.db is just a file that will be created in the
        // current directory if it doesn't exist
        using (var conn = new SQLiteConnection("Data Source=colors.db"))
        {
            conn.Open();
            using (IDbCommand cmd = conn.CreateCommand())
            {
                cmd.CommandText = "CREATE TABLE Colors(id INTEGER PRIMARY KEY, name TEXT)";
                cmd.ExecuteNonQuery();
 
                cmd.CommandText = "INSERT INTO Colors(name) VALUES(\"Red\")";
                cmd.ExecuteNonQuery();
 
                cmd.CommandText = "INSERT INTO Colors(name) VALUES(\"Green\")";
                cmd.ExecuteNonQuery();
 
                cmd.CommandText = "INSERT INTO Colors(name) VALUES(\"Blue\")";
                cmd.ExecuteNonQuery();
            }
            using (IDbCommand cmd = conn.CreateCommand())
            {
                cmd.CommandText = "SELECT name FROM Colors";
                using (IDataReader reader = cmd.ExecuteReader())
                {
                    System.Console.WriteLine(reader.Read(0));
                }
            }
        }
    }
}

Because the database is local, queries are fast from start to finish—no network latency issues here. And because it’s just a file, you can throw it away when you’re done. SQLite makes an excellent backbone to any data virtualization scheme because of its hybrid semantics (simplified random access like a database, convenience like a flat file).

*I’m purposely leaving out Core Data’s binary and XML serialization because they require the entire object graph to be loaded into memory, and if you’re going to do that, then what’s the point?

But large data sets aren’t that important…

Every mature application that I have ever been a part of has had to tackle the issue of larger data sets at some point. In my personal experience, thousands of entities (hundreds if you’re talking about images) is enough to be considered a “large data set” in that your application begins to visibly suffer.

I’d love to recommend data virtualization as the route to go for handling large data sets in .NET, but there really isn’t enough in the way of frameworks that provides an out-of-the-box solution. Implement paging unless you really need data virtualization. If you’re working on iPhone apps, drop everything and learn Core Data if you haven’t already. You’d be surprised at how much you don’t need to worry about. But regardless of what platform you’re developing a front-end on, you should always at least ask yourself how your GUI might respond to thousands of x. because one day it’s going to have to. —DKT