Uniqueness of Simple Demographics in the U.S. Population [Link]

Sometimes, math and statistics makes me sad:

“It was found that 87% (216 million of 248 million) of the population in the United States had reported characteristics that likely made them unique based only on {5-digit ZIP, gender, date of birth}. About half of the U.S. population (132 million of 248 million or 53%) are likely to be uniquely identified by only {place, gender, date of birth}, where place is basically the city, town, or municipality in which the person resides. And even at the county level, {county, gender, date of birth} are likely to uniquely identify 18% of the U.S. population. In general, few characteristics are needed to uniquely identify a person.”

It’s quite terrifying to know that such a birthdate, a zip code, and a gender could potentially pass as a unique identifier for someone with the vast and overreaching “anonymized” databases that exist out there.

http://privacy.cs.cmu.edu/dataprivacy/papers/LIDAP-WP4abstract.html

http://arstechnica.com/tech-policy/news/2009/09/your-secrets-live-online-in-databases-of-ruin.ars

—DKT

Building Custom Abstract Data Models in .NET

One of the easiest things to do in .NET is take either a list of vanilla C# objects or the contents of a System.Data.DataTable and show it in a grid through some kind of data binding. It was possible in the WinForms days with the System.Windows.Forms.DataGridView or a vendor-supplied grid (Syncfusion, Infragistics, etc.). It is still very much possible in WPF with the WPF Toolkit grid and any number of (and ChartFX gets special kudos for being able to “figure out” which columns/series are important and graph them by default).

But what if DataTables aren’t your thing, but vanilla C# objects aren’t “dynamic” enough for you? What if you need custom on-the-fly columns but don’t/can’t deal with the headaches of ADO.NET?

(You can skip straight to the Quick How-To if you’d like.)

Using Reflection to build a dynamic grid (don’t do this)

Let’s step back for a bit and consider the vanilla C# object case. Let’s say I wanted to render a list of these objects without using the automagic abilities that seem to be provided in every grid:

1
2
3
4
5
public class Order
{
    public string Symbol { get; }
    public long Quantity { get; }
}

A first stab at this might be to use the Reflection API:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
public class OrderGrid : UserControl
{
    private void InitializeCustomProperties()
    {
        PropertyInfo[] properties = typeof(Order).GetProperties();
 
        // start the awful vendor-specific binding between the
        // reflection pieces and the grid
        foreach (PropertyInfo pi in properties)
        {
            myGrid.Columns.Add(pi.Name);
        }
    }
 
    private void _myGrid_CellValueRequired(object sender, MyGridCellEventArgs e)
    {
        string columnName = myGrid.Columns[e.ColumnIndex];
        PropertyInfo[] properties = typeof(Order).GetProperties();
        PropertyInfo pi = Array.Find(properties, p => p.Name == columnName);
        if (pi == null) return;
 
        var orders = (List<Order>)myGrid.ItemsSource;
        Order order = orders[e.RowIndex];
 
        // now read the actual property value
        e.Value = pi.GetValue(order);
        e.Handled = true;
    }
}

Seems simple and harmless enough, right? Now let’s layer on a slight air of dynamic data—something that seemingly necessitates a reflection-based solution:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
public class Order
{
    public string Symbol { get; }
    public long Quantity { get; }
    public IList ExtendedAttributes { get; }
}
 
public class OrderGrid : UserControl
{
    private void InitializeCustomProperties()
    {
        PropertyInfo[] properties = typeof(Order).GetProperties();
 
        // start the awful vendor-specific binding between the
        // reflection pieces and the grid
        foreach (PropertyInfo pi in properties)
        {
            if (pi.Name != "ExtendedAttributes")
            {
                myGrid.Columns.Add(pi.Name);
            }
        }
    }
 
    private void _myGrid_CellValueRequired(object sender, MyGridCellEventArgs e)
    {
        PropertyInfo[] properties = typeof(Order).GetProperties();
 
        // only the first n - 1 columns represent reflection properties;
        // the last few columns come from ExtendedAttributes
        if (e.ColumnIndex < (properties.Length - 1))
        {
            string columnName = myGrid.Columns[e.ColumnIndex];
            PropertyInfo pi = Array.Find(properties, p => p.Name == columnName);
            if (pi == null) return;
 
            var orders = (List<Order>)myGrid.ItemsSource;
            Order order = orders[e.RowIndex];
 
            // now read the actual property value
            e.Value = pi.GetValue(order);
            e.Handled = true;
        }
        else
        {
            e.Value = (e.ColumnIndex - (properties.Length - 1));
            e.Handled = true;
        }
    }
}

It’s starting to get a little nastier now—as the Order class grows and its “ExtendedAttributes” properties get more complex, we end up doing more and more in the view to render this complex object properly—there is no metadata and no reflecting over “ExtendedAttributes”. We don’t know name or type information at the very least, and we certainly can’t decorate the items of the list with an Attribute like we could with a “real” C# property. There is no assurance that every Order’s ExtendedAttributes properties will behave the same way—we’re naïvely using index positioning to uniquely identify a property; what if my Order objects are coming from all over the place?

And how do DataTables always render so well? How does every grid under the sun know how to properly pull schema information from a DataTable using specific properties like DataTable.Columns and the appropriate accessors on every individual row? Does it really just boil down to this:

1
2
3
4
5
6
7
8
if (source.GetType() == typeof(DataTable))
{
    // DataTable specific code
}
else
{
    // a pile of ugly Reflection code 
}

And then what about the Windows Forms Designer? There are all sorts of properties that don’t show up, or show up with slightly different names, or slightly different characteristics than what one would expect through only reflection. Is there more switch/case hackery at work here?

The looks-like-a-DataTable-but-isn’t Approach (better)

It turns out that this problem is already solved, partially because of the original Windows Forms Designer, and partially because of DataTables. A rich API exists for allowing objects to describe themselves, and for lists of objects to publish information about their contents, even if the list is empty. System.Data.DataTable implements several interfaces, but the most important one for purposes of our discussion is
System.ComponentModel.IListSource, which contains an interesting method:

  • GetList: Returns an IList that can be bound to a data source from an object that does not implement an IList itself.

On DataTable, this returns an instance of DataView, which according to the documentation, “Represents a databindable, customized view of a DataTable for sorting, filtering, searching, editing, and navigation.” It turns out that a DataView is what is typically rendered by a grid. We can think of it as a a IList of DataRow objects (it’s actually a list of DataRowView, but that’s not terribly important here), which structurally, is much more similar to a list of boring C# objects.

There are a few other important interfaces that DataView implements other than IList (and ICollection and IEnumerable): ITypedList to provide schema information on the constituent rows (especially when the list is empty, because then we can’t ask the rows themselves) and IBindingList, which allows views to control sorting and filtering.

It turns out that if you implement a few of these key interfaces, you get all this magic for free, too.

The Quick How-To

  1. (Optional) Implement System.ComponentModel.IListSource on your “main” data class—you’d only need to do this if you didn’t want your main class to directly represent your “list” of data.
  2. Create a list class. It should provide implementation for a few interfaces:
    • System.ComponentModel.ITypedList.
      Your implementation of ITypedList.GetItemProperties(PropertyDescriptor[]) should return a PropertyDescriptorCollection that contains the “properties” of the rows of your data set. This interface is required when your list is empty so that a way of providing object metadata exists without resorting to querying the contents of the list.
    • System.Collections.IList.
      This is a hard requirement. Dealing with generic types generically is usually not straightforward and can sometimes require reflection; even to this day, a lot of code in WPF looks for (and makes use of) this interface.
    • System.Collections.Generics.IList<T>.
      I haven’t seen too much in the way of generic components requiring an implementation of this interface, but it’s a good idea to implement regardless—it’ll be easier to work with LINQ and other APIs that expect generic collections.
    • System.Collections.Specialized.INotifyCollectionChanged (optional).
      This is the same interface that classes like ObservableCollection<T> use to propagate their contents changes in WPF. (If you need Windows Forms support, you’ll have to instead implement…)
    • System.ComponentModel.IBindingList (optional).
      Implement this interface only if you need backwards compatibility with Windows Forms. It’s a bit of a hybrid interface, incorporating aspects of both ICollection and ICollectionView, but as a result of that, you’ll effectively limit clients to one view on your collection.
  3. (Optionally) Implement System.ComponentModel.ICustomTypeDescriptor on your “row” data class. I usually implement this by having each of my “rows” retain a reference to the “list”, and merely calling ITypedList.GetItemProperties(PropertyDescriptor[]) on the list. Implementing something once takes less time than implementing something twice.

    I like to implement ICustomTypeDescriptor on the row because it provides a context-insensitive way of retrieving properties of the row. If someone created a System.Collections.Generic.List<T> containing my “rows”, this list is still eligible for rendering in a grid without any additional work (except, of course, when the list is empty, because then there would be no way of providing schema information).

  4. Create a subclass of System.ComponentModel.PropertyDescriptor (read this if you don’t know what this class is for) for each property that you want your rows to have. You will want/need to implement a few important properties and methods:
    • public Type PropertyType { get; } (override required)
      The type of the property. If you want the type of values contained in the column to be completely wide open, then return typeof(object).
    • public TypeConverter Converter { get; } (override optionally)
      The converter that is used in conjunction with the property. For real properties, this would be defined through the usage of a [System.ComponentModel.TypeConverterAttribute] attribute on the property’s type or the property itself.
    • public Type ComponentType { get; } (override required)
      The type that defines the property. This is essentially your “row” class.
    • public object GetValue(object component); (override required)
      This is probably the most important method; this is the method that tells the system how to fetch a value from your row for the “property”.
    • public void SetValue(object component, object value); (override required)
      You can override this to set the value of this property on your row; otherwise, you can throw NotSupportedException if you want.
    • public bool IsReadOnly { get; } (override required)
      Return true or false, depending on whether or not the property should be publicly modifiable.
    • public bool CanResetValue(object component); (override required)
      When you’re using the Windows Forms Designer, this is the method that determines whether or not you can “reset” the value of the property to some default value. Other visualizers could make use of it too, but I haven’t seen too many that take advantage of it.
    • public void ResetValue(object component); (override required)
      This actually implements the resetting of a value as mentioned above. Either implement it or throw NotSupportedException.
    • public bool ShouldSerializeValue(object component); (override required)
      The Windows Forms Designer uses this method to determine whether or not a property’s value should be serialized. If a property was set to its default value, this method would return false because there would be no reason to actually persist the value of the property (since reapplying it wouldn’t result in a change).

The Quick Example

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
/// <summary>
/// The class that serves as the "entry point" for your data model.
/// </summary>
/// <remarks>
/// It's not always necessary to create a class like this, but it can be
/// helpful from a design standpoint. Here, like the DataTable, I expose
/// a collection of "columns" that client APIs can use for their own
/// purposes.
/// </remarks>
public class MyDataTable : IListSource
{
    private readonly MyDataRowCollection _dataRows;
    private readonly PropertyDescriptorCollection _properties;
 
    public MyDataTable(...)
    {
        _dataRows = new MyDataRowCollection(rows, columns);
        _properties = columns;
    }
 
    public MyDataRowCollection Rows { get { return _dataRows; } }
 
    public PropertyDescriptorCollection Columns { get { return _properties; } }
 
    /// <summary>
    /// Returns the underlying collection of elements for this source.
    /// </summary>
    /// <remarks>
    /// It is somewhat rare to see this implemented any other way than through
    /// an explicit implementation because you will generally want to
    /// expose a specific collection type to expose more advanced behavior.
 
    /// (We do the same here through the <see cref="Rows"/> property.
    /// </remarks>
    System.Collections.IList IListSource.GetList()
    {
        return _dataRows;
    }
 
    bool IListSource.ContainsListCollection
    {
        get { return false; }
    }
}
 
/// <summary>
/// The class that directly represents a collection of rows.
/// </summary>
/// <remarks>
/// You can either implement IList<T>/IList manually, or inherit from another
/// collection implementation. However, your particular use case will probably
/// only need or want to expose a subset of the functionality of those interfaces,
/// so it may be beneficial to implement the interfaces directly and hide some
/// collection methods through explicit implementation.
/// </remarks>
public class MyDataRowCollection
    : ITypedList, ReadOnlyCollection<MyDataRow>,
        IList<MyDataRow>, IList
{
    private readonly MyDataTable _parent;
 
    public MyDataRowCollection(MyDataTable parent, IList<MyDataRow> rows)
        : base(rows)
    {
        _parent = parent;
    }
 
    public MyDataTable ParentTable { get { return _parent; } }
 
    PropertyDescriptorCollection ITypedList.GetItemProperties(
            PropertyDescriptor[] listAccessors)
    {
        return _parent.Columns;
    }
 
    string ITypedList.GetListName(PropertyDescriptor[] listAccessors)
    {
        return null;
    }
}
 
/// <summary>
/// The class that represents the data in a "row".
/// </summary>
public class MyDataRow : ICustomTypeDescriptor
{
    private readonly MyDataRowCollection _parentCollection;
    private readonly object[] _rows;
 
    internal MyDataRow(MyDataRowCollection parentCollection)
    {
        _parentCollection = parentCollection;
    }
 
    public object this[string name]
    {
        get { return _rows[ColumnIndexFromName(name)]; }
        set { _rows[ColumnIndexFromName(name)] = value; }
    }
 
    private int ColumnIndexFromName(string name)
    {
        PropertyDescriptor pd = _parentCollection.ParentTable.Columns[name];
        if (pd != null)
        {
            return _parentCollection.ParentTable.Columns.IndexOf(pd);
        }
        return -1;
    }
 
    AttributeCollection ICustomTypeDescriptor.GetAttributes()
    {
        return AttributeCollection.Empty;
    }
 
    string ICustomTypeDescriptor.GetClassName()
    {
        return typeof(MyDataRow).FullName;
    }
 
    string ICustomTypeDescriptor.GetComponentName()
    {
        return typeof(MyDataRow).Name;
    }
 
    TypeConverter ICustomTypeDescriptor.GetConverter()
    {
        return null;
    }
 
    EventDescriptor ICustomTypeDescriptor.GetDefaultEvent()
    {
        return null;
    }
 
    PropertyDescriptor ICustomTypeDescriptor.GetDefaultProperty()
    {
        return null;
    }
 
    object ICustomTypeDescriptor.GetEditor(Type editorBaseType)
    {
        return null;
    }
 
    EventDescriptorCollection ICustomTypeDescriptor.GetEvents(Attribute[] attributes)
    {
        return EventDescriptorCollection.Empty;
    }
 
    EventDescriptorCollection ICustomTypeDescriptor.GetEvents()
    {
        return EventDescriptorCollection.Empty;
    }
 
    PropertyDescriptorCollection ICustomTypeDescriptor.GetProperties(Attribute[] attributes)
    {
        return ((ITypedList)_parentCollection).GetItemProperties(null);
    }
 
    PropertyDescriptorCollection ICustomTypeDescriptor.GetProperties()
    {
        return ((ITypedList)_parentCollection).GetItemProperties(null);
    }
 
    object ICustomTypeDescriptor.GetPropertyOwner(PropertyDescriptor pd)
    {
        return this;
    }
}
 
/// <summary>
/// The class that represents information about a "column".
/// </summary>
public class MyDataColumn : PropertyDescriptor
{
    private readonly Type _propertyType;
 
    public MyDataColumn(string name, Type type, Attribute[] attrs) : base(name, attrs)
    {
        _propertyType = type;
    }
 
    public override object GetValue(object component)
    {
        return ((MyDataRow)component)[this.Name];
    }
 
    public override void SetValue(object component, object value)
    {
        ((MyDataRow)component)[this.Name] = value;
    }
 
    public override bool IsReadOnly
    {
        get { return false; }
    }
 
    public override Type PropertyType
    {
        get { return _propertyType; }
    }
 
    public override Type ComponentType
    {
        get { return typeof(MyDataRow); }
    }
 
    public override void ResetValue(object component)
    {
        throw new NotSupportedException();
    }
 
    public override bool CanResetValue(object component)
    {
        return false;
    }
 
    public override bool ShouldSerializeValue(object component)
    {
        return false;
    }
}

Now Use It

There is a lot that got glossed over here, but it’s a very deep subject. It amazes me how few .NET developers know about and make use of this layer (either by consuming objects in this fashion or by publishing objects that conform to this API), especially because there is so much buy-in from third-party vendors and projects, and especially because this is one of the few components of the binding API that has not had to change much at all between Windows Forms and WPF.

If you are working on an infrastructure project of any kind that purports to expose data to a .NET GUI (Windows Forms or WPF), absolutely make use of this in order to keep both your view and your data model as flexible and independent as possible; you’ll be happy you did. —DKT

Of Magic and Properties in C# (and why Reflection isn’t the answer)

Reflection can be used to discover information about classes at runtime. And of course, you can use the Reflection API to cause all kinds of magic. You could iterate through a bunch of objects and output all of their property values. Or you could increment all the numeric properties by one. Or find all the properties tagged as [Bold] and draw them on the screen in a grid in a funny way. Or write each property-change value to a log file.

Considering the following class:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
public class Order : DependencyObject, INotifyPropertyChanged
{
    public static readonly DependencyProperty SymbolProperty =
        DependencyProperty.Register(
            "Symbol", typeof(string), typeof(Order),
            new PropertyMetadata(null));
 
    private double _quantity;
    private OrderSide _side;
 
    public Order()
    {
        _quantity = 100.0;
    }
 
    public string Symbol
    {
        get { return (string)GetValue(SymbolProperty); }
        set { SetValue(SymbolProperty, value); }
    }
 
    [DefaultValue(100.0)]
    public double Quantity
    {
        get { return _quantity; }<br>
        set
        {
            if (_quantity != value)
            {
                _quantity = value;
                OnPropertyChanged(new PropertyChangedEventArgs("Quantity"));
            }
        }
    }
 
    [TypeConverter(typeof(OrderSideConverter))]
    public OrderSide Side
    {
        get { return _side; }
        set
        {
            if (_side != value)
            {
                _side = value;
                if (SideChanged != null)
                {
                    SideChanged(this, EventArgs.Empty);
                }
            }
        }
    }
 
    public event EventHandler SideChanged;
 
    public event PropertyChangedEventHandler PropertyChanged;
 
    protected virtual void OnPropertyChanged(PropertyChangedEventArgs e)
    {
        if (PropertyChanged != null)
        {
            PropertyChanged(this, e);
        }
    }
}

We could easily figure out how to abstractly interact with a class like this…right?

What, really, is a property?

I suppose you could say it’s just a list of optional attributes, optional modifiers, a type, a property name, and a getter, setter, or both. But much as we argue to our Java counterparts that a property is more than a simple wrapping of a get() and set() method, a full-fledged property is a bit more than its individual curly braces and access modifiers. One of the biggest aspects of a real property is change notification, which is never actually part of the property declaration*. And judging from the pathological Order class that started the post, there are a lot of ways to skin that notifying cat. And I haven’t even mentioned the “fake” properties in things like a System.Data.DataTable, where the Reflection API is of no real use at all.

At a minimum, really, a property is a field that is:

  • observable
  • gettable, and optionally settable (fields that are settable and not gettable are weird and we won’t talk about them any more)
  • otherwise describable through custom attributes that provide additional metadata

More than just a “property”, and definitely more than what you get back when you reflect over a type and pull out all of its System.Reflection.PropertyInfos. And, again, reflecting over a DataTable tells you nothing about schema information at all.

If only there were a way of describing a property…

…oh wait, there is. There’s the System.ComponentModel.PropertyDescriptor class:

Although it may seem redundant to have two classes to describe properties, this isn’t quite the same information as can be retrieved through reflection. There is less emphasis on being an object model for compiled code and more emphasis on providing tools for abstractly working with properties—for example, there is no concept of access levels (if you’re holding an instance of PropertyDescriptor, it might as well be public). Some concepts exposed as Attributes become first-level concepts in this API (BrowsableAttribute, DisplayNameAttribute, TypeConverterAttribute become IsBrowsable, DisplayName, and Converter, for example). There is also a linkage to the event model that provides change notifications through AddValueChanged and RemoveValueChanged.

You could probably picture being able to build a generic grid based on nothing more than a list of objects and a list of PropertyDescriptors about them; in fact, that’s exactly what any grid worth its .NET salt does—from interacting with POCOs to DataTables. And a good portion of this API was driven by the requirements of the original Windows Forms Designer, one of the most “abstract” views of all.

To get a list of PropertyDescriptors, call one of the overloads of TypeDescriptor.GetProperties. You always get back an appropriate PropertyDescriptor for the “type” of the property—dependency properties give you back instances of DependencyPropertyDescriptor, and internal subclasses of PropertyDescriptors are returned for other properties (and they’ll properly take into account implementations of the INotifyPropertyChanged interface). In fact, one of the common methods for listening for changes in dependency properties is really just taking advantage of the fact that the PropertyDescriptor API provides a way for arbitrary clients to listen for changes in property values whereas DependencyProperty does not.

This is also the appropriate API that you should use when working with object models abstractly. Reflection isn’t going to always provide you all of the information you need, and you have a lot more to self-assemble. Reflection is also the slowest of all possibilities. Although much of the work done through PropertyDescriptors uses reflection under the hood, it doesn’t necessarily need to: interacting with DependencyProperty through this API requires no reflection, and calling DependencyPropertyDescriptor.GetValue(object) is a heck of a lot faster than calling the corresponding PropertyInfo.GetValue(object).

PropertyDescriptor is also a facet of the appropriate API when building an abstract data model—a model where the properties are not known at compile-time and instead drive controls at run-time. Building views that work off of this abstract data model have been done to death and there is almost never a reason to build your own any more**. Instead, next time I’ll cover the road less-travelled, your very own abstract data model. —DKT

*It’s actually a bit of a shame that the property syntax doesn’t support a built-in event that is every much as part of the property declaration as the getter and setter. Part of what makes a PropertyDescriptor necessary is exactly that the property definition isn’t the one-stop shop for everything property-related.

**Off the top of my head, System.Windows.Forms.DataGridView, Syncfusion, Infragistics support binding in Windows Forms, the WPF Toolkit grid for WPF, and ChartFX does an admirable WPF-y job for your charting needs.

Finding all of the bindings on an object in WPF

Disclaimer: Don’t do it. Whatever brought you here, chances are you’re doing something wrong, because you’re most likely about to do something very gnarly that you shouldn’t be doing with your view and a whole pile of code-behind. You could make your viewmodel more intelligent, you could implement IDataErrorInfo or otherwise use an attached behavior and put your errors on the viewmodel where they belong—chances are, there is something that you could be doing that won’t require you to need to identify all of the bindings on an object.

That being said, here’s how it’s done.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
public static Dictionary<DependencyProperty, BindingBase> GetAllBindings(DependencyObject d)
{
    if (d == null)
    {
        throw new ArgumentNullException("d");
    }
 
    var bindings = new Dictionary<DependencyProperty, BindingBase>();
    var lve = d.GetLocalValueEnumerator();
    while (lve.MoveNext())
    {
        DependencyProperty dp = lve.Current.Property;
        var expr = BindingOperations.GetBindingBase(d, dp);
        if (expr != null)
        {
            bindings.Add(dp, expr);
        }
    }
    return bindings;
}

If you are doing (or are thinking about doing) anything fancy with reflection to determine all properties and values on an object, you may want to consider using DependencyObjects instead. You get everything that DependencyObjects give you for free, and you can do everything without resorting to slow (and often obfuscating) reflection. —DKT

Bindable Validation Errors

Wouldn’t it be great if you could do this in XAML:

1
2
<TextBox Text="{Binding Path=IngredientName}"
         Validation.Error="{Binding Path=IngredientNameError}"/>

Turns out you can:

1
2
<TextBox Text="{Binding Path=IngredientName}"
         vh:ValidationHelper.Error="{Binding Path=IngredientNameError}"/>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
using System;
using System.Globalization;
using System.Windows;
using System.Windows.Controls;
using System.Windows.Data;
 
namespace Pelebyte.ValidationUtils
{
  /// <summary>
  /// Provides validation helper methods and utilities.
  /// </summary>
  public static class ValidationHelper
  {
    /// <summary>
    /// The <see cref="DependencyProperty"/> that identifies the
    /// <see cref="P:ValidationHelper.Error"/> attached property.
    /// </summary>
    public static readonly DependencyProperty ErrorProperty =
      DependencyProperty.RegisterAttached(
        "Error", typeof(string), typeof(ValidationHelper),
        new PropertyMetadata(null, OnErrorChanged));
 
    /// <summary>
    /// Gets the custom error applied to the control.
    /// </summary>
    /// <param name="d">
    /// The control to get the current custom error for.
    /// </param>
    /// <returns>
    /// A string that represents the custom error.
    /// </returns>
    public static string GetError(DependencyObject d)
    {
      return (string)d.GetValue(ErrorProperty);
    }
 
    /// <summary>
    /// Sets the custom error applied to the control.
    /// </summary>
    /// <param name="d">
    /// The control to set the current custom error for.
    /// </param>
    /// <param name="value">
    /// A string that represents the custom error. An empty or
    /// <c>null</c> string clears the error on the field.
    /// </param>
    public static void SetError(DependencyObject d, string value)
    {
      d.SetValue(ErrorProperty, value);
    }
 
    /// <summary>
    /// Called when the <see cref="P:ValidationHelper.Error"/>
    /// attached property changes value.
    /// </summary>
    /// <param name="d">
    /// The <see cref="DependencyObject"/> that is having its
    /// <see cref="P:ValidationHelper.Error"/> property changing
    /// values.
    /// </param>
    /// <param name="e">
    /// The <see cref="DependencyPropertyChangedEventArgs"/>
    /// instance containing the event data.
    /// </param>
    private static void OnErrorChanged(
      DependencyObject d, DependencyPropertyChangedEventArgs e)
    {
      BindingExpressionBase expr;
 
        expr = BindingOperations.GetBindingExpressionBase(
          d, BindingErrorTargetProperty);
 
      if ((expr == null) && (e.NewValue != null))
      {
          // create a new binding between two properties that
          // we're only going to use so that we have an avenue
          // of our own to attach binding errors
          Binding b = new Binding();
          b.Source = d;
          b.Path = new PropertyPath(BindingErrorSourceProperty);
          b.Mode = BindingMode.OneWayToSource;
 
          b.ValidationRules.Add(new InternalRule(d));
 
          expr = BindingOperations.SetBinding(
              d, BindingErrorTargetProperty, b);
      }
 
        if (expr != null)
      {
        expr.UpdateSource();
      }
      }
    }
 
    /// <summary>
    /// The internal implementation of <see cref="ValidationRule"/>
    /// that returns our real "error" whenever we want.
    /// </summary>
    private sealed class InternalRule : ValidationRule
    {
      private readonly DependencyObject _d;
 
      /// <summary>
      /// Initializes a new instance of the
      /// <see cref="InternalRule"/> class specific to a
      /// particular object. The
      /// <see cref="P:ValidationHelper.Error"/> property of the
      /// given object will be used to determine the error on the
      /// object.
      /// </summary>
      /// <param name="d">
      /// The <see cref="DependencyObject"/> to return errors
      /// for.
      /// </param>
      public InternalRule(DependencyObject d)
      {
        _d = d;
      }
 
      public override ValidationResult Validate(
          object value, CultureInfo cultureInfo)
      {
        // completely ignore /value/ and look for the error
        // on the DependencyObject that was given to us in
        // our constructor
        string error = GetError(_d);
 
        if (string.IsNullOrEmpty(error))
        {
          // an empty or null string means no error
          return ValidationResult.ValidResult;
        }
        else
        {
          // anything else means an error
          return new ValidationResult(false, error);
        }
      }
    }
 
    // two private dependency properties that we use internally to
    // set up our useless binding
 
    private static readonly DependencyProperty
      BindingErrorSourceProperty =
        DependencyProperty.RegisterAttached(
          "BindingErrorSource", typeof(object),
          typeof(ValidationHelper),
          new PropertyMetadata(null));
 
    private static readonly DependencyProperty
      BindingErrorTargetProperty =
        DependencyProperty.RegisterAttached(
          "BindingErrorTarget", typeof(object),
          typeof(ValidationHelper),
          new PropertyMetadata(null));
  }
}

Why it works

The System.Windows.Controls.Validation.Errors property is a collection for a reason—it’s a collection of all of the binding errors on the object.

For most controls, it’s not readily apparent that more than one binding on the same control could actually fail:

1
<TextBox Text="{Binding Path=IngredientName, ValidatesOnDataErrors=True}"/>

But if you had a complex control where more than one property was controlled directly by the user, it’s more obvious why you’d possibly need a collection instead of a single object:

1
2
3
4
5
<!-- this slider has two thumbs; the user drags both of
     them around to specify a range -->
<my:DoubleSlider
    MinValue="{Binding Path=Minimum, ValidatesOnDataErrors=True}"
    MaxValue="{Binding Path=Maximum, ValidatesOnDataErrors=True}"/>

If both of these properties had errors, WPF would collect both of them in Validation.Errors.

So the ValidationHelper code above is essentially emulating the following XAML snippet in C# (note that the code in green isn’t actually possible—that’s why we’re writing this code in C#):

<TextBox x:Name="MyTextBox"
Text="{Binding Path=IngredientName, ValidatesOnDataErrors=True}"
<vh:ValidationHelper.BindingErrorTarget>
<Binding Source="MyTextBox"
Path="(vh:ValidationHelper.BindingErrorSource)">
Mode="OneWayToSource"
<vh:ValidationHelper+InternalRule (MyTextBox)>
</Binding>
</vh:ValidationHelper.BindingErrorTarget>
</TextBox>

(It may help at this point to open another a window with my little sketch of how WPF data binding data flows around.)

Our hidden BindingErrorTarget property participates in validation just like a regular property. So when the value of ValidationHelper.Error is changed:

  1. Force the binding on ValidationHelper.BindingErrorTarget to be re-evaluated from the target back to the source (call BindingExpression.UpdateSource()).
  2. Our validation rule InternalRule gets called as part of the normal validation process; our rule will return ValidationHelper.Error instead of validating the incoming value.

It doesn’t actually matter what the values of the BindingErrorTarget and BindingErrorSource properties are; they only exist so that we can key into the binding system.

Why?

IDataErrorInfo and ValidatesOnDataErrors would seemingly make this technique redundant: why go through all this trouble to expose a binding site for errors on the viewmodel when you could just implement IDataErrorInfo?

  • IDataErrorInfo is only consulted when the source property changes value—if you have a data source whose errors can dynamically change independently of the source property, there isn’t a clean way from the viewmodel to force the view to pick up your changes in the error. (If your source implements INotifyPropertyChanged, you can raise PropertyChanged for the relevant property, but if you use DependencyObjects, there is no way to force the binding system to re-evaluate the property from the viewmodel—you’d need the BindingExpression, which then requires your viewmodel to have knowledge of the view).
  • It may be inconvenient or impossible from a design standpoint to have the object containing your error property to also implement IDataErrorInfo.
    If your error comes from a different object than your viewmodel, then your implementation of IDataErrorInfo would need to know where to fetch it.
  • If you ever had a situation where you had a ValidationRule that you wanted to bind to, you’ve probably discovered that it’s never going to happen—ValidationRule, not being a DependencyObject, doesn’t support binding—not in code, and certainly not in XAML. An attached behavior or subclass (you never need to subclass in WPF) is really your only recourse for situations like this.
  • You don’t want to rely on .NET 3.0 SP1 or .NET 3.5 SP1. Thankfully, Windows 7 comes out of the box with it, but Vista, and certainly XP, do not. This technique works with every version of WPF.

Remember that any DependencyProperty that is the target of a binding can hold validation errors; also remember that you can add attached properties to any object. That means you can add arbitrary validation errors to any DependencyObject through this trick. You can also drive this error generation off of whatever you want—I chose the simplest example and created an attached property specifically to hold an error that will be reported, unchanged, right back through the binding system. You could create a new attached behavior, have the TextBox.Text property and TextBox.TextChanged event drive the error; then set up an attached behavior that provides validation on the text of the TextBox without having to provide an instance of ValidationRule. —DKT


In the interests of not cluttering the picture, I left out a little magic trick:

  • We could attach a converter to the binding where IValueConverter.ConvertBack() returns Binding.DoNothing; this would stop data from ever flowing to the source.
  • Then we could actually drop the BindingErrorSource property and reuse an existing one (like Tag or something), knowing that our binding will never actually change the value or otherwise interfere with it.

And why use two properties when you could get away with just one?

Sep 24: Updated the OnErrorChanged to fix a bug that would cause the error state to never actually clear…whoops! –DKT

Data Flow in a WPF BindingExpression

I was writing up some posts on how ValidationRules work, and of course, I ended up confusing myself, because data binding can get confusing once you start throwing in IValueConverters, ValidationRules, target objects, source objects…yikes.

So I decided to sketch out a simple diagram for my own benefit. And after 15 minutes, I realized it wasn’t so simple:





Click the image for a larger version

Hope it helps make WPF binding less confusing… —DKT

Dealing with Large Data Sets

Large data sets are increasingly becoming a reality of front-ends, GUIs and web apps alike. Users are getting more comfortable with tons of data, and are generating a lot of it too. Large data sets are as a much a reality as multi-threading—they’re both incredibly annoying to deal with, and they’re both not ignorable with today’s computers (and users).

Strategies

There are four basic strategies to dealing with large data sets: Ignoring the problem, Filtering, Paging, and Data Virtualization.

Ignoring the problem

Don’t do this unless your users never deal with more than ten things at a time. (Incidentally, I have never met anyone with an e-mail account with less than ten e-mails, or anyone with an iPod with ten songs—have you?)

Instant F. Why’d you bother writing the app in the first place? Your users are furious because they’re either spending large quantities of time staring at progress bars, or worse, a frozen screen.

Filtering

Filtering sometimes avoids the need for data virtualization, but this is usually nothing more than a band-aid for a broken arm—watch as your first user types in a query that somehow manages to return 90% of your data set and bring your app to its knees. In order for filtering to be effective, your user must be able and willing to provide enough criteria to narrow down the data set into something that’s not a “large data set”. From the user’s point of view, there is a very high up-front cost in using the UI—more often than not, “filtering” means twenty text boxes/dropdowns/checkboxes, all using names for fields that no one really understands. (“Hey, Bob—what do you suppose the “Type” dropdown is for? It’s got ‘PCX’, ‘AVW’, and ‘BOO’ in it.”) Those cryptic fields are not self-describing because the only thing that could help—the data itself—is locked behind a dizzying array of options.

This’ll get you anything from a C+ to a F, depending on the data set. Everything will be fine until That User types in “T” for a search query. (By the way, did you know “T” is the ticker symbol for AT&T?)

Paging

Implementing paging shows that you care about your users’ time. The GUI doesn’t feel overloaded or clunky and it’s a familiar concept to the user: everyone knows what “Page 5 of 752 (81–100, 15,035 items total)” means. It’s not a coincidence that most (all?) webmail apps show things in pages of 20 or 50 or so e-mails; the browser would choke over trying to render a <table> with thousands of rows and would provide for a very nasty user experience.

It’s not perfect, though:

  • Changing data. If you’re dealing with a data set where rows are frequently added/removed while the user is looking at it, paging is becomes a mediocre solution. You can then either keep the page boundaries stable (which requires at least the server to remember the full data set at the time of initial query) or ignore the problem (which would cause rows to randomly disappear or appear twice as the user Next Pages through your data set).
  • Slower user experience. Most operations are a request-reply back to the server—sorting, grouping, filtering, etc. Most clicks make the user wait.
  • No table scanning. There is something to be said for quickly scrolling through thousands of items, scanning for something when a search isn’t giving me back what I want. (Was it “Bob’s Restaurant”? Oh! I see it there on Row 452; I was actually looking for “Bill’s Burgers and Fries”.)
  • n-dimeinsional Data. Paging is very linear. If you have more than one dimension of data, paging is not very useful. I’m referring to every data set that looks better as a pretty graph.

This can get you as high as an A– (it definitely works for all the search engines) down to a C– (can you imagine what working with iTunes would be like if they made you “Click here for the next 20 results”?).

Data Virtualization

Make the user think you loaded everything. When they click the sort headers, they don’t know that you’re actually sucking out objects back from disk. When they’re whipping through the scrollbar, they have no idea that you’re grabbing loading the next 100 rows into memory.

The only downside? It’s obnoxiously difficult on most development platforms to get right.

What is virtualization?

Any developer who has worked with a grid control knows how important virtualization is for performance. The basic idea behind virtualization is very simple: if the user can’t see it, then the computer doesn’t need it. You can present the user with a giant scrollable expanse of cells; the grid will handle creating/dropping/recycling grid cells as necessary. Any grid control worth anything supports virtualization; even the basic, built-in ListBox and ListView in WPF (and Silverlight 3.0) support row-based virtualization. But in most simple binding situations, you generally need a collection of all of the data that you want to render for your grid loaded in memory.

For applications that don’t delve into large collections of data, there isn’t much point. The added complexity isn’t worth it. But that being said, a lot of commonly-used types of applications benefit from data virtualization:

  • Music library applications—think iTunes with tens of thousands of files of music, movies, etc.
  • Desktop mail applications—the inbox with thousands of e-mails
  • Google Maps—possibly the only web app that I can think of that shows off what data virtualization can truly look like

It would be really neat if Google could take the same approach with Mail as they do in Maps—a virtual table that doesn’t actually download all the mail, but instead lets you pan through the inbox using a fake scrollbar as if everything was already downloaded, but they don’t because it’s a pain in the neck and paging usually works just as well. (Or maybe they just haven’t thought of it yet. If you see that feature within the next few months or years, you can thank me for giving them the idea.)

A+, if you pull off the illusion flawlessly.

How do I implement it?

Most grid controls allow you to hook into their virtualization so that you could conceivably provide your own data virtualization to go along with the control virtualization native to the control, but there isn’t much out of the box in .NET to help you with the data side. There is no built-in “virtual” ICollection<T>, although there is nothing to stop someone from implementing one. Indeed, Beatriz Costa wrote about some data virtualization techniques that can be used in WPF and Silverlight, but they aren’t perfect.

Although .NET is a great platform for quickly building apps, Cocoa is a great platform for quickly building apps that can handle lots and lots of data. Cocoa on the Mac (and, as of iPhone OS 3.0, Cocoa Touch) supports data virtualization out-of-the-box using Core Data. Core Data provides a mechanism for defining a data model, and then it takes care of persistence of objects for you. On Cocoa Touch, NSFetchedResultsController acts as an intermediary between Core Data and your controls, essentially handling loading and unloading of data as the UI requires it. Given the memory constraints on the iPhone, Apple really had to provide a solid solution for data virtualization—keeping a list of even a few hundred moderately complicated objects could cause your app to run out of memory and crash.

Core Data uses SQLite under the hood*, and it is a very good solution for rolling your own data virtualization scheme. SQLite is an embedded database engine. There are no servers to install, no processes to run—it’s just a library that allows SQL access to a file on the file system. It has the semantics of a database and the semantics of a file, depending on which one is more convenient. System.Data.SQLite is an excellent ADO.NET provider that you can use in .NET (unfortunately, because it uses native code, it’s off-limits in Silverlight):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
using System.Data.SQLite;
 
public static class SQLiteTest
{
    public static void Main()
    {
        // colors.db is just a file that will be created in the
        // current directory if it doesn't exist
        using (var conn = new SQLiteConnection("Data Source=colors.db"))
        {
            conn.Open();
            using (IDbCommand cmd = conn.CreateCommand())
            {
                cmd.CommandText = "CREATE TABLE Colors(id INTEGER PRIMARY KEY, name TEXT)";
                cmd.ExecuteNonQuery();
 
                cmd.CommandText = "INSERT INTO Colors(name) VALUES(\"Red\")";
                cmd.ExecuteNonQuery();
 
                cmd.CommandText = "INSERT INTO Colors(name) VALUES(\"Green\")";
                cmd.ExecuteNonQuery();
 
                cmd.CommandText = "INSERT INTO Colors(name) VALUES(\"Blue\")";
                cmd.ExecuteNonQuery();
            }
            using (IDbCommand cmd = conn.CreateCommand())
            {
                cmd.CommandText = "SELECT name FROM Colors";
                using (IDataReader reader = cmd.ExecuteReader())
                {
                    System.Console.WriteLine(reader.Read(0));
                }
            }
        }
    }
}

Because the database is local, queries are fast from start to finish—no network latency issues here. And because it’s just a file, you can throw it away when you’re done. SQLite makes an excellent backbone to any data virtualization scheme because of its hybrid semantics (simplified random access like a database, convenience like a flat file).

*I’m purposely leaving out Core Data’s binary and XML serialization because they require the entire object graph to be loaded into memory, and if you’re going to do that, then what’s the point?

But large data sets aren’t that important…

Every mature application that I have ever been a part of has had to tackle the issue of larger data sets at some point. In my personal experience, thousands of entities (hundreds if you’re talking about images) is enough to be considered a “large data set” in that your application begins to visibly suffer.

I’d love to recommend data virtualization as the route to go for handling large data sets in .NET, but there really isn’t enough in the way of frameworks that provides an out-of-the-box solution. Implement paging unless you really need data virtualization. If you’re working on iPhone apps, drop everything and learn Core Data if you haven’t already. You’d be surprised at how much you don’t need to worry about. But regardless of what platform you’re developing a front-end on, you should always at least ask yourself how your GUI might respond to thousands of x. because one day it’s going to have to. —DKT

Mark Buherle is looking for more records to break

A no-hitter, a perfect game, and 45 consecutive batters retired without reaching base. He went 49 consecutive starts going six or more innings (ended on a boneheaded ejection with two outs in the 6th in Baltimore after hitting a batter—I’m never going to forget what a demented call that was) And he does it with an 87 mph fastball. Who needs a power arm when you can have a crafty lefty who knows what he’s doing?

Baseball wasn’t all that big where I grew up, so I didn’t really follow the sport until I moved to Chicago near where the White Sox play. I decided that if I was going to be stuck in traffic every summer when the Sox were in town, that I might as well root them on. Buherle pitched the baseball game I ever saw live, and—big surprise—the White Sox won. It’s a shock when he loses, and it’s joy when he’s on the mound.

Congrats, Mr. Buherle: you show the rest of us what determination and effort can get you. It truly is incredible… —DKT

A Hello World Order Entry app in WPF

So you’ve decided that Windows Forms is more trouble than its worth. You’re all set to start writing a WPF app. You fire up Visual Studio 2008, create a new solution, and see absolutely nothing about this fancy MVVM that seems to have taken the world by storm. No standard template—just a “WPF Application” that, when you start it, gives you the same blank Window that you used to get when you were creating Windows Forms apps. Really?

To be fair, I don’t think even the gang in Redmond realized just how far the community would run with MVVM and a lot of what we think of as staples of MVVM design now simply weren’t even thought of when Microsoft was trying to rush VS 2008 out the door. Visual Studio 2010 will provide better out-of-the-box guidance in creating a well-factored WPF app, but that’s not here quite yet.

Until then, give Order Entry a whirl. It’s a simple app with a simple purpose—provide a simple order entry ticket (buy/sell, symbol, price, and quantity). It sets up the basics of defining a view and a viewmodel that exposes the properties of a model in a way more conducive to consumption by the view.

It illustrates:

  1. Binding a view to a viewmodel
  2. Separation of the view (visual elements) from the viewmodel (data, non-visual resources)
  3. A simple ICommand implementation that disables/enables the Submit button

It’s not perfect because it lacks two things that every real GUI has (which we’ll fix in later posts):

  1. Threading. When you press Submit, all the “work” happens on the GUI thread.
  2. Validation. When you type something in the view that is not physically storable in the viewmodel (like entering in “Abracadabra” as a quantity), the screen changes to indicate an error on the field, but it doesn’t stop the user from submitting (viewmodel and its ICommand don’t know that the view is in an error state—only the view knows).

Validation and WPF data binding are tricky to get right. We’ll talk about that in later posts too. But for now, here’s version 1 of the Order Entry app to chew on. —DKT