Uniqueness of Simple Demographics in the U.S. Population [Link]

Sometimes, math and statistics makes me sad:

“It was found that 87% (216 million of 248 million) of the population in the United States had reported characteristics that likely made them unique based only on {5-digit ZIP, gender, date of birth}. About half of the U.S. population (132 million of 248 million or 53%) are likely to be uniquely identified by only {place, gender, date of birth}, where place is basically the city, town, or municipality in which the person resides. And even at the county level, {county, gender, date of birth} are likely to uniquely identify 18% of the U.S. population. In general, few characteristics are needed to uniquely identify a person.”

It’s quite terrifying to know that such a birthdate, a zip code, and a gender could potentially pass as a unique identifier for someone with the vast and overreaching “anonymized” databases that exist out there.


http://privacy.cs.cmu.edu/dataprivacy/papers/LIDAP-WP4abstract.html

http://arstechnica.com/tech-policy/news/2009/09/your-secrets-live-online-in-databases-of-ruin.ars

—DKT

Building Custom Abstract Data Models in .NET

One of the easiest things to do in .NET is take either a list of vanilla C# objects or the contents of a System.Data.DataTable and show it in a grid through some kind of data binding. It was possible in the WinForms days with the System.Windows.Forms.DataGridView or a vendor-supplied grid (Syncfusion, Infragistics, etc.). It is still very much possible in WPF with the WPF Toolkit grid and any number of (and ChartFX gets special kudos for being able to “figure out” which columns/series are important and graph them by default).

But what if DataTables aren’t your thing, but vanilla C# objects aren’t “dynamic” enough for you? What if you need custom on-the-fly columns but don’t/can’t deal with the headaches of ADO.NET?

(You can skip straight to the Quick How-To if you’d like.)

Using Reflection to build a dynamic grid (don’t do this)

Let’s step back for a bit and consider the vanilla C# object case. Let’s say I wanted to render a list of these objects without using the automagic abilities that seem to be provided in every grid:

1
2
3
4
5
public class Order
{
    public string Symbol { get; }
    public long Quantity { get; }
}

A first stab at this might be to use the Reflection API:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
public class OrderGrid : UserControl
{
    private void InitializeCustomProperties()
    {
        PropertyInfo[] properties = typeof(Order).GetProperties();
 
        // start the awful vendor-specific binding between the
        // reflection pieces and the grid
        foreach (PropertyInfo pi in properties)
        {
            myGrid.Columns.Add(pi.Name);
        }
    }
 
    private void _myGrid_CellValueRequired(object sender, MyGridCellEventArgs e)
    {
        string columnName = myGrid.Columns[e.ColumnIndex];
        PropertyInfo[] properties = typeof(Order).GetProperties();
        PropertyInfo pi = Array.Find(properties, p => p.Name == columnName);
        if (pi == null) return;
 
        var orders = (List<Order>)myGrid.ItemsSource;
        Order order = orders[e.RowIndex];
 
        // now read the actual property value
        e.Value = pi.GetValue(order);
        e.Handled = true;
    }
}

Seems simple and harmless enough, right? Now let’s layer on a slight air of dynamic data—something that seemingly necessitates a reflection-based solution:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
public class Order
{
    public string Symbol { get; }
    public long Quantity { get; }
    public IList ExtendedAttributes { get; }
}
 
public class OrderGrid : UserControl
{
    private void InitializeCustomProperties()
    {
        PropertyInfo[] properties = typeof(Order).GetProperties();
 
        // start the awful vendor-specific binding between the
        // reflection pieces and the grid
        foreach (PropertyInfo pi in properties)
        {
            if (pi.Name != "ExtendedAttributes")
            {
                myGrid.Columns.Add(pi.Name);
            }
        }
    }
 
    private void _myGrid_CellValueRequired(object sender, MyGridCellEventArgs e)
    {
        PropertyInfo[] properties = typeof(Order).GetProperties();
 
        // only the first n - 1 columns represent reflection properties;
        // the last few columns come from ExtendedAttributes
        if (e.ColumnIndex < (properties.Length - 1))
        {
            string columnName = myGrid.Columns[e.ColumnIndex];
            PropertyInfo pi = Array.Find(properties, p => p.Name == columnName);
            if (pi == null) return;
 
            var orders = (List<Order>)myGrid.ItemsSource;
            Order order = orders[e.RowIndex];
 
            // now read the actual property value
            e.Value = pi.GetValue(order);
            e.Handled = true;
        }
        else
        {
            e.Value = (e.ColumnIndex - (properties.Length - 1));
            e.Handled = true;
        }
    }
}

It’s starting to get a little nastier now—as the Order class grows and its “ExtendedAttributes” properties get more complex, we end up doing more and more in the view to render this complex object properly—there is no metadata and no reflecting over “ExtendedAttributes”. We don’t know name or type information at the very least, and we certainly can’t decorate the items of the list with an Attribute like we could with a “real” C# property. There is no assurance that every Order’s ExtendedAttributes properties will behave the same way—we’re naïvely using index positioning to uniquely identify a property; what if my Order objects are coming from all over the place?

And how do DataTables always render so well? How does every grid under the sun know how to properly pull schema information from a DataTable using specific properties like DataTable.Columns and the appropriate accessors on every individual row? Does it really just boil down to this:

1
2
3
4
5
6
7
8
if (source.GetType() == typeof(DataTable))
{
    // DataTable specific code
}
else
{
    // a pile of ugly Reflection code 
}

And then what about the Windows Forms Designer? There are all sorts of properties that don’t show up, or show up with slightly different names, or slightly different characteristics than what one would expect through only reflection. Is there more switch/case hackery at work here?

The looks-like-a-DataTable-but-isn’t Approach (better)

It turns out that this problem is already solved, partially because of the original Windows Forms Designer, and partially because of DataTables. A rich API exists for allowing objects to describe themselves, and for lists of objects to publish information about their contents, even if the list is empty. System.Data.DataTable implements several interfaces, but the most important one for purposes of our discussion is
System.ComponentModel.IListSource, which contains an interesting method:

  • GetList: Returns an IList that can be bound to a data source from an object that does not implement an IList itself.

On DataTable, this returns an instance of DataView, which according to the documentation, “Represents a databindable, customized view of a DataTable for sorting, filtering, searching, editing, and navigation.” It turns out that a DataView is what is typically rendered by a grid. We can think of it as a a IList of DataRow objects (it’s actually a list of DataRowView, but that’s not terribly important here), which structurally, is much more similar to a list of boring C# objects.

There are a few other important interfaces that DataView implements other than IList (and ICollection and IEnumerable): ITypedList to provide schema information on the constituent rows (especially when the list is empty, because then we can’t ask the rows themselves) and IBindingList, which allows views to control sorting and filtering.

It turns out that if you implement a few of these key interfaces, you get all this magic for free, too.

The Quick How-To

  1. (Optional) Implement System.ComponentModel.IListSource on your “main” data class—you’d only need to do this if you didn’t want your main class to directly represent your “list” of data.
  2. Create a list class. It should provide implementation for a few interfaces:
    • System.ComponentModel.ITypedList.
      Your implementation of ITypedList.GetItemProperties(PropertyDescriptor[]) should return a PropertyDescriptorCollection that contains the “properties” of the rows of your data set. This interface is required when your list is empty so that a way of providing object metadata exists without resorting to querying the contents of the list.
    • System.Collections.IList.
      This is a hard requirement. Dealing with generic types generically is usually not straightforward and can sometimes require reflection; even to this day, a lot of code in WPF looks for (and makes use of) this interface.
    • System.Collections.Generics.IList<T>.
      I haven’t seen too much in the way of generic components requiring an implementation of this interface, but it’s a good idea to implement regardless—it’ll be easier to work with LINQ and other APIs that expect generic collections.
    • System.Collections.Specialized.INotifyCollectionChanged (optional).
      This is the same interface that classes like ObservableCollection<T> use to propagate their contents changes in WPF. (If you need Windows Forms support, you’ll have to instead implement…)
    • System.ComponentModel.IBindingList (optional).
      Implement this interface only if you need backwards compatibility with Windows Forms. It’s a bit of a hybrid interface, incorporating aspects of both ICollection and ICollectionView, but as a result of that, you’ll effectively limit clients to one view on your collection.
  3. (Optionally) Implement System.ComponentModel.ICustomTypeDescriptor on your “row” data class. I usually implement this by having each of my “rows” retain a reference to the “list”, and merely calling ITypedList.GetItemProperties(PropertyDescriptor[]) on the list. Implementing something once takes less time than implementing something twice.

    I like to implement ICustomTypeDescriptor on the row because it provides a context-insensitive way of retrieving properties of the row. If someone created a System.Collections.Generic.List<T> containing my “rows”, this list is still eligible for rendering in a grid without any additional work (except, of course, when the list is empty, because then there would be no way of providing schema information).

  4. Create a subclass of System.ComponentModel.PropertyDescriptor (read this if you don’t know what this class is for) for each property that you want your rows to have. You will want/need to implement a few important properties and methods:
    • public Type PropertyType { get; } (override required)
      The type of the property. If you want the type of values contained in the column to be completely wide open, then return typeof(object).
    • public TypeConverter Converter { get; } (override optionally)
      The converter that is used in conjunction with the property. For real properties, this would be defined through the usage of a [System.ComponentModel.TypeConverterAttribute] attribute on the property’s type or the property itself.
    • public Type ComponentType { get; } (override required)
      The type that defines the property. This is essentially your “row” class.
    • public object GetValue(object component); (override required)
      This is probably the most important method; this is the method that tells the system how to fetch a value from your row for the “property”.
    • public void SetValue(object component, object value); (override required)
      You can override this to set the value of this property on your row; otherwise, you can throw NotSupportedException if you want.
    • public bool IsReadOnly { get; } (override required)
      Return true or false, depending on whether or not the property should be publicly modifiable.
    • public bool CanResetValue(object component); (override required)
      When you’re using the Windows Forms Designer, this is the method that determines whether or not you can “reset” the value of the property to some default value. Other visualizers could make use of it too, but I haven’t seen too many that take advantage of it.
    • public void ResetValue(object component); (override required)
      This actually implements the resetting of a value as mentioned above. Either implement it or throw NotSupportedException.
    • public bool ShouldSerializeValue(object component); (override required)
      The Windows Forms Designer uses this method to determine whether or not a property’s value should be serialized. If a property was set to its default value, this method would return false because there would be no reason to actually persist the value of the property (since reapplying it wouldn’t result in a change).

The Quick Example

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
/// <summary>
/// The class that serves as the "entry point" for your data model.
/// </summary>
/// <remarks>
/// It's not always necessary to create a class like this, but it can be
/// helpful from a design standpoint. Here, like the DataTable, I expose
/// a collection of "columns" that client APIs can use for their own
/// purposes.
/// </remarks>
public class MyDataTable : IListSource
{
    private readonly MyDataRowCollection _dataRows;
    private readonly PropertyDescriptorCollection _properties;
 
    public MyDataTable(...)
    {
        _dataRows = new MyDataRowCollection(rows, columns);
        _properties = columns;
    }
 
    public MyDataRowCollection Rows { get { return _dataRows; } }
 
    public PropertyDescriptorCollection Columns { get { return _properties; } }
 
    /// <summary>
    /// Returns the underlying collection of elements for this source.
    /// </summary>
    /// <remarks>
    /// It is somewhat rare to see this implemented any other way than through
    /// an explicit implementation because you will generally want to
    /// expose a specific collection type to expose more advanced behavior.
 
    /// (We do the same here through the <see cref="Rows"/> property.
    /// </remarks>
    System.Collections.IList IListSource.GetList()
    {
        return _dataRows;
    }
 
    bool IListSource.ContainsListCollection
    {
        get { return false; }
    }
}
 
/// <summary>
/// The class that directly represents a collection of rows.
/// </summary>
/// <remarks>
/// You can either implement IList<T>/IList manually, or inherit from another
/// collection implementation. However, your particular use case will probably
/// only need or want to expose a subset of the functionality of those interfaces,
/// so it may be beneficial to implement the interfaces directly and hide some
/// collection methods through explicit implementation.
/// </remarks>
public class MyDataRowCollection
    : ITypedList, ReadOnlyCollection<MyDataRow>,
        IList<MyDataRow>, IList
{
    private readonly MyDataTable _parent;
 
    public MyDataRowCollection(MyDataTable parent, IList<MyDataRow> rows)
        : base(rows)
    {
        _parent = parent;
    }
 
    public MyDataTable ParentTable { get { return _parent; } }
 
    PropertyDescriptorCollection ITypedList.GetItemProperties(
            PropertyDescriptor[] listAccessors)
    {
        return _parent.Columns;
    }
 
    string ITypedList.GetListName(PropertyDescriptor[] listAccessors)
    {
        return null;
    }
}
 
/// <summary>
/// The class that represents the data in a "row".
/// </summary>
public class MyDataRow : ICustomTypeDescriptor
{
    private readonly MyDataRowCollection _parentCollection;
    private readonly object[] _rows;
 
    internal MyDataRow(MyDataRowCollection parentCollection)
    {
        _parentCollection = parentCollection;
    }
 
    public object this[string name]
    {
        get { return _rows[ColumnIndexFromName(name)]; }
        set { _rows[ColumnIndexFromName(name)] = value; }
    }
 
    private int ColumnIndexFromName(string name)
    {
        PropertyDescriptor pd = _parentCollection.ParentTable.Columns[name];
        if (pd != null)
        {
            return _parentCollection.ParentTable.Columns.IndexOf(pd);
        }
        return -1;
    }
 
    AttributeCollection ICustomTypeDescriptor.GetAttributes()
    {
        return AttributeCollection.Empty;
    }
 
    string ICustomTypeDescriptor.GetClassName()
    {
        return typeof(MyDataRow).FullName;
    }
 
    string ICustomTypeDescriptor.GetComponentName()
    {
        return typeof(MyDataRow).Name;
    }
 
    TypeConverter ICustomTypeDescriptor.GetConverter()
    {
        return null;
    }
 
    EventDescriptor ICustomTypeDescriptor.GetDefaultEvent()
    {
        return null;
    }
 
    PropertyDescriptor ICustomTypeDescriptor.GetDefaultProperty()
    {
        return null;
    }
 
    object ICustomTypeDescriptor.GetEditor(Type editorBaseType)
    {
        return null;
    }
 
    EventDescriptorCollection ICustomTypeDescriptor.GetEvents(Attribute[] attributes)
    {
        return EventDescriptorCollection.Empty;
    }
 
    EventDescriptorCollection ICustomTypeDescriptor.GetEvents()
    {
        return EventDescriptorCollection.Empty;
    }
 
    PropertyDescriptorCollection ICustomTypeDescriptor.GetProperties(Attribute[] attributes)
    {
        return ((ITypedList)_parentCollection).GetItemProperties(null);
    }
 
    PropertyDescriptorCollection ICustomTypeDescriptor.GetProperties()
    {
        return ((ITypedList)_parentCollection).GetItemProperties(null);
    }
 
    object ICustomTypeDescriptor.GetPropertyOwner(PropertyDescriptor pd)
    {
        return this;
    }
}
 
/// <summary>
/// The class that represents information about a "column".
/// </summary>
public class MyDataColumn : PropertyDescriptor
{
    private readonly Type _propertyType;
 
    public MyDataColumn(string name, Type type, Attribute[] attrs) : base(name, attrs)
    {
        _propertyType = type;
    }
 
    public override object GetValue(object component)
    {
        return ((MyDataRow)component)[this.Name];
    }
 
    public override void SetValue(object component, object value)
    {
        ((MyDataRow)component)[this.Name] = value;
    }
 
    public override bool IsReadOnly
    {
        get { return false; }
    }
 
    public override Type PropertyType
    {
        get { return _propertyType; }
    }
 
    public override Type ComponentType
    {
        get { return typeof(MyDataRow); }
    }
 
    public override void ResetValue(object component)
    {
        throw new NotSupportedException();
    }
 
    public override bool CanResetValue(object component)
    {
        return false;
    }
 
    public override bool ShouldSerializeValue(object component)
    {
        return false;
    }
}

Now Use It

There is a lot that got glossed over here, but it’s a very deep subject. It amazes me how few .NET developers know about and make use of this layer (either by consuming objects in this fashion or by publishing objects that conform to this API), especially because there is so much buy-in from third-party vendors and projects, and especially because this is one of the few components of the binding API that has not had to change much at all between Windows Forms and WPF.

If you are working on an infrastructure project of any kind that purports to expose data to a .NET GUI (Windows Forms or WPF), absolutely make use of this in order to keep both your view and your data model as flexible and independent as possible; you’ll be happy you did. —DKT