Tech News, Tech Resources, Technology Articles, Gadget News, Computer News


Extending LINQ – Specifying a Property in the Distinct Function

October 29th, 2009

The “Distinct” function in LINQ is definitely one of the top 10 most used functions, but it’s probably one of the top 10 most incomplete as well. This article will show how to extend the IEnumerable interface to allow a very easy way to specify what property makes your list unique (or ‘distinct’).

If you are already familiar with the standard query operators in LINQ, then you have no doubt tried to do something like this:

// Make a unique list of customers from the big list!
var uniqueList = bigList.Distinct(item => item.ID);

// Do something with the unique customers…

The problem is, the “Distinct” function doesn’t let you specify a property by which you call your object ‘unique’. In other words, if you have a collection that has the same customer data in there twice… calling “Distinct” will not remove the duplicate. Note – this statement requires the following VERY IMPORTANT DISCLAIMER:

It’s highly important that you read what I just said above correctly. I said “the same customer *data*”… I did NOT say “the same customer *object*.” Why is that significant? – You will learn the answer to that question further in the article.

How Does LINQ Implement the Distinct Function

Before we can solve the problem explained above by extending LINQ, we have to understand why the method currently does not remove duplicates in all cases. To understand that, we have to understand how it is removing duplicates at all. – Hopefully I haven’t lost you yet :)

To explain in short – the Distinct function will iterate through an IEnumerable<T> one item at a time and perform the following pseudo-code:

  1. Get the hash code of the current item using the GetHashCode() method which *all* objects in the .NET Framework have.
  2. Check to see if this hash code has been seen before (by checking in a private dictionary hidden from our eyes).
  3. If the hash code is not already in the dictionary, then we simply add the current item to the dictionary and return the object.
  4. If that very same hash code has already been found, then we compare the current item with the last item with that hash to double check uniqueness and return the object if it is unique.

Here is what that code *could* look like (but it’s not exact):

var hashedItems = new Dictionary<int, T>();

foreach (T item in this.Items)
{
int currentHash = item.GetHashCode();

if (hashedItems.ContainsKey(currentHash) == false)
{
hashedItems.Add(currentHash, item);

// We’ve never seen this item before… return it!
yield return item;
}
else
{
if (item.Equals(hashedItems[currentHash]) == false)
{
// We thought we’ve seen this item, but guess not… return it!

yield return item;
}
}
}

By the way, did you catch the reason why two objects that represent the same data will not be de-duped? The reason is that step 1 is to use the “GetHashCode()” method to do a quick test. The here problem is that you may not consider these following objects to be different, but LINQ would:

var personOne = new Person(123, “Timothy”, “Khouri”);
var personTwo = new Person(123, “Timothy”, “Khouri”);

The reason why those two objects would be considered ‘different’ is because different instances of an object will have different hash codes, even if the ‘data’ is the same. Now that we understand enough of what LINQ is doing, we can extend it to meet our simple needs.

Extending IEnumerable<T>

So, how do we add our functionality to everything that LINQ is already extending? – Very easily, extend the only interface that LINQ is extending! As a side note, I recommend putting your code in the “System.Collections.Generic” namespace so that once you add a ‘using’ (or ‘Imports’ in VB) to that namespace, you’ll automatically tap into your extensions too.

// I feel like I’m working for Microsoft when I use this namespace
// will I get paid for this article? – Probably not :(

namespace System.Collections.Generic
{
public static class MyIEnumerableExtensions
{
public static IEnumerable<T> Distinct(this IEnumerable<T> source,
Func<T, object> uniqueCheckerMethod)
{
// Extension code here…

}
}
}

So, now that we have our method stub that extends any list (array or collection) by letting the developer specify how he considers an object to be unique, we need to fill in the contents of our function.

Because the “Distinct” LINQ extension has an overload that accepts an ‘IEqualityComparer’ object, we already have half of the code written for us! So, what we are going to do is create a generic ‘comparer’ class and use that to do our filtering. Here is what our extension will look like:

public static IEnumerable<T> Distinct(this IEnumerable<T> source,
Func<T, object> uniqueCheckerMethod)
{
return source.Distinct(new GenericComparer<T>(uniqueCheckerMethod));
}

And, our “GenericComparer<T>” class will be as follows:

public class GenericComparer<T> : IEqualityComparer<T>
{
public GenericComparer(Func<T, object> uniqueCheckerMethod)
{
this._uniqueCheckerMethod = uniqueCheckerMethod;
}

private Func<T, object> _uniqueCheckerMethod;

bool IEqualityComparer<T>.Equals(T x, T y)
{
return this._uniqueCheckerMethod(x).Equals(this._uniqueCheckerMethod(y));
}

int IEqualityComparer<T>.GetHashCode(T obj)
{
return this._uniqueCheckerMethod(obj).GetHashCode();
}
}

Now, I can filter the following list like so:

var people = new List<Person>();

people.Add(new Person(123, “Timothy”, “Khouri”));
people.Add(new Person(124, “Bob”, “Dole”));
people.Add(new Person(123, “Timothy”, “Khouri”));
people.Add(new Person(125, “Gill”, “Bates”));

foreach (var person in people.Distinct(p => p.ID))
{
// Only the 3 unique people will be displayed!

Console.WriteLine(person.FullName);
}

Download the Source File

While the source for this article is very simple (and is spread throughout the article), you may want to download it and test it out for yourself. This project (compiled for Visual Studio 2008 – .NET 3.5 SP1) has a simple Console app that demonstrates the code above. I encourage you to put some break-points in the code to see how it’s all working. Here’s the code: SingingEels_ExtendingLinqDistinct.zip

Source:singingeels.com

Share and Enjoy:
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google Bookmarks
  • BlinkList
  • Diigo
  • Fark
  • Faves
  • laaik.it
  • LinkedIn
  • Live
  • MisterWong
  • MySpace
  • Netvibes
  • Netvouz
  • NewsVine
  • Propeller
  • Reddit
  • Slashdot
  • Socialogs
  • StumbleUpon
  • Technorati
  • Twitter
  • Yahoo! Buzz

Comments


Author: Categories: Database Programming Tags: ,
Comments are closed.