/// <summary>
/// Compares the content of two enumerables for equality. Order
/// of elements does NOT matter. Elements may exist multiple
/// times. <c>null</c> is both a valid key and value.
///
/// O(n). Fast for elements with well distributed hashcode.
/// </summary>
public static bool ContentEquals<T> (
this IEnumerable<T> first,
IEnumerable<T> second,
IEqualityComparer<T> comparer)
{
// Declare a dictionary to count the occurence of the items in the collection
Dictionary<T, int> itemCounts = new Dictionary<T, int> (comparer);
// null references are stored here.
int zero = 0;
// Increase the count for each occurence of the item in the first collection
foreach (T item in first) {
if (item == null) {
zero++;
}
else if (itemCounts.ContainsKey (item)) {
itemCounts[item]++;
}
else {
itemCounts[item] = 1;
}
}
// Decrease the count for each occurence of the item in the second collection
foreach (T item in second) {
if (item == null) {
zero--;
}
else if (itemCounts.ContainsKey (item)) {
itemCounts[item]--;
}
else {
// There was no occurence of this item in the first collection, thus the collections are not equal
return false;
}
}
// The count of each item should be 0 if the contents of the collections are equal
foreach (int value in itemCounts.Values) {
if (value != 0) {
return false;
}
}
if (zero != 0) {
return false;
}
// The collections are equal
return true;
}
Refactorings
No refactoring yet !
Ants
June 22, 2009, June 22, 2009 09:16, permalink
This doesn't quite answer your question, but why just use something as simple as:
public static bool ContentEquals<T> (
this IEnumerable<T> first,
IEnumerable<T> second,
IEqualityComparer<T> comparer)
{
return !first.Except(second, comparer).MoveNext();
}
mafutrct
June 22, 2009, June 22, 2009 09:59, permalink
Ants: I think the complexity would increase to O(n*n) that way. Usage of a dictionary seems mandatory.
Ants
June 22, 2009, June 22, 2009 22:30, permalink
Using Reflector on System.Core, it looks like Enumerable.Except() creates an ExceptIterator that uses an internal Set class that implements a hashtable. The structure is very similar to your implementation above.
Of course, your implementation has the benefit of bailing out early since all you are looking for is boolean, while Except() has to build up the entire Set. :-)
mafutrct
June 25, 2009, June 25, 2009 08:45, permalink
That's interesting, the default extensions are really more sophisticated than I imagined. :)
I was just about to run a performance test of your suggestion when I noticed that "This method returns those elements in first that do not appear in second. It does not also return those elements in second that do not appear in first."
I guess this means we'd have to do the Except-comparision twice.... :(
Looking for a workaround, I stumbled upon Enumerable.Distinct, but that method does not work for multiple occurrences of the same item in the list :-\
petar.petrov.myopenid.com
June 26, 2009, June 26, 2009 07:51, permalink
I have some suggestions about your code.
The first thing to do is to constraint T to be a class, because (item == null) will be always false for value types.
After that you can optimize you code by counting the elements in the first sequence and in the second. If the second count becomes greater then the first one (you have two sequences with different length ) - they are different.
Place the cheep check (zero != 0) before iterating on itemCounts.
One last thing I think it will lead to a better performance is to use TryGetValue for the dictionary. Here's the code
:
if (itemCounts.ContainsKey(item))
{
itemCounts[item]++;
}
else
{
itemCounts[item] = 1;
}
which is compiled as
if (itemCounts.ContainsKey(item))
{
itemCounts[item] = itemCounts[item] + 1;
}
else
{
itemCounts[item] = 1;
}
You always perform the check for the item and setting the value(it's a part of the algorithm). However itemCounts[item]++ (itemCounts[item] = itemCounts[item] + 1) is not that innocent. The indexer get will perform FindKey() (the same as ContainsKey) to get the value or throw an exception if the key is not found. In total it FindKey is called twice ! By changing the code to
int count;
itemCounts.TryGetValue(item, out count);
itemCounts[item] = ++count;
This hidden problem is eliminated.
Here's my proposition
public static bool ContentEquals<T>(
this IEnumerable<T> first,
IEnumerable<T> second,
IEqualityComparer<T> comparer) where T : class
{
// Declare a dictionary to count the occurence of the items in the collection
Dictionary<T, int> itemCounts = new Dictionary<T, int>(comparer);
// null references are stored here.
int zero = 0;
var totalFirst = 0;
// Increase the count for each occurence of the item in the first collection
foreach (T item in first)
{
if (item == null)
{
zero++;
}
else
{
totalFirst++;
int count;
itemCounts.TryGetValue(item, out count);
itemCounts[item] = ++count;
}
}
var totalSecond = 0;
// Decrease the count for each occurence of the item in the second collection
foreach (T item in second)
{
if (item == null)
{
zero--;
}
else
{
totalSecond++;
if (totalSecond > totalFirst)
{
return false;
}
int count;
if (itemCounts.TryGetValue(item, out count))
{
itemCounts[item] = --count;
}
else
{
return false;
}
}
}
if (zero != 0)
{
return false;
}
// The count of each item should be 0 if the contents of the collections are equal
foreach (int value in itemCounts.Values)
{
if (value != 0)
{
return false;
}
}
// The collections are equal
return true;
}
mafutrct
July 6, 2009, July 06, 2009 13:11, permalink
Petar: I'm not sure about the 'where T : class' part. Doesn't this prevent usage of the function with IEnumerable<int>? Apart from that I really like your suggestions.
This is the code I'm currently using:
/// <summary>
/// Compares the content of two enumerables for equality. Order
/// of elements does NOT matter. Elements may exist multiple
/// times. <c>null</c> is both a valid key and value.
///
/// O(n). (?) Fast for elements with well distributed hashcode.
/// </summary>
public static bool ContentEquals<T> (
this IEnumerable<T> first,
IEnumerable<T> second,
IEqualityComparer<T> comparer)
{
// Declare a dictionary to count the occurrence of the items in the collection
Dictionary<T, int> itemCounts = new Dictionary<T, int> (comparer);
// null references are stored here.
int zero = 0;
// Element count
int items = 0;
// Increase the count for each occurrence of the item in the first collection
foreach (T item in first) {
if (item == null) {
zero++;
}
else {
items++;
int count = 0;
itemCounts.TryGetValue (item, out count);
itemCounts[item] = ++count;
}
}
// Decrease the count for each occurrence of the item in the second collection
foreach (T item in second) {
if (item == null) {
zero--;
}
else {
items--;
if (items < 0) {
// Mismatching number of elements
return false;
}
int count = 0;
if (itemCounts.TryGetValue (item, out count)) {
itemCounts[item] = --count;
}
else {
// There was no occurrence of this item in the first collection, thus the collections are not equal
return false;
}
}
}
// Number of elements must match
if (items != 0) {
return false;
}
// Number of nulls has to be equal
if (zero != 0) {
return false;
}
// The count of each item should be 0 if the contents of the collections are equal
foreach (int value in itemCounts.Values) {
if (value != 0) {
return false;
}
}
// The collections are equal
return true;
}
petar.petrov.myopenid.com
July 6, 2009, July 06, 2009 16:43, permalink
Yes, you can't use this on IEnumerable<int> but the idea was to eliminate the against null on value types - it's always false ( value types cannot be null ). You can remove the zero counter and the checks in the loops. Doing so you will eliminate first.Count + second.Count checks and some increments/decrements. I think you can get the best version by splitting the code in two private methods for value and reference types equality check. Of course if you want you can stay with your current version it's quite good.
public static bool ContentEquals<T>(
this IEnumerable<T> first,
IEnumerable<T> second,
IEqualityComparer<T> comparer)
{
var isValueType = typeof(T).IsValueType;
if (isValueType)
{
return ValueTypeContentEquals(comparer, first, second);
}
return ReferenceTypeContentEquals(comparer, first, second);
}
private static bool ReferenceTypeContentEquals<T>(IEqualityComparer<T> comparer, IEnumerable<T> first, IEnumerable<T> second)
{
throw new NotImplementedException();
}
private static bool ValueTypeContentEquals<T>(IEqualityComparer<T> comparer, IEnumerable<T> first, IEnumerable<T> second)
{
throw new NotImplementedException();
}
Fabio Maulo
November 17, 2009, November 17, 2009 13:35, permalink
return !first.Except(second).Concat(second.Except(first)).Any();
VS08 grants only a maintainability index of 49, but I found no way to improve this method. I'd be glad if you could point out some ideas.