module Enumerable
def cluster
sort.inject([[]]) do |out, n|
if out.last.include?(n) or out.last.empty?
out.last << n
else
out << [n]
end
out
end
end
end
Refactorings
No refactoring yet !
Wolfbyte
July 18, 2008, July 18, 2008 04:27, permalink
How about this that does it iteratively. It has to go over the initial list 3 times so execution time would be 3n and it doesn't sort the output. It works in 3 steps.
The first phase gets a count of each item in the enumerable and stores it into a hash.
The second phase creates a new array. For each index in the original the hash is looked up to see how many of the items at this index were marked in phase 1 and produces an array of that many items in the new array. It then sets the count to 0 so that the rest of the elements in the original enumerable will be replaced with empty arrays when encountered. (Note that because of the way that map works these operations had to be reversed and a temporary variable introduced)
The third phase removes the empty lists from the new array.
module Enumerable
def cluster
counts = {}
counts.default = 0
self.each { |i| counts[i] += 1 }
self.map { |i| j, counts[i] = counts[i], 0; [i] * j }.reject { |i| i.empty? }
end
end
Joran Jessurun
July 18, 2008, July 18, 2008 08:16, permalink
This is not really different, but I think it is easyer to read.
module Enumerable
def cluster
result = []
sort.each do |value|
if result.last && result.last.last == value
result.last << value
else
result << [value]
end
end
result
end
end
Emmett
July 18, 2008, July 18, 2008 18:03, permalink
One way to think about this is a hash with counts; that's probably going to be most efficient.
Another way to think about it is simply with appends and finds.
# hash with counts
module Enumerable
def cluster
h = Hash.new(0)
each {|x| h[x] += 1}
h.map {|k, count| Array.new(count, k)}
end
end
# appends and finds
module Enumerable
def cluster
a = Array.new
each {|x|
if sub = a.find {|y| y.first == x}
sub << x
else
a << [x]
end
}
a
end
end
Emmett
July 18, 2008, July 18, 2008 18:05, permalink
Oops, I just realized I can write appends-n-finds (which is natural order preserving, btw) much more elegantly
# appends and finds
module Enumerable
def cluster
inject(Array.new) {|a, x|
if sub = a.find {|y| y.first == x}
sub << x
a
else
a << [x]
end
}
end
end
Joran Jessurun
July 18, 2008, July 18, 2008 19:07, permalink
What about this one, it is a variation on with hash and counts, but I build the result directly into the hash.
module Enumerable
def cluster
result = {}
each do |value|
result[value] ||= []
result[value] << value
end
result.values
end
end
Joran Jessurun
July 18, 2008, July 18, 2008 19:17, permalink
Exactly the same solution as above, but with less lines of code. But I don't like it much because it is too cryptic.
module Enumerable
def cluster
result = Hash.new { |hash,key| hash[key] = [] }
each { |value| result[value] << value }
result.values
end
end
Jason Dew
July 18, 2008, July 18, 2008 23:36, permalink
If you don't need the order preserved and a hash is sufficient... If you want to keep the ordering, replace with OrderedHash in ActiveSupport (or Ruby 1.9). cluster_arrayed an implementation that returns an array of arrays.
module Enumerable
def cluster
inject(Hash.new) do |all, one|
all[one] = (all[one] || 0) + 1
all
end
end
def cluster_arrayed
cluster.inject(Array.new) do |all, (key, count)|
all << [key] * count
end
end
end
Jeremy Weiskotten
July 22, 2008, July 22, 2008 02:04, permalink
This is a pretty clean one-liner. It doesn't perform as well, and I would expect degradation on larger arrays.
module Enumerable
def cluster
self.uniq.sort.map { |u| self.select { |v| u == v } }
end
end
Wolfbyte
July 22, 2008, July 22, 2008 03:05, permalink
@Jeremy - Be aware that uniq is not a method of Enumerable. This means that if you call {:a=>'a', :b=>'b'}.cluster you will get a failure. Be sure to define a method only for classes that it will be acceptable for. This means that if you are extending the Enumerable module you can only rely on the Enumerable methods. The best bet to doing this is to build a minimal class that includes Enumerable for testing.
Not really any different from my first go but this is a 2-liner. It's practically golf now though and I think the original is clearer
module Enumerable
def cluster
c=inject(Hash.new(0)){|h,i|h[i]+=1;h}
map{|i|j,c[i]=c[i],0;[i]*j}.reject{|i|i==[]}
end
end
Jeremy Weiskotten
July 22, 2008, July 22, 2008 15:16, permalink
@Wolfbyte, good catch. I suppose my method could be added to Array instead of Enumerable.
%w[app noot app mies zoo mies app zoo zo z].cluster
should result in:
[["app", "app", "app"], ["mies", "mies"], ["noot"], ["z"], ["zo"], ["zoo", "zoo"]]
Can this inject be written more naturally? I am not that good in fold and inject.
Also the result should not have to be sorted, maybe the natural order could be preserved somewhat?
What do you think?