8abd3736ee784dcd159d3b26b882076e

I'm looking for an optimal way to group similar ActiveRecord objects into an array of arrays. The method I have below works but seems highly inefficient since I have to make a full copy of the array, do the grouping, then run uniq to eliminate the duplicates. I know there must be a better way to handle this. I tried Enumerable#group_by but that returns a hash.

Thanks for your help.

grouped = [] 
arr = [#<Foo id: 1, status_id: 2, profile_id: 3>,#<Foo id: 2, status_id: 2, profile_id: 3>,#<Foo id: 3, status_id: 1, profile_id: 3>]
arr_copy = arr
arr.each do |a|
  grouped << arr_copy.select do |copy|
                       (copy.status_id == arr.status_id) &&
                       (copy.profile_id == arr.profile_id) &&
                     end
end
# necessary to eliminate dupes
final_group = grouped.uniq

# end result
# [[#<Foo id: 1, status_id: 2, profile_id: 3>,#<Foo id: 2, status_id: 2, profile_id: 3>],[#<Foo id: 3, status_id: 1, profile_id: 3>]]

Refactorings

No refactoring yet !

D85d44a0eca045f40e5a31449277c26c

Ben Marini

March 29, 2010, March 29, 2010 15:13, permalink

No rating. Login to rate!

#group_by does what you need, you just need to grab the values of the hash afterward

require 'rubygems'
require 'active_support'
Foo = Struct.new(:id, :status_id, :profile_id)
arr = [Foo.new(1,2,3),Foo.new(2,2,3),Foo.new(3,1,3)]
res = arr.group_by(&:status_id).values
p res
8abd3736ee784dcd159d3b26b882076e

toro04

March 30, 2010, March 30, 2010 18:12, permalink

No rating. Login to rate!

Ben, good point about just getting the values back from group_by. I need to group by both status_id and profile_id so I ended up doing the following. I did some benchmarking and was surprised to see that using the group_by method did not perform much better than my first post where I dup the array and then use uniq. Performance drops off significantly when there are lots of objects in the array too. Still not convinced I'm doing this optimally.

require 'rubygems'
require 'active_support'
Foo = Struct.new(:id, :status_id, :profile_id)
arr = [Foo.new(1,2,3),Foo.new(2,2,3),Foo.new(3,1,3),Foo.new(4,2,3)]
res = arr.group_by do |obj|
  arr.select {|f| f.status_id == obj.status_id && f.profile_id == obj.profile_id}
end.values

p res

# [[#<struct Foo id=1, status_id=2, profile_id=3>, #<struct Foo id=2, status_id=2, profile_id=3>, #<struct Foo id=4, status_id=2, profile_id=3>], [#<struct Foo id=3, status_id=1, profile_id=3>]]
D85d44a0eca045f40e5a31449277c26c

Ben Marini

March 31, 2010, March 31, 2010 00:27, permalink

1 rating. Login to rate!

You don't need to do a select inside the group_by. The way #group_by works is it groups based on the return value of the block you pass it. Try it this way and see how it performs:

res = arr.group_by { |f| [f.status_id, f.profile_id] }.values
8abd3736ee784dcd159d3b26b882076e

toro04

March 31, 2010, March 31, 2010 18:14, permalink

No rating. Login to rate!

Ben, thanks for showing me the correct way to use group_by! Below are the new Benchmark results comparing the two different methods. The benchmark was run with an array of 500 objects. What a major improvement! You made my day.

Thanks again!

Dup/Select/Uniq  11.380000   0.100000  11.480000 ( 11.517070)
Group_By          0.020000   0.000000   0.020000 (  0.014287)
----------------------------------------- total: 11.500000sec

Your refactoring





Format Copy from initial code

or Cancel