pools = {1 => [1,2,3], 2=> [3,4,5], 3=> [5,6,7]}
intersection = []
pools.each do |k,v|
not_k = pools.dup
not_k.delete_if{|nk,nv| nk == k }
inter << (v & not_k.values.flatten)
end
intersection.flatten.uniq
Refactorings
No refactoring yet !
Jeremy Weiskotten
July 31, 2008, July 31, 2008 21:59, permalink
Here are a couple of options, depending on which version of Ruby you're using. Unfortunately Array#count is not available in Ruby 1.8.6 (although you could add it to the class pretty easily if you wanted to).
# Ruby 1.8.6
pools = {1 => [1,2,3], 2 => [3,4,5], 3 => [5,6,7]}
values = pools.values.flatten
intersection = values.select { |v| values.select { |w| w == v }.size > 1 }.uniq
# Ruby 1.8.7+
pools = {1 => [1,2,3], 2 => [3,4,5], 3 => [5,6,7]}
values = pools.values.flatten
intersection = values.select { |v| values.count(v) > 1 }.uniq
Maciej Piechotka
July 31, 2008, July 31, 2008 22:03, permalink
May be something like that
pools = {1 => [1,2,3], 2=> [3,4,5], 3=> [5,6,7]}
rej = Hash.new {0}
pools.values.flatten.sort.reject! {|elem| rej[elem]++ != 1}
AviFlombaum
July 31, 2008, July 31, 2008 22:14, permalink
Nice! However, the issue with your method is on large data sets that select statement requires you to iterate over each element in the array which adds significant time to the task. My method above, while not as pretty and concise, save significant processing time. I guess I should have mentioned that performance is a huge issue on this one.
badcarl
July 31, 2008, July 31, 2008 23:06, permalink
require "rubygems"
require "facets/enumerable/probability"
pools = {1 => [1,2,3], 2=> [3,4,5], 3=> [5,6,7]}
intersection = pools.values.flatten.frequency.delete_if { |k, v| v == 1 }.keys
AviFlombaum
August 1, 2008, August 01, 2008 02:21, permalink
badcarl - thanks! That's awesome and fast - but in the end, I like my method best - requires no gems, is hella fast (uses less mem and runs faster on large sets) and I can pretty it up a bit. Thanks though, you both rock!
badcarl
August 1, 2008, August 01, 2008 04:06, permalink
Here is another way. Keep a set of numbers that have appeared once. Add things to the intersection if they've appeared already.
require "set"
pools = {1 => [1,2,3], 2=> [3,4,5], 3=> [5,6,7]}
intersection, appeared = Set.new, Set.new
pools.values.flatten.each do |value|
appeared.include?(value) ? intersection << value : appeared << value
end
Maciej Piechotka
August 1, 2008, August 01, 2008 07:03, permalink
The sort is unnecessary in my code
pools = {1 => [1,2,3], 2=> [3,4,5], 3=> [5,6,7]}
rej = Hash.new {0}
pools.values.flatten.reject! {|elem| rej[elem]++ != 1}
kmcd
August 1, 2008, August 01, 2008 10:08, permalink
Hi all,
This is refactoring is similar to badcarls set based suggestion. I haven't analysed the complexity of this solution yet. I'll do a performance test & dig into my old big O notes ... later.
class Hash
# Returns duplicate elements of values arrays. Usage:
# {1 => [1,2,3], 2 => [3,4,5], 3 => [5,6,7]}.intersection
# => [3,5]
def intersection
self.values.flatten.duplicates
end
end
class Array
# Returns an array of duplicate elements. Usage:
# [1,2,3,3].duplicates
# => [3]
def duplicates
dups, unique = [], []
self.each { |e| unique.member?(e) ? dups.push(e) : unique.push(e) }
dups
end
end
require 'test/unit'
class TestArrayDuplicates < Test::Unit::TestCase
def test_return_empty_with_all_uniq
assert_equal [1,2,3].duplicates, []
end
def test_return_duplicates_with_duplicate_elements
assert_equal [1,2,3,3].duplicates, [3]
end
end
class TestHashIntersection < Test::Unit::TestCase
# Using setup for clarity.
# assert_equal wont accept an unitialised hash
# ie assert_equal {}, {} gives a syntax error
def setup
@dups_hash = {1 => [1,2], 2 => [2,3], 3 => [3,4]}
@uniq_hash = {1 => [1,2,3], 2 => [4,5], 3 => [6,7]}
end
def test_return_empty_with_all_uniq
assert_equal @uniq_hash.intersection, []
end
def test_return_duplicates_with_duplicate_elements
assert_equal @dups_hash.intersection, [2,3]
end
end
rjspotter
August 6, 2008, August 06, 2008 22:01, permalink
I'm basicly golfing at this point but it was a fun exersize
pools = {1 => [1,2,3], 2=> [3,4,5], 3=> [5,6,7]}
pools.inject([]) {|m,x| m << x[1].select {|y| pools[x[0] + 1].include?(y) unless pools[x[0] + 1].nil?}}.flatten
#or
pools.inject([]) {|m,x| m << (x[1] & pools[x[0] + 1] unless pools[x[0] + 1].nil?)}.compact.flatten
mike
August 11, 2008, August 11, 2008 01:15, permalink
Read this thread and figured Enumerable#nonuniq from facets run on pools.values.flatten would probably be faster... it was!
irb(main):001:0> @pools = {1 => [1,2,3], 2=> [3,4,5], 3=> [5,6,7]}
=> {1=>[1, 2, 3], 2=>[3, 4, 5], 3=>[5, 6, 7]}
irb(main):002:0>
irb(main):003:0* def method1
irb(main):004:1> intersection = []
irb(main):005:1> @pools.each do |k,v|
irb(main):006:2* not_k = @pools.dup
irb(main):007:2> not_k.delete_if{|nk,nv| nk == k }
irb(main):008:2> intersection << (v & not_k.values.flatten)
irb(main):009:2> end
irb(main):010:1> intersection.flatten.uniq
irb(main):011:1> end
=> nil
irb(main):012:0>
irb(main):013:0* module Enumerable
irb(main):014:1> # from facets
irb(main):015:1* def nonuniq
irb(main):016:2> h1 = {}
irb(main):017:2> h2 = {}
irb(main):018:2> each {|i|
irb(main):019:3* h2[i] = true if h1[i]
irb(main):020:3> h1[i] = true
irb(main):021:3> }
irb(main):022:2> h2.keys
irb(main):023:2> end
irb(main):024:1> end
=> nil
irb(main):025:0>
irb(main):026:0* require 'rubygems'; require 'benchmark'
=> true
irb(main):027:0> Benchmark.bm do |x|
irb(main):028:1* x.report("method1") { 100_000.times { method1 }}
irb(main):029:1> x.report("method2") { 100_000.times { @pools.values.flatten.nonuniq }}
irb(main):030:1> end
user system total real
method1 4.070000 0.440000 4.510000 ( 4.573123)
method2 1.790000 0.250000 2.040000 ( 2.046597)
=> true
AviFlombaum
August 11, 2008, August 11, 2008 02:30, permalink
Can you benchmark off something like this to find all intersections?
{1 => [1..50000], 2=> [20000..40000], 3=> [30000..50000]}
leemic
August 13, 2008, August 13, 2008 02:36, permalink
Can you please clarify your example? This does not make sense how you are finding intersection. Do you mean that the element appears on all the arrays?
For example, you are given the following hash
{1 => [1,2,3], 2=> [3,4,5], 3=> [5,6,7]}
This means you have three different arrays
[ 1, 2, 3 ]
[ 3, 4, 5 ]
[ 5, 6, 7 ]
There is no value that exists in all three arrays.
Your answer is [ 3, 5 ]. This means 3 came from the intersection of the first and second arrays. 5 came from the intersection of the second and third arrays.
Let's do another example
{1 => [1,2,3], 2=> [3,4,5], 3=> [1, 5,6] }
The answer should be [ 1, 3, 5 ]. However, '1' came from the intersection between the first and third arrays.
Anyway, if you need the intersection of all three arrays - here is one liner
require 'set'
intersect = pools.values.collect{ |arr| Set.new(arr) }.inject{ |inter, set| inter.nil? ? set : inter & set }
AviFlombaum
August 13, 2008, August 13, 2008 05:43, permalink
I mean the element appears in at least one other array besides its own.
[ 1, 2, 3 ]
[ 3, 4, 5 ]
[ 5, 6, 7 ]
is [3,5], for the reason you pointed out.
In your example: {1 => [1,2,3], 2=> [3,4,5], 3=> [1, 5,6] }
[1,3,5] is correct. I'll test out your method but I was hoping to not require any gems. Does this make sense?
Find the intersection of arrays contained within a hash. So given {1 => [1,2,3], 2=> [3,4,5], 3=> [5,6,7]} return [3,5]