s = "The green hat is the twin of the green hat" h = Hash.new(0) s.split.each do |w| h[w.downcase] += 1 end pp h
Refactorings
No refactoring yet !
Tj Holowaychuk
May 2, 2009, May 02, 2009 16:04, permalink
I would probably do something like below. Mind you I would never do that much processing without making it a routine, so I would probably go with Enumerable#frequency or something similar. PS. if you think there is a more ruby way of doing it, why would you use that as an interview task? :P
results = 'The green hat is the twin of the green hat'.downcase.split.inject(Hash.new(0)) do |counts, word|
counts[word] += 1
counts
end
p results
module Enumerable
def frequency
inject Hash.new(0) do |counts, word|
counts[word] += 1
counts
end
end
end
p 'The green hat is the twin of the green hat'.downcase.split.frequency
Tj Holowaychuk
May 2, 2009, May 02, 2009 16:10, permalink
Or more of a functional approach, which personally I think is nice and easy to read but its not a ruby idiom at all, and most rubyists rage when I do stuff like this haha
p 'The green hat is the twin of the green hat'.
downcase.split.
inject(Hash.new 0) { |c, w| c[w] += 1; c }
danielharan
May 3, 2009, May 03, 2009 00:29, permalink
That task occurred often enough in my daily work I decided to write a method for it
pp 'The green hat is the twin of the green hat'.downcase.split.to_freq_hash
# from http://github.com/danielharan/support/blob/14178d993a566d8d2e583c24591f4e381e49d327/lib/support/array_ext.rb
module Support
module ArrayExt
def to_freq_hash
inject(Hash.new {|hash,key| hash[key] = 0}) do |freq,item|
freq[item] += 1
freq
end
end
end
end
Tj Holowaychuk
May 3, 2009, May 03, 2009 01:43, permalink
Hash.new(0) !! lol
Also I find it incredibly redundant/useless to put methods into modules that are intended for a class, and then included in. Adding additional arbitrary modules to the inheritance chain seems pretty pointless (in most cases), sure you cant super back to them but if it does what it says its going to do you should not have to
Tj Holowaychuk
May 3, 2009, May 03, 2009 02:00, permalink
Heres what I ended up going with, extending String
##
# Return hash of word frequencies.
#
# === Examples
#
# 'foo foo bar'.word_frequency
# # => { 'foo' => 2, 'bar' => 1 }
#
def word_frequency
split.inject Hash.new(0) do |frequencies, word|
frequencies[word] += 1
frequencies
end
end
##
# Return frequency of _word_ or 0.
#
# === Examples
#
# 'foo foo bar'.frequency_of_word('foo') # => 2
# 'foo foo bar'.frequency_of_word('bar') # => 1
#
def frequency_of_word word
word_frequency[word]
end
Muke Tever
May 3, 2009, May 03, 2009 13:39, permalink
"something funky with map/reduce/something." -- Got it, boss.
# of course this is expensive
words = "The green hat is the twin of the green hat".downcase.split
hsh = Hash.new(0)
words.uniq.map{|word| hsh[word] = words.grep(word).size}
p hsh
Tj Holowaychuk
May 3, 2009, May 03, 2009 17:03, permalink
Interesting use of grep there! twice as slow though :P
Muke Tever
May 3, 2009, May 03, 2009 23:34, permalink
Twice as slow for this, and much, much slower working with larger strings! Oh my.
Marc-Andre
May 5, 2009, May 05, 2009 02:06, permalink
Daniel: The added advantage of Hash.new(0) is that simply checking a word won't create a useless entry. Say you get a frequency hash of a small text and you then run the words of the bible on that hash... you might not like the memory usage!
TJ: your Enumerable version is much more versatile and can be used to get the frequencies of other stuff than strings. While I'm nitpicking, isn't this a good opportunity to use the new 'returning'?
module Enumerable
def frequency
returning Hash.new(0) do |hash|
each{ |elem| hash[elem] +=1 }
end
end
end
Tj Holowaychuk
May 5, 2009, May 05, 2009 04:14, permalink
returning? what the! haha, 1.9 addition? i still havent even gotten around to installing 1.9 :(
Tj Holowaychuk
May 5, 2009, May 05, 2009 15:56, permalink
im realizing now this method is pretty dumb anyways haha, its basically just group_by, with a pass assigning length instead of actual values, but i guess word_frequency still reads better than split.downcase.group_by { self }
George Entenman
May 8, 2009, May 08, 2009 11:46, permalink
Seems like a one-liner to me.
s = "The green hat is the twin of the green hat"
p s.split.inject(Hash.new(0)) { |h,k| h[k.downcase] += 1; h }
George Entenman
May 8, 2009, May 08, 2009 11:48, permalink
Ah! I see that TJ's solution is pretty much the same thing, in a slightly different order. Nice.
MatÃas Flores
May 14, 2009, May 14, 2009 20:55, permalink
Just a different approach
words = 'The green hat is the twin of the green hat'.downcase.split
p Hash[ words.uniq.map { |w| [w, words.count(w)] } ]
andyjeffries
July 22, 2009, July 22, 2009 20:21, permalink
Tj wrote:
"PS. if you think there is a more ruby way of doing it, why would you use that as an interview task?"
Because it's more interesting to see what people can come up with. I like getting an insight in to their thought processes rather than just "the answer is X". Interviews aren't always about just getting the question right - it's more about getting a feel for how they work.
Anyway, thanks for the answers - very interesting...
This is a coding question I tend to ask people to do in interviews. I don't mind how it's done, it's not difficult, it's just to give them something relatively easy to code up.
However, I've posted my solution below and I'm sure there must be a more Ruby-like way of doing it - something funky with map/reduce/something.
The task is: given a string "The green hat is the twin of the green hat" convert it in to a list of words with counts for each word, e.g. the=3, green=2, hate=2, is=1, of=1, twin=1