5cb7a05b6de4a6655319db9333be429c

This is a coding question I tend to ask people to do in interviews. I don't mind how it's done, it's not difficult, it's just to give them something relatively easy to code up.

However, I've posted my solution below and I'm sure there must be a more Ruby-like way of doing it - something funky with map/reduce/something.

The task is: given a string "The green hat is the twin of the green hat" convert it in to a list of words with counts for each word, e.g. the=3, green=2, hate=2, is=1, of=1, twin=1

s = "The green hat is the twin of the green hat"
h = Hash.new(0)
s.split.each do |w|
  h[w.downcase] += 1
end
pp h

Refactorings

No refactoring yet !

F1e3ab214a976a39cfd713bc93deb10f

Tj Holowaychuk

May 2, 2009, May 02, 2009 16:04, permalink

No rating. Login to rate!

I would probably do something like below. Mind you I would never do that much processing without making it a routine, so I would probably go with Enumerable#frequency or something similar. PS. if you think there is a more ruby way of doing it, why would you use that as an interview task? :P

results = 'The green hat is the twin of the green hat'.downcase.split.inject(Hash.new(0)) do |counts, word|
  counts[word] += 1
  counts
end
p results

module Enumerable
  def frequency
    inject Hash.new(0) do |counts, word|
      counts[word] += 1
      counts
    end
  end
end
p 'The green hat is the twin of the green hat'.downcase.split.frequency
F1e3ab214a976a39cfd713bc93deb10f

Tj Holowaychuk

May 2, 2009, May 02, 2009 16:10, permalink

2 ratings. Login to rate!

Or more of a functional approach, which personally I think is nice and easy to read but its not a ruby idiom at all, and most rubyists rage when I do stuff like this haha

p 'The green hat is the twin of the green hat'.
      downcase.split.
        inject(Hash.new 0) { |c, w| c[w] += 1; c }
880cbab435f00197613c9cc2065b4f5a

danielharan

May 3, 2009, May 03, 2009 00:29, permalink

No rating. Login to rate!

That task occurred often enough in my daily work I decided to write a method for it

pp 'The green hat is the twin of the green hat'.downcase.split.to_freq_hash

# from http://github.com/danielharan/support/blob/14178d993a566d8d2e583c24591f4e381e49d327/lib/support/array_ext.rb
module Support
  module ArrayExt
    def to_freq_hash
      inject(Hash.new {|hash,key| hash[key] = 0}) do |freq,item|
        freq[item] += 1
        freq
      end
    end
  end
end
F1e3ab214a976a39cfd713bc93deb10f

Tj Holowaychuk

May 3, 2009, May 03, 2009 01:43, permalink

No rating. Login to rate!

Hash.new(0) !! lol
Also I find it incredibly redundant/useless to put methods into modules that are intended for a class, and then included in. Adding additional arbitrary modules to the inheritance chain seems pretty pointless (in most cases), sure you cant super back to them but if it does what it says its going to do you should not have to

F1e3ab214a976a39cfd713bc93deb10f

Tj Holowaychuk

May 3, 2009, May 03, 2009 02:00, permalink

1 rating. Login to rate!

Heres what I ended up going with, extending String

##
  # Return hash of word frequencies.
  #
  # === Examples
  #
  #   'foo foo bar'.word_frequency 
  #   # => { 'foo' => 2, 'bar' => 1 }
  #
  
  def word_frequency
    split.inject Hash.new(0) do |frequencies, word|
      frequencies[word] += 1
      frequencies
    end
  end
  
  ##
  # Return frequency of _word_ or 0.
  #
  # === Examples
  #
  #   'foo foo bar'.frequency_of_word('foo') # => 2
  #   'foo foo bar'.frequency_of_word('bar') # => 1
  #
  
  def frequency_of_word word
    word_frequency[word]
  end
F4192eeb4b26e96deab8b5c68926105d

Muke Tever

May 3, 2009, May 03, 2009 13:39, permalink

No rating. Login to rate!

"something funky with map/reduce/something." -- Got it, boss.

# of course this is expensive
words = "The green hat is the twin of the green hat".downcase.split
hsh = Hash.new(0)
words.uniq.map{|word| hsh[word] = words.grep(word).size}
p hsh
F1e3ab214a976a39cfd713bc93deb10f

Tj Holowaychuk

May 3, 2009, May 03, 2009 17:03, permalink

No rating. Login to rate!

Interesting use of grep there! twice as slow though :P

F4192eeb4b26e96deab8b5c68926105d

Muke Tever

May 3, 2009, May 03, 2009 23:34, permalink

No rating. Login to rate!

Twice as slow for this, and much, much slower working with larger strings! Oh my.

7f00244a6387413b37ee449f234ec045

Marc-Andre

May 5, 2009, May 05, 2009 02:06, permalink

1 rating. Login to rate!

Daniel: The added advantage of Hash.new(0) is that simply checking a word won't create a useless entry. Say you get a frequency hash of a small text and you then run the words of the bible on that hash... you might not like the memory usage!

TJ: your Enumerable version is much more versatile and can be used to get the frequencies of other stuff than strings. While I'm nitpicking, isn't this a good opportunity to use the new 'returning'?

module Enumerable
  def frequency
    returning Hash.new(0) do |hash|
      each{ |elem| hash[elem] +=1 }
    end
  end
end
F1e3ab214a976a39cfd713bc93deb10f

Tj Holowaychuk

May 5, 2009, May 05, 2009 04:14, permalink

No rating. Login to rate!

returning? what the! haha, 1.9 addition? i still havent even gotten around to installing 1.9 :(

F1e3ab214a976a39cfd713bc93deb10f

Tj Holowaychuk

May 5, 2009, May 05, 2009 04:18, permalink

No rating. Login to rate!

nvm, rails haha..

F1e3ab214a976a39cfd713bc93deb10f

Tj Holowaychuk

May 5, 2009, May 05, 2009 15:56, permalink

No rating. Login to rate!

im realizing now this method is pretty dumb anyways haha, its basically just group_by, with a pass assigning length instead of actual values, but i guess word_frequency still reads better than split.downcase.group_by { self }

1267296d813fbd42a4c75426e8362867

George Entenman

May 8, 2009, May 08, 2009 11:46, permalink

No rating. Login to rate!

Seems like a one-liner to me.

s = "The green hat is the twin of the green hat"
p s.split.inject(Hash.new(0)) { |h,k| h[k.downcase] += 1; h }
1267296d813fbd42a4c75426e8362867

George Entenman

May 8, 2009, May 08, 2009 11:48, permalink

No rating. Login to rate!

Ah! I see that TJ's solution is pretty much the same thing, in a slightly different order. Nice.

Aa938bf5bf7edb9f43bb662b4983b88d

Matías Flores

May 14, 2009, May 14, 2009 20:55, permalink

No rating. Login to rate!

Just a different approach

words = 'The green hat is the twin of the green hat'.downcase.split
p Hash[ words.uniq.map { |w| [w, words.count(w)] } ]
5cb7a05b6de4a6655319db9333be429c

andyjeffries

July 22, 2009, July 22, 2009 20:21, permalink

No rating. Login to rate!

Tj wrote:
"PS. if you think there is a more ruby way of doing it, why would you use that as an interview task?"

Because it's more interesting to see what people can come up with. I like getting an insight in to their thought processes rather than just "the answer is X". Interviews aren't always about just getting the question right - it's more about getting a feel for how they work.

Anyway, thanks for the answers - very interesting...

Your refactoring





Format Copy from initial code

or Cancel