Forum: Ruby Review of code please

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Mark W. (Guest)
on 2007-01-27 14:40
(Received via mailing list)
Hi all,

I'm hoping for some critique of the code below.
It works but is slow and probably not 'the ruby way'

---------------------------------------------------------
# Anagrams
#
# Purpose: find all anagrams in /usr/share/dict/words
#          for 4 letter words (length 5 because of \n)

# path to the word file
filename = "/usr/share/dict/words"

# check file exists and is readable
if File.file?(filename) && File.readable?(filename) then
  dict_words = []
  all_anagrams = {}

  open(filename).each do |dict_word|
    if dict_word.length == 5 then
      dict_words << dict_word.chomp.strip
    end
  end

  wrds = dict_words

  dict_words.each do |wrd|
    current_wrd = wrd.split(//).sort.join

    anagrams = []

    wrds.each do |w|
      if wrd != w then
        if not current_wrd == w.split(//).sort.join then
          next
        end
        anagrams.push(w)
      end
    end

    if not anagrams.empty? then
      all_anagrams[wrd] = anagrams
    end
  end

  p all_anagrams

end
---------------------------------------------------------

thanks,
Mark W. (Guest)
on 2007-01-27 15:25
(Received via mailing list)
Well!!,

On Sat, 27 Jan 2007 23:38:30 +1100, Mark W. wrote:

>
>       dict_words << dict_word.chomp.strip
>     wrds.each do |w|
>     end
>   end
>
>   p all_anagrams
>
> end
> ---------------------------------------------------------
>
> thanks,

I just found a solution here:
http://www.joeygibson.com/blog/tech/ruby/index.html

that is exponentially faster than mine and finds all anagrams not just 4
letter words! I just need to understand it now.

---------------------------------------------------------
words = IO.readlines("/usr/share/dict/words")

anagrams = Hash.new([])

words.each do |word|
    base = Array.new
    word.chomp!.downcase!

    word.each_byte do |byte|
        base << byte.chr
    end

    base.sort!

    anagrams[base.to_s] |= [word]
end

# Find the anagrams by eliminating those with only one word
anagrams.reject! {|k, v| v.length == 1}

values = anagrams.values.sort do |a, b|
    b.length <=> a.length
end

File.open("anagrams.txt", "w") do |file|
    values.each do |line|
        file.puts(line.join(", "))
    end
end

largest = anagrams.keys.max do |a, b|
    a.length <=> b.length
end

puts "Total: #{anagrams.length}" #
puts "Largest set of anagrams: #{values[0].inspect}" #
print "Longest anagram: #{anagrams[largest].inspect} " #
puts "at #{largest.length} characters each"
Rob B. (Guest)
on 2007-01-27 16:57
(Received via mailing list)
On Jan 27, 2007, at 8:25 AM, Mark W. wrote:
>> # Purpose: find all anagrams in /usr/share/dict/words
>>   open(filename).each do |dict_word|
>>     anagrams = []
>>     if not anagrams.empty? then
>
> I just found a solution here:
> http://www.joeygibson.com/blog/tech/ruby/index.html
>
> that is exponentially faster than mine and finds all anagrams not
> just 4
> letter words! I just need to understand it now.
>
> ---------------------------------------------------------
> words = IO.readlines("/usr/share/dict/words")
Slurps in the entire dictionary as an array of lines (typically much
quicker than reading a line at a time even with buffering).
>
> anagrams = Hash.new([])
Build a new Hash where the default value is an empty array
>
> words.each do |word|
>     base = Array.new
>     word.chomp!.downcase!
>
>     word.each_byte do |byte|
>         base << byte.chr
>     end
Get the individual bytes in the word (after having stripped any line
ending [chomp!] and standardizing the lettercase [downcase!])...
>
>     base.sort!
...and put them in a standard order (by value, the bytes are just
integers).  This is essentially the same as you were doing with
current_wrd (in fact, it might be interesting to see how much this
change affects the overall speed).
>
>     anagrams[base.to_s] |= [word]
construct the string representation of the array of byte values to
use as the hash key, and then use the array union operator to include
the new word.

> end
>
> # Find the anagrams by eliminating those with only one word
> anagrams.reject! {|k, v| v.length == 1}
>
> values = anagrams.values.sort do |a, b|
>     b.length <=> a.length
> end
Largest sets first.
>
> puts "Total: #{anagrams.length}" #
> puts "Largest set of anagrams: #{values[0].inspect}" #
> print "Longest anagram: #{anagrams[largest].inspect} " #
> puts "at #{largest.length} characters each"
> ---------------------------------------------------------
>
> --
> Mark

Does that make more sense to you?  (I nearly ignored your message,
but your second post showed that you were really interested in
understanding, not just trying to get others to do your code review ;-)

-Rob

Rob B.    http://agileconsultingllc.com
removed_email_address@domain.invalid
Erik V. (Guest)
on 2007-01-27 20:31
(Received via mailing list)
Here's a map-reduce-ish [1] solution.

I don't say it's better, or faster, or smaller (memory-wise).
It's just, uh, different...

(Sorry, couldn't resist, again... ;])

gegroet,
Erik V. - http://www.erikveen.dds.nl/

 [1] http://labs.google.com/papers/mapreduce.html

----------------------------------------------------------------

 WITHOUT THE TOTALS

 File.open("anagrams.txt", "w") do |f|
   File.readlines("/usr/share/dict/words").collect do |word|
     word.chomp.downcase
   end.collect do |word|
     [word.scan(/./).sort, word]
   end.inject({}) do |hash, (base, word)|
     (hash[base] ||= []) << word ; hash
   end.values.reject do |set|
     set.length == 1
   end.collect do |set|
     set.sort
   end.sort_by do |set|
     [-set.length, set]
   end.collect do |set|
     set.join("\t")
   end.each do |line|
     f.puts(line)
   end
 end

----------------------------------------------------------------

 WITH THE TOTALS

 sets =
 File.readlines("/usr/share/dict/words").collect do |word|
   word.chomp.downcase
 end.collect do |word|
   [word.scan(/./).sort, word]
 end.inject({}) do |hash, (base, word)|
   (hash[base] ||= []) << word ; hash
 end.values.reject do |set|
   set.length == 1
 end.collect do |set|
   set.sort
 end.sort_by do |set|
   [-set.length, set]
 end

 largest = sets[0]
 longest = sets.inject{|a, b| a[0].length > b[0].length ? a : b}

 File.open("anagrams.txt", "w") do |f|
   sets.each do |set|
     f.puts(set.join("\t"))
   end
 end

 puts "Total: %s"                                        %
[sets.length]
 puts "Largest set of anagrams: %s"                      %
[largest.inspect]
 puts "Longest anagrams: %s at %s characters each"       %
[longest.inspect, longest[0].length]
Mark W. (Guest)
on 2007-01-28 01:20
(Received via mailing list)
Hi Rob,

On Sat, 27 Jan 2007 23:56:42 +0900, Rob B. wrote:

>>> #
>>>
>>>
>>>
>>> thanks,
> Slurps in the entire dictionary as an array of lines (typically much
>>         base << byte.chr
>>     anagrams[base.to_s] |= [word]
>>     b.length <=> a.length
>>     a.length <=> b.length
>
> Does that make more sense to you?  (I nearly ignored your message,
> but your second post showed that you were really interested in
> understanding, not just trying to get others to do your code review ;-)

Excellent, thanks

Although my solution worked I knew it wasn't 'rubyish' and also slow. I
then found the second solution and was proved right!! Erics' solution
looks even more 'adventurous' so I'll have to spend some time with the
Cookbook, Pickaxe, irb and ri to try and get a grasp of it.

I actually benchmarked the different ways to get the words from the dict
soon after finding the second solution and IO.readlines was much
quicker.

I thought this would be a good way to learn Ruby. ie find a simple
exercise, have a go and then get opinions on better ways to attacked
it. Like others here, rubyquiz is out of my reach ATM and I found this
'kata(6)' at the pragmatic programmers.

I've spent way too much time reading books and not enough time coding!

PS - your inline comments are very helpful.

>
> -Rob
>
> Rob B.    http://agileconsultingllc.com
> removed_email_address@domain.invalid


cheers,
Mark W. (Guest)
on 2007-01-28 01:25
(Received via mailing list)
Hi Erik,

On Sun, 28 Jan 2007 03:30:21 +0900, Erik V. wrote:

>  [1] http://labs.google.com/papers/mapreduce.html
>    end.inject({}) do |hash, (base, word)|
>      f.puts(line)
>  end.collect do |word|
>
> [sets.length]
>  puts "Largest set of anagrams: %s"                      %
> [largest.inspect]
>  puts "Longest anagrams: %s at %s characters each"       %
> [longest.inspect, longest[0].length]
>
> ----------------------------------------------------------------

wow! now that's going to take some understanding!
Thanks for the homework over the next few nights ;-)


cheers,
William J. (Guest)
on 2007-01-28 04:11
(Received via mailing list)
On Jan 27, 12:30 pm, "Erik V." <removed_email_address@domain.invalid> wrote:
>  [1]http://labs.google.com/papers/mapreduce.html
>    end.inject({}) do |hash, (base, word)|
>      f.puts(line)
>    end
>  end

Do you know what a one-to-one mapping is?
It's senseless to use "collect"; use "map" instead.

open( 'anagrams.txt', 'w' ) { |file|
  IO.readlines( 'words' ).inject({}){ |hash,word|
  word.strip!.downcase!
  (hash[word.split(//).sort.join] ||=[]) << word; hash }.
  values.reject{|list| list.size < 2}.
  map{|list| list.sort}.sort_by{|list| [ -list.size, list ] }.
  each{|list| file.puts list.join( "\t" ) } }
Erik V. (Guest)
on 2007-01-28 12:37
(Received via mailing list)
> Do you know what a one-to-one mapping is? It's senseless to
> use "collect"; use "map" instead.

Do you know they're exactly the same? It's senseless to use
"map" instead of "collect".

gegroet,
Erik V. - http://www.erikveen.dds.nl/

----------------------------------------------------------------

 $ grep *.c -wie rb_define_method | grep -wie collect -wie map
 array.c:    rb_define_method(rb_cArray, "collect", rb_ary_collect,
0);
 array.c:    rb_define_method(rb_cArray, "collect!",
rb_ary_collect_bang, 0);
 array.c:    rb_define_method(rb_cArray, "map", rb_ary_collect, 0);
 array.c:    rb_define_method(rb_cArray, "map!", rb_ary_collect_bang,
0);
 enum.c:    rb_define_method(rb_mEnumerable,"collect", enum_collect,
0);
 enum.c:    rb_define_method(rb_mEnumerable,"map", enum_collect, 0);
This topic is locked and can not be replied to.