Review of code please


#1

Hi all,

I’m hoping for some critique of the code below.
It works but is slow and probably not ‘the ruby way’


Anagrams

Purpose: find all anagrams in /usr/share/dict/words

for 4 letter words (length 5 because of \n)

path to the word file

filename = “/usr/share/dict/words”

check file exists and is readable

if File.file?(filename) && File.readable?(filename) then
dict_words = []
all_anagrams = {}

open(filename).each do |dict_word|
if dict_word.length == 5 then
dict_words << dict_word.chomp.strip
end
end

wrds = dict_words

dict_words.each do |wrd|
current_wrd = wrd.split(//).sort.join

anagrams = []

wrds.each do |w|
  if wrd != w then
    if not current_wrd == w.split(//).sort.join then
      next
    end
    anagrams.push(w)
  end
end

if not anagrams.empty? then
  all_anagrams[wrd] = anagrams
end

end

p all_anagrams

end

thanks,


#2

Well!!,

On Sat, 27 Jan 2007 23:38:30 +1100, Mark W. wrote:

  dict_words << dict_word.chomp.strip
wrds.each do |w|
end

end

p all_anagrams

end

thanks,

I just found a solution here:
http://www.joeygibson.com/blog/tech/ruby/index.html

that is exponentially faster than mine and finds all anagrams not just 4
letter words! I just need to understand it now.


words = IO.readlines("/usr/share/dict/words")

anagrams = Hash.new([])

words.each do |word|
base = Array.new
word.chomp!.downcase!

word.each_byte do |byte|
    base << byte.chr
end

base.sort!

anagrams[base.to_s] |= [word]

end

Find the anagrams by eliminating those with only one word

anagrams.reject! {|k, v| v.length == 1}

values = anagrams.values.sort do |a, b|
b.length <=> a.length
end

File.open(“anagrams.txt”, “w”) do |file|
values.each do |line|
file.puts(line.join(", "))
end
end

largest = anagrams.keys.max do |a, b|
a.length <=> b.length
end

puts “Total: #{anagrams.length}” #
puts “Largest set of anagrams: #{values[0].inspect}” #
print "Longest anagram: #{anagrams[largest].inspect} " #
puts “at #{largest.length} characters each”


#3

On Jan 27, 2007, at 8:25 AM, Mark W. wrote:

Purpose: find all anagrams in /usr/share/dict/words

open(filename).each do |dict_word|
anagrams = []
if not anagrams.empty? then

I just found a solution here:
http://www.joeygibson.com/blog/tech/ruby/index.html

that is exponentially faster than mine and finds all anagrams not
just 4
letter words! I just need to understand it now.


words = IO.readlines("/usr/share/dict/words")
Slurps in the entire dictionary as an array of lines (typically much
quicker than reading a line at a time even with buffering).

anagrams = Hash.new([])
Build a new Hash where the default value is an empty array

words.each do |word|
base = Array.new
word.chomp!.downcase!

word.each_byte do |byte|
    base << byte.chr
end

Get the individual bytes in the word (after having stripped any line
ending [chomp!] and standardizing the lettercase [downcase!])…

base.sort!

…and put them in a standard order (by value, the bytes are just
integers). This is essentially the same as you were doing with
current_wrd (in fact, it might be interesting to see how much this
change affects the overall speed).

anagrams[base.to_s] |= [word]

construct the string representation of the array of byte values to
use as the hash key, and then use the array union operator to include
the new word.

end

Find the anagrams by eliminating those with only one word

anagrams.reject! {|k, v| v.length == 1}

values = anagrams.values.sort do |a, b|
b.length <=> a.length
end
Largest sets first.

puts “Total: #{anagrams.length}” #
puts “Largest set of anagrams: #{values[0].inspect}” #
print "Longest anagram: #{anagrams[largest].inspect} " #
puts “at #{largest.length} characters each”


Mark

Does that make more sense to you? (I nearly ignored your message,
but your second post showed that you were really interested in
understanding, not just trying to get others to do your code review :wink:

-Rob

Rob B. http://agileconsultingllc.com
removed_email_address@domain.invalid


#4

Here’s a map-reduce-ish [1] solution.

I don’t say it’s better, or faster, or smaller (memory-wise).
It’s just, uh, different…

(Sorry, couldn’t resist, again… ;])

gegroet,
Erik V. - http://www.erikveen.dds.nl/

[1] http://labs.google.com/papers/mapreduce.html


WITHOUT THE TOTALS

File.open(“anagrams.txt”, “w”) do |f|
File.readlines("/usr/share/dict/words").collect do |word|
word.chomp.downcase
end.collect do |word|
[word.scan(/./).sort, word]
end.inject({}) do |hash, (base, word)|
(hash[base] ||= []) << word ; hash
end.values.reject do |set|
set.length == 1
end.collect do |set|
set.sort
end.sort_by do |set|
[-set.length, set]
end.collect do |set|
set.join("\t")
end.each do |line|
f.puts(line)
end
end


WITH THE TOTALS

sets =
File.readlines("/usr/share/dict/words").collect do |word|
word.chomp.downcase
end.collect do |word|
[word.scan(/./).sort, word]
end.inject({}) do |hash, (base, word)|
(hash[base] ||= []) << word ; hash
end.values.reject do |set|
set.length == 1
end.collect do |set|
set.sort
end.sort_by do |set|
[-set.length, set]
end

largest = sets[0]
longest = sets.inject{|a, b| a[0].length > b[0].length ? a : b}

File.open(“anagrams.txt”, “w”) do |f|
sets.each do |set|
f.puts(set.join("\t"))
end
end

puts “Total: %s” %
[sets.length]
puts “Largest set of anagrams: %s” %
[largest.inspect]
puts “Longest anagrams: %s at %s characters each” %
[longest.inspect, longest[0].length]


#5

Hi Rob,

On Sat, 27 Jan 2007 23:56:42 +0900, Rob B. wrote:

thanks,
Slurps in the entire dictionary as an array of lines (typically much
base << byte.chr
anagrams[base.to_s] |= [word]
b.length <=> a.length
a.length <=> b.length

Does that make more sense to you? (I nearly ignored your message,
but your second post showed that you were really interested in
understanding, not just trying to get others to do your code review :wink:

Excellent, thanks

Although my solution worked I knew it wasn’t ‘rubyish’ and also slow. I
then found the second solution and was proved right!! Erics’ solution
looks even more ‘adventurous’ so I’ll have to spend some time with the
Cookbook, Pickaxe, irb and ri to try and get a grasp of it.

I actually benchmarked the different ways to get the words from the dict
soon after finding the second solution and IO.readlines was much
quicker.

I thought this would be a good way to learn Ruby. ie find a simple
exercise, have a go and then get opinions on better ways to attacked
it. Like others here, rubyquiz is out of my reach ATM and I found this
‘kata(6)’ at the pragmatic programmers.

I’ve spent way too much time reading books and not enough time coding!

PS - your inline comments are very helpful.

-Rob

Rob B. http://agileconsultingllc.com
removed_email_address@domain.invalid

cheers,


#6

Hi Erik,

On Sun, 28 Jan 2007 03:30:21 +0900, Erik V. wrote:

[1] http://labs.google.com/papers/mapreduce.html
end.inject({}) do |hash, (base, word)|
f.puts(line)
end.collect do |word|

[sets.length]
puts “Largest set of anagrams: %s” %
[largest.inspect]
puts “Longest anagrams: %s at %s characters each” %
[longest.inspect, longest[0].length]


wow! now that’s going to take some understanding!
Thanks for the homework over the next few nights :wink:

cheers,


#7

On Jan 27, 12:30 pm, “Erik V.” removed_email_address@domain.invalid wrote:

[1]http://labs.google.com/papers/mapreduce.html
end.inject({}) do |hash, (base, word)|
f.puts(line)
end
end

Do you know what a one-to-one mapping is?
It’s senseless to use “collect”; use “map” instead.

open( ‘anagrams.txt’, ‘w’ ) { |file|
IO.readlines( ‘words’ ).inject({}){ |hash,word|
word.strip!.downcase!
(hash[word.split(//).sort.join] ||=[]) << word; hash }.
values.reject{|list| list.size < 2}.
map{|list| list.sort}.sort_by{|list| [ -list.size, list ] }.
each{|list| file.puts list.join( “\t” ) } }


#8

Do you know what a one-to-one mapping is? It’s senseless to
use “collect”; use “map” instead.

Do you know they’re exactly the same? It’s senseless to use
“map” instead of “collect”.

gegroet,
Erik V. - http://www.erikveen.dds.nl/


$ grep *.c -wie rb_define_method | grep -wie collect -wie map
array.c: rb_define_method(rb_cArray, “collect”, rb_ary_collect,
0);
array.c: rb_define_method(rb_cArray, “collect!”,
rb_ary_collect_bang, 0);
array.c: rb_define_method(rb_cArray, “map”, rb_ary_collect, 0);
array.c: rb_define_method(rb_cArray, “map!”, rb_ary_collect_bang,
0);
enum.c: rb_define_method(rb_mEnumerable,“collect”, enum_collect,
0);
enum.c: rb_define_method(rb_mEnumerable,“map”, enum_collect, 0);