Forum: Ruby is this a good way to find anagrams?

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
wrong (Guest)
on 2005-12-08 07:55
(Received via mailing list)
it seems to me, with computers these days, this should finish
instantly, not take like 20 seconds.
also, please help me make my ruby more ruby-like. i'm new to ruby,
not that i know any other language.


##these are here from when i was first testing
secret_word = "spine"
rack = "spine"

secret_word = secret_word.split(//)
rack = rack.split(//)
test_rack = rack.to_s

##are there enough of the right letters in the rack to spell the word
from the dictionary?
##i think i'm going to use funny spanish names for my methods instead
of descriptive names.
def rodolfo(secret_word, rack)
secret_word.each do |x|
     rack = rack.to_s
     if rack.include? x
         rack = rack.sub(x, '')
         ##p rack, x
     else
         ##puts x, ' rack don\'t work'
         break
     end
end
end


puts "reading dictionary"
dict = IO.readlines('/usr/share/dict/words')

while rack
     puts "enter rack"
     rack = gets.chomp.split(//)

     dict.each do |secret_word|

         if rodolfo(secret_word.chomp.split(//), rack)
             puts secret_word.to_s
         end
     end
end
mrkode (Guest)
on 2005-12-08 09:33
(Received via mailing list)
umm... what's the dictionary for?

also, I think there's an easier way to do this. with a bit of spacing
and
stuff, I wrote up a 10 line solution. ;)
I also did the splitting string into array thing. Then there's two
instance
methods for Array, named "sort" and "==" respectively.

An anagram is when two words contain the exact same characters, right?
so if
the two splitted strings are the same when sorted, they should be
anagrams.
batkins57 (Guest)
on 2005-12-08 10:06
(Received via mailing list)
One thing you could try is inverting your algorithm.  My dict/words
file has 234,000+ entries in it.  Since you're looping through the dict
file and then calling rodolfo on each iteration, you have to pay the
cost of calling rodolfo 234,000 times, even though in general only a
couple of entries in the words file are relevant.

So what you could do is first find all the possible permuations of the
letters in your given word.  Then loop through the word file once.  If
a word matches one of the permutations in your list, remove it from the
list of words to be checked and push it onto a separate list.  This
should be significantly faster, since you calculate the combinations up
front, and then do one scan through the list.

You could also call out to a program like aspell or ispell to check
your words for you.  Those programs will do something at least as fast
as a binary search, which will take much, much  less time than scanning
through the whole thing  - even in the worst case, a 234,000 entry file
will be checked in 18 string comparisons, instead of 234,000.

hth,
Bill
m.fellinger (Guest)
on 2005-12-08 10:14
(Received via mailing list)
Hey travis,

you gave me a good time figuring this one out... after some thought i
came up
with something that takes roughly 1 second to search 9,6k words for a
anagram.
of course this will go up when you add more words (a task you still can
think
about)
all in all, 5 LoC... but i'm sure there are even better ways...

word = "easy".split(//).sort
anagrams = IO.readlines('/usr/share/dict/british-english').partition
{|l|
  l.strip!
  (l.size == word.size && l.split(//).sort == word)
}[0]
anagrams.each{|x| puts x}

##### gives you:
manveru@lambda:~/ruby$ time ruby anagramma.rb
ayes
easy
yeas

real    0m1.097s
user    0m0.936s
sys     0m0.105s


Am Donnerstag, 8. Dezember 2005 06:52 schrieb travis laduke:
batkins57 (Guest)
on 2005-12-08 11:07
(Received via mailing list)
I went ahead and implemented my own suggestions, and just to give you
an idea of how fast you can get this, my version can find all valid
anagrams of a five-letter word ("truck") given a 274,000 word
dictionary in 0.231 seconds....real-time.

:D
martindemello (Guest)
on 2005-12-08 11:44
(Received via mailing list)
removed_email_address@domain.invalid wrote:
>
> So what you could do is first find all the possible permuations of the
> letters in your given word.  Then loop through the word file once.  If
> a word matches one of the permutations in your list, remove it from the
> list of words to be checked and push it onto a separate list.  This
> should be significantly faster, since you calculate the combinations up
> front, and then do one scan through the list.

Even faster - sort the list of permutations, then do an incremental
search against the dictionary: binary search to find the first word in
the
list, then repeat, setting the lower bound of your search to the
position immediately after each match when you do the next one.
O(m!*log n) rather than O(m log m * n), which for small m and large n
can lead to a speedup. It might be even faster to search for the middle
word in your permutation list, then do that recursively with the left
and right sublists, passing in the upper and lower bounds as arguments.
All this assumes an in-memory dictionary with O(1) position indexing, of
course.

martin
chneukirchen (Guest)
on 2005-12-09 12:13
(Received via mailing list)
Michael F. <removed_email_address@domain.invalid> writes:

> anagrams = IO.readlines('/usr/share/dict/british-english').partition {|l|
>   l.strip!
>   (l.size == word.size && l.split(//).sort == word)
> }[0]
> anagrams.each{|x| puts x}

Better use #find_all, no?
ruby.brian (Guest)
on 2005-12-09 14:08
(Received via mailing list)
On 08/12/05, travis laduke <removed_email_address@domain.invalid> wrote:
> it seems to me, with computers these days, this should finish
> instantly, not take like 20 seconds.
> also, please help me make my ruby more ruby-like. i'm new to ruby,
> not that i know any other language.
>
> [snip]

if you want to find lots of anagrams against the same database it
would make sense to create an index. Make a hash that is indexed by
the sorted letters and let it point to a list of words that yield this
letter. After you have done this once, you can lookup anagrams in O(1)

cheers,

Brian

--
http://ruby.brian-schroeder.de/

Stringed instrument chords: http://chordlist.brian-schroeder.de/
ruby.brian (Guest)
on 2005-12-09 14:16
(Received via mailing list)
On 09/12/05, Brian Schröder <removed_email_address@domain.invalid> wrote:
> the sorted letters and let it point to a list of words that yield this
> letter. After you have done this once, you can lookup anagrams in O(1)
>
> cheers,
>
> Brian
>

Here comes a short implementation:

hash =
    File.read('/usr/share/dict/words').
    downcase.
    split(/\n/).
    inject(Hash.new { | h, k | h[k] = [] }) { | h, w |
h[w.split(//).sort] << w; h }


ARGV.each do | word |
  puts "Anagrams for #{word}: #{hash[word.split(//).sort].join(' ')}"
end

>>>>>
Anagrams for ruby: ruby bury ruby
Anagrams for duck: duck
Anagrams for type: type
Anagrams for truck: truck
Anagrams for brian: brain brian rabin brain
Anagrams for sort: rots sort tors
Anagrams for index: index nixed

Here are the timings:

The calculation of the index took 3.34749s

Lookup of 7 words took 0.000195s

And lookup of the words scales linearly, so for certain applications
(building an anagram-server e.g.) this may be the best possibility.

Cheers,

Brian

--
http://ruby.brian-schroeder.de/

Stringed instrument chords: http://chordlist.brian-schroeder.de/
djberg96 (Guest)
on 2005-12-10 18:54
(Received via mailing list)
removed_email_address@domain.invalid wrote:
> I went ahead and implemented my own suggestions, and just to give you
> an idea of how fast you can get this, my version can find all valid
> anagrams of a five-letter word ("truck") given a 274,000 word
> dictionary in 0.231 seconds....real-time.

I think there's an algorithm for finding anagrams using libgmp, for
which there are Ruby bindings (on the RAA).  I'll bet that's fast. :)

Dan
This topic is locked and can not be replied to.