Forum: Ruby Buiding anamgrams

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
733df2bec21a6aa4eaaa38ff8dac96b6?d=identicon&s=25 andrea (Guest)
on 2008-12-26 10:36
(Received via mailing list)
I'm studying ruby (and I really love it even more than python) and I
want to do a little exercise.

I want to create an efficient module to generate every possible
anagrams of a word, of any given size...

I already done this in python, I want to see how can it be elegant in
ruby...
I need a good permutation algorithm (this
http://www.cut-the-knot.org/Curriculum/Combinatori...
works fine) and I'd like to use enumerators in the best and most
efficient way.

I also looked at callcc which is great for backtracking, could be used
in this case??
thanks for any help

A basic structure like that could be fine?
class Anagrams
  attr :word
  include Enumerable
  include Comparable

  def <=>(other)
    self.word <=> other.word
  end

  def each(&block)
    @word.chars
  end

  def initialize(word)
    @word = word
  end

end
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2008-12-26 11:50
(Received via mailing list)
On 26.12.2008 10:30, andrea wrote:
> I'm studying ruby (and I really love it even more than python) and I
> want to do a little exercise.

Welcome to the pack!  What a nice Christmas occupation. :-)

> I want to create an efficient module to generate every possible
> anagrams of a word, of any given size...

Do you really mean "anagrams" or rather "permutations"?  To decide on
anagrams you likely need some kind of dictionary which is significantly
more difficult than just permuting.

> I already done this in python, I want to see how can it be elegant in
> ruby...
> I need a good permutation algorithm (this
> http://www.cut-the-knot.org/Curriculum/Combinatori...
> works fine) and I'd like to use enumerators in the best and most
> efficient way.

 From 1.8.7 on you can do this

irb(main):009:0> "abc".scan(/./).permutation {|x| p x.join}
"abc"
"acb"
"bac"
"bca"
"cab"
"cba"
=> ["a", "b", "c"]
irb(main):010:0> "abc".chars.to_a.permutation {|x| p x.join}
"abc"
"acb"
"bac"
"bca"
"cab"
"cba"
=> ["a", "b", "c"]
irb(main):011:0>

Now you only need a filter that properly decides which of those is a
proper word and use that to decide which words to print and which not.

> I also looked at callcc which is great for backtracking, could be used
> in this case??

I would try to avoid it as it makes code very hard to read - at least
for me. :-)

>   end
>
>   def each(&block)
>     @word.chars
>   end
>
>   def initialize(word)
>     @word = word
>   end
>
> end

What exactly should be the responsibility of this class?  Especially,
why do you call it "anagrams" but store only one word in it?

Cheers

  robert
A745f7d401d0a9bf80a5d5d94c961a05?d=identicon&s=25 unknown (Guest)
on 2008-12-26 12:22
(Received via mailing list)
That's it. In ruby 1.8.6 you cant use this form , only for 1.8.7 and
prior
...
I tested here and the code works perfectly.
About your class:
class Anagrams
>>   end
>>
>>   def initialize(word)
>>     @word = word
>>   end
>>
>> end
i really dont understant what it means. Explain better.
King regards
tiago




On Fri, 26 Dec 2008 19:50:00 +0900, Robert Klemme
D0338c0de4cb3c5c17300396159933d1?d=identicon&s=25 Axel Etzold (Guest)
on 2008-12-27 01:27
(Received via mailing list)
Dear Andrea,

> in this case??
Permutations become quite unwieldy very quickly ... maybe you'd have a
look
at Hidden Markov Models instead. There are several algorithms associated
to these models. I think what you'd want to do is to apply the "forward
algorithm"
to estimate how probable a given word is, given the probability of
letter x2 succeeding
letter x1 in the language you are trying to form anagrams in ( ...
download some novels
from project Gutenberg, split them into pairs of letters and count.....)
.

For the forward algorithm, look at the concrete example here:

http://en.wikipedia.org/wiki/Viterbi_algorithm

There once was a Ruby quiz about Markov chains to help you with Ruby:

http://www.rubyquiz.com/quiz74.html

To test whether a given word exists in language X, you could use raspell
in conjunction with
aspell:

http://blog.evanweaver.com/files/doc/fauna/raspell...

Best regards,

Axel
733df2bec21a6aa4eaaa38ff8dac96b6?d=identicon&s=25 andrea (Guest)
on 2008-12-27 10:01
(Received via mailing list)
Thanks a lot everyone, now I already posted but don't know why it
doesn't appear my message.
Anyway you're right Axel, this is the first "solution" to the problem
but it's really too slow for any reasonable large size of the string.

This is the code, any hint about style or whatever is welcome:
http://pastie.textmate.org/private/rp4ivw36vefuz07twjlw3g


I'll have a look to the Markov Chain (which I'm studying right now at
uni).
5d06917e13b29bcff1c1609492c06873?d=identicon&s=25 Dave Thomas (Guest)
on 2008-12-27 19:47
(Received via mailing list)
On Dec 27, 2008, at 2:59 AM, andrea wrote:

> Thanks a lot everyone, now I already posted but don't know why it
> doesn't appear my message.
> Anyway you're right Axel, this is the first "solution" to the problem
> but it's really too slow for any reasonable large size of the string.
>
> This is the code, any hint about style or whatever is welcome:
> http://pastie.textmate.org/private/rp4ivw36vefuz07twjlw3g

I think a faster approach, particularly to find multiple anagrams, is
to create a signature or hash with the property that each word in a
set of anagrams will have the same hash code. Finding anagrams then
becomes a problem of building lists of words with the hash hash code.

A convenient hash is simply to take all the letters in each work and
sort them, so that "dog" becomes "dgo" and "god" also becomes "dgo".
Because they have the same hash, they are anagrams.

Here's the code from the PickAxe[1] that uses this to find anagrams:

   http://pastie.textmate.org/private/yttznlgzcf9czdkc7vxea

and here are some simple tests:

   http://pastie.textmate.org/private/stv3sxaecc6z8uwxclldba

In fact, I use this as an example of Gem packaging, so you can
actually do

    gem install anagram

and get the whole thing.


Cheers


Dave

[1] http://pragprog.com/titles/ruby3
6e366eb5a71be2bad7f383d42aeb4788?d=identicon&s=25 Justin Collins (Guest)
on 2008-12-28 02:14
(Received via mailing list)
andrea wrote:
> uni).
>

Just for fun/reference, this is what I came up with after reading your
post earlier. Note that it doesn't do the "of any length" requirement,
though.


require 'rubygems'
require 'raspell'

def anagrams word
        checker = Aspell.new
        word.split(//).permutation.select { |w| checker.check w.join
}.map { |w| w.join }
end

puts anagrams "words"


-Justin
733df2bec21a6aa4eaaa38ff8dac96b6?d=identicon&s=25 andrea (Guest)
on 2009-01-02 09:15
(Received via mailing list)
Thanks to everyone for your very nice answers, now I wrote another
version of the program.

http://pastie.textmate.org/private/ryxlzlqle8txaorswiynea

Which implments all 3 methods discussed here, the nicest of course is
the third.

It creates signature only if really needed, otherwise it just loads
the dict with Marshall. (other ways?)
Is correct the use of md5?? I think it is as it gives the same result
of the openssl md5, but it looks a little bit too fast for such a big
file.
dict_md5 = Digest::MD5.hexdigest(open(@dict).read)
conf_md5 = open($CONF).read
# then they must be equal and I have the correct
# signatures for my dictionary
if dict_md5 == conf_md5
  puts "we already have the right signatures" if $DEBUG
  # loading the precomputed signature file
  @signatures = Marshal.load(open($SIGNATURE))
else

It's not bad at all, the only problem is that with a huge dictionary
it takes a long time to load it (around 38seconds for a almost 1-
million word dictionary) and takes a big amount of ram.

Could I maybe load it incrementally? Maybe divide the signs by size
and load just what I need to check?
Is it possible using only one file?


Thanks a lot
This topic is locked and can not be replied to.