What's the most ruby-ish way to write this python code?

dfg59 · May 9, 2007, 9:02pm

All -

I’ve been using ruby for quite some time and I’m only beginning to look
at python. I really like the way ruby flows as compared to python and I
can’t see myself getting pulled away from ruby. However, there is one
thing I like about python, and that is the generators. What is the best
way to translate the following code into ruby? Assume word is some
string and letters is a string of all the characters a…z:

[word[0:i]+c+word[i+1:] for i in range(len(word)) for c in letters]

This is all based on trying to efficiently rewrite
How to Write a Spelling Corrector in ruby. Essentially what this code
does is get the array of words resulting from inserting every character
at every possible position in the given word. I find it pretty succinct,
but I know ruby can do better! I’ve come up with two ways to do this in
ruby, but neither seems to “click” with me:

(0…word.size).inject([]) do |words,i|
letters.split(‘’).each do |c|
words << word[0…i]+c+word[i…-1]
end
words
end

OR

(0…words.size).map do |i|
letters.split(‘’).map do |c|
word[0…i]+c+word[i…-1]
end
end.flatten

Any advice? Currenty, I’m using the first approach and it’s sloooooow
(I’m assuming inject has high overhead).

dfg59 · May 9, 2007, 11:38pm

2007/5/9, Drew O. [email protected]:

end

Any advice? Currenty, I’m using the first approach and it’s sloooooow
(I’m assuming inject has high overhead).

–
Posted via http://www.ruby-forum.com/.

Both ways seem ruby-esque to me.

Note though that you should use word[i+1…-1] (not word[i…-1]) in
order to get the same result as your python code.

The fastest executing algorithm that I could find was:
ar = []
(0…word.size).each do |i|
letters.split(//).each do |c|
ar << word.dup
ar[-1][i] = c
end
end

Regards,
Raf

dfg59 · May 9, 2007, 11:55pm

Thanks for the response.

Note though that you should use word[i+1…-1] (not word[i…-1]) in
order to get the same result as your python code.

I noticed this just after I posted…good catch

The fastest executing algorithm that I could find was:
ar = []
(0…word.size).each do |i|
letters.split(//).each do |c|
ar << word.dup
ar[-1][i] = c
end
end

This was my suspicion as well. At least to my way of think, this seems
decidedly non-ruby-esque. How is it that map has more overhead than
duplicating the word i*c times?

Another question for the performance folks out there: will I see a
performance gain when I access the resulting words from this method if I
store them in a set rather than an array?

dfg59 · May 10, 2007, 10:35am

On 09.05.2007 21:02, Drew O. wrote:

end

Any advice? Currenty, I’m using the first approach and it’s sloooooow
(I’m assuming inject has high overhead).

Since you are splitting letters all the time, I’d start with pulling out
this from the loop. So you could do

LETTERS = “a”…“z”

Then, for efficiency reasons, I would not construct the whole thing in
memory but create an Enumerable class for it

WordGen = Struct.new(:word) do
include Enumerable

def each
word.size.times do |idx|
WordGen::LETTERS.each do |chr|
yield word[0…idx] << chr << word[idx…-1]
end
end
self
end

def to_a
map {|x|x}
end
end

WordGen::LETTERS = “a”…“z”

Note also, that you can use “<<” with substrings without affecting the
original string. “<<” is more efficient than “+”.

Now you can do

wg = WordGen.new “foobar”
wg.each {|w| puts w}
array = wg.to_a

etc.

Kind regards

robert

dfg59 · May 10, 2007, 8:21pm

On 5/10/07, Robert K. [email protected] wrote:

On 09.05.2007 21:02, Drew O. wrote:

Here’s another way to do it:

letters = (‘a’…‘z’).to_a
word = ‘word’

ans = []
(0…word.size).each do |i|
letters.each do |c|
ans << word.sub(/.{#{i}}/, “\0#{c}”)
end
end

print ans.join(“\n”)

or as an enumerator:

class Wordgen
include Enumerable

attr_accessor :word

Letters = (‘a’…‘z’).to_a

def initialize(word)
self.word = word
end

def each
(0…word.size).each do |i|
Letters.each do |c|
yield word.sub(/.{#{i}}/, “\0#{c}”)
end
end
end
end

Wordgen.new(“word”).each do |ans|
p ans
end

–
Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/

dfg59 · May 10, 2007, 8:13am

Well this was find diversion for the evening

Original python

[word[0:i]+c+word[i+1:] for i in range(len(word)) for c in letters]

2007/5/9, Drew O. [email protected]:

(0…word.size).inject([]) do |words,i|
letters.split(‘’).each do |c|
words << word[0…i]+c+word[i+1…-1]
end
words
end

On Thu, May 10, 2007 at 06:38:12AM +0900, Raf C. wrote:

The fastest executing algorithm that I could find was:
ar = []
(0…word.size).each do |i|
letters.split(//).each do |c|
ar << word.dup
ar[-1][i] = c
end
end

I managed to get a few on my machine that were as fast or faster than
what was given so far. One thing that would work in all of the above
code would be to precompute ‘letters’ into an array before entering any
of the loops. Since that’s a known nconstant we can generate that with:

LETTERS = ('a'..'z').to_a

and then use it in everywhere. Replacing the above in Raf and Drew’s
solution cut some good time off. I did play with using a Range instead
of an Array for LETTERS, but the array turned out to be more efficient.

The two solutions I came up with that were as fast or faster than what
was already give were:

speed up using concat instead of append (fastest I found)

word = “ruby”
ar = []
word.size.times do |x|
ar = ar.concat LETTERS.collect { |l| z = word.dup ; z[x] = l ; z
}
end
speed up with pre computing the result set, this one is very close to
the time of Raf’s using the LETTERS array, but slower than (1)

In this one we generate an array holding the original word and then a
parallel one holding the replacement letter for the first. We then
use integer math to calculate the appropriate index of the letter in
the word in the first array to replace with the letter from the
second array.

word = “ruby”
lsize = LETTERS.size
words = Array.new(word.size * lsize) { word.dup }
replacements = LETTERS * word.size
words.size.times { |i| words[i][i/lsize] = replacements[i] }

Both of these generate duplicates, and so I played with using a Set
instead of an Array, but in my testing using a Set was more expensive
than using an Array and then calling .uniq! on the result.

Of the two I think (1) is the more rubyesque of what I put together.

I’ve attached the benchmark script I was playing with to find out what
was the fastest method for folks to have fun with.

enjoy,

-jeremy

dfg59 · May 10, 2007, 8:30pm

Rick Denatale wrote:

On 5/10/07, Robert K. [email protected] wrote:

On 09.05.2007 21:02, Drew O. wrote:

Here’s another way to do it:

Thanks so much for all the responses, very insightful. If anyone wants
to take a crack at rewriting How to Write a Spelling Corrector in
ruby, go for it. I’ll post my current code later this evening (which is
dog slow) and then I’ll begin updating it based on the feedback I’ve
received.

Thanks again,
Drew

dfg59 · May 10, 2007, 9:22pm

Thanks so much for all the responses, very insightful. If anyone wants
to take a crack at rewritinghttp://norvig.com/spell-correct.htmlin
ruby, go for it.

Have a look at this:

http://groups.google.com/group/comp.lang.ruby/browse_thread/thread/58cf1942dcf95e3a/9dd918190db6e472?lnk=gst&q=norvig&rnum=1#9dd918190db6e472

dfg59 · May 13, 2007, 12:53am

Does anybody know how to unsubscribe from this … too many messages

dfg59 · September 25, 2007, 11:08pm

Have a look at this:

I’m sorry, I just noticed you participated in that thread. So, I
guess, don’t bother