“Eugene K.” [email protected] wrote in message
news:[email protected]…
Here is my solution. Obviously it is not as “well educated” as Steve’s one
because it does not use pronunciation dictionary, but it behaves pretty
well with what it has 
One more limit of a good education - sometimes it is limited to an
education
source. Unfortunately I did not have enough patience to test it on all
new
expectations, even single example requires too much time (without ruby
inline), but I know that it will not get at least some part of a new
set:
expectations.rb
$expectations = {
%w[pro 1 sal] => ‘provencal’,
%w[2 thing] => ‘toothing’,
%w[r b trash] => ‘arbitrage’,
%w[mon 3 l] => ‘montreal’,
%w[3 men dose] => ‘tremendous’,
%w[mid wind r] => ‘midwinter’,
%w[yes tier knight] => ‘yesternight’,
%w[mar well s]=>‘marvelous’,
%w[vert x]=>‘vertex’,
%w[ban l eyes] => ‘banalize’,
%w[harm n eyes] => ‘harmonize’,
%w[harm o niece east] => ‘harmonicist’,
%w[knight in gale] => ‘nightingale’,
%w[knee hill east] => ‘nihilist’,
%w[mass car pone a] => ‘mascarpone’,
%w[cock knee] => ‘cockney’,
}
################################################
And two more variations for my solution.
First - compromise with coarsened hash, balancing decent matching with
decent performance. This one is probably the best I coud get.
Second one - full scan. Allows more concise code and slightly better
matching in cost of performance (but still several time faster than
Steve’s
without inline).
Variant 1
require ‘rubygems’
require ‘text’
include Text::Metaphone
include Text::Levenshtein
load ‘expectations.rb’
subs={‘1’=>‘wan’,‘2’=>‘to’,‘3’=>‘tre’,‘4’=>‘for’,‘5’=>‘five’,‘6’=>‘six’,‘7’=>‘seven’,‘8’=>‘ate’,‘9’=>‘nine’,‘10’=>‘ten’,
‘c’=>‘see’,‘h’=>‘eich’,‘j’=>‘jey’,‘k’=>‘key’,‘q’=>‘que’,‘r’=>‘ar’}
subsy={}
%w[b c d g p t v z].each {|l| subsy[l]=l+‘y’}
%w[b c d g p t v z].each {|l| subs[l]=l+‘ee’}
%w[f l m n s x].each{|l| subs[l]=‘e’+l}
def metadist(str1,str2)
2*distance(metaphone(str1),metaphone(str2))+
distance(str1,str2)
end
def short_double_metaphone(word)
m1,m2=double_metaphone(word)
[m1[0,2],m2 ? m2[0,2] : nil]
end
hash=Hash.new{|h,k|h[k]=[]}
File.open("/usr/share/dict/words") {|f| f.readlines}.each do |w|
word=w.downcase.delete("^a-z")
m1,m2=short_double_metaphone(word)
hash[m1]<<word
hash[m2]<<word if m2
end
$expectations.values.each { |word|
m1,m2=short_double_metaphone(word)
hash[m1]<<word
hash[m2]<<word if m2
}
hash.each_key{|k| hash[k].uniq!}
inputs=[]
if (ARGV.empty?)
inputs=$expectations.keys
else
inputs << ARGV
end
inputs.each { |rebus|
y_ed=rebus[0…-2]<<(subsy[rebus[-1]] || rebus[-1])
word=y_ed.map{|w| subs[w] || w }.join.downcase.gsub(/[^a-z0-9]/,’’)
m1,m2=short_double_metaphone(word)
results=hash[m1]
results+=hash[m2] if m2 && m2!=m1
res=results.uniq.sort_by{|a| [metadist(word,a),a.length]}.first(5)
print “’#{rebus.join(’ ‘)}’ => #{res[0]}”
expected=$expectations[rebus]
print “, expected ‘#{expected}’ is at position #{res.index(expected)}”
if
expected
puts
}
################################################
Variant 2
require ‘rubygems’
require ‘text’
include Text::Metaphone
include Text::Levenshtein
load ‘expectations.rb’
subs={‘1’=>‘won’,‘2’=>‘to’,‘3’=>‘tre’,‘4’=>‘for’,‘5’=>‘five’,‘6’=>‘six’,‘7’=>‘seven’,‘8’=>‘ate’,‘9’=>‘nine’,‘10’=>‘ten’,
‘c’=>‘see’,‘h’=>‘eich’,‘j’=>‘jey’,‘k’=>‘key’,‘q’=>‘que’,‘r’=>‘ar’}
subsy={}
%w[b c d g p t v z].each {|l| subsy[l]=l+‘y’}
%w[b c d g p t v z].each {|l| subs[l]=l+‘ee’}
%w[f l m n s x].each{|l| subs[l]=‘e’+l}
def metadist(str1,str2)
2*distance(metaphone(str1),metaphone(str2))+
distance(str1,str2)
end
words = (File.open("/usr/share/dict/words") {|f| f.readlines}.map{|word|
word.downcase.delete("^a-z")}+$expectations.values).uniq
inputs=[]
if (ARGV.empty?)
inputs=$expectations.keys
else
inputs << ARGV
end
inputs.each { |rebus|
y_ed=rebus[0…-2]<<(subsy[rebus[-1]] || rebus[-1])
word=y_ed.map{|w| subs[w] || w }.join.downcase.gsub(/[^a-z0-9]/,’’)
res=words.sort_by{ |a| [metadist(word,a),a.length] }.first(5)
print “’#{rebus.join(’ ‘)}’ => #{res[0]}”
expected=$expectations[rebus]
print “, expected ‘#{expected}’ is at position #{res.index(expected)}”
if
expected
puts
}
################################################