Re: Shirt Reader (#140)

jameskelly · September 27, 2007, 5:14am

My solution’s algorithm is fairly simple: Take the words, take their
pronunciations in IPA or a suitable equivalent, and then find the word
with the pronunciation that requires the least amount of changes to turn
it into the concatenated pronunciations of the terms (I am indebted to
Steve D for the knowledge that this is called the “Levenshtein
Distance”).

ruby Rebus…rb (B+YOU+TEA+FULL)
BEAUTIFUL

I had plans for performing this on every combination of the synonyms of
the given terms, and acquired both a thesaurus and pronunciation
dictionary from http://www.dcs.shef.ac.uk/research/ilash/Moby/, but,
unfortunately, they both had quite a few more entries than they should
have. Apparently, “skim milk” is a synonym for “cheese.” Additionally,
according to the Moby Dictionary, “contretemps” is pronounced the way
it’s spelled, and “ember” is not a word, but “embers” is. To use the
thesaurus would be to make a program that already takes a minute or so
to run take a day instead. For those reasons, as well as the fact that
the dictionary contains a lot that it shouldn’t have (what’s a
“Fayme?”), and due to some unintuitive pronunciation differences, my
program doesn’t use a thesaurus, and gets a lot of things wrong.

ruby Rebus.rb (E+SCENT+SHELLS)
CENTRAL’S

(Yes, the unweighted Levensgtein distance really is lower than to
“ESSENTIALS” when using Moby Pronunciation Dictionary.)

ruby Rebus.rb (FAN+TASK+TICK)
FANTASKTIK

Note on my program’s input: only expressions within parentheses are
evaluated (allowing for multiple evaluation and leaving known terms
unchanged in a multi-rebus phrase that must be evaluated), everything
must be cnverted to all caps, and operators must be shown. The latter is
because I’ve implemented a crude interpreation of subtraction in
rebuses.

ruby Rebus.rb (LOVE-L)
OF

As can be seen, my program’s output is regretfully only mediocre.
Changes that would greatly improve output include some mechanism to
differentiate between using a letter’s name and phoneme (i.e.: “aitch”
versus the h-sound) (at present the dictionary just uses the name),
using a more restrictive dictionary and thesaurus, and modifying the
levenshtein_distance method to make changes between similar vowels and
consonants have a lower weight than changes between dissimilar vowels
and consonants.

But alas, the period for this quiz is ending.

Anyway, here’s my code:

#$thesaurus = File.open(“mobythes.aur”) do |f|

h = Hash.new

f.readlines.each do |line|

words=line.chomp.split(‘,’).map{|s|s.upcase}

h[words[0]] = words[1…-1]

end

h

#end

#This only works if the beginning comments are manually removed
$pronunciations = File.open(“cmudict0.3”) do |f|
NUMBER_OF_SYMBOLS = 39

h = Hash.new

lines = f.readlines
lines[0…NUMBER_OF_SYMBOLS].each do |line|
h[lines[0…(line =~ /[A-Z]/)]] = line.split[1…-1]
end
lines[NUMBER_OF_SYMBOLS…-1].each do |line|
words = line.split(/\s+/)
h[words.first] = words[1…-1]
end
h
end

$words = $pronunciations.keys

#See Levenshtein distance - Wikipedia
def levenshtein_distance(a, b)
prev_row = (0…b…length).to_a
cur_row = [0] * b.length
1.upto(a.length) do |i|
cur_row[0] = i
1.upto(b.length) do |j|
cost = (a[i-1] == b[j-1]) ? 0 : 1
cur_row[j] = [prev_row[j]+1,
cur_row[j-1]+1,
prev_row[j-1]+cost].min
end
prev_row = cur_row
cur_row = cur_row.dup
end
prev_row.last
end

#def all_synonym_combinations(word_arr)

if word_arr.size == 1

($thesaurus[word_arr.first] || [word_arr.first]).map{|w|[w]}

else

next_combs = all_synonym_combinations(word_arr[1…-1])

next_combs_length = next_combs.flatten.length

synonyms = ($thesaurus[word_arr[0]] || []) << word_arr[0]

synonyms = synonyms.inject([]){|a,w| a+=[[w]]*next_combs_length}

synonyms.flatten.zip(next_combs*synonyms.length

).map{|a|a.flatten}

end

#end

rebus = ARGV[0].chomp

expressions = rebus.scan(/(?:().+?(?:))/)

expressions.each do |expression|

ops = expression.scan(/[±]/)
terms = expression.gsub(/[()]/,“”).scan(/[^±]+/)
#all_synonym_combinations(terms).each do |terms|
pronunciation = $pronunciations[terms[0]]
terms[1…-1].each_with_index do |term,idx|
if ops[idx] == “+”
pronunciation += $pronunciations[term]
else
#Set difference is a very crude interpretation of subtraction in
rebuses
#(especially considering, in cmudict, a letter’s pronunciation
is that
# of its name, not its main phoneme)
pronunciation -= $pronunciations[term]
end
end
reps = $words.sort_by{|word|
levenshtein_distance($pronunciations[word],
pronunciation)}
rep = terms.index(reps[0]) ? reps[1] : reps[0]
rebus[expression] = rep
#end
end

puts rebus.gsub(“+”," ")

  ____________________________________________________________________________________

Luggage? GPS? Comic books?
Check out fitting gifts for grads at Yahoo! Search
http://search.yahoo.com/search?fr=oni_on_mail&p=graduation+gifts&cs=bz