Numbers Can Be Words (#133)

bbazzarrakk · August 3, 2007, 3:04pm

The three rules of Ruby Q.:

Please do not post any solutions or spoiler discussion for this quiz
until
48 hours have passed from the time on this message.
Support Ruby Q. by submitting ideas as often as you can:

http://www.rubyquiz.com/

Enjoy!

Suggestion: A [QUIZ] in the subject of emails about the problem helps
everyone
on Ruby T. follow the discussion. Please reply to the original quiz
message,
if you can.

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

by Morton G.

When working with hexadecimal numbers it is likely that you’ve noticed
some hex
numbers are also words. For example, ‘bad’ and ‘face’ are both English
words and
valid hex numbers (2989 and 64206, respectively, in decimal). I got to
thinking
that it would be interesting to find out how many and which hex numbers
were
also valid English words. Of course, almost immediately I started to
think of
generalizations. What about other bases? What about languages other than
English?

Your mission is to pick a word list in some language (it will have be
one that
uses roman letters) and write Ruby code to filter the list to extract
all the
words which are valid numbers in a given base. For many bases this isn’t
an
interesting task–for bases 2-10, the filter comes up empty; for bases
11-13,
the filter output is uninteresting (IMO); for bases approaching 36, the
filter
passes almost everything (also uninteresting IMO). However, for bases in
the
range from 14 to about 22, the results can be interesting and even
surprising,
especially if one constrains the filter to accept only words of some
length.

I used /usr/share/dict/words for my word list. Participants who don’t
have
that list on their system or want a different one can go to Kevin’s Word
List
Page (http://wordlist.sourceforge.net/) as a source of other word lists.

Some points you might want to consider: Do you want to omit short words
like ‘a’
and ‘ad’? (I made word length a parameter). Do you want to allow
capitalized
words (I prohibited them)? Do you want to restrict the bases allowed (I
didn’t)?

bbazzarrakk · August 3, 2007, 3:22pm

On Aug 3, 9:01 am, Ruby Q. [email protected] wrote:

When working with hexadecimal numbers it is likely that you’ve noticed some hex
numbers are also words. For example, ‘bad’ and ‘face’ are both English words and
valid hex numbers (2989 and 64206, respectively, in decimal). I got to thinking
that it would be interesting to find out how many and which hex numbers were
also valid English words. Of course, almost immediately I started to think of
generalizations. What about other bases? What about languages other than
English?

I’m not sure why this quiz is being phrased as “numbers that are
words”. Aren’t you just asking for a program that finds words that use
only the first n letters of the alphabet? Or am I missing something
obvious (tends to happen )?

Actually, one interesting variant that would tie it to numbers would
be if you could include digits that look like letters, i.e.:
0 → O
1 → I
2 → Z
5 → S
6 → G
8 → B

In this case, even numbers in base 10 could be words.

bbazzarrakk · August 3, 2007, 3:38pm

On Aug 3, 2007, at 8:20 AM, Karl von Laudermann wrote:

also valid English words. Of course, almost immediately I started
to think of
generalizations. What about other bases? What about languages
other than
English?

I’m not sure why this quiz is being phrased as “numbers that are
words”. Aren’t you just asking for a program that finds words that use
only the first n letters of the alphabet? Or am I missing something
obvious (tends to happen )?

That’s pretty much the quiz, yes. It’s not too hard to solve, but
the results are pretty interesting.

James Edward G. II

bbazzarrakk · August 3, 2007, 3:52pm

There is no end of numerological[0,1] variations that could be used by
anyone who feels the need for an additional challenge this week.

Regards,

Paul

[0] Numerology - Wikipedia
[1] Kabbalah - Wikipedia

bbazzarrakk · August 5, 2007, 5:47pm

2007/8/3, Ruby Q. [email protected]:

Your mission is to pick a word list in some language (it will have be one
that
uses roman letters) and write Ruby code to filter the list to extract all
the
words which are valid numbers in a given base.

Hi,

I have come up with this one-liner:

----------8<----------
puts File.readlines
(‘/usr/share/dict/words’).grep(/\A[a-#{((b=ARGV[0].to_i)-1).to_s(b)}]+\Z/)
---------->8----------

Example:
$ ruby ./rq133_numberscanbewords_rafc.rb 16
a
abed
accede
acceded
ace
aced
ad
add
added
b
baa
baaed
babe
bad
bade
be
bead
beaded
bed
bedded
bee
beef
beefed
c
cab
cabbed
cad
cede
ceded
d
dab
dabbed
dad
dead
deaf
deb
decade
decaf
deed
deeded
deface
defaced
e
ebb
ebbed
efface
effaced
f
fa
facade
face
faced
fad
fade
faded
fed
fee
feed

Regards,
R.

bbazzarrakk · August 5, 2007, 5:57pm

Here are some solutions to this quiz. The first solution deliberately
avoids using regular expressions. Note the use of next to skip over
words that are too short or capitalized and break to stop the
iteration when it gets into territory beyond where numbers of the
given base exist.


WORD_LIST = "/usr/share/dict/words"
WORDS = File.read(WORD_LIST).split
def number_words(base=16, min_letters=3)

result = []

WORDS.each do |w|

next if w.size < min_letters || (?A…?Z).include?(w[0])

break if w[0] > ?a + (base - 11)

result << w if w.to_i(base).to_s(base) == w

end

result

end

number_words(18, 5) # => ["abaca", "abaff", "accede", "achage", "adage", "added", "adead", "aface", "ahead", "bacaba", "bacach", "bacca", "baccae", "bache", "badge", "baggage", "bagged", "beach", "beached", "beachhead", "beaded", "bebed", "bedad", "bedded", "bedead", "bedeaf", "beech", "beedged", "beefhead", "beefheaded", "beehead", "beeheaded", "begad", "behead", "behedge", "cabbage", "cabbagehead", "cabda", "cache", "cadge", "caeca", "caffa", "caged", "chafe", "chaff", "chebec", "cheecha", "dabba", "dagaba", "dagga", "dahabeah", "deadhead", "debadge", "decad", "decade", "deedeed", "deface", "degged", "dhabb", "echea", "edged", "efface", "egghead", "facade", "faced", "faded", "fadge", "feedhead", "gabgab", "gadbee", "gadded", "gadge", "gaffe", "gagee", "geggee", "hache", "haggada", "hagged", "headache", "headed", "hedge"]

The second solution uses #inject rather than #each, but doesn’t seem
to be much if any of an improvement. I found it interesting because
it’s one of few times I’ve ever needed to pass an argument to break
and next.


WORD_LIST = "/usr/share/dict/words"
WORDS = File.read(WORD_LIST).split
def number_words(base=16, min_letters=3)

WORDS.inject([]) do |result, w|

next result if w.size < min_letters || (?A…?Z).include?(w[0])

break result if w[0] > ?a + (base - 11)

result << w if w.to_i(base).to_s(base) == w

result

end

end

number_words(20, 7) # => ["accidia", "accidie", "acidific", "babiche", "bacchiac", "bacchic", "bacchii", "badiaga", "baggage", "beached", "beachhead", "beedged", "beefhead", "beefheaded", "beehead", "beeheaded", "behedge", "bighead", "cabbage", "cabbagehead", "caddice", "caddiced", "caffeic", "cheecha", "cicadid", "dahabeah", "deadhead", "debadge", "debeige", "decadic", "decafid", "decided", "deedeed", "deicide", "diffide", "edifice", "egghead", "feedhead", "giffgaff", "haggada", "haggadic", "headache", "jibhead"]

In my third and last solution, I take the obvious route and use
regular expressions. Maybe regular expressions are better after all.


WORD_LIST = "/usr/share/dict/words"
WORDS = File.read(WORD_LIST).split
def number_words(base=16, min_letters=3)

biggest_digit = (?a + (base - 11))

regex = /\A[a-#{biggest_digit.chr}]+\z/

result = []

WORDS.each do |w|

next if w.size < min_letters || w =~ /^[A-Z]/

break if w[0] > biggest_digit

result << w if w =~ regex

end

result

end

The following are all the hex numbers in word list which have at
least three letters.

number_words # => ["aba", "abac", "abaca", "abaff", "abb", "abed", "acca", "accede", "ace", "adad", "add", "adda", "added", "ade", "adead", "aface", "affa", "baa", "baba", "babe", "bac", "bacaba", "bacca", "baccae", "bad", "bade", "bae", "baff", "bead", "beaded", "bebed", "bed", "bedad", "bedded", "bedead", "bedeaf", "bee", "beef", "cab", "caba", "cabda", "cad", "cade", "caeca", "caffa", "cede", "cee", "dab", "dabb", "dabba", "dace", "dad", "dada", "dade", "dae", "daff", "dead", "deaf", "deb", "decad", "decade", "dee", "deed", "deedeed", "deface", "ebb", "ecad", "edea", "efface", "facade", "face", "faced", "fad", "fade", "faded", "fae", "faff", "fed", "fee", "feed"]

Regards, Morton

bbazzarrakk · August 5, 2007, 7:16pm

“Ruby Q.” [email protected] wrote in message > The three
rules
of Ruby Q.:

11-13,
the filter output is uninteresting (IMO); for bases approaching 36, the
filter
passes almost everything (also uninteresting IMO). However, for bases in
the
range from 14 to about 22, the results can be interesting and even
surprising,
especially if one constrains the filter to accept only words of some
length.

Here are my 4 solutions (all use ?, so they will not work in 1.9)

solution #1 - Simple one-liner

p File.read(ARGV[0]).split(“\n”).reject{|w| w !~
%r"^[a-#{(?a-11+ARGV[1].to_i).chr}]+$"}.sort_by{|w| [w.length,w]} if
(?a…?z)===?a-11+ARGV[1].to_i

solution #2 - Non-hackery substs, like Olaf

p File.read(ARGV[0]).split(“\n”).reject{|w| w !~
%r"^[a-#{(?a-11+ARGV[1].to_i).chr}|lO]+$"}.sort_by{|w| [w.length,w]} if
(?a…?k)===?a-11+ARGV[1].to_i

solution #3 - c001 hackerz

p File.read(ARGV[0]).split(“\n”).reject{|w| w !~
%r"^[a-#{(?a-11+ARGV[1].to_i).chr}|lo]+$"i}.map{|w|
w.downcase.gsub(‘o’,‘0’).gsub(‘l’,‘1’)}.sort_by{|w| [w.length,w]} if
(?a…?k)===?a-11+ARGV[1].to_i

solution #4 - B16 5H0UT1N6 HACKER2

base=ARGV[1].to_i
base_=base+?a-11

raise “Bad base: [#{base}]” if base<1 || base_>?z

sub0=base_ < ?o
sub1=base>1 && base_ < ?l
sub2=base>2 && base_ < ?z
sub5=base>5 && base_ < ?s
sub6=base>6 && base_ < ?g
sub8=base>8 && base_ < ?b

reg=“^[”
reg<<‘O’ if sub0
reg<<‘I’ if sub1
reg<<‘Z’ if sub2
reg<<‘S’ if sub5
reg<<‘G’ if sub6
reg<<‘B’ if sub8
reg<<“|a-#{base_.chr}” if base>10
reg<<‘]+$’

result=File.read(ARGV[0]).split(“\n”).reject{|w| w !~
%r"#{reg}"i}.map{|w|
w.upcase}.sort_by{|w| [w.length,w]}
result.map!{|w| w.gsub(‘O’,‘0’)} if sub0
result.map!{|w| w.gsub(‘I’,‘1’)} if sub1
result.map!{|w| w.gsub(‘Z’,‘2’)} if sub2
result.map!{|w| w.gsub(‘S’,‘5’)} if sub5
result.map!{|w| w.gsub(‘G’,‘6’)} if sub6
result.map!{|w| w.gsub(‘B’,‘8’)} if sub8
result.reject!{|w| w !~ /[A-Z]/} # NUM8ER5-0NLY LIKE 61885 ARE N0T
READA8LE
p result

bbazzarrakk · August 5, 2007, 8:12pm

On Fri, 03 Aug 2007 13:46:16 +0000, Paul N. wrote:

There is no end of numerological[0,1] variations that could be used by
anyone who feels the need for an additional challenge this week.

Well then, along those lines I have a Hebrew gematria counter. Give it
words on the commandline, and it will tell you what the gematria is of
those words, and what the total gematria.

I use this to check when converting Hebrew citations of Jewish books
into
English for the benefit of those reading English newsgroups.

#!/usr/bin/env ruby
$KCODE = “u”
require “jcode”
require ‘generator’
class String
def u_reverse; split(//).reverse.join; end
end

â€LETTERVALUES=Hash.new(0).merge
â€ Hash[‘×’ => 1, ‘×‘’ => 2, ‘×’’ => 3, ‘×“’ => 4, ‘×”’ => 5,
â€ ‘×•’ => 6, ‘×–’ => 7, ‘×—’ => 8, ‘×˜’ => 9, ‘×™’ => 10, ‘×›’ => 20
â€ ‘×œ’ => 30, ‘×ž’ => 40, '× ’ => 50, ‘×¡’ => 60, ‘×¢’ => 70, ‘×¤’ => 80,
â€ ‘×¦’ => 90, ‘×§’ => 100, ‘×¨’ => 200, ‘×©’ => 300, ‘×ª’ => 400,
â€ ‘×’ => 40, ‘×š’ => 20 , ‘×Ÿ’ => 50, ‘×£’ => 80, ‘×¥’ => 90]
gematrias=ARGV.collect do |word|
word.split(//).inject(0) do |t,l|
t+LETTERVALUES[l]
end
end

SyncEnumerator.new(ARGV, gematrias).each do |word,value|
#reverse the word to print it RTL if all of the characters in it
#are hebrew letters

#note that this doesn’t find nikudot, but then we don’t care
#anyway because the terminal mangles nikudot – the result will be
#so mangled anyway that we don’t care whether it’s reversed
word=word.u_reverse if word.split(//)-LETTERVALUES.keys==[]
printf “%s %d\n”, word, value
end

printf “Total %d\n”, gematrias.inject {|t,l| t+l}

bbazzarrakk · August 5, 2007, 9:04pm

Here is my solution. I tried to make things easy to follow…

First, I create a regular expression to match all words in a number
base.
This method basically generates a regex matching single words consisting
of
letters in the base. Matching is case insensitive:

def get_regexp(base_num)

Get number of letters in the base

num_letters = base_num - 10
num_letters = 26 if num_letters > 26 # Cap at all letters in alphabet
return nil if num_letters < 1 # Nothing would match

Create a regular expression to match all letters in the base

end_c = (“z”[0] - (26 - num_letters)).chr # Move back from ‘z’ until
reach
last char in the base
regexp_str = “^([a-#{end_c}])+$” # Always starts at ‘a’
Regexp.new(regexp_str, “i”)
end

Next we have a “main” method to read file, base, and length parameters
from
the command line, and find all words. The code uses a boilerplate
read_words_from_file method to read the words:

if ARGV.size != 3
puts “Usage: words_as_numbers.rb word_file number_base
minimum_word_length”
else
word_file = ARGV[0]
base = ARGV[1].to_i
word_length = ARGV[2].to_i
regexp = get_regexp(base)

Find all words

if (regexp != nil)
for word in read_words_from_file(word_file)
if word.size >= word_length
puts word if regexp.match(word)
end
end
end
end

And here is a test run:

words_as_numbers.rb linux.words.txt 16 6
accede
acceded
beaded
bedded
beefed
decade
deeded
deface
facade
facaded

Its interesting that each subsequent base (11, 12, etc) contains all
words
in the previous iteration. It would be interesting to analyze the
frequency
of words found at each iteration, or create a visualization of the
process.
Anyway, here is a pastie of everything: Parked at Loopia

Thanks,

Justin

bbazzarrakk · August 6, 2007, 12:59am

On Aug 5, 2007, at 10:46 AM, Raf C. wrote:

I have come up with this one-liner:

----------8<----------
puts File.readlines
(’/usr/share/dict/words’).grep(/\A[a-#{((b=ARGV[0].to_i)-1).to_s
(b)}]+\Z/)
---------->8----------

I used a one-liner too:

ruby -sne ‘print if $_.downcase =~ /\A[\d\s#{(“a”…“z”).to_a.join[0…
($size.to_i - 10)]}]+\Z/’ – -size=12 /usr/share/dict/words

James Edward G. II

bbazzarrakk · August 5, 2007, 7:33pm

Just a simple regex, the rest is just option parsing.

#!/usr/bin/env ruby -wKU

require “optparse”

options = {
:base => 16,
:min_length => 1,
:word_file => “/usr/share/dict/words”,
:case_insensitive => false
}

ARGV.options do |opts|
opts.banner = “Usage: #{File.basename($PROGRAM_NAME)} [OPTIONS]”

opts.separator “”
opts.separator “Specific Options:”

opts.on( “-b”, “–base BASE”, Integer,
“Specify base (default #{options[:base]})” ) do |base|
options[:base] = base
end

opts.on( “-l”, “–min-word-length LENGTH”, Integer,
“Specify minimum length” ) do |length|
options[:min_length] = length
end

opts.on( “-w”, “–word-file FILE”,
“Specify word file”,
“(default #{options[:word_file]})” ) do |word_file|
options[:word_file] = word_file
end

opts.on( “-i”, “–ignore-case”,
“Ignore case distinctions in word file.” ) do |i|
options[:ignore_case] = true
end

opts.separator “Common Options:”

opts.on( “-h”, “–help”,
“Show this message.” ) do
puts opts
exit
end

begin
opts.parse!
rescue
puts opts
exit
end
end

last_letter = (options[:base] - 1).to_s(options[:base])
letters = (“a”…last_letter).to_a.join
exit if letters.size.zero?

criteria = Regexp.new("^[#{letters}]{#{options[:min_length]},}$",
options[:ignore_case])

open(options[:word_file]).each do |word|
puts word if word =~ criteria
end

bbazzarrakk · August 6, 2007, 8:49pm

Crude but effective.

Written in about 20minutes.

###################################################

@words = File.new(‘/usr/share/dict/words’).read.downcase.scan(/[a-z]
+/).uniq
@chars = ‘0123456789abcdefghijklmnopqrstuvwxyz’

def print_matches(base,minsize=0)

print "Base: " + base.to_s + “\n”

alphabet = @chars[0,base]

print "Alphabet: " + alphabet + “\n\nMatching Words:\n\n”

@words.each do |w|

 if w.length >= minsize
   hexword = true
   w.each_byte { |c|
     if !alphabet.include?(c.chr)
       hexword = false
       break
     end
   }
   p w if hexword
 end

end

print_matches 18,4

#################################################

Output:

Base: 18
Alphabet: 0123456789abcdefgh

Matching Words:

“ababdeh”
“abac”
“abaca”
“abaff”
“abba”
“abed”
“acca”
“accede”
“achage”
“ache”
“adad”
“adage”
“adda”
“added”
“adead”
“aface”
“affa”
“agade”
“agag”
“aged”
“agee”
“agha”
…

Douglas F Shearer

bbazzarrakk · November 29, 2010, 4:33pm

Great Post.I like the link.Now expecting some good ideas from your

upcoming post

http://www.dealsourcedirect.com/ion-tape2pc.html

bbazzarrakk · August 6, 2007, 8:24am

My solution.

robert