Randomly generated words from a set of letters

Hi,

I have an interesting problem to solve and thought I’d ask for some
input here. We’re developing an application and one part of it is
building a random combination for this lock…

http://wordlock.com/_pdf/Wordlock-Bikelock-Instructions.pdf

So there are four columns, each contains a specific set of letters.

That PDF doc contains a pretty extensive list of pre-defined English
words, but it is not a list of ALL possible words. So I can think of a
couple options…

  1. Manually add the pre-defined word set to a DB table and randomly pull
    a word out as needed.

  2. Each time I need a word, build a random word from the set of letters
    and compare to a dictionary somewhere to make sure I have a real English
    word.

  3. Build a complete random word set one time (as in 1 but with all
    possible word combinations) and store in the DB to pull from randomly.

So…what would you do? 1 is probably easiest…but 2 is more
interesting for me from a problem solving standpoint.

Just looking for some thoughts.

Thanks,
Matt

unknown wrote:

  1. Manually add the pre-defined word set to a DB table and randomly pull
    a word out as needed.

  2. Each time I need a word, build a random word from the set of letters
    and compare to a dictionary somewhere to make sure I have a real English
    word.

  3. Build a complete random word set one time (as in 1 but with all
    possible word combinations) and store in the DB to pull from randomly.

So…what would you do? 1 is probably easiest…but 2 is more
interesting for me from a problem solving standpoint.

Just looking for some thoughts.

Thanks,
Matt

Not sure what the difference is between 1 and 3 but option number 2 can
be grossly inefficient for little or no benefit.

Option 3 is O(1) complexity and therefore I vote for it…

hth
ilan

On Tue, Oct 20, 2009 at 10:50 AM, [email protected] wrote:

Hi,

I have an interesting problem to solve and thought I’d ask for some input here. We’re developing an application and one part of it is building a random combination for this lock…

http://wordlock.com/_pdf/Wordlock-Bikelock-Instructions.pdf

So there are four columns, each contains a specific set of letters.

I won’t speak to what might make a good design for you :), but:

irb(main):001:0>
File.open(‘/usr/share/dict/words’).read.scan(/^[bdfhlmprst][aeiloruy][iklnrsaenrot][ledkstpdny]$/i)
=> [“Baal”, “Ball”, “Bart”, “Bass”, “Bean”, “Bell”, “Bern”, “Bert”,
“Bess”, “Best”, “Bill”, “Bird”, “Boas”, “Bond”, “Bonn”, “Bork”,
“Born”, “Bose”, “Brad”, “Bran”, “Bray”, “Bret”, “Brie”, “Brit”,
“Burl”, “Burt”, “Byrd”, “Dale”, “Dane”, “Dare”, “Dean”, “Deon”,
“Dial”, “Dion”, “Dirk”, “Dole”, “Donn”, “Duke”, “Dunn”, “Duse”,
“Fern”, “Fiat”, “Finn”, “Fisk”, “Ford”, “Fran”, “Fred”, “Frey”,
“Haas”, “Hale”, “Hall”, “Hals”, “Hank”, “Hans”, “Hart”, “Head”,
“Heep”, “Hell”, “Hess”, “Hill”, “Hiss”, “Holt”, “Hood”, “Horn”,
“Huey”, “Hull”, “Huns”, “Hunt”, “Land”, “Lane”, “Laos”, “Lars”,
“Lean”, “Lent”, “Leon”, “Leos”, “Lily”, “Lind”, “Lois”, “Lord”,
“Lott”, “Luis”, “Luke”, “Lyle”, “Lyly”, “Lynn”, “Lyon”, “Male”,
“Mann”, “Mark”, “Mars”, “Mary”, “Mass”, “Matt”, “Mead”, “Mike”,
“Mill”, “Minn”, “Miss”, “Moll”, “Monk”, “Mons”, “Mont”, “Moon”,
“More”, “Mort”, “Moss”, “Mott”, “Muse”, “Myst”, “Park”, “Pate”,
“Peel”, “Pele”, “Penn”, “Perl”, “Pete”, “Pike”, “Pitt”, “Pole”,
“Polk”, “Post”, “Pres”, “Pyle”, “Rand”, “Reed”, “Reid”, “Rene”,
“Riel”, “Rios”, “Root”, “Rory”, “Rose”, “Ross”, “Russ”, “Ryan”,
“Saks”, “Salk”, “Sand”, “Sean”, “Sian”, “Sony”, “Tass”, “Tate”,
“Tell”, “Tess”, “Tony”, “Tory”, “Tran”, “Trey”, “Troy”, “Tues”,
“Tull”, “Turk”, “Tyre”, “baas”, “bail”, “bait”, “bake”, “bald”,
“bale”, “balk”, “ball”, “band”, “bane”, “bank”, “bans”, “bard”,
“bare”, . . .

On Tue, Oct 20, 2009 at 10:26 AM, Ilan B. [email protected]
wrote:

  1. Build a complete random word set one time (as in 1 but with all

What if you created some sort of a tree data structure, where each node
represented a letter. (each node has at most 26 children) That seems
like it
should be more efficient, because for something like toon , tool , took
,
tooth , they all utilize the same series of paths to go through the too,
and
only diverge after that. So instead of 17 characters required for these
4
words, you only need 8.

what would you do?

If the result always has to be one of the pre-defined words, then the
most efficient approach is to have all the words available and then just
pick one from that set.

If you don’t mind reading this whole list into RAM, then it’s easy:

word = words[rand words.size]

If it’s huge and you want to keep it on disk, then I’d pre-format the
dictionary as a CDB, where the key is a number in the range 0 to
words.size-1.

http://cr.yp.to/cdb.html
http://raa.ruby-lang.org/project/cdb/

This allows any word to be found with at most two disk seeks.

If you have a SQL database already available in your application, just
use that. Again, I’d give each word a sequence number between 0 and N-1
to make it easy to pick one randomly.