Trie data structure

I’m trying to implement a trie data structure for my parsing program
that parses numbers…I’m lost as to what the algorithm would look
like. Also, I don’t fully understand the wikipedia definition at
Trie - Wikipedia when it says “…no node in the tree
stores the key associated with that node; instead, its position in the
tree shows what key it is associated with…”

Any help is appreciated!

On Jun 10, 5:44 pm, Justin To [email protected] wrote:

I’m trying to implement a trie data structure for my parsing program
that parses numbers…I’m lost as to what the algorithm would look
like. Also, I don’t fully understand the wikipedia definition athttp://en.wikipedia.org/wiki/Triewhen it says “…no node in the tree
stores the key associated with that node; instead, its position in the
tree shows what key it is associated with…”

Any help is appreciated!

Posted viahttp://www.ruby-forum.com/.

The basic idea is that each node contains only a partial key, the
final piece of its full key. So if the node you’re adding should have
the full key “hi”:

  • It will contain the partial key “i”.
  • Its parent will have the partial key “h”.
  • Its grandparent will have an empty partial key - the root node.

As the lookup algorithm example on the wiki page describes, looking up
a value by key is a matter of starting at the root node and following
each path that leads you to the next item in the full key. This
differs from a binary tree in that the tree structure is not only a
means to locate the key, but a reflection of the key itself.

Have fun!

Justin To wrote:

Also, I don’t fully understand the wikipedia definition at
Trie - Wikipedia when it says “…no node in the tree
stores the key associated with that node; instead, its position in the
tree shows what key it is associated with…”

This web page has an example of data represented as a trie:

http://tinyurl.com/56t7kh

The example is a set of words with each node corresponding to a
character.

There are various ways of implementing these sorts of data structures,
but hashes provide a powerful basis.

Most of the articles I read say that instead of storing a value at a
node, it stores a reference. How would one accomplish this in Ruby?
Also, I’m trying to make a trie to store numbers from 100,000 to
999,999. Basically, what would the trie look like?

Thanks for the help!

I can’t seem to grasp the algorithm for a trie. Can someone please help
me… I need the trie to store 6-digit numbers. I’ve read all the above
articles, but I still have no clue of how I should begin.

Thanks for the help!

On Wed, Jun 11, 2008 at 8:50 AM, Justin To [email protected] wrote:

Most of the articles I read say that instead of storing a value at a
node, it stores a reference. How would one accomplish this in Ruby?
Also, I’m trying to make a trie to store numbers from 100,000 to
999,999. Basically, what would the trie look like?

Take a look at Ruby Quiz - DictionaryMatcher (#103)

martin

On Wed, Jun 11, 2008 at 1:04 PM, Justin To [email protected] wrote:

I can’t seem to grasp the algorithm for a trie. Can someone please help
me… I need the trie to store 6-digit numbers. I’ve read all the above
articles, but I still have no clue of how I should begin.

here’s a quick example of a trie storing the three digit numbers 123,
145, 207, 451, 455. we take advantage of the fact that all numbers are
known to be the same length, otherwise you also need to store a
“number_ends_here?” flag on each node

{ :root =>
{ 1 =>
{ 2 => { 3 => true } },
{ 4 => { 5 => true } }
},
{ 2 =>
{ 0 => { 7 => true } }
},
{ 4 =>
{ 5 =>
{ 1 => true, 5 => true } }
}
}

you traverse it thus (untested code, but it gives the basic idea):

node = trie[:root]
num_to_check = “123”
num_to_check.split(//).each {|digit|
node = node[digit]
return false unless node
}

return true

martin

class Trie
attr_reader :value, :children
def initialize(value=nil)
@value = value # The corresponding value
@children = [] # Store hashes inside of the
hash
end

def <<(value)
sub_trie = Trie.new(value)
@children << sub_trie
return sub_trie

End of def <<(value)

end

Return true if value exists or nil if value D.N.E.

def child_value?(value)
if @children.empty? # If @children is empty, then there
are no children
return ‘empty’
else
# If one of the children contains the value then return true
@children.each do |child|
if(child.value==value)
return true
else
return nil
end
end
end

End of def child_value?(value)

end

Return the child that added the value if the value D.N.E. else

return nil, the value already exists

DO NOT ALTER OR CALL OUTSIDE OF CLASS–belongs to def

add_number(value)
def add_digit(digit)
if(!child_value?(digit))
child = self<<(digit)
puts “#{digit} added”
return child
else
puts “#{digit}: already exists in the child.”
return nil
end
end

def add_number(number)
current_node = self
number.to_s.each_byte do |byte|
puts “add_number #{byte.chr.to_i}”
puts "Nodeclass: " + current_node.class.to_s
current_node = current_node.add_digit(byte.chr.to_i)
end

end

End of class Trie

end

t = Trie.new
t << 3

t.add_number(234)

Output:

HashTrie3.rb:50:in add_number': undefined methodadd_digit’ for
nil:NilClass (NoMethodError)
from HashTrie3.rb:47:in each_byte' from HashTrie3.rb:47:inadd_number’
from HashTrie3.rb:62
add_number 2
Nodeclass: Trie
2 added
add_number 3
Nodeclass: Trie
3: already exists in the child. # ?
add_number 4 # ?
Nodeclass: NilClass # ?

Somewhere I’m not saving the current node correctly so when it adds 3,
it’s adding it into the same array that the first 3 (t<<3) and the 2
[t.add_number(234)] is in. But then again, I could have messed up
somewhere else. Maybe I’m passing around copies of the current node and
not the ACTUAL one.

Any help is much appreciated!!!

Justin

Justin To wrote:
I can’t seem to grasp the algorithm for a trie. Can someone please help
me… I need the trie to store 6-digit numbers. I’ve read all the above
articles, but I still have no clue of how I should begin.

Thanks for the help!

Maybe also check out the SimpleTrie class at
http://snippets.dzone.com/posts/show/5634

Cheers,
j.k.

class Trie
attr_reader :value, :children
attr_accessor :number_exists
def initialize(value=nil, number_exists=false)
@value = value
@children = []
@number_exists = number_exists # BOOL
end

def <<(value)
sub_trie = Trie.new(value)
children << sub_trie
return sub_trie
end
#--------------------------------------------------------------------------------------

def child_value?(value, node)

if(node.children.empty?)
  return 'empty'                     # Return: 'empty', children[i], 

‘D.N.E.’
else
i=0; while(i<node.children.size)
if(node.children[i].value==value)
puts “#{value} exists in #{node}.”
return node.children[i]
else if(i==node.children.size-1)
return ‘D.N.E.’
end
end
i+=1
end
end

end

def add_digits(digitArray)

node = self
while(digitArray[0])
scan = child_value?(digitArray[0], node)       # Scan the trie for 

the digit

    # Empty case
    if(scan=='empty')
      node = node<<(digitArray[0])
      if(digitArray.size==1)
        node.number_exists = true
      end
      digitArray.delete(digitArray[0])

    else if(scan=='D.N.E.')
      node = node<<(digitArray[0])
      if(digitArray.size==1)
       node.number_exists = true
      end
      digitArray.delete(digitArray[0])
    else if(scan.is_a?(Trie))
      if(digitArray.size==1)
        scan.number_exists = true # ??
      end
      node = scan
      digitArray.delete(digitArray[0])
    else
      puts "**Scan error**"
      digitArray.delete(digitArray[0])
    end
    end
    end

end

#end function
end

def add_number(number)
if(number.is_a?(Fixnum))
digitArray = []
number.to_s.each_byte { |byte| digitArray.push(byte.chr.to_i) }
add_digits(digitArray)
else
puts “’#{number}’ not a valid Fixnum
end
end

End of class Trie

end

t = Trie.new

t.add_number(434)

#984343525

looking for a (4), (3 and 5)

9843–52-

puts “\n”
puts t.children[0].children[0].children.size

My program can add most numbers fine… but if it comes to something
like 434 or 4343 or I suppose any similar patterns, it doesn’t work.
I’ve tried it with only 1 number, that way all you have to do is look
through def add_digits(digitArray) at the section for the Empty Case.

Any help is much appreciated!

Thanks!

class Trie
attr_reader :value, :children
attr_accessor :number_exists
def initialize(value=nil, number_exists=false)
@value = value
@children = []
@number_exists = number_exists # BOOL
end

def <<(value)
sub_trie = Trie.new(value)
children << sub_trie
return sub_trie
end
#--------------------------------------------------------------------------------------

def child_value?(value, node)

if(node.children.empty?)
  return 'empty'                     # Return: 'empty', children[i], 

‘D.N.E.’
else
i=0; while(i<node.children.size)
if(node.children[i].value==value)
return node.children # Something wrong here ???
else if(i==node.children.size-1)
return ‘D.N.E.’
end
end
i+=1
end
end

end

def add_digits(digitArray)

node = self
while(digitArray[0])
scan = child_value?(digitArray[0], node)       # Scan the trie for 

the digit

    # Empty case
    if(scan=='empty')
      node = node<<(digitArray[0])
      if(digitArray.size==1)
        node.number_exists = true
      end
      digitArray.delete(digitArray[0])

    end

end

#end function
end

#--------------------------------------------------------------------------------------
def add_number(number)
if(number.is_a?(Fixnum))
digitArray = []
number.to_s.each_byte { |byte| digitArray.push(byte.chr.to_i) }
add_digits(digitArray)
else
puts “’#{number}’ not a valid Fixnum
end
end

End of class Trie

end

t = Trie.new

t.add_number(434)

#984343525

looking for a (4), (3 and 5)

9843–52-

puts “\n”

Looking for size = 1

puts t.children[0].children[0].children.size

Actually, you can just look at this portion of code

class Trie
attr_reader :value, :children
attr_accessor :number_exists
def initialize(value=nil, number_exists=false)
@value = value
@children = []
@number_exists = number_exists # BOOL
end

def <<(value)
sub_trie = Trie.new(value)
children << sub_trie
return sub_trie
end
#--------------------------------------------------------------------------------------

def child_value?(value, node)

if(node.children.empty?)
  return 'empty'                     # Return: 'empty', children[i], 

‘D.N.E.’
else
i=0; while(i<node.children.size)
if(node.children[i].value==value)
return node.children # Something wrong here ???
else if(i==node.children.size-1)
return ‘D.N.E.’
end
end
i+=1
end
end

end

def add_digits(digitArray)
node = self
while(digitArray[0])
scan = child_value?(digitArray[0], node) # Scan the trie for
the digit

    # Empty case
    if(scan=='empty')
      node = node<<(digitArray[0])
      if(digitArray.size==1)
        node.number_exists = true
      end
      digitArray.delete_at(0)

    # D.N.E. case
    else if(scan=='D.N.E.')
      node = node<<(digitArray[0])
      if(digitArray.size==1)
       node.number_exists = true
      end
      digitArray.delete_at(0)

    # Value exists case
    else if(scan.is_a?(Trie))
      puts "ok"
      if(digitArray.size==1)
        scan.number_exists = true # ??
      end
      node = scan
      digitArray.delete_at(0)

    # If the number already exists ---- ???
    else
      digitArray.delete_at(0)
    end
    end
    end

end

#end function
end

#--------------------------------------------------------------------------------------
def add_number(number)
if(number.is_a?(Fixnum))
digitArray = []
number.to_s.each_byte { |byte| digitArray.push(byte.chr.to_i) }
add_digits(digitArray)
else
puts “’#{number}’ not a valid Fixnum
end
end

End of class Trie

end

t = Trie.new

t.add_number(2223)
t.add_number(2333)

puts “\n”

puts t.children[0].children[0].children.size

Hey Adam, thanks for the tips. I redid a few things, thought it was
working, but turns out it’s still buggy. If you can point anything out
to me, that’d be great! Thanks again!

On 6/12/08, Justin To [email protected] wrote:

There are a couple of issues with this code, here are some hints.

Return true if value exists or nil if value D.N.E.

def child_value?(value)
if @children.empty? # If @children is empty, then there are no children
return ‘empty’

‘empty’ is not nil - it will act like a true, which is the opposite of
what the comments say this function should do.

else
# If one of the children contains the value then return true
@children.each do |child|
if(child.value==value)
return true
else
return nil
end
end

this code is only testing the value of the first child: it has a
return statement in both branches of the ‘if’, so it will exit after
the first test.

end

End of def child_value?(value)

end

Return the child that added the value if the value D.N.E. else

return nil, the value already exists

DO NOT ALTER OR CALL OUTSIDE OF CLASS–belongs to def add_number(value)

you can use the ‘private’ keyword here, which enforces ‘do not call
outside of class’.

def add_number(number)
current_node = self
number.to_s.each_byte do |byte|
puts “add_number #{byte.chr.to_i}”
puts "Nodeclass: " + current_node.class.to_s
current_node = current_node.add_digit(byte.chr.to_i)
end

as your comments say, If the digit already exists in the children,
add_digit will return nil. In that case, you will set current_node to
nil here. You probably want to set it to something else.

end

End of class Trie

end

hope this helps,
-Adam

class Trie
attr_reader :value, :children
attr_accessor :number_exists
def initialize(value=nil, number_exists=false)
@value = value
@children = []
@number_exists = number_exists # BOOL
end

def <<(value)
sub_trie = Trie.new(value)
children << sub_trie
return sub_trie
end

def each
yield value
@children.each do |child_node|
child_node.each { |e| yield e }
end
end

def output
each { |x| puts x }
end
#--------------------------------------------------------------------------------------

def child_value?(value, node)
if(node.children.empty?)
return ‘empty’ # Return: ‘empty’, children[i],
‘D.N.E.’
else
i=0; while(i<node.children.size)
if(node.children[i].value==value)
return node.children[i] # Something wrong here ???
else if(i==node.children.size-1)
return ‘D.N.E.’
end
end
i+=1
end
end

end

def add_digits(digitArray)
node = self
while(digitArray[0])
scan = child_value?(digitArray[0], node) # Scan the trie for
the digit

    # Empty case
    if(scan=='empty')
      node = node<<(digitArray[0])
      if(digitArray.size==1)
        node.number_exists = true
      end
      digitArray.delete_at(0)

    # D.N.E. case
    else if(scan=='D.N.E.')
      node = node<<(digitArray[0])
      if(digitArray.size==1)
       node.number_exists = true
      end
      digitArray.delete_at(0)

    # Value exists case
    else if(scan.is_a?(Trie))
      if(digitArray.size==1)
        scan.number_exists = true # ??
      end
      node = scan
      digitArray.delete_at(0)

    # If the number already exists ---- ???
    else
      #set node = child node
      puts "**Scan error**"
      digitArray.delete_at(0)
    end
    end
    end

end

#end function
end

#--------------------------------------------------------------------------------------
def add_number(number)
if(number.is_a?(Fixnum))
digitArray = []
number.to_s.each_byte { |byte| digitArray.push(byte.chr.to_i) }
add_digits(digitArray)
else
puts “’#{number}’ not a valid Fixnum
end
end

End of class Trie

end

t = Trie.new

File.open(‘output.txt’, ‘r’).each do |x|

t.add_number(x.chomp.to_i)

end

t.output

I think I’ve finally got it to work. Now, I’m going to try to implement
a method #output to output all of the numbers… now I’m in the crap
T_T. Firstly, the #output I have above displays all of the nodes, but
now I need to say:
(*) Traverse every node and
when you reach a node whose
number_exists = true,
make that list of letters
into a word and store it in
an array.

Ex.

  nil
  /
 1
/

2 => number_exists = true (12)
/
3 => number_exists = true (123)

Final array: storage[12, 123]

Thanks for any help!

Justin

def each
info = [value,number_exists, children]
yield(info)
@children.each do |child_node|
child_node.each { |e| yield e }
end
end

def output
numberCollector = []
tempString= ‘’

each do |x|
  tempString.concat(x[0].to_s)
    if(x[1])
      numberCollector.push(tempString)
      tempString=tempString.chop
    end


end

puts numberCollector

end

Only works for numbers that share (1) common node: e.g. 22, 23, 24, 25

What can I do to output numbers that share (x) common nodes?

Thanks!

class Trie
attr_reader :value, :children
attr_accessor :number_exists
def initialize(value=nil, number_exists=false)
@value = value
@children = []
@number_exists = number_exists # BOOL
end

def <<(value)
sub_trie = Trie.new(value)
children << sub_trie
return sub_trie
end

def each
info = [value,number_exists, children]
yield(info)
@children.each do |child_node|
child_node.each { |e| yield e }
end
end

def output
numberCollector = []
tempString= ‘’
each do |x|
tempString.concat(x[0].to_s)
if(x[1])
numberCollector.push(tempString)
if(x[2].size!=0)
tempString=’’
else
x[2].each
end
end
end

puts numberCollector

end
#--------------------------------------------------------------------------------------

def child_value?(value, node)
if(node.children.empty?)
return ‘empty’ # Return: ‘empty’, children[i],
‘D.N.E.’
else
i=0; while(i<node.children.size)
if(node.children[i].value==value)
return node.children[i] # Something wrong here ???
else if(i==node.children.size-1)
return ‘D.N.E.’
end
end
i+=1
end
end

end

def add_digits(digitArray)
node = self
while(digitArray[0])
scan = child_value?(digitArray[0], node) # Scan the trie for
the digit

    # Empty case
    if(scan=='empty')
      node = node<<(digitArray[0])
      if(digitArray.size==1)
        node.number_exists = true
      end
      digitArray.delete_at(0)

    # D.N.E. case
    else if(scan=='D.N.E.')
      node = node<<(digitArray[0])
      if(digitArray.size==1)
       node.number_exists = true
      end
      digitArray.delete_at(0)

    # Value exists case
    else if(scan.is_a?(Trie))
      if(digitArray.size==1)
        scan.number_exists = true # ??
      end
      node = scan
      digitArray.delete_at(0)

    # If the number already exists ---- ???
    else
      #set node = child node
      puts "**Scan error**"
      digitArray.delete_at(0)
    end
    end
    end

end

#end function
end

#--------------------------------------------------------------------------------------
def add_number(number)
if(number.is_a?(Fixnum))
digitArray = []
number.to_s.each_byte { |byte| digitArray.push(byte.chr.to_i) }
add_digits(digitArray)
else
puts “’#{number}’ not a valid Fixnum
end
end

End of class Trie

end

t = Trie.new

File.open(‘output.txt’, ‘r’).each do |x|

t.add_number(x.chomp.to_i)

end

t.output

I can display numbers that don’t have similar prefixes… e.g. 123, 234,
345; but I can’t store these examples: 123, 145, 146

My algorithm should be:

  1. Check nodes until node[i].number_exists = true
    • True: store the number up until this point:
      node[i]
      • If there are more nodes after node[i], then
        keep going and start from (1)

Any help is appreciated, thanks!!!

require ‘ruby-prof’

class Trie
attr_reader :value, :children
attr_accessor :number_exists, :prefix
def initialize(value=nil)
@value = value
@children = []
@number_exists = false
@prefix=’’
end

def <<(value)
sub_trie = Trie.new(value)
sub_trie.prefix = prefix+value.to_s
children << sub_trie
return sub_trie
end

def each
info = [value, children, number_exists, prefix]

yield(info)
@children.each do |child_node|
    child_node.each { |e| yield e  }
  end

end

def output(opt=‘all’)

if(opt.is_a?(String))
if(opt.downcase=='all')
    each do |x|

    if(x[3]!='')
        if(x[2])
          puts x[3]
        end
    end
  end

else
  puts "Error: '#{opt}' not a valid selection.\n \
     Valid selections\n \
     - All: outputs every # in the Trie\n \
     - #: (e.g. 1, 123, 43...) checks if the # exists in the Trie"
end
else if(opt.is_a?(Fixnum))

    each do |x|
      if(x[3].to_i==opt)
          if(x[2])
            puts x[3].to_s + ' exists.'
            return true
          end
      end
    end

    puts "#{opt} not found."

end
end

 end

#--------------------------------------------------------------------------------------

def child_value?(value, node)
if(node.children.empty?)
return ‘empty’ # Return: ‘empty’, children[i],
‘D.N.E.’
else
i=0; while(i<node.children.size)
if(node.children[i].value==value)
return node.children[i] # Something wrong here ???
else if(i==node.children.size-1)
return ‘D.N.E.’
end
end
i+=1
end
end
end

def add_digits(digitArray)

node = self
while(digitArray[0])
scan = child_value?(digitArray[0], node)       # Scan the trie for 

the digit

    # Empty case
    if(scan=='empty')
      node = node<<(digitArray[0])
      if(digitArray.size==1)
        node.number_exists = true
      end
      digitArray.delete_at(0)

    # D.N.E. case
    else if(scan=='D.N.E.')
      node = node<<(digitArray[0])
      if(digitArray.size==1)
       node.number_exists = true
      end
      digitArray.delete_at(0)

    # Value exists case
    else if(scan.is_a?(Trie))
        if(digitArray.size==1)
          scan.number_exists = true # ?? FOR OUTPUT.. check this 

area
end

      node = scan
      digitArray.delete_at(0)
      if(digitArray.size==0)
        return true
      end

    else
      puts "**Scan error**"
      digitArray.delete_at(0)
    end
    end
    end

end

#end function
end

#--------------------------------------------------------------------------------------
def add_number(number)
if(number.is_a?(Fixnum))
digitArray = []
number.to_s.each_byte { |byte| digitArray.push(byte.chr.to_i) }
add_digits(digitArray)
else
puts “’#{number}’ not a valid Fixnum
end
end

End of class Trie

end

t = Trie.new(nil)

open(‘output.txt’, ‘w’) do |f|

i=0; while i<100000

#-----6 digit number-----
min   = 100000
max  = 999999
f << min + rand(max-min+1) << "\n"

i+=1
end

f.flush
end

RubyProf.start
File.open(‘output.txt’, ‘r’).each do |x|

t.add_number(x.chomp.to_i)

end

result = RubyProf.stop
printer = RubyProf::FlatPrinter.new(result)
printer.print(STDOUT, 0)

*Trie data structure for random-generated 6-digit numbers
I think I’ve finally managed to put together a newbie Trie data
structure. I’m kind of lost, though, when it comes to testing it. It’s
suppose to be a very effective data structure, but when I test it out,
it doesn’t seem very good. Compared to a hash and an array, it’s much
slower at adding new numbers and only sometimes is it faster at
retrieving a specific number.

If anybody can go through my newbie code and point out areas that are
ineffective or inefficient, that’d be terrific. I’m positive that my
implementation is very poor, and I need some feedback from more
experienced people about how to fix it.

Thanks a lot!

Justin