kD-Trees implementation

marcin_r · January 15, 2008, 9:53pm

Hello.

I started playing with ruby kD-Trees implementation, an thoughts?

to be extended to allow for searching / parasing nodes

class KDTree
attr_reader :root

def initialize(points)
@root = KDNode.new.parse(points)
end

def add_node(point)
@root.add(point)
end
end

pretty standard tree node

class KDNode
attr_reader :left, :right
attr_reader :location

def initialize(location=nil, left=nil, right=nil)
@location = location
@left = left
@right = right
end

parse list of points and build nodes

def parse(points, depth = 0)
@depth = depth # save depth for this node for removing
@dim = points[0].length # dimension based on point dimensions
axis = depth % @dim

 points = points.sort_by{|point| point[axis]} # sorting by axis
 half = points.length / 2 # algoithm suggest using median
 # but in reality we don't need median value
 # we just need to cut it in half

 @location = points[half]  # here we can store just index
 @left = KDNode.parse(points[0..half-1], depth+1) unless half < 1
 @right = KDNode.parse(points[half+1..-1], depth+1) unless half+1 >=

points.length
end

add leaf to node, adding nodes will make tree unbalanced

def add(point, depth = 0)
@dim = point.length
axis = depth % @dim

 if @location[axis] < point[axis]
   @left ? @left.add(point) : @left = KDNode.new(point)
 else
   @right ? @right.add(point) : @right = KDNode.new(point)
 end

end

remove node from tree

def remove
self.parse(@left.to_a + @right.to_a, @depth)
# no idea if it should be @depth or @depth + 1
end

list all leafs including this node

def to_a
@left.to_a + [@location] + @right.to_a
end
end

I’m thinking about storing only indexes in leaves, and storing point
list in kD-Tree.

any suggestions?

marcin_r · January 19, 2008, 12:45pm

Marcin R. wrote:

class KDNode
…
def parase(points, depth = 0)
…
points = points.sort_by{|point| point[@axis]}

Is this the standard way to build a kdtree?

Seems awfully wasteful to re-sort the values for
each level. Can’t you build @dim copies of the
original array, each sorted one on that dim, then
build the kdtree over the partitions on that?

Clifford H…

marcin_r · January 19, 2008, 5:50pm

Clifford H. wrote:

each level. Can’t you build @dim copies of the
original array, each sorted one on that dim, then
build the kdtree over the partitions on that?

Clifford H…

Problem is more complex than you think.
Solution that have @dim copies are o(nlogn) but
if i have @dim copies sorted by axis, after first split i have to
somehow mark other copies - so next 2 splits will operate only on
elements that belongs to proper branch.

Maybny i understood it wrong and there’s better solution, but for now
I’m focusing on writing proper algorithm , then i might think how to
optimize it.

marcin_r · January 20, 2008, 3:30am

Marcin R. wrote:

Clifford H. wrote:

Is this the standard way to build a kdtree?
Problem is more complex than you think.

Yes, ok, I see that now, @dim copies doesn’t help much.
I’m still interested to know what better method there
might be, if any.

for now
I’m focusing on writing proper algorithm , then i might think how to
optimize it.

Fair enough!

marcin_r · January 20, 2008, 1:02pm

Clifford H. wrote:

how to optimize it.

Fair enough!

There are reaserch papers that descripbe O(nlogn) instead of (nlog^2n)
algorithms but They are usually full of complex math and 20-50 pages
long. Also one that i could understand Is based n pointers and
interlinked lists and it’s kinda hard to implement in Ruby

marcin_r · January 21, 2008, 12:40am

Marcin R. wrote:

There are reaserch papers that descripbe O(nlogn) instead of (nlog^2n)
algorithms but They are usually full of complex math and 20-50 pages
long. Also one that i could understand Is based n pointers and
interlinked lists and it’s kinda hard to implement in Ruby

The wikipedia page for kdtrees references an o(n) median-finding
algorithm
in a book compiled by Cormen and others. I have the book, and the
algorithm
doesn’t look too scary. I might have a go at implementing it. But in any
case, the exact median is only needed if you demand an exactly-balanced
tree.
Subsequent insertions will unbalance the tree anyhow in most cases, so
it’s
probably not necessary.

The algorithm for finding the i’th lowest entry given in the book is as
follows.
(copyright, fair use provisions allow posting):

The SELECT algorithm determines the ith smallest of an input array of n

1 elements by
executing the following steps. (If n = 1, then SELECT merely returns its
only input value as
the ith smallest.)

Divide the n elements of the input array into âŒŠn/5âŒ‹ groups of 5
elements each and at
most one group made up of the remaining n mod 5 elements.
Find the median of each of the âŒˆn/5âŒ‰ groups by first insertion
sorting the elements of
each group (of which there are at most 5) and then picking the median
from the sorted
list of group elements.
Use SELECT recursively to find the median x of the âŒˆn/5âŒ‰ medians
found in step 2.
(If there are an even number of medians, then by our convention, x is
the lower
median.)
Partition the input array around the median-of-medians x using the
modified version of
PARTITION. Let k be one more than the number of elements on the low side
of the
partition, so that x is the kth smallest element and there are n-k
elements on the high
side of the partition.
If i = k, then return x. Otherwise, use SELECT recursively to find
the ith smallest
element on the low side if i < k, or the (i - k)th smallest element on
the high side if i >
k.

That’s it, not too scary.

Clifford H…

marcin_r · January 21, 2008, 1:12am

Clifford H. wrote:

tree.
only input value as
found in step 2.
5. If i = k, then return x. Otherwise, use SELECT recursively to find
the ith smallest
element on the low side if i < k, or the (i - k)th smallest element on
the high side if i >
k.

That’s it, not too scary.

Clifford H…

I didn’t say it was scary, I’m not scared of complex math but since
finals are coming soon I don’t have enought time.

I didn’t have this book, and other sources were pretty confusing
(Reserch papers tend to be this way) - after i understood the algorithm
writing Ruby implementation was piece of cake

Anyway thanks for posting this piece of article, I’ll try to implement
it ASAP.

about finding median - my algorithm doesn’t find it either, but problem
is to find a fast way to select (with some degree of aproximation)
elements that were assigned to current branch

marcin_r · January 16, 2008, 12:19am

new code - lets you print trees, and find k-nearest points (at least i
hope so ;))

require ‘pp’

class KDTree
attr_reader :root
attr_reader :points

def initialize(points, dim)
@dim = dim
@root = KDNode.new(dim).parase(points)
end

def add_node(point)
@root.add(point)
end

def find(*range)
range.collect{ |pa|
pa = Range.new(pa.first, pa.last) if pa.kind_of? Array
}
@points = []
query(range, @root)
@points
end

def query(range, node)
axis = node.axis
median = node.location[axis]

 if node.left && (median >= range[axis].begin)
   query(range, node.left);  #/* Skip left if max of left tree

(median) is out of range /
end
if node.right && (median <= range[axis].end)
query(range, node.right); #/ Skip right if min of right tree
(median) is out of range */
end
if (0…@dim-1).all?{|ax|
range[ax].include? node.location[ax]
}
@points << node.location;
end
end

def print
@root.print
end
end

class KDNode
attr_reader :left, :right
attr_reader :location

attr_reader :axis

def initialize(dim, location=nil, left=nil, right=nil)
@dim = dim
@location = location
@left = left
@right = right
end

def parase(points, depth = 0)
@axis = depth % @dim

 points = points.sort_by{|point| point[@axis]}
 half = points.length / 2

 @location = points[half]
 @left = KDNode.new(@dim).parase(points[0..half-1], depth+1) unless

half < 1
@right = KDNode.new(@dim).parase(points[half+1…-1], depth+1)
unless half+1 >= points.length
self
end

def add(point)
if @location[@axis] < point[@axis]
@left ? @left.add(point) : @left = KDNode.new(point)
else
@right ? @right.add(point) : @right = KDNode.new(point)
end
end

def remove
self.parse(@left.to_a + @right.to_a, @axis)
end

def to_a
@left.to_a + [@location] + @right.to_a
end

def print(l=0)
@left.print(l+1) if @left
puts(" "*l + @location.inspect)
@right.print(l+1) if @right
end
end

a = []

10.times do |x|
10.times do |y|
10.times do |z|
a << [x, y, z]
end
end
end

tree = KDTree.new(a, 3)
tree.print

puts " -------------- "

tree.find([0,4], [1,4], [5,7])

pp tree.points.sort_by{|po| po[0]}

marcin_r · January 21, 2008, 1:37am

about finding median - my algorithm doesn’t find it either

But I think it does - you sort N elements and use the N/2’th
element, which is the median. That guarantees a balanced tree.

Well, I guess it’s close enought - it doesn’t guarantee balanced tree -
you have to take N+1 cases into consideration.

The r-tree (and priority r-tree) looks like a useful structure
too - the equivalent of a b-tree for spatial data.

Clifford H…

Well I started implementing it as brain-tease, and platform to test my
optimization techniques - so plan is to implement full set of operations

and then start to optimize it as much as i can

One of properties of kd-trees is important to me - fast finding
neighbours in k-dimensions

marcin_r · January 21, 2008, 1:25am

Marcin R. wrote:

Anyway thanks for posting this piece of article,

Not a problem - thanks for sharing your kdtree implementation!

about finding median - my algorithm doesn’t find it either

But I think it does - you sort N elements and use the N/2’th
element, which is the median. That guarantees a balanced tree.

The r-tree (and priority r-tree) looks like a useful structure
too - the equivalent of a b-tree for spatial data.

Clifford H…