 # kD-Trees implementation

Hello.

I started playing with ruby kD-Trees implementation, an thoughts?

# to be extended to allow for searching / parasing nodes

class KDTree

def initialize(points)
@root = KDNode.new.parse(points)
end

end
end

# pretty standard tree node

class KDNode

def initialize(location=nil, left=nil, right=nil)
@location = location
@left = left
@right = right
end

# parse list of points and build nodes

def parse(points, depth = 0)
@depth = depth # save depth for this node for removing
@dim = points.length # dimension based on point dimensions
axis = depth % @dim

`````` points = points.sort_by{|point| point[axis]} # sorting by axis
half = points.length / 2 # algoithm suggest using median
# but in reality we don't need median value
# we just need to cut it in half

@location = points[half]  # here we can store just index
@left = KDNode.parse(points[0..half-1], depth+1) unless half < 1
@right = KDNode.parse(points[half+1..-1], depth+1) unless half+1 >=
``````

points.length
end

@dim = point.length
axis = depth % @dim

`````` if @location[axis] < point[axis]
@left ? @left.add(point) : @left = KDNode.new(point)
else
@right ? @right.add(point) : @right = KDNode.new(point)
end
``````

end

# remove node from tree

def remove
self.parse(@left.to_a + @right.to_a, @depth)
# no idea if it should be @depth or @depth + 1
end

# list all leafs including this node

def to_a
@left.to_a + [@location] + @right.to_a
end
end

I’m thinking about storing only indexes in leaves, and storing point
list in kD-Tree.

any suggestions?

Marcin R. wrote:

class KDNode

def parase(points, depth = 0)

points = points.sort_by{|point| point[@axis]}

Is this the standard way to build a kdtree?

Seems awfully wasteful to re-sort the values for
each level. Can’t you build @dim copies of the
original array, each sorted one on that dim, then
build the kdtree over the partitions on that?

Clifford H…

Clifford H. wrote:

each level. Can’t you build @dim copies of the
original array, each sorted one on that dim, then
build the kdtree over the partitions on that?

Clifford H…

Problem is more complex than you think.
Solution that have @dim copies are o(nlogn) but
if i have @dim copies sorted by axis, after first split i have to
somehow mark other copies - so next 2 splits will operate only on
elements that belongs to proper branch.

Maybny i understood it wrong and there’s better solution, but for now
I’m focusing on writing proper algorithm , then i might think how to
optimize it.

Marcin R. wrote:

Clifford H. wrote:

Is this the standard way to build a kdtree?
Problem is more complex than you think.

Yes, ok, I see that now, @dim copies doesn’t help much.
I’m still interested to know what better method there
might be, if any.

for now
I’m focusing on writing proper algorithm , then i might think how to
optimize it.

Fair enough!

Clifford H. wrote:

how to optimize it.

Fair enough!

There are reaserch papers that descripbe O(nlogn) instead of (nlog^2n)
algorithms but They are usually full of complex math and 20-50 pages
long. Also one that i could understand Is based n pointers and
interlinked lists and it’s kinda hard to implement in Ruby

Marcin R. wrote:

There are reaserch papers that descripbe O(nlogn) instead of (nlog^2n)
algorithms but They are usually full of complex math and 20-50 pages
long. Also one that i could understand Is based n pointers and
interlinked lists and it’s kinda hard to implement in Ruby

algorithm
in a book compiled by Cormen and others. I have the book, and the
algorithm
doesn’t look too scary. I might have a go at implementing it. But in any
case, the exact median is only needed if you demand an exactly-balanced
tree.
Subsequent insertions will unbalance the tree anyhow in most cases, so
it’s
probably not necessary.

The algorithm for finding the i’th lowest entry given in the book is as
follows.
(copyright, fair use provisions allow posting):

The SELECT algorithm determines the ith smallest of an input array of n

1 elements by
executing the following steps. (If n = 1, then SELECT merely returns its
only input value as
the ith smallest.)

1. Divide the n elements of the input array into âŒŠn/5âŒ‹ groups of 5
elements each and at
most one group made up of the remaining n mod 5 elements.
2. Find the median of each of the âŒˆn/5âŒ‰ groups by first insertion
sorting the elements of
each group (of which there are at most 5) and then picking the median
from the sorted
list of group elements.
3. Use SELECT recursively to find the median x of the âŒˆn/5âŒ‰ medians
found in step 2.
(If there are an even number of medians, then by our convention, x is
the lower
median.)
4. Partition the input array around the median-of-medians x using the
modified version of
PARTITION. Let k be one more than the number of elements on the low side
of the
partition, so that x is the kth smallest element and there are n-k
elements on the high
side of the partition.
5. If i = k, then return x. Otherwise, use SELECT recursively to find
the ith smallest
element on the low side if i < k, or the (i - k)th smallest element on
the high side if i >
k.

That’s it, not too scary.

Clifford H…

Clifford H. wrote:

tree.
only input value as
found in step 2.
5. If i = k, then return x. Otherwise, use SELECT recursively to find
the ith smallest
element on the low side if i < k, or the (i - k)th smallest element on
the high side if i >
k.

That’s it, not too scary.

Clifford H…

I didn’t say it was scary, I’m not scared of complex math but since
finals are coming soon I don’t have enought time.

I didn’t have this book, and other sources were pretty confusing
(Reserch papers tend to be this way) - after i understood the algorithm
writing Ruby implementation was piece of cake

Anyway thanks for posting this piece of article, I’ll try to implement
it ASAP.

about finding median - my algorithm doesn’t find it either, but problem
is to find a fast way to select (with some degree of aproximation)
elements that were assigned to current branch

new code - lets you print trees, and find k-nearest points (at least i
hope so ;))

require ‘pp’

class KDTree

def initialize(points, dim)
@dim = dim
@root = KDNode.new(dim).parase(points)
end

end

def find(*range)
range.collect{ |pa|
pa = Range.new(pa.first, pa.last) if pa.kind_of? Array
}
@points = []
query(range, @root)
@points
end

def query(range, node)
axis = node.axis
median = node.location[axis]

`````` if node.left && (median >= range[axis].begin)
query(range, node.left);  #/* Skip left if max of left tree
``````

(median) is out of range /
end
if node.right && (median <= range[axis].end)
query(range, node.right); #/
Skip right if min of right tree
(median) is out of range */
end
if (0…@dim-1).all?{|ax|
range[ax].include? node.location[ax]
}
@points << node.location;
end
end

def print
@root.print
end
end

class KDNode

def initialize(dim, location=nil, left=nil, right=nil)
@dim = dim
@location = location
@left = left
@right = right
end

def parase(points, depth = 0)
@axis = depth % @dim

`````` points = points.sort_by{|point| point[@axis]}
half = points.length / 2

@location = points[half]
@left = KDNode.new(@dim).parase(points[0..half-1], depth+1) unless
``````

half < 1
@right = KDNode.new(@dim).parase(points[half+1…-1], depth+1)
unless half+1 >= points.length
self
end

if @location[@axis] < point[@axis]
@left ? @left.add(point) : @left = KDNode.new(point)
else
@right ? @right.add(point) : @right = KDNode.new(point)
end
end

def remove
self.parse(@left.to_a + @right.to_a, @axis)
end

def to_a
@left.to_a + [@location] + @right.to_a
end

def print(l=0)
@left.print(l+1) if @left
puts(" "*l + @location.inspect)
@right.print(l+1) if @right
end
end

a = []

10.times do |x|
10.times do |y|
10.times do |z|
a << [x, y, z]
end
end
end

tree = KDTree.new(a, 3)
tree.print

puts " -------------- "

tree.find([0,4], [1,4], [5,7])

pp tree.points.sort_by{|po| po}

about finding median - my algorithm doesn’t find it either

But I think it does - you sort N elements and use the N/2’th
element, which is the median. That guarantees a balanced tree.

Well, I guess it’s close enought - it doesn’t guarantee balanced tree -
you have to take N+1 cases into consideration.

The r-tree (and priority r-tree) looks like a useful structure
too - the equivalent of a b-tree for spatial data.

Clifford H…

Well I started implementing it as brain-tease, and platform to test my
optimization techniques - so plan is to implement full set of operations

• and then start to optimize it as much as i can One of properties of kd-trees is important to me - fast finding
neighbours in k-dimensions

Marcin R. wrote:

Anyway thanks for posting this piece of article,

Not a problem - thanks for sharing your kdtree implementation!

about finding median - my algorithm doesn’t find it either

But I think it does - you sort N elements and use the N/2’th
element, which is the median. That guarantees a balanced tree.

The r-tree (and priority r-tree) looks like a useful structure
too - the equivalent of a b-tree for spatial data.

Clifford H…

This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.