As was brought up in the discussion, the effectiveness of the A*
algorithm is
very dependent on the estimate function used to gage distance remaining
to the
goal. The quiz recommended the Manhattan Distance function for this
role, but
that’s not very well suited to the kind of work we are doing here.
Daniel
Martin suggested an alternate implementation producing better results.
Let’s have a look at Daniel’s code below.
Daniel first posted a set of unit tests to verify code correctness. I
won’t go
into all of those tests here, but let me show a couple. As always, in
unit
testing it’s a good idea to begin by verifying functionality in some
trivial
case:
# ...
def test_simple_horizontal
map = %q(@..X)
solution = @solver.do_quiz_solution(map)
assert_paths_equal(%q(####), solution)
end
# ...
This test ensures that the solver can walk a straight path, with no
obstacles
between the starting point and the goal.
Daniel’s tests increase in difficulty, of course, and the final test is
the one
Daniel used to show how the Manhattan Distance gets into trouble:
# ...
def test_bad_distance
map = %q(@.*..
..~..
..^.X).sub(/ +/,'')
solution = @solver.do_quiz_solution(map)
assert_paths_equal(
%q(#?#..
.?~#.
..^.#), solution, "Got tripped up by manhattan distance")
end
# ...
Here we see a neat feature of Daniel’s custom assert_paths_equal()
assertion
(not shown). The question marks allow any output in those cells. The
reasoning
is that there are sometimes multiple correct paths. Here the first move
must be
to one of the question marks, but either will produce the same length
path.
The real test here is to ensure that the path goes over the forrest.
The
Manhattan Distance function can cause algorithms to favor the mountain
route.
Let’s move on to the actual code now.
One of the challenges the quiz didn’t spend a lot of time on is that A*
really
needs a priority queue to manage the paths it is examining, always
keeping the
best path so far first in line. Daniel took the easy way out of this
problem
and just resorted the paths after each insert:
# I suppose someone would think I should use a heap here.
# I've found that the built-in sort method is much faster
# than any heap implementation in ruby. As a plus, the logic
# is easier to follow.
class PriorityQueue
def initialize
@list = []
end
def add(priority, item)
# Add @list.length so that sort is always using Fixnum comparisons,
# which should be fast, rather than whatever is comparison on
`item’
@list << [priority, @list.length, item]
@list.sort!
self
end
def <<(pritem)
add(*pritem)
end
def next
@list.shift[2]
end
def empty?
@list.empty?
end
end
The add() method is all of the magic here. When an item is add()ed,
Daniel
actually adds a three element Array to the queue and resorts the entire
queue.
The first element of the Array ensures that items are sorted by
priority. When
priorities match, the second item breaks ties by add order. The item
never
factors into the sort. When the item is retrieved with next(), the two
extra
sort fields are discarded.
I have to side-step a moment here and address Daniel’s first comment.
To be
clear, he is saying that using Ruby’s sort() can be faster than a pure
Ruby
heap, since sort() is implemented in C. If both elements are correctly
implemented in C, the heap should definitely out perform resorting.
Finally,
the Ruby heap can also be faster, as soon as significant input is
involved:
#!/usr/bin/env ruby -w
require "sorted_array" # Daniel's PriorityQueue
require "heap" # pure Ruby Heap, from Ruby Q. #40
require "benchmark"
DATA = Array.new(5_000) { rand(5_000) }.freeze
Benchmark.bmbm(10) do |results|
results.report("sorted_array:") do
queue = PriorityQueue.new
DATA.each { |n| queue.add(n, n) }
queue.next until queue.empty?
end
results.report("ruby_heap:") do
queue = Heap.new
DATA.each { |n| queue.insert(n) }
queue.extract until queue.empty?
end
end
# >> Rehearsal -------------------------------------------------
# >> sorted_array: 33.950000 0.020000 33.970000 ( 33.972859)
# >> ruby_heap: 0.450000 0.000000 0.450000 ( 0.449776)
# >> --------------------------------------- total: 34.420000sec
# >>
# >> user system total real
# >> sorted_array: 33.990000 0.010000 34.000000 ( 34.016562)
# >> ruby_heap: 0.440000 0.000000 0.440000 ( 0.437217)
OK, let’s get back to Daniel’s code:
require 'enumerator'
class Astar
def do_quiz_solution(puzzle)
@terrain = []
instr = ""
puzzle.each {|rowstr|
next if rowstr =~ /^\s*$/
rowstr.gsub!(/[^.@~X*^]/,'')
instr += rowstr
instr += "\n"
row = []
rowstr.scan(/[.@~X*^]/) {|terrain|
case terrain
when /[.@X]/; row << 1
when /[*]/; row << 2
when /\^/; row << 3
when /~/; row << nil
end
}
xind = rowstr.index('X')
aind = rowstr.index('@')
if (aind)
@start = [@terrain.length, aind]
end
if (xind)
@goal = [@terrain.length, xind]
end
@terrain << row
}
if do_find_path
outarr = instr.split(/\n/)
@path.each{|row,col| outarr[row][col]='#'}
return outarr.join("\n")
else
return nil
end
end
# ...
This method manages the quiz process. First, an Array is created to
hold the
costs of each cell and a String to hold the passed map. The each()
iterator
cleans and dissects the input, filling both structures as it goes. It
also
locates the @start and @goal coordinates while it works.
With the setup out of the way, a hand-off is made to do_find_path()
which is the
A* implementation. If a path is found, the indicated coordinates are
marked on
the original map String and returned. Otherwise, nil is returned.
Here’s the heart of the matter:
# ...
def do_find_path
been_there = {}
pqueue = PriorityQueue.new
pqueue << [1,[@start,[],1]]
while !pqueue.empty?
spot,path_so_far,cost_so_far = pqueue.next
next if been_there[spot]
newpath = [path_so_far, spot]
if (spot == @goal)
@path = []
newpath.flatten.each_slice(2) {|i,j| @path << [i,j]}
return @path
end
been_there[spot] = 1
spotsfrom(spot).each {|newspot|
next if been_there[newspot]
tcost = @terrain[newspot[0]][newspot[1]]
newcost = cost_so_far + tcost
pqueue << [newcost + estimate(newspot),
[newspot,newpath,newcost]]
}
end
return nil
end
# ...
This is the A* algorithm in Daniel’s code. It begins by establishing a
Hash to
track where it has been and a PriorityQueue to manage the paths being
considered. From there the code dives into the search loop which
terminates
when the the PriorityQueue is empty?(), indicating that all move options
have
been exhausted.
At each step through the loop, where we are, the path so far, and the
cumlative
cost are pulled out of the PriorityQueue. We skip ahead if we’ve been
on this
spot before. (The first path would have been shorter, since the
PriorityQueue
keeps them sorted.) The new path is built, including our new location,
and we
check to see if the @goal has been reached. When it has, the path is
converted
from to a list of coordinates and returned.
If we’re not at the goal, we need to extend the path one step in every
direction. Here the helper method spotsfrom() generates the choices to
iterate
over. If we’ve been there before we can skip it, otherwise the new cost
is
calculated via the other helper method estimate() and added, with the
new path,
to the PriorityQueue.
Let’s have a look at those helper methods:
# ...
def estimate(spot)
# quiz statement version
# (spot[0] - @goal[0]).abs + (spot[1] - @goal[1]).abs
# my version
[(spot[0] - @goal[0]).abs, (spot[1] - @goal[1]).abs].max
end
def spotsfrom(spot)
retval = []
vertadds = [0,1]
horizadds = [0,1]
if (spot[0] > 0) then vertadds << -1; end
if (spot[1] > 0) then horizadds << -1; end
vertadds.each{|v| horizadds.each{|h|
if (v != 0 or h != 0) then
ns = [spot[0]+v,spot[1]+h]
if (@terrain[ns[0]] and @terrain[ns[0]][ns[1]]) then
retval << ns
end
end
}}
retval
end
end
The estimate() method is Daniel’s improvement to distance calculation.
Since
diagonal moves are allowed to shorten two distances at once, we really
just need
to consider the longer distance, vertically or horizontally, to the
goal.
The other helper builds a list of neighbors by adding combinations of
-1, 0, and
1 as offsets to the current location coordinates. The @terrain Array is
used to
reality check these coordinates as in-bounds and walkable, before they
are added
to the returned Array.
The final solution is just an instance creation and interface method
call away:
if __FILE__ == $0
puts Astar.new.do_quiz_solution(ARGF)
end
My thanks to all the pathfinders who fiddled around with the quiz and to
Daniel
Martin especially for helping to clarify intent.
Ruby Q. for this week: Seek James out at RubyConf and introduce
yourself!