Testing DiGraph (#73)

The first step to solving this quiz is to come up with some graphs that
you can
test. I started by thinking of some trivial patterns I could easily
hand figure
the answers for. A good example is the graph:

A -> B -> C -> D

Finding the longest simple path for that is not hard, it is always all
of the
edges. That’s just a straight line basically and it will include all
the nodes.
Building these kinds of graphs programatically for any size is also
easy:

>> Array.new(3) { |i| [(?A + i).chr, (?A + i + 1).chr] }
=> [["A", "B"], ["B", "C"], ["C", "D"]]

Himadri ChoudHury took a different approach, generating completely
random
graphs. This has the advantage of testing far more cases including
isolated
nodes and circular paths. Here’s the generation code:

# @dg is the generated DiGraph. Store the paths in a hash called @paths
@dg
# @paths is a hash. Each element of the hash is an array that contains
# all the paths from that node
@paths
# @nodes is an array which contains a list of all the nodes
@nodes

# Randomly generate @dg and the corresponding @paths
def generate_dg
    nodes = Array.new
    # 10 nodes total
    10.times do
        nodes << rand(10)
    end
    nodes.uniq!
    @paths = Hash.new
    nodes.each do |n|
        num_paths_from_each_node = rand(3) + 1
        next_nodes = Array.new
        num_paths_from_each_node.times do
            next_nodes << nodes[rand(nodes.length)]
        end
        next_nodes.uniq!
        @paths[n] = next_nodes
    end
    arr = Array.new
    @paths.each do |key,vals|
        @paths[key].each do |val|
            arr << [key,val]
        end
    end
    @dg = DiGraph.new(*arr)
    @nodes = @paths.keys
end

(Note: those bare instance variables nixed in with the comments
reference
variables on the Class object and not the instances themselves. They
read and
discard nil values having no effect.)

The code begins by generating up to ten (uniq!() can reduce the count)
random
nodes as simple Integers. After that, each node is randomly connected
to
between one and three other nodes. A Hash is created during this
process to
represent the connections. This turns out to be very helpful later on.
Finally, the paths are converted into the Array representation DiGraph
expects
and the graph is created. The last line also stores the nodes for easy
access.

Here’s a random graph (viewed from the paths Hash) created from this
code, so
you can see how they come out:

{ 5 => [8, 6],
  0 => [0, 6],
  6 => [1, 7, 6],
  1 => [8],
  7 => [7, 9, 4],
  8 => [6, 1],
  9 => [4, 8, 0],
  4 => [5] }

Now, in order to use these random graphs, we really need code that can
tell us
the right answer for the data. Essentially, this means that we require
our own
implementation of the test methods, to prove that we get the same
answers as the
DiGraph implementation. Here is one of those methods, used to test
max_length_of_simple_path_including_node():

# Depth first search for the longest simple path starting from 'node'
# Simple path means a path that doesn't contain any duplicate edges
# Note: I'm not using the definition of simply connected based on no
# duplicate nodes
def search(node)
    longest_path = 0
    if (@paths[node])
        @paths[node].each_index do |next_idx|
            next_node = @paths[node][next_idx]
            @paths[node].delete_at(next_idx)
            tmp = 1 + search(next_node)
            @paths[node].insert(next_idx,next_node)
            if (longest_path < tmp)
                longest_path = tmp
            end
        end
    end
    return longest_path
end

This method is a simple depth-first search, as the comment says,
counting the
steps taken to reach the farthest node without crossing a single edge
twice.
The implementation is just a recursive search from each node reachable
from the
start node. At each step, the last edge crossed is removed from the
paths Hash,
the recursive search is performed, and then the edge is restored. The
returned
result here is just a count, not the actual path.

The tests using that are pretty trivial:

def test_03_max_length_of_simple_path_including_node
    generate_dg
    @nodes.each do |node|
        longest_path = search(node)
        # ...
        assert_equal( longest_path,
                      @dg.max_length_of_simple_path_including_node(node) 

)
end
end

Unfortunately, this is where we get to execution problems with the quiz.
The
above code passes some tests, due to bugs in the DiGraph implementation,
but it
does not correctly implement the
max_length_of_simple_path_including_node()
method described in the quiz.

Here’s an example of where it gets wrong answers, using my trivial
one-way path
described at the beginning of this quiz:

>> @paths = {"A" => ["B"], "B" => ["C"], "C" => ["D"]}
=> {"A"=>["B"], "B"=>["C"], "C"=>["D"]}
>> search("A")
=> 3
>> search("B")
=> 2
>> search("C")
=> 1
>> search("D")
=> 0

Those are the longest simple paths starting from a given node, but not
the
longest given paths including the given node (which would all be 3).

These errors made it tricky to correctly evaluate the expected results
of the
methods and I apologize for this. I ran into the same issue with my own
tests,
as anyone following my posts to Ruby T. saw in painful detail.

Luckily, it’s easy to get to a real solution of the first method from
here.
First, we can modify Himadri’s search() method to return the path
instead of
just a count:

def search(node, longest_path = [node])
    if (@paths[node])
        @paths[node].each_index do |next_idx|
            next_node = @paths[node][next_idx]
            @paths[node].delete_at(next_idx)
            tmp = search(next_node, longest_path + [next_node])
            @paths[node].insert(next_idx,next_node)
            if (longest_path.size < tmp.size)
                longest_path = tmp
            end
        end
    end
    return longest_path
end

Then, using that method, we can exhaustively search all the long paths
for the
biggest one containing our node:

def max_length_of_simple_path_including_node(n)
  all_paths = @paths.keys.map { |node| search(node) }
  all_paths = all_paths.select { |path| path.include? n }
  all_paths = all_paths.sort_by { |path| -path.size }
  all_paths.empty? ? 0 : all_paths.shift.size - 1
end

Line by line that makes a list of the longest paths starting from each
node,
reduces that list to those including the desired node, sorts the
remaining paths
biggest to smallest, and finally returns an edge count of the largest
path or 0
if there aren’t any paths left.

The other method is left as an exercise for the interested reader.

Many, many thanks to the submitters who braved the waters and came up
with some
basic tests, right or wrong.

Tomorrow, we will programatically generate 1,000 monkeys in the hopes
that they
can recreate the works of Shakespeare…