Forum: Ruby Speed Golf - Remove Early Dups

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
gavin (Guest)
on 2005-12-06 03:15
(Received via mailing list)
SUMMARY
What's the fastest and/or shortest you can turn this:
   in = ["i", "v", "w", "e", "l", "d", "u", "f", "e", "v", "f", "e",
"d", "e", "w", "d"]

into this:
   ["i", "l", "u", "v", "f", "e", "w", "d"]
?


DETAILS
The goal is, given a stream of items, to produce the same ordered
stream with no duplicates, where the relative position of the last item
wins. The shortest (and fastest?) for me turns out to be simply:
   in.reverse.uniq.reverse

I have the question mark because the in-place version is sometimes
faster, sometimes slower in my tests:
   out = in.dup; out.reverse!; out.uniq!; out.reverse!

I thought it might be faster to loop through the array in one pass
myself, but that turns out to be about 4x slower:
   seen = {}
   out = in.dup
   out.each_with_index{ |v,i|
     if n=h[v]
       out[n] = nil
     end
     h[v] = i
   }
   out.compact!

The input for me isn't REALLY an array, but rather a series of items
that I receive one at a time from a depth-first traversal of a graph.
If this influences your answer, so be it :)


BACKGROUND
Why is this problem mildly interesting? See
http://phrogz.net/nodes/traversingdirectedgraph.asp

I was reading that page for nostalgia the other day and thought about
porting the calculation library to Ruby for fun, and to see how much
easier solving the problem would be in Ruby.

Here's the code I put together for fun (the cells and formulae in the
below correspond to the graphic on the above url), before adding the
'minimal update chain' optimization.

require 'set'
module ClassContainer
  def initialize( *args )
    self.class[ self.name ] = self
    super
  end

  def self.included( klass )
    klass.class_eval{
      def self.inherited( subklass )
        ( @subclasses ||= [] ) << subklass
      end

      def self.all
        @subclasses ||= []
        @instances ||= {}
        all = @instances.dup
        @subclasses.each{ |sub| all.merge!( sub.all ) }
        all
      end

      def self.each
        all.each_value{ |i| yield i }
      end

      def self.[]( name )
        all[ name ]
      end

      def self.[]=( name, instance )
        ( @instances ||= {} )[ name ] = instance
      end

      def self.sorted
        all.values.sort_by{ |n| n.name.to_s }
      end
    }
  end
end

module GraphNode
  attr_reader :outgoing_edges, :incoming_edges

  def initialize( *args )
    @outgoing_edges = Set.new
    @incoming_edges = Set.new
  end

  def edges
    @outgoing_edges + @incoming_edges
  end

  def add_edge_to( other_node )
    @outgoing_edges  << other_node
    other_node.incoming_edges << self
    self
  end

  def clear_incoming
    @incoming_edges.each{ |n|
      n.outgoing_edges.delete self
    }
    @incoming_edges = Set.new
    self
  end

  def traverse( seen_nodes = Set.new, &block )
    new_seen = seen_nodes.dup << self
    @outgoing_edges.each{ |destination|
      unless seen_nodes.include?( destination )
        yield destination
        destination.traverse( new_seen, &block )
      end
    }
  end
end

class Cell
  include GraphNode
  include ClassContainer

  attr_reader :name
  attr_accessor :value
  def initialize( name, value=0 )
    raise "Duplicate ID" if Cell[ name ]
    @name = name
    @value = value * 1.0

    super
  end

  def value=( new_value )
    # Naive update approach
    return if @value == new_value
    @value = new_value
    update_dependents
    new_value
  end

  def update_dependents
    traverse{ |formula|
      formula.evaluate( true )
    }
  end

  def to_s
    '<%s "%s" value=%.2f>' % [ self.class.name, @name, @value ]
  end
end

class Formula < Cell
  def initialize( *args )
    super
    @formula, @value = @value.to_s, 0.0
    update_dependencies
  end

  def update_dependencies
    clear_incoming
    @formula.scan( /@(\w+)/ ).flatten.each{ |source_name|
      if incoming = Cell[ source_name.to_sym ]
        incoming.add_edge_to( self )
      end
    }
  end

  def evaluate( skip_dependents = false )
    scope = ValueSpace.new
    Cell.each{ |n| scope.set( n.name, n.value ) }
    @value = eval( @formula, scope.get_binding )
    puts "Updated #{self}"
    update_dependents unless skip_dependents
  end

  def to_s
    '<%s "%s" formula="%s" value=%.2f>' % [ self.class.name, @name,
@formula, @value ]
  end
end

class ValueSpace
  def set( name, value )
    instance_variable_set :"@#{name}", value
  end
  def get_binding
    binding
  end
end

if $0 == __FILE__
  Cell.new( :a, 5 )
  Cell.new( :d, 3 )
  Cell.new( :e, 32 )
  Cell.new( :i, 10 )
  Cell.new( :l, 12 )

  Formula.new( :b, "@d + @e + @a" )
  Formula.new( :g, "@h * @i" ).value = 20
  Formula.new( :h, "@g / @i" ).value = 2
  Formula.new( :c, "@a + 3" )
  Formula.new( :f, "@b + @c" )
  Formula.new( :j, "@c + 8" )
  Formula.new( :k, "@l + @j" )
  Formula.new( :m, "@l + 10" )

  # Need to call here to handle circular dependencies
  Formula.each{ |f| f.update_dependencies }

  # Naive approach...
  puts "Naive initial evaluation..."
  Formula.each{ |f| f.evaluate }

  puts "*" * 40
  puts "Initial values..."
  puts Cell.sorted

  puts "*" * 40
  puts "Changing 'a' to 7..."
  Cell[ :a ].value = 7

  puts "*" * 40
  puts "Changing 'i' to 36..."
  Cell[ :i ].value = 36

  puts '-' * 40
  puts "Per cell dependency..."
  Cell.each{ |cell|
    puts "#{cell.name} -> #{cell.outgoing_edges.map{|c| c.name}.join(',
')}"
  }

end

(which outputs)

Naive initial evaluation...
Updated <Formula "c" formula="@a + 3" value=8.00>
Updated <Formula "f" formula="@b + @c" value=8.00>
Updated <Formula "j" formula="@c + 8" value=16.00>
Updated <Formula "k" formula="@l + @j" value=28.00>
Updated <Formula "f" formula="@b + @c" value=8.00>
Updated <Formula "j" formula="@c + 8" value=16.00>
Updated <Formula "k" formula="@l + @j" value=28.00>
Updated <Formula "b" formula="@d + @e + @a" value=40.00>
Updated <Formula "f" formula="@b + @c" value=48.00>
Updated <Formula "k" formula="@l + @j" value=28.00>
Updated <Formula "g" formula="@h * @i" value=20.00>
Updated <Formula "h" formula="@g / @i" value=2.00>
Updated <Formula "m" formula="@l + 10" value=22.00>
Updated <Formula "h" formula="@g / @i" value=2.00>
Updated <Formula "g" formula="@h * @i" value=20.00>
****************************************
Initial values...
<Cell "a" value=5.00>
<Formula "b" formula="@d + @e + @a" value=40.00>
<Formula "c" formula="@a + 3" value=8.00>
<Cell "d" value=3.00>
<Cell "e" value=32.00>
<Formula "f" formula="@b + @c" value=48.00>
<Formula "g" formula="@h * @i" value=20.00>
<Formula "h" formula="@g / @i" value=2.00>
<Cell "i" value=10.00>
<Formula "j" formula="@c + 8" value=16.00>
<Formula "k" formula="@l + @j" value=28.00>
<Cell "l" value=12.00>
<Formula "m" formula="@l + 10" value=22.00>
****************************************
Changing 'a' to 7...
Updated <Formula "b" formula="@d + @e + @a" value=42.00>
Updated <Formula "f" formula="@b + @c" value=50.00>
Updated <Formula "c" formula="@a + 3" value=10.00>
Updated <Formula "f" formula="@b + @c" value=52.00>
Updated <Formula "j" formula="@c + 8" value=18.00>
Updated <Formula "k" formula="@l + @j" value=30.00>
****************************************
Changing 'i' to 36...
Updated <Formula "h" formula="@g / @i" value=0.56>
Updated <Formula "g" formula="@h * @i" value=20.00>
Updated <Formula "g" formula="@h * @i" value=20.00>
Updated <Formula "h" formula="@g / @i" value=0.56>
----------------------------------------
Per cell dependency...
c -> f, j
e -> b
f ->
i -> h, g
l -> k, m
j -> k
b -> f
k ->
g -> h
m ->
a -> b, c
h -> g
d -> b
akonsu (Guest)
on 2005-12-06 03:28
(Received via mailing list)
perhaps in.reverse!.uniq!.reverse!   ?
w_a_x_man (Guest)
on 2005-12-06 03:40
(Received via mailing list)
Phrogz wrote:
> DETAILS
> The goal is, given a stream of items, to produce the same ordered
> stream with no duplicates, where the relative position of the last item
> wins. The shortest (and fastest?) for me turns out to be simply:
>    in.reverse.uniq.reverse
>
....
> The input for me isn't REALLY an array, but rather a series of items
> that I receive one at a time from a depth-first traversal of a graph.
> If this influences your answer, so be it :)

In = ["i", "v", "w", "e", "l", "d", "u", "f", "e", "v", "f", "e",
  "d", "e", "w", "d"]

out = []

In.each{ |x| out.delete( x );  out << x }

p out
akonsu (Guest)
on 2005-12-06 03:48
(Received via mailing list)
does this have a quadratic time complexity? doing this by sorting might
be faster... or am i mistaken?

konstantin
akonsu (Guest)
on 2005-12-06 04:08
(Received via mailing list)
i meant in.reverse.uniq!.reverse! (the first reverse is non-destructive)
dbatml (Guest)
on 2005-12-06 05:05
(Received via mailing list)
On Tue, 06 Dec 2005 02:27:44 +0100, ako... 
<removed_email_address@domain.invalid> wrote:

> i meant in.reverse.uniq!.reverse! (the first reverse is non-destructive)

Don't chain bang methods:

>> ["i", "l", "u", "v", "f", "e", "w", "d"].reverse.uniq!.reverse!
NoMethodError: undefined method `reverse!' for nil:NilClass
         from (irb):3
         from :0

They return nil if nothing changed...
gavin (Guest)
on 2005-12-06 05:13
(Received via mailing list)
On Dec 5, 2005, at 6:47 PM, ako... wrote:
> does this have a quadratic time complexity? doing this by sorting
> might
> be faster... or am i mistaken?

Quadratic? No. The suggested algorithms are O(2n) or O(3n).
dblack (Guest)
on 2005-12-06 05:17
(Received via mailing list)
Hi --

On Tue, 6 Dec 2005, Phrogz wrote:

> DETAILS
> The goal is, given a stream of items, to produce the same ordered
> stream with no duplicates, where the relative position of the last item
> wins. The shortest (and fastest?) for me turns out to be simply:
>    in.reverse.uniq.reverse

This entry is not in contention on shortness or speed (don't even bother
to benchmark it -- it's completely off the charts), but I just thought
it
looked really cool :-)

  a.inject([]){|r,*b|r-b+b}


David
__
David A. Black
removed_email_address@domain.invalid

"Ruby for Rails", forthcoming from Manning Publications, April 2006!
gavin (Guest)
on 2005-12-06 05:29
(Received via mailing list)
On Dec 5, 2005, at 6:27 PM, ako... wrote:
> i meant in.reverse.uniq!.reverse! (the first reverse is non-
> destructive)

Array#uniq! can return nil if no changes were made. This will not
occur given the above input, but can for the general case.

Using reverse rather than dup does shave a bit of time off, though,
which puts it as the winner on my machine. So far:

Rehearsal --------------------------------------------------------
William J.          4.230000   0.060000   4.290000 (  4.369104)
Simple Copies          1.180000   0.010000   1.190000 (  1.227496)
Dup and In-Place       1.210000   0.010000   1.220000 (  1.240726)
Reverse and In-Place   1.130000   0.010000   1.140000 (  1.158873)
Hash and Compact       7.220000   0.090000   7.310000 (  7.552673)
---------------------------------------------- total: 15.150000sec

                            user     system      total        real
William J.          4.230000   0.040000   4.270000 (  4.465252)
Simple Copies          1.190000   0.010000   1.200000 (  1.218064)
Dup and In-Place       1.220000   0.010000   1.230000 (  1.268994)
Reverse and In-Place   1.120000   0.010000   1.130000 (  1.152538)
Hash and Compact       7.220000   0.060000   7.280000 (  7.511923)


input = ["i", "v", "w", "e", "l", "d", "u", "f", "e", "v", "f", "e",
"d", "e", "w", "d"]

require 'benchmark'
Benchmark.bmbm( 20 ) do |r|
   n = 100_000

   r.report( 'William J.' ){
     n.times{
       out = []
       input.each{ |x| out.delete( x ); out << x }
     }
   }

   r.report( 'Simple Copies' ){
     n.times{
       input.reverse.uniq.reverse
     }
   }

   r.report( 'Dup and In-Place' ){
     n.times{
       out = input.dup
       out.reverse!
       out.uniq!
       out.reverse!
       out
     }
   }

   r.report( 'Reverse and In-Place' ){
     n.times{
       out = input.reverse
       out.uniq!
       out.reverse!
       out
     }
   }

   r.report( "Hash and Compact" ){
     n.times{
       seen = {}
       out = input.dup
       out.each_with_index{ |val,idx|
         if old_idx = seen[ val ]
           out[ old_idx ] = nil
         end
         seen[ val ] = idx
       }
     }
   }

end
akonsu (Guest)
on 2005-12-06 09:21
(Received via mailing list)
i do not see why not. this is almost exactly a bubble sort. which is in
worst case quadratic.
gavin (Guest)
on 2005-12-06 15:37
(Received via mailing list)
On Dec 6, 2005, at 12:17 AM, ako... wrote:
> i do not see why not. this is almost exactly a bubble sort. which
> is in
> worst case quadratic.

It's not an arbitrary re-sorting of all the elements involved. The
elements are all there, in the right order - you just have to weed
out the few bad ones.
akonsu (Guest)
on 2005-12-06 23:31
(Received via mailing list)
my reasoning was that the code loops over the array and on every
iteration sequentially searches another array of the same length. this
is similar to bubble sort where on every iteration the current element
is moved to the beginning of the array until its place in the order is
found. but this is not really important, so let us not argue over this
; -)
konstantin
This topic is locked and can not be replied to.