SUMMARY What's the fastest and/or shortest you can turn this: in = ["i", "v", "w", "e", "l", "d", "u", "f", "e", "v", "f", "e", "d", "e", "w", "d"] into this: ["i", "l", "u", "v", "f", "e", "w", "d"] ? DETAILS The goal is, given a stream of items, to produce the same ordered stream with no duplicates, where the relative position of the last item wins. The shortest (and fastest?) for me turns out to be simply: in.reverse.uniq.reverse I have the question mark because the in-place version is sometimes faster, sometimes slower in my tests: out = in.dup; out.reverse!; out.uniq!; out.reverse! I thought it might be faster to loop through the array in one pass myself, but that turns out to be about 4x slower: seen = {} out = in.dup out.each_with_index{ |v,i| if n=h[v] out[n] = nil end h[v] = i } out.compact! The input for me isn't REALLY an array, but rather a series of items that I receive one at a time from a depth-first traversal of a graph. If this influences your answer, so be it :) BACKGROUND Why is this problem mildly interesting? See http://phrogz.net/nodes/traversingdirectedgraph.asp I was reading that page for nostalgia the other day and thought about porting the calculation library to Ruby for fun, and to see how much easier solving the problem would be in Ruby. Here's the code I put together for fun (the cells and formulae in the below correspond to the graphic on the above url), before adding the 'minimal update chain' optimization. require 'set' module ClassContainer def initialize( *args ) self.class[ self.name ] = self super end def self.included( klass ) klass.class_eval{ def self.inherited( subklass ) ( @subclasses ||= [] ) << subklass end def self.all @subclasses ||= [] @instances ||= {} all = @instances.dup @subclasses.each{ |sub| all.merge!( sub.all ) } all end def self.each all.each_value{ |i| yield i } end def self.[]( name ) all[ name ] end def self.[]=( name, instance ) ( @instances ||= {} )[ name ] = instance end def self.sorted all.values.sort_by{ |n| n.name.to_s } end } end end module GraphNode attr_reader :outgoing_edges, :incoming_edges def initialize( *args ) @outgoing_edges = Set.new @incoming_edges = Set.new end def edges @outgoing_edges + @incoming_edges end def add_edge_to( other_node ) @outgoing_edges << other_node other_node.incoming_edges << self self end def clear_incoming @incoming_edges.each{ |n| n.outgoing_edges.delete self } @incoming_edges = Set.new self end def traverse( seen_nodes = Set.new, &block ) new_seen = seen_nodes.dup << self @outgoing_edges.each{ |destination| unless seen_nodes.include?( destination ) yield destination destination.traverse( new_seen, &block ) end } end end class Cell include GraphNode include ClassContainer attr_reader :name attr_accessor :value def initialize( name, value=0 ) raise "Duplicate ID" if Cell[ name ] @name = name @value = value * 1.0 super end def value=( new_value ) # Naive update approach return if @value == new_value @value = new_value update_dependents new_value end def update_dependents traverse{ |formula| formula.evaluate( true ) } end def to_s '<%s "%s" value=%.2f>' % [ self.class.name, @name, @value ] end end class Formula < Cell def initialize( *args ) super @formula, @value = @value.to_s, 0.0 update_dependencies end def update_dependencies clear_incoming @formula.scan( /@(\w+)/ ).flatten.each{ |source_name| if incoming = Cell[ source_name.to_sym ] incoming.add_edge_to( self ) end } end def evaluate( skip_dependents = false ) scope = ValueSpace.new Cell.each{ |n| scope.set( n.name, n.value ) } @value = eval( @formula, scope.get_binding ) puts "Updated #{self}" update_dependents unless skip_dependents end def to_s '<%s "%s" formula="%s" value=%.2f>' % [ self.class.name, @name, @formula, @value ] end end class ValueSpace def set( name, value ) instance_variable_set :"@#{name}", value end def get_binding binding end end if $0 == __FILE__ Cell.new( :a, 5 ) Cell.new( :d, 3 ) Cell.new( :e, 32 ) Cell.new( :i, 10 ) Cell.new( :l, 12 ) Formula.new( :b, "@d + @e + @a" ) Formula.new( :g, "@h * @i" ).value = 20 Formula.new( :h, "@g / @i" ).value = 2 Formula.new( :c, "@a + 3" ) Formula.new( :f, "@b + @c" ) Formula.new( :j, "@c + 8" ) Formula.new( :k, "@l + @j" ) Formula.new( :m, "@l + 10" ) # Need to call here to handle circular dependencies Formula.each{ |f| f.update_dependencies } # Naive approach... puts "Naive initial evaluation..." Formula.each{ |f| f.evaluate } puts "*" * 40 puts "Initial values..." puts Cell.sorted puts "*" * 40 puts "Changing 'a' to 7..." Cell[ :a ].value = 7 puts "*" * 40 puts "Changing 'i' to 36..." Cell[ :i ].value = 36 puts '-' * 40 puts "Per cell dependency..." Cell.each{ |cell| puts "#{cell.name} -> #{cell.outgoing_edges.map{|c| c.name}.join(', ')}" } end (which outputs) Naive initial evaluation... Updated <Formula "c" formula="@a + 3" value=8.00> Updated <Formula "f" formula="@b + @c" value=8.00> Updated <Formula "j" formula="@c + 8" value=16.00> Updated <Formula "k" formula="@l + @j" value=28.00> Updated <Formula "f" formula="@b + @c" value=8.00> Updated <Formula "j" formula="@c + 8" value=16.00> Updated <Formula "k" formula="@l + @j" value=28.00> Updated <Formula "b" formula="@d + @e + @a" value=40.00> Updated <Formula "f" formula="@b + @c" value=48.00> Updated <Formula "k" formula="@l + @j" value=28.00> Updated <Formula "g" formula="@h * @i" value=20.00> Updated <Formula "h" formula="@g / @i" value=2.00> Updated <Formula "m" formula="@l + 10" value=22.00> Updated <Formula "h" formula="@g / @i" value=2.00> Updated <Formula "g" formula="@h * @i" value=20.00> **************************************** Initial values... <Cell "a" value=5.00> <Formula "b" formula="@d + @e + @a" value=40.00> <Formula "c" formula="@a + 3" value=8.00> <Cell "d" value=3.00> <Cell "e" value=32.00> <Formula "f" formula="@b + @c" value=48.00> <Formula "g" formula="@h * @i" value=20.00> <Formula "h" formula="@g / @i" value=2.00> <Cell "i" value=10.00> <Formula "j" formula="@c + 8" value=16.00> <Formula "k" formula="@l + @j" value=28.00> <Cell "l" value=12.00> <Formula "m" formula="@l + 10" value=22.00> **************************************** Changing 'a' to 7... Updated <Formula "b" formula="@d + @e + @a" value=42.00> Updated <Formula "f" formula="@b + @c" value=50.00> Updated <Formula "c" formula="@a + 3" value=10.00> Updated <Formula "f" formula="@b + @c" value=52.00> Updated <Formula "j" formula="@c + 8" value=18.00> Updated <Formula "k" formula="@l + @j" value=30.00> **************************************** Changing 'i' to 36... Updated <Formula "h" formula="@g / @i" value=0.56> Updated <Formula "g" formula="@h * @i" value=20.00> Updated <Formula "g" formula="@h * @i" value=20.00> Updated <Formula "h" formula="@g / @i" value=0.56> ---------------------------------------- Per cell dependency... c -> f, j e -> b f -> i -> h, g l -> k, m j -> k b -> f k -> g -> h m -> a -> b, c h -> g d -> b

on 2005-12-06 03:15

on 2005-12-06 03:40

Phrogz wrote: > DETAILS > The goal is, given a stream of items, to produce the same ordered > stream with no duplicates, where the relative position of the last item > wins. The shortest (and fastest?) for me turns out to be simply: > in.reverse.uniq.reverse > .... > The input for me isn't REALLY an array, but rather a series of items > that I receive one at a time from a depth-first traversal of a graph. > If this influences your answer, so be it :) In = ["i", "v", "w", "e", "l", "d", "u", "f", "e", "v", "f", "e", "d", "e", "w", "d"] out = [] In.each{ |x| out.delete( x ); out << x } p out

on 2005-12-06 03:48

does this have a quadratic time complexity? doing this by sorting might be faster... or am i mistaken? konstantin

on 2005-12-06 04:08

i meant in.reverse.uniq!.reverse! (the first reverse is non-destructive)

on 2005-12-06 05:05

On Tue, 06 Dec 2005 02:27:44 +0100, ako... <removed_email_address@domain.invalid> wrote: > i meant in.reverse.uniq!.reverse! (the first reverse is non-destructive) Don't chain bang methods: >> ["i", "l", "u", "v", "f", "e", "w", "d"].reverse.uniq!.reverse! NoMethodError: undefined method `reverse!' for nil:NilClass from (irb):3 from :0 They return nil if nothing changed...

on 2005-12-06 05:13

On Dec 5, 2005, at 6:47 PM, ako... wrote: > does this have a quadratic time complexity? doing this by sorting > might > be faster... or am i mistaken? Quadratic? No. The suggested algorithms are O(2n) or O(3n).

on 2005-12-06 05:17

Hi -- On Tue, 6 Dec 2005, Phrogz wrote: > DETAILS > The goal is, given a stream of items, to produce the same ordered > stream with no duplicates, where the relative position of the last item > wins. The shortest (and fastest?) for me turns out to be simply: > in.reverse.uniq.reverse This entry is not in contention on shortness or speed (don't even bother to benchmark it -- it's completely off the charts), but I just thought it looked really cool :-) a.inject([]){|r,*b|r-b+b} David __ David A. Black removed_email_address@domain.invalid "Ruby for Rails", forthcoming from Manning Publications, April 2006!

on 2005-12-06 05:29

On Dec 5, 2005, at 6:27 PM, ako... wrote: > i meant in.reverse.uniq!.reverse! (the first reverse is non- > destructive) Array#uniq! can return nil if no changes were made. This will not occur given the above input, but can for the general case. Using reverse rather than dup does shave a bit of time off, though, which puts it as the winner on my machine. So far: Rehearsal -------------------------------------------------------- William J. 4.230000 0.060000 4.290000 ( 4.369104) Simple Copies 1.180000 0.010000 1.190000 ( 1.227496) Dup and In-Place 1.210000 0.010000 1.220000 ( 1.240726) Reverse and In-Place 1.130000 0.010000 1.140000 ( 1.158873) Hash and Compact 7.220000 0.090000 7.310000 ( 7.552673) ---------------------------------------------- total: 15.150000sec user system total real William J. 4.230000 0.040000 4.270000 ( 4.465252) Simple Copies 1.190000 0.010000 1.200000 ( 1.218064) Dup and In-Place 1.220000 0.010000 1.230000 ( 1.268994) Reverse and In-Place 1.120000 0.010000 1.130000 ( 1.152538) Hash and Compact 7.220000 0.060000 7.280000 ( 7.511923) input = ["i", "v", "w", "e", "l", "d", "u", "f", "e", "v", "f", "e", "d", "e", "w", "d"] require 'benchmark' Benchmark.bmbm( 20 ) do |r| n = 100_000 r.report( 'William J.' ){ n.times{ out = [] input.each{ |x| out.delete( x ); out << x } } } r.report( 'Simple Copies' ){ n.times{ input.reverse.uniq.reverse } } r.report( 'Dup and In-Place' ){ n.times{ out = input.dup out.reverse! out.uniq! out.reverse! out } } r.report( 'Reverse and In-Place' ){ n.times{ out = input.reverse out.uniq! out.reverse! out } } r.report( "Hash and Compact" ){ n.times{ seen = {} out = input.dup out.each_with_index{ |val,idx| if old_idx = seen[ val ] out[ old_idx ] = nil end seen[ val ] = idx } } } end

on 2005-12-06 09:21

i do not see why not. this is almost exactly a bubble sort. which is in worst case quadratic.

on 2005-12-06 15:37

On Dec 6, 2005, at 12:17 AM, ako... wrote: > i do not see why not. this is almost exactly a bubble sort. which > is in > worst case quadratic. It's not an arbitrary re-sorting of all the elements involved. The elements are all there, in the right order - you just have to weed out the few bad ones.

on 2005-12-06 23:31

my reasoning was that the code loops over the array and on every iteration sequentially searches another array of the same length. this is similar to bubble sort where on every iteration the current element is moved to the beginning of the array until its place in the order is found. but this is not really important, so let us not argue over this ; -) konstantin