Need help detecting overlapping ranges

Hello all,

I am writing some code where I create a bunch of ranges, then at the end
I want to create new ranges out of any ranges that overlap one another.
For example, say I have the following ranges:

(1…5) (7…11) (22…29) (5…8)

Given the ranges above, I want to end up with the following ranges:

(1…11) (22…29)

Here is the code I’ve come up with so far (ranges is an array of ranges
similar to what I described above in my example):

ranges = @failed.outages
changes = true
while changes
changes = false
outages = ranges.collect { |range| range.to_a }
ranges.clear
while !outages.empty?
outage = outages.shift
outages.each do |n|
unless (outage & n).empty?
outage = (outage + n).uniq.sort
outages.delete(n)
changes = true
end
end
ranges << (outage.first…outage.last)
end
end
return ranges.sort { |a,b| a.first <=> b.first }

This code works, but it is EXTREMELY slow (my current array of ranges
is averaging out to ~24000 range elements). Anyone have an idea of how
to speed it up?


Thanks!
Bryan

Bryan R. wrote:

Hello all,

I am writing some code where I create a bunch of ranges, then at the end
I want to create new ranges out of any ranges that overlap one another.
For example, say I have the following ranges:

(1…5) (7…11) (22…29) (5…8)

Given the ranges above, I want to end up with the following ranges:

(1…11) (22…29)

(…)

This code works, but it is EXTREMELY slow (my current array of ranges
is averaging out to ~24000 range elements). Anyone have an idea of how
to speed it up?


Thanks!
Bryan

Sets are usually much faster then large arrays and sets have a divide
method.

require ‘set’
ranges=[(1…5),(7…11),(22…29),(5…8)]
set=Set.new
ranges.each do|range|
range.each do|el|
set<<el
end
end
sets = set.divide { |i,j| (i - j).abs == 1 } #straight from the
documentation!

p sets
#<Set: {#<Set: {27, 22, 28, 23, 29, 24, 25, 26}>, #<Set: {5, 11, 6, 1,
7, 2, 8, 3, 9, 4, 10}>}>

hth,

Siep

2008/8/6 Bryan R. [email protected]:

(1…11) (22…29)
while !outages.empty?
end
return ranges.sort { |a,b| a.first <=> b.first }

This code works, but it is EXTREMELY slow (my current array of ranges
is averaging out to ~24000 range elements). Anyone have an idea of how
to speed it up?

I guess it’s slow because it’s doing many Range to Array conversions.
Here’s a version that does no such conversions:

# Assumption: for every range: range.end >= range.begin
def overlap(ranges)
  ranges = ranges.dup
  new_ranges = []
  until ranges.empty?
    cur = ranges.pop
    overlap = ranges.find { |range|
      cur.member?(range.begin) || cur.member?(range.end) ||
        range.member?(cur.begin) || range.member?(cur.end)
    }
    if overlap
      new_begin = cur.begin < overlap.begin ? cur.begin : 

overlap.begin
new_end = cur.end > overlap.end ? cur.end : overlap.end
join = Range.new(new_begin, new_end)
ranges.map! { |range| range.equal?(overlap) ? join : range }
else
new_ranges << cur
end
end
new_ranges
end

(And in case Gmail messes up the formating:
The domain name pastecode.com is for sale)

So I’ve tried it only with the one example you’ve given and
I didn’t actually benchmark it.

Stefan

On Wed, Aug 6, 2008 at 11:48 AM, Bryan R.
[email protected] wrote:

(1…11) (22…29)
Bit busy at work, so I don’t have time to play with code, but mostly
your algorithm is slow. This one should work better:

  1. sort the array of ranges by startpoint
  2. compare the first two ranges. if they overlap, merge them by
    setting the second range equal to the merged range and the first to
    nil
  3. repeat with the next pair, and so on
  4. compact the array

in action:

1…5, 5…8, 7…11, 22…29
nil, 1…8, 7…11, 22…29
nil, nil, 1…11, 22…29
no change
1…11, 22…29

Might be some corner cases I haven’t taken care of but the basic
algorithm is O(n) and involves no range conversions.

martin

On Aug 6, 2008, at 12:48 PM, Bryan R. wrote:

Given the ranges above, I want to end up with the following ranges:
changes = false
end


Thanks!
Bryan

Posted via http://www.ruby-forum.com/.

this should be quite fast with a few constraints. the main thing is
that you do not have to convert ranges to detect the overlap nor to do
the merge.

cfp:~ > cat a.rb
def collapse_inclusive_integer *ranges
ranges.flatten!
ranges.compact!

ranges.each do |r|
raise ArgumentError, “exclusive range #{ range.inspect }” if
r.exclude_end?
raise ArgumentError, “non-integer range #{ range.inspect }” unless
Integer === r.begin and Integer === r.end
end

overlaps = lambda do |i,j|
a, b = ranges[i], ranges[j]
a.begin <= b.end and b.begin <= a.end
end

merge = lambda do |i,j|
a, b = ranges[i], ranges[j]
values = a.begin, a.end, b.begin, b.end
min, max = values.min, values.max
range = min … max
src, dst = i < j ? [i,j] : [j,i]
ranges[src] = range
ranges.delete_at dst
range
end

loop {
catch(‘start over’){
size = ranges.size
size.times do |i|
size.times do |j|
next if i == j
if overlaps[i,j]
merge[i,j]
throw ‘start over’
end
end
end
return ranges
}
}
end

ranges = 1…5, 7…11, 22…29, 5…8

p ranges
p collapse_inclusive_integer(ranges)

cfp:~ > ruby a.rb
[1…5, 7…11, 22…29, 5…8]
[1…11, 22…29]

a @ http://codeforpeople.com/

Hi –

On Thu, 7 Aug 2008, Bryan R. wrote:

outages = Array.new
break
end
end
end
outages << range
end
return outages
end

I did something similar in trying to implement Martin’s algorithm. I
haven’t tested it beyond eyeballing the results for this one run:

ranges = [(1…5), (7…11), (22…29), (5…8)].sort_by {|r| r.first }
outages = [ranges.shift]

ranges.each do |r|
if outages[-1].include?(r.first)
outages[-1] = Range.new(outages[-1].first, r.last)
else
outages.push®
end
end

p outages

David

Thanks to all who replied! I took the main advice of all of your posts
to be “don’t convert ranges to arrays stupid!!!” and also “don’t do this
recursively stupid!”. :slight_smile:

With that in mind, here’s what I came up with (I didn’t want to just
copy someone’s code… I don’t learn anything that way!) – it seems to
be much faster:

def merge_outages
ranges = @failed.outages.sort { |a,b| a.first <=> b.first }
outages = Array.new
while !ranges.empty?
range = ranges.shift
loop do
if ranges.empty?
break
else
if (range.last + 1) >= ranges.first.first
range = (range.first…ranges.first.last)
ranges.shift
else
break
end
end
end
outages << range
end
return outages
end

What do you guys think of this approach? Am I overlooking something
that you guys suggested?


Thanks!
Bryan

2008/8/6 Martin DeMello [email protected]:

Given the ranges above, I want to end up with the following ranges:

  1. repeat with the next pair, and so on
    Might be some corner cases I haven’t taken care of but the basic
    algorithm is O(n) and involves no range conversions.

It’s O(n), but only if you write a radix sort to sort the ranges :wink:

Stefan

Hi David,

I like the conciseness of your code, but I think it only works for when
the end of one range is the same as the beginning of another. I need to
also handle the times when ranges overlap more than that (i.e. 1…5,
3…6 -> 1…6). Thanks for thinking about this!


Thanks!
Bryan

David A. Black wrote:

Hi –

On Thu, 7 Aug 2008, Bryan R. wrote:

outages = Array.new
break
end
end
end
outages << range
end
return outages
end

I did something similar in trying to implement Martin’s algorithm. I
haven’t tested it beyond eyeballing the results for this one run:

ranges = [(1…5), (7…11), (22…29), (5…8)].sort_by {|r| r.first }
outages = [ranges.shift]

ranges.each do |r|
if outages[-1].include?(r.first)
outages[-1] = Range.new(outages[-1].first, r.last)
else
outages.push®
end
end

p outages

David

On Aug 6, 2008, at 3:47 PM, Bryan R. wrote:

Am I overlooking something
that you guys suggested?

this_will_break_it = 0 … 42

a @ http://codeforpeople.com/

It actually looks like the latest code I posted doesn’t seem to work
either… I’m getting gaps where I shouldn’t be. :frowning:

On Wed, Aug 6, 2008 at 3:09 PM, Stefan L.
[email protected] wrote:

2008/8/6 Martin DeMello [email protected]:

Might be some corner cases I haven’t taken care of but the basic
algorithm is O(n) and involves no range conversions.

It’s O(n), but only if you write a radix sort to sort the ranges :wink:

Doh :slight_smile:

m.

On Aug 6, 2008, at 4:36 PM, Bryan R. wrote:

Can you elaborate at all?

cfp:~ > ruby -e’ p ( 0 … 1 ).to_a ’
[0]

cfp:~ > ruby -e’ p ( 0 … 1 ).to_a ’
[0, 1]

one range included the endpoint, one does not. with integer ranges
you could guess the prior end point, but not with floats

cfp:~ > ruby -e’ p ( 0 … 1.0 ).to_a ’
[0]

cfp:~ > ruby -e’ p ( 0 … 1.0 ).include?(0.999999999999999) ’
true

a @ http://codeforpeople.com/

Actually, never mind what I said about this code in a previous post… I
misunderstood it. I think it works perfectly… still testing!!! :slight_smile:

David A. Black wrote:

Hi –

I did something similar in trying to implement Martin’s algorithm. I
haven’t tested it beyond eyeballing the results for this one run:

ranges = [(1…5), (7…11), (22…29), (5…8)].sort_by {|r| r.first }
outages = [ranges.shift]

ranges.each do |r|
if outages[-1].include?(r.first)
outages[-1] = Range.new(outages[-1].first, r.last)
else
outages.push®
end
end

p outages

David

Hi Ara,

Can you elaborate at all?


Thanks!
Bryan

Hi –

On Thu, 7 Aug 2008, Bryan R. wrote:

Hi David,

I like the conciseness of your code, but I think it only works for when
the end of one range is the same as the beginning of another. I need to
also handle the times when ranges overlap more than that (i.e. 1…5,
3…6 -> 1…6). Thanks for thinking about this!

Here’s the heavy artillery :slight_smile:

def merge_ranges(ranges)
ranges = ranges.sort_by {|r| r.first }
*outages = ranges.shift
ranges.each do |r|
lastr = outages[-1]
if lastr.last >= r.first - 1
outages[-1] = lastr.first…[r.last, lastr.last].max
else
outages.push®
end
end
outages
end

require ‘test/unit’

class RangeTest < Test::Unit::TestCase
def test_one
ranges = [1…5, 20…20, 4…11, 40…45, 39…50]
assert_equal([1…11, 20…20, 39…50], merge_ranges(ranges))
end

def test_two
ranges = [1…5, 7…11, 22…29, 5…8]
assert_equal([1…11, 22…29], merge_ranges(ranges))
end

def test_three
ranges = [1…5, 7…11, 22…29, 6…8]
assert_equal([1…11, 22…29], merge_ranges(ranges))
end
end

David

Crazy! I had just discovered the r.last lastr.last max problem when
you posted this. Thanks! :slight_smile:

David A. Black wrote:

Here’s the heavy artillery :slight_smile:

def merge_ranges(ranges)
ranges = ranges.sort_by {|r| r.first }
*outages = ranges.shift
ranges.each do |r|
lastr = outages[-1]
if lastr.last >= r.first - 1
outages[-1] = lastr.first…[r.last, lastr.last].max
else
outages.push®
end
end
outages
end

On Aug 6, 1:48 pm, Bryan R. [email protected] wrote:

(1…11) (22…29)
Here is a module from our project earlier this year:

# Module Spans and the associated Array "span" method implements an
array of numeric
# intervals with "merge", "complement", "fit", and other convenience
methods
# Copyright (C) 2008  GETCO LLC
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# The full text of the license as of August 06, 2008, is available at
#    http://www.gnu.org/licenses/gpl-3.0.txt
#
class Array

  #  Module +Spans+ and the associated Array span method implements an
array numeric intervals - aka +spans+
  module Spans

    # Reduces the set of spans by combining the overlapping intervals
together to form larger intervals. Adjacent spans are not merged.
    # *IMPORTANT:* The return set is a set of *copies* of objects,
*not the references* to the original objects. The new object have some
of their boundaries changed, but it is random, which objects are
deleted, and which have new boundaries.
    #
    #  >> [ [5,5], [10,20], [3,4], [5,6], [15,25], [3,7] ].spans.merge
    #  => [[3, 7], [10, 25]]
    def merge
      result = deep_clone.position_sort!
      result.each_index do |idx|

        break if idx >= result.size - 1
        curr = result[ idx ]
        succ = result[ idx + 1 ]

        if overlap?( curr, succ )
          curr.send("#{@span_left}=",  [curr.send(@span_left),
succ.send(@span_left) ].min )
          curr.send("#{@span_right}=", [curr.send(@span_right),
succ.send(@span_right)].max )
          result.delete_at( idx + 1)
          redo
        end
      end
      return result.spans( @span_left, @span_right )
    end

    # Provides the spans' complement to fill the specified interval.
The complement spans is a simple array of new arrays, responding to
the +first+ and +last+ methods.
    #
    #  >> [ [5,5], [10,20], [3,4], [5,6], [15,25],
[3,7] ].spans.complement(-1,28)
    #  => [[-1, 3], [7, 10], [25, 28]]
    def complement( left, right )
      complemented = []
      merge.clean!.each do |span|
        break if right <= span.send(@span_left)
        next  if span.send(@span_right)  <  left
        complemented << [ left, span.send(@span_left) ] if left <
span.send(@span_left)
        left = span.send(@span_right)
      end
      complemented <> [ [5,5], [2,4], [3,7] ].spans.fit( 2 )
    #  => [2, 4]
    def fit( requested_size )
      # see in-code comment to fit_all
      find do |a|
        delta = a.send(@span_right) - a.send(@span_left)
        (delta = requested_size) or (delta > requested_size)
      end
    end

    # Finds all spans to fit the specified size. Returns a new, but
filtered, array of the original objects. The array alreasy has +Spans+
mixed in with the original boundary names.
    #
    #  >> [ [5,5], [2,4], [3,7] ].spans.fit_all( 2 )
    #  => [[2, 4], [3, 7]]
    def fit_all( requested_size )
      self.select do |a|
        ( requested_size <= (a.send(@span_right) -
a.send(@span_left)) )
      end.spans( @span_left, @span_right )
    end

    # Removes all empty and invalid spans from the the calling
instance
    def clean!() delete_if{ |s| s.send(@span_left) >=
s.send(@span_right) } end

    protected

    def position_sort()  sort  { |a,b| position( a, b ) } end
    def position_sort!() sort! { |a,b| position( a, b ) } end

    def position( a, b )
      if a.send(@span_left) < b.send(@span_left)
        a.send(@span_right) > b.send(@span_left)  ? 0 : -1 # -1 is
when "a" is at left of "b"
      else
         a.send(@span_left) < b.send(@span_right) ? 0 :  1 # 1 is when
"a" is at right of "b"
      end
    end

    def overlap?( a, b )
      position( a, b ) == 0
    end

    def deep_clone
        klone = self.clone
        klone.clear
        each { |v| klone <
span.send(@span_right) }
      return true
    end

    @span_left  = (  left_name || "first" )
    @span_right = ( right_name || "last"  )
    unless valid?
      remove_instance_variable(:@span_left)
      remove_instance_variable(:@span_right)
      raise ArgumentError, "Can not consider this array for spans -
invalid intervals."
    end
    self.extend Spans
  end
end