Detecting number ranges

I had to write a script this evening to take an unsorted input file of
the
form:

database 1 is on server 3
database 8 is on server 7

and output it in the form:

server 3 handles database 1 through 7
server 7 handles database 8 through 11

My brain was in procedural mode, and I wrote the following ugliness:


def output_range(start_db, end_db, server_n)
puts “server #{server_n} handles database #{start_db} through
#{end_db}”
end

open(“db_list.txt”).readlines.each do |line|
m = line.match(/^database (.) is on server (.)/)
db_n = m[1].to_i
server_n = m[2].to_i

db_servers[db_n] = server_n
end

range_start = 1

db_servers.each_index do |i|
next if i == 1
if db_servers[i] != db_servers[i - 1]
output_range(range_start, i - 1, db_servers[i - 1])
range_start = i
end
end

last_db = db_servers.size - 1
output_range(range_start, last_db, db_servers[last_db])

which in fact is not only horrid to look at, but has a bug (it doesn’t
like
it if the input skips over a database number, i.e. db 233 and 235 are
present but 234 is missing).

I feel like there’s a much nicer way to express this in Ruby, but can’t
think of what it might be…

I’m not sure why data like this:

 database 1 is on server 3
 database 8 is on server 3

should produce:

 server 3 handles databases 1 to 8

since server 3 only hands db’s 1 and 8.

Or, what to do if there is only one db for the server, say:

database 1 is on server 3

Should the output be:

server 3 handles database 1 to 1

Can there be duplicate lines like this:

database 1 is on server 3
database 8 is on server 7
database 1 is on server 3

Also, is the output supposed to be in ascending order by server?

In any case, here is the input I used:

database 8 is on server 7
database 10 is on server 7
database 10 is on server 7
database 5 is on server 9
database 1 is on server 3
database 2 is on server 3
database 133 is on server 3
database 4 is on server 144

and here is the output:

server 3 handles databases 1 to 133
server 7 handles databases 8 to 10
server 9 handles database 5
server 144 handles database 4

require “set”

def output_data(arr)
arr.each do |elmt|
min = elmt[1].min
max = elmt[1].max

if min == max
  puts "server #{elmt[0]} handles database #{min}"
else
  puts "server #{elmt[0]} handles databases #{min} to #{max}"
end

end
end

servers = Hash.new{|hash, key| hash[key] = Set.new}

File.open(“data.txt”) do |file|
file.each_line do |line|
nums = line.scan(/\d+/)
servers[nums[1].to_i].add(nums[0].to_i)
end
end

data = servers.sort
output_data(data)

On 9/26/07, Jay L. [email protected] wrote:

server 7 handles database 8 through 11

I feel like there’s a much nicer way to express this in Ruby, but can’t
think of what it might be…

Does this feel nicer?

$ cat db_ranges.rb && ruby db_ranges.rb

db_ranges.rb

26 September 2007

require ‘enumerator’

file =<<END
database 8 is on server 7
database 10 is on server 7
database 9 is on server 7
database 5 is on server 9
database 1 is on server 3
database 2 is on server 3
database 133 is on server 3
database 4 is on server 144
END

dbs = Hash.new {|h,k| h[k] = []}

file.each do |line|
m = line.match(/^database (.) is on server (.)/)
db_n = m[1].to_i
server_n = m[2].to_i
dbs[server_n] << db_n
end

dbs.each do |server, db|
result = [[db[0]]]
db.sort.each_cons(2) {|x, y| if y == x + 1 then result.last << y;
else result << [y]; end}
result.each {|x| puts “Server #{server} handles databases #{x.first}
to #{x.last}”}
end

Server 144 handles databases 4 to 4
Server 7 handles databases 8 to 10
Server 3 handles databases 1 to 2
Server 3 handles databases 133 to 133
Server 9 handles databases 5 to 5

Kind regards,

Jesus.

On 9/26/07, Jay L. [email protected] wrote:

server 7 handles database 8 through 11

Does this do it?

fp_arr = [“db 11 is on sv 3”,“db 9 is on sv 7”,“db 13 is on sv 3”,“db
12 is on sv 3”,“db 8 is on sv 7”,“db 18 is on sv 2”]

nums,info = [],{}
fp_arr.each {|x| nums << x.scan(/\d+/)}
sv = nums.map{|x| x[1]}.uniq
sv.each {|t| info[t] = nums.select{|x| x[1] == t}.sort.map {|a| a[0]}}
info.keys.sort.each {|j| print “server #{j} handles database
#{info[j][0]} through #{info[j][-1]}\n”}

Server 2 handles database 18 through 18
Server 3 handles database 11 through 13
Server 7 handles database 8 through 9

Harry

On Sep 26, 12:16 am, Jay L. [email protected] wrote:

server 7 handles database 8 through 11
m = line.match(/^database (.) is on server (.)/)
if db_servers[i] != db_servers[i - 1]
it if the input skips over a database number, i.e. db 233 and 235 are
present but 234 is missing).

I’m not in the mood for hash.

ary = DATA.readlines
ary.map{|s| s[/\d+$/] }.uniq.each{|server|
db = ary.grep(/ #{server}$/).map{|s|
s[/\d+/].to_i}.sort
puts “Server #{server} databases #{db[0]} to #{db[-1]}”
}

END
database 8 is on server 7
database 10 is on server 7
database 9 is on server 7
database 5 is on server 9
database 1 is on server 3
database 2 is on server 3
database 133 is on server 3
database 4 is on server 144

On Wed, 26 Sep 2007 17:18:58 +0900, Jesús Gabriel y Galán wrote:

db.sort.each_cons(2) {|x, y| if y == x + 1 then result.last << y;
else result << [y]; end}

Nice! I think this is the magic I was looking for.

On Wed, 26 Sep 2007 16:49:09 +0900, 7stud – wrote:

I’m not sure why data like this:

 database 1 is on server 3
 database 8 is on server 3

should produce:

 server 3 handles databases 1 to 8

since server 3 only hands db’s 1 and 8.

Sorry, I was being a lazy typist. (In my defense, I only said the input
was “in the form!”) Assume that, for the example I gave, the output is
correct and the input has more data than I showed.

Or, what to do if there is only one db for the server, say:

database 1 is on server 3

Should the output be:

server 3 handles database 1 to 1

Yep, exactly.

Can there be duplicate lines like this:

database 1 is on server 3
database 8 is on server 7
database 1 is on server 3

Nope, never. The actual input is akin to a list of symbolic NFS links,
something like:

/prod_dir/db1.file -> /mounts/server3/db1.file

So db1.file can only ever point to one place, and can only ever show up
in
the input once.

For that same reason, the input’s initially unsorted - it’s mostly
sorted, but alphanumerically (db1, db11, db12, … db199, db2, db21,
etc.)

Also, is the output supposed to be in ascending order by server?

Nope, not necessary. Ascending by DB number probably makes it easier to
manually edit or check, though.

and here is the output:

server 3 handles databases 1 to 133
server 7 handles databases 8 to 10
server 9 handles database 5
server 144 handles database 4

Hmm, looks like it assumes the whole range if there are missing
elements.
(I guess I didn’t specify that behavior, did I?) If a database isn’t
listed in the input, it shouldn’t be included in ranges in the output.

So the actual output from the above input should be (in any order):

server 3 handles database 1 to 2
server 3 handles databases 133 to 133
server 7 handles databases 8 to 8
server 7 handles databases 10 to 10
server 9 handles databases 5 to 5
server 144 handles databases 4 to 4

I -think- the “missing database” problem is endemic to the Set approach,
right?

Jay

On Wed, 26 Sep 2007 14:57:04 -0700, William J. wrote:

I’m not in the mood for hash.

ary = DATA.readlines
ary.map{|s| s[/\d+$/] }.uniq.each{|server|
db = ary.grep(/ #{server}$/).map{|s|
s[/\d+/].to_i}.sort
puts “Server #{server} databases #{db[0]} to #{db[-1]}”
}

Definitely wins the “shortest” prize! But, um, no output:

#=> Server databases to

Looks like .each is called only once, with |server| == nil.

Weirder, when I just run:

ary = open(“fake_data”).readlines
ary.map{|s| s[/\d+$/] }.inspect
return

I get:

#=> wjames.rb:4: unexpected return (LocalJumpError)

I would have thought that ary.map{|s| s[/d+$] } should return an array
itself. Although I’m not clear what that line does - doesn’t it look
for a
1+ digit number n, and return the nth character of the line?

On Wed, 26 Sep 2007 17:57:16 +0900, Harry K. wrote:

#server 2 handles database 18 through 18
#server 3 handles database 11 through 13
#server 7 handles database 8 through 9

Hmm, that fails if I add “db 20 is on sv 2”, which should retain “server
2
handles db 18 through 18” but also add “srver 2 handles db 20 through
20”.
I can’t quite follow what it’s doing but I’m going to have to study that
more; map is a powerful tool in general.

On Sep 27, 12:00 pm, Jay L. [email protected] wrote:

Definitely wins the “shortest” prize! But, um, no output:

#=> Server databases to

I just copied, pasted, and ran the code. Output:
Server 7 databases 8 to 10
Server 9 databases 5 to 5
Server 3 databases 1 to 133
Server 144 databases 4 to 4

Change
ary = DATA.readlines
to
ary = DATA.readlines ; p ary
and see what you get.
Your lack of output could be caused by an added space at the end
of each data line.

#=> wjames.rb:4: unexpected return (LocalJumpError)

I would have thought that ary.map{|s| s[/d+$] } should return an array
itself. Although I’m not clear what that line does - doesn’t it look for a
1+ digit number n, and return the nth character of the line?

irb(main):001:0> “is it foo bar again?”[ /.oo \w+/ ]
=> “foo bar”

s[ /\d+$/ ] returns a sequence of digits at the end of the string (or
just before
“\n” in the string).

Extra spaces won’t bother this version:

ary = DATA.readlines.map{|s| s.scan(/\d+/).map{|s| s.to_i}}
p ary
ary.map{|a| a[1] }.uniq.each{|server|
db = ary.select{|a| a[1]==server}.map{|a| a[0]}.sort
puts “Server #{server} databases #{db[0]} to #{db[-1]}”
}

END
database 8 is on server 7
database 10 is on server 7
database 9 is on server 7
database 5 is on server 9
database 1 is on server 3
database 2 is on server 3
database 133 is on server 3
database 4 is on server 144

2007/9/26, Jay L. [email protected]:

server 3 handles database 1 through 7

db_servers.each_index do |i|

Jay L. |
Boston, MA | My character doesn’t like it when they
Faster: jay at jay dot fm | cry or shout or hit.
http://www.jay.fm | - Kristoffer

Or use Set#divide:

require ‘set’

ss = Set[*DATA.readlines.map{ |l| l.scan( /\d+/).map{ |s|
s.to_i}.reverse}]

ss = ss.divide{ |i, j| i[0] == j[0] && (i[1] - j[1]).abs <= 1}

ss = ss.inject( {}){ |h, s| sa = s.to_a.sort; (h[sa.first[0]] ||= []) <<
(1
== sa.size ? sa.first[1] : sa.first[1]…sa.last[1]); h}

ss.keys.sort.each{ |k| puts “Server #{k} handles db #{ss[k].sort_by{ |d|
d.is_a?( Range) ? d.begin : d}.join( ', ')}”}

END
database 8 is on server 7
database 10 is on server 7
database 9 is on server 7
database 5 is on server 9
database 1 is on server 3
database 2 is on server 3
database 133 is on server 3
database 4 is on server 144

=>
Server 3 handles db 1…2, 133
Server 7 handles db 8…10
Server 9 handles db 5
Server 144 handles db 4

Regards,
R.

On 9/28/07, Jay L. [email protected] wrote:

#{info[j][0]} through #{info[j][-1]}\n"}

Server 2 handles database 18 through 18
Server 3 handles database 11 through 13
Server 7 handles database 8 through 9

Hmm, that fails if I add “db 20 is on sv 2”, which should retain “server 2
handles db 18 through 18” but also add “srver 2 handles db 20 through 20”.
I can’t quite follow what it’s doing but I’m going to have to study that
more; map is a powerful tool in general.

In that case, it outputs “server 2 handles database 18 to 20”.
I also thought that is what you wanted. But after seeing your further
explanation, I see that it is not.
Here is the same code with comments I probably should have included
the first time.

fp_arr = [“db 11 is on sv 3”,“db 9 is on sv 7”,“db 13 is on sv 3”,“db
12 is on sv 3”,“db 8 is on sv 7”,“db 18 is on sv 2”,“db 20 is on sv
2”]

nums,info = [],{}
fp_arr.each {|x| nums << x.scan(/\d+/)}
#p nums #[[“11”, “3”], [“9”, “7”], [“13”, “3”], [“12”, “3”]…]

sv = nums.map{|x| x[1]}.uniq #[“3”, “7”, “2”] server list
sv.each {|t| info[t] = nums.select{|x| x[1] == t}.sort.map {|a| a[0]}}
#p info #{“7”=>[“8”, “9”], “2”=>[“18”, “20”], “3”=>[“11”, “12”, “13”]}

info.keys.sort.each do |j|
print “server #{j} handles database #{info[j][0]} to #{info[j][-1]}\n”
end

Server 2 handles database 18 to 20
Server 3 handles database 11 to 13
Server 7 handles database 8 to 9

Harry

From: Jay L. [mailto:[email protected]]

On Wed, 26 Sep 2007 14:57:04 -0700, William J. wrote:

> ary = DATA.readlines

> ary.map{|s| s[/\d+$/] }.uniq.each{|server|

> db = ary.grep(/ #{server}$/).map{|s|

> s[/\d+/].to_i}.sort

> puts “Server #{server} databases #{db[0]} to #{db[-1]}”

> }

Definitely wins the “shortest” prize!

i was impressed by James and Raf’s solutions, so i combined the two,
together using 1.9’s group_by, so

C:\ruby1.9\bin>ruby test.rb
Server Databases
3 1 to 2, 133
7 8 to 10, 20, 30 to 33
9 5
144 3 to 4

C:\ruby1.9\bin>cat test.rb
require ‘set’

module Enumerable
def ranger
to_set.divide {|i,j| (i - j).abs == 1}.
map{|s| s.to_a.sort}.sort_by{|s| s.first}
end
end

ary = DATA.readlines.map{|s| s.scan(/\d+/).map{|s| s.to_i}}

puts “Server\tDatabases”
ary.group_by{|a| a.last}.sort.each do |server,pairs|
db=pairs.map{|a| a.first}.ranger.
map{|a| “#{a.first}#{” to #{a.last}" if a.size>1}“}.join(”, ")
puts “#{server}\t#{db}”
end

END
database 8 is on server 7
database 10 is on server 7
database 9 is on server 7
database 5 is on server 9
database 1 is on server 3
database 2 is on server 3
database 133 is on server 3
database 4 is on server 144
database 3 is on server 144
database 20 is on server 7
database 30 is on server 7
database 31 is on server 7
database 32 is on server 7
database 33 is on server 7
C:\ruby1.9\bin>

kind regards -botp

On Fri, 28 Sep 2007 07:13:39 +0900, Harry K. wrote:

Here is the same code with comments I probably should have included
the first time.

Thanks! For my own educational purposes (and any future newbies reading
this), I renamed a bunch of variables, extracted some temp variables and
came up with the following. It still has one bug (doesn’t handle
non-contiguous ranges like [1, 2, 133]), but I think I could fix that on
my
own…

db_server_pairs = []
server_dbs = {}

.scan will return an array of matches

DATA.readlines.each do |line|
db_server_pairs << line.scan(/\d+/).map{|num| num.to_i}
end

servers = db_server_pairs.map{|pair| pair[1]}.uniq
#=> […, 7, …]

servers.each do |server|
my_pairs = db_server_pairs.select{|pair| pair[1] == server}
#=> [[8, 7], [10, 7], …]

server_dbs[server] = my_pairs.sort.map {|pair| pair[0]}
#=> [8, 9, 10]
end

server_dbs.keys.sort.each do |server|
first = server_dbs[server][0]
last = server_dbs[server][-1]
print “server #{server} handles database #{first} to #{last}\n”
end

END
database 8 is on server 7
database 10 is on server 7
database 9 is on server 7
database 5 is on server 9
database 1 is on server 3
database 2 is on server 3
database 133 is on server 3
database 4 is on server 144

On Thu, 27 Sep 2007 10:27:28 -0700, William J. wrote:

Change
ary = DATA.readlines
to
ary = DATA.readlines ; p ary
and see what you get.
Your lack of output could be caused by an added space at the end
of each data line.

Ah, close - it’s a Windows carriage return which Cygwin doesn’t like.
That’s what happens when I use some Cygwin and some Windows apps
simultaneously. I was getting

[“database 8 is on server 7\r\n”,…\

And while “7\n”[/\d+$/] == “7”, “7\r\n”[/\d+$/] == nil, at least on
Cygwin.

Thanks.

On 30-sep-2007, at 23:40, Jay L. wrote:

Thanks! For my own educational purposes (and any future newbies
reading
this), I renamed a bunch of variables, extracted some temp
variables and
came up with the following. It still has one bug (doesn’t handle
non-contiguous ranges like [1, 2, 133]), but I think I could fix
that on my
own…

Well I used the following snippet once

module Enumerable
def to_ranges
compact.sort.uniq.inject([]) do | result, elem |
result = [elem…elem] if result.length.zero?
if [result[-1].end, result[-1].end.succ].include?(elem)
result[-1] = result[-1].begin…elem
else
result.push elem…elem
end
result
end
end

def member_includes?(item)
  find{|r| r.include?(item)} ? true : false
end

def implant(item)
 (map{|e| e.to_a }.flatten + Array(item)).to_ranges
end

def implant!(item)
  replace(implant(item))
end

end

You can basically make an array and the “collapse” it

On Fri, 28 Sep 2007 17:30:46 +0900, Peña, Botp wrote:

i was impressed by James and Raf’s solutions, so i combined the two, together using 1.9’s group_by, so

That’s beautiful! I love ranger. And there are a lot of other
techiques
I learned from here. Again, I’ve oversimplified it below for future
reference.

I’ve never seen “set.divide” before, and it’s the key to the whole
thing.
Question: Isn’t it possible to give set.divide a block that’ll give
inconsistent or crazy results? I can’t quite think of how, but it’s
assuming that the relationships are commutative…

It’s also really interesting that I was initially using the group_by
from
version 3726 of Rails:

def group_by
inject([]) do |groups, element|
value = yield(element)
if (last_group = groups.last) && last_group.first == value
last_group.last << element
else
groups << [value, [element]]
end
groups
end
end

which resulted in two separate “server 7” groups - yet something later
in
the chain healed that rift.

Finally, the function I broke out as “text_range” felt like an Escher
drawing at first…

def text_range(ary)
“#{ary.first}#{” to #{ary.last}" if ary.size>1}"
end

But I finally realized that the braces must take precedence over the
quotes, so the middle quotes are “inner” quotes, as if it read:

def text_range(ary)
s = “#{ary.first}”
s << " to #{ary.last}" if ary.size > 1
s
end

botp.rb

require ‘set’

Using server 7 as our example

module Enumerable

def ranger
s = to_set
#=> #<Set: {33, 30, 8, 31, 20, 9, 32, 10}>

consecutives = s.divide {|i,j| (i - j).abs == 1}
#=> #<Set: {#<Set: {20}>, #<Set: {8, 9, 10}>,
            #<Set: {33, 30, 31, 32}>}>

inner_sorted = consecutives.map{|group| group.to_a.sort}
# each group is sorted internally now
#=> [[30, 31, 32, 33], [20], [8, 9, 10]]

inner_sorted.sort_by{|s| s.first}
# now groups are in sorted order as well
#=> [[8, 9, 10], [20], [30, 31, 32, 33]]

end

def group_by
inject({}) do |groups, element|
(groups[yield(element)] ||= []) << element
groups
end
end if RUBY_VERSION < ‘1.9’

end

def text_range(ary)

Note the quoted text - and another #{} - inside

the #{… if ary.size>1} construct.

Who knew you could do such a thing?

“#{ary.first}#{” to #{ary.last}" if ary.size>1}"
end

db_server_pairs = DATA.readlines.map{|s| s.scan(/\d+/).map{|s| s.to_i}}
#=> [[8, 7], [10, 7], [9, 7], … ]

pairs_by_server = db_server_pairs.group_by{|pair| pair.last}
#=> [[7, [[8, 7], [10, 7], [9, 7], … ]], 3, … ]

puts “Server\tDatabases”
pairs_by_server.sort.each do |server, pairs|

server => 7

pairs => [[8, 7], [10, 7], [9, 7], …]

my_dbs = pairs.map {|pair| pair.first}
#=> [8, 10, 9, … ]

my_ranges = my_dbs.ranger
#=> [[8, 9, 10], [20], [30, 31, 32, 33]]

range_text = my_ranges.map{|r| text_range®}.join(", ")
puts “#{server}\t#{range_text}”
end

END
database 8 is on server 7
database 10 is on server 7
database 9 is on server 7
database 5 is on server 9
database 1 is on server 3
database 2 is on server 3
database 133 is on server 3
database 4 is on server 144
database 3 is on server 144
database 20 is on server 7
database 30 is on server 7
database 31 is on server 7
database 32 is on server 7
database 33 is on server 7