Partitioning with Set.divide

splattael · April 29, 2007, 2:07pm

Hello all,

I have been playing with partitioning a set recently and I am stuck with
an issue. The whole story is here:

http://www.rubyrailways.com/partitioning-sets-in-ruby/

A quick version for those who would not like to read the article:

Consider this input:

a 53 2 3
b 8 62 1 23
a 9 0 31
b 4 45 4 16 7
b 1 23
c 3 42 2 31 4 6
a 1 3 22
a 7 83 1 23 3
b 1 14 4 15 16 2
c 5 16 2 34

the goal is to create a partition based on the character in the first
column, i.e.:

<Set: <Set: {“a 9 0 31”, “a 7 83 1 23 3”, “a 53 2 3”, "a 1 3 22 "}>,
<Set: {"b 1 23 ", “b 1 14 4 15 16 2”, “b 8 62 1 23”, “b 4 45 4 16 7”}>,
<Set: {“c 5 16 2 34”, “c 3 42 2 31 4 6”}>}>

Which is exactly what Set.divide does. However, there is one problem: I
would like to know if there are duplicate lines. I.e. divide returns the
same result, no matter that the input is this:

c 5 16 2 34
c 5 16 2 34
c 5 16 2 34

or this:

c 5 16 2 34

What I would need is a modified divide which returns also the count of
the elements in the input set (at least for those elements which are
more than once in the set). Is this doable or do I have to roll some
code to do this for me additionally?

Cheers,
Peter

__
http://www.rubyrailways.com :: Ruby and Web2.0 blog
http://scrubyt.org :: Ruby web scraping framework
http://rubykitchensink.ca/ :: The indexed archive of all things Ruby

splattael · April 29, 2007, 2:57pm

Well, you could first count how many repetitions there are of each
line and then partition the set of pairs [line, count].

e.g.

h = Hash.new {0} # is this how I set a default value?

STDIN.each_line {|l| h[l] += 1}

partitioning = Set.new(h.to_a).divide{|a| a[0][0]}

Aur

splattael · April 29, 2007, 3:02pm

On 4/29/07, SonOfLilit [email protected] wrote:

A quick version for those who would not like to read the article:
a 7 83 1 23 3
Which is exactly what Set.divide does. However, there is one problem: I

http://scrubyt.org :: Ruby web scraping framework
http://rubykitchensink.ca/ :: The indexed archive of all things Ruby

Even better:
h = Hash.new {0} # is this how I set a default value?
STDIN.each_line {|l| h[l] += 1}
partitioning = h.to_set.divide{|a| a[0][0]} # changed to .to_set,
which is great. might need to be .to_set{|k,v| k,v}

splattael · April 29, 2007, 3:42pm

Sorry for the triple post, but I’ve read the article and propose a
completely different approach to the problem you present (also in a
comment there with typos):

h = {}
open(‘input.txt’).each_line{|l| h[l[0…0]] += l[2…-1].split(’
').inject(0) {|c,x| c+=x.to_i; c}}}
p h.map{|k,v| {k => v}} # to turn {“a” => 80, “b” => 60} into [{“a” =>
80}, {b => 60}]

which is a pretty weird data structure IMHO, but I’ll play by your

rules
Let’s look at your code:
#############
input = Set.new open(‘input.txt’).readlines.map{|e| e.chomp}
groups = input.divide {|x,y| x.map[0][0] == y.map[0][0] } # what’s map
for?
#build the array of hashes
p groups.map.inject([]) {|a,g| # what’s map for?
#build the hashes for the number sequences with same letters
a << g.map.inject(Hash.new(0)) {|h,v|
#for every sequence, sum the numbers it contains
h[v[0…0]] += v[2…-1].split(’ ').inject(0) {|c,x|
c+=x.to_i; c}; h
}; a
}
#############
divide() seems redundant in your code. You both 1) divide(); and 2)
implement divide on your own; in the same code. You do four passes on
the lines (and maybe more in the map() calls, I can’t figure those
out), in one of them passing twice on each line’s content, forcing the
data to stay all in memory the whole time.

My code, which isn’t debugged but shows my idea, passes once on the
lines, and in that pass twice on each line’s contents. It doesn’t keep
the data in memory.

BTW I couldn’t understand the use of map() without a block. Mind
explaining?

Aur S.

splattael · April 29, 2007, 3:44pm

Ah, sorry. tryruby.hobix.com helped me find that hash.map() is
identical to hash.to_a

splattael · April 29, 2007, 3:55pm

SonOfLilit wrote:

Ah, sorry. tryruby.hobix.com helped me find that hash.map() is
identical to hash.to_a

Well the problem was that I just came across Set.divide and wanted to
demonstrate it on an example - so I dug out an older problem of mine
where I thought I can show it off - you have seen the result.

I think this was a typical manifestation of the ‘If you have a hammer,
everything looks like a nail’ problem.

I guess if my goal would have been solving the task at hand, rather than
putting ‘divide’ into action, I would come up with a solution similar to
yours…

Thanks for the suggestion!

Cheers,
Peter
__
http://www.rubyrailways.com :: Ruby and Web2.0 blog
http://scrubyt.org :: Ruby web scraping framework
http://rubykitchensink.ca/ :: The indexed archive of all things Ruby

splattael · April 30, 2007, 11:55am

On 29.04.2007 14:04, Peter S. wrote:

more than once in the set). Is this doable or do I have to roll some
code to do this for me additionally?

Basically you need bags. Since a quick check does not reveal any, you
can roll your own pretty easily with a Hash with default value 0. This
is what I’d do: (see script at end). Of course you could save another
line by inlining “key”.

Kind regards

robert

8<-----------------

#!/usr/bin/ruby

require ‘pp’

parts = Hash.new {|h,k| h[k] = Hash.new(0)}

DATA.each do |line|
line.chomp!
key = line[/^\w+/]
parts[key][line] += 1
end

pp parts

END
a 53 2 3
b 8 62 1 23
a 9 0 31
b 4 45 4 16 7
b 1 23
c 3 42 2 31 4 6
a 1 3 22
a 7 83 1 23 3
b 1 14 4 15 16 2
c 5 16 2 34
c 5 16 2 34