Forum: Ruby histogram of histograms

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Charles L. Snyder (Guest)
on 2007-02-07 22:55
(Received via mailing list)
Hi

I have several text files that look like this:

Brazil,  10
Brazil,  13
Brazil,  9
Bulgaria, 1
Canada, 48
Canada, 52
Canada, 38
Canada, 55
Canada, 59
Chile, 1
Chile, 1
Chile, 2
China, 7
China, 18
China, 19
China, 22
China, 25

I need to iterate through the above file(s) and get the data
summarized in the form:

Canada, 252
China, 91
Chile, 4
Brazil, 32
Bulgaria, 1

I know how to go from a single column list with multiple repeated
values to a 'histogram' type list, ie:

my_hash = countries.inject(Hash.new { 0 }) { |counts, key| counts[key]
+= 1; counts}
my_hash = my_hash.sort { |a,b| a[1] <=> b[1] }

but I'm unable to figure out how to get the 2-column csv values into a
total by country as shown above.
(I do have another file "countries.txt" which is a unique list of
countries.)

Thanks in advance!

CLS
Martin DeMello (Guest)
on 2007-02-07 23:13
(Received via mailing list)
On 2/8/07, Charles L. Snyder <removed_email_address@domain.invalid> wrote:
>
> I need to iterate through the above file(s) and get the data
> summarized in the form:
>
> Canada, 252
> China, 91
> Chile, 4
> Brazil, 32
> Bulgaria, 1

#------------------------------------------------------------------
countries = <<HERE
Brazil, 10
Brazil, 13
Brazil, 9
Bulgaria, 1
Canada, 48
Canada, 52
Canada, 38
Canada, 55
Canada, 59
Chile, 1
Chile, 1
Chile, 2
China, 7
China, 18
China, 19
China, 22
China, 25
HERE

totals = Hash.new {|h, k| h[k] = 0}

countries.each_line {|line|
  country, n = line.split(/,\s*/)
  totals[country] += n.to_i
}

totals.keys.sort_by {|i| -totals[i]}.each {|c|
  puts "#{c}, #{totals[c]}"
}

#------------------------------------------------------------------

martin
Robert K. (Guest)
on 2007-02-07 23:20
(Received via mailing list)
On 07.02.2007 21:53, Charles L. Snyder wrote:
> Canada, 59
> summarized in the form:
>
> Canada, 252
> China, 91
> Chile, 4
> Brazil, 32
> Bulgaria, 1

I would do that in stream mode, i.e. not first read all and then
summarize but directly summarize (see attached).  Reason is, that this
is more efficient especially since these files look like they could be
large.

> I know how to go from a single column list with multiple repeated
> values to a 'histogram' type list, ie:
>
> my_hash = countries.inject(Hash.new { 0 }) { |counts, key| counts[key]
> += 1; counts}

I don't know why you do this.  Do you also need the number of
occurrences?

> my_hash = my_hash.sort { |a,b| a[1] <=> b[1] }
>
> but I'm unable to figure out how to get the 2-column csv values into a
> total by country as shown above.
> (I do have another file "countries.txt" which is a unique list of
> countries.)

You don't need the second file unless you want to report zero counts for
countries not present.

Kind regards

  robert
unknown (Guest)
on 2007-02-07 23:30
(Received via mailing list)
Hi --

On Thu, 8 Feb 2007, Charles L. Snyder wrote:

> Canada, 38
>
> values to a 'histogram' type list, ie:
>
> my_hash = countries.inject(Hash.new { 0 }) { |counts, key| counts[key]
> += 1; counts}
> my_hash = my_hash.sort { |a,b| a[1] <=> b[1] }

my_hash will actually become an array at that point :-)

> but I'm unable to figure out how to get the 2-column csv values into a
> total by country as shown above.
> (I do have another file "countries.txt" which is a unique list of
> countries.)

Here's one way:

require 'scanf'

hash = Hash.new {0}
DATA.scanf("%s%d") {|key,count| hash[key] += count }

hash.sort.each {|k,v| puts "#{k} #{v}" }

__END__
Brazil, 10
Brazil, 13
Brazil, 9
Bulgaria, 1
etc.

That has the slight ugliness of including the comma in the key.  You
could do:

   hash[key.chomp(",")] += count

to avoid that, and then add the comma to the printout if you want it
back.


David
William J. (Guest)
on 2007-02-08 00:10
(Received via mailing list)
On Feb 7, 2:53 pm, "Charles L. Snyder" <removed_email_address@domain.invalid> 
wrote:
> Canada, 38
>
> values to a 'histogram' type list, ie:
> Thanks in advance!
>
> CLS


hash = Hash.new(0)
"\
Brazil, 10
Brazil, 13
Brazil, 9
Bulgaria, 1
Canada, 48
Canada, 52
Canada, 38
Canada, 55
Canada, 59
Chile, 1
Chile, 1
Chile, 2
China, 7
China, 18
China, 19
China, 22
China, 25".each{|s| s.split(',').inject{|k,v| hash[k] += v.to_i }}
p hash
This topic is locked and can not be replied to.