How to improve this code?

Hi guys,
I am new in the Ruby world, I am coming from Java, and I would like to
"think " in Ruby instead Java.

I did a code to read a CSV file (separated by comma), organize the
values and print the output.

Basically the CSV looks like this:
[email protected],value1
[email protected],value2
[email protected],value3
[email protected],value4
[email protected],value1
[email protected],value2

the output should be in two lines
[email protected],value1,value2,value3,value4
[email protected],value1,value

My initial thought was store the values into a Hash object, where the
KEY is the email (column a) and the value is an Array containing the
values (column b).
Going through all lines, test if the email address already exists in the
Hash, if so update the Array, otherwise create a new entry into the
Hash.

The code following below:

h = Hash.new
File.open(“Sector_brand.csv”).each_line do |lines|
values = lines.split(",")
email = values[0]
content = values[1]
if h.key?(email)
l = h[email]
l.push content
h[email] = l
else
l = [content]
h[email] = l
end
end

I didn’t put the code to print the Hash. Also I didn’t create the code
above in a class because it is just a test.

Well guys, does anyone could see my code and give comments? How the code
above could be improved?

Thanks in advice.

Junior

Jair Rillo J. wrote:

Hi guys,
I am new in the Ruby world, I am coming from Java, and I would like to
"think " in Ruby instead Java.

I did a code to read a CSV file (separated by comma), organize the
values and print the output.

Basically the CSV looks like this:
[email protected],value1
[email protected],value2
[email protected],value3
[email protected],value4
[email protected],value1
[email protected],value2

the output should be in two lines
[email protected],value1,value2,value3,value4
[email protected],value1,value

My initial thought was store the values into a Hash object, where the
KEY is the email (column a) and the value is an Array containing the
values (column b).
Going through all lines, test if the email address already exists in the
Hash, if so update the Array, otherwise create a new entry into the
Hash.

The code following below:

h = Hash.new
File.open(“Sector_brand.csv”).each_line do |lines|
values = lines.split(",")
email = values[0]
content = values[1]
if h.key?(email)
l = h[email]
l.push content
h[email] = l
else
l = [content]
h[email] = l
end
end

I didn’t put the code to print the Hash. Also I didn’t create the code
above in a class because it is just a test.

Well guys, does anyone could see my code and give comments? How the code
above could be improved?

Thanks in advice.

Junior

Try this:

h = Hash.new do |hash, key|
hash[key] = []
end

IO.foreach(‘data.txt’) do |line|
data = line.chomp.split(’,’)
h[data[0]] << data[1]
end

p h

–output:–
{“[email protected]”=>[“value1”, “value2”], “[email protected]”=>[“value1”,
“value2”, “value3”, “value4”]}

On Wed, 23 Jan 2008, Jair Rillo J. wrote:

Hi guys,
I am new in the Ruby world, I am coming from Java, and I would like to
"think " in Ruby instead Java.

I did a code to read a CSV file (separated by comma), organize the
values and print the output.

Probably the mostly useful first thought in Ruby is… “Nah, I bet
it’s in the standard library somewhere, better check ruby-doc.org

From standard library module ‘csv’

Open a CSV formatted file for reading or writing.

For reading.

EXAMPLE 1

CSV.open(‘csvfile.csv’, ‘r’) do |row|

p row

end

EXAMPLE 2

reader = CSV.open(‘csvfile.csv’, ‘r’)

row1 = reader.shift

row2 = reader.shift

if row2.empty?

p ‘row2 not find.’

end

reader.close

ARGS

filename: filename to parse.

col_sep: Column separator. ?, by default. If you want to

separate

fields with semicolon, give ?; here.

row_sep: Row separator. nil by default. nil means “\r\n or \n”.

If you

want to separate records with \r, give ?\r here.

RETURNS

reader instance. To get parse result, see CSV::Reader#each.

For writing.

EXAMPLE 1

CSV.open(‘csvfile.csv’, ‘w’) do |writer|

writer << [‘r1c1’, ‘r1c2’]

writer << [‘r2c1’, ‘r2c2’]

writer << [nil, nil]

end

EXAMPLE 2

writer = CSV.open(‘csvfile.csv’, ‘w’)

writer << [‘r1c1’, ‘r1c2’] << [‘r2c1’, ‘r2c2’] << [nil, nil]

writer.close

ARGS

filename: filename to generate.

col_sep: Column separator. ?, by default. If you want to

separate

fields with semicolon, give ?; here.

row_sep: Row separator. nil by default. nil means “\r\n or \n”.

If you

want to separate records with \r, give ?\r here.

RETURNS

writer instance. See CSV::Writer#<< and CSV::Writer#add_row to

know how

to generate CSV string.

My initial thought was store the values into a Hash object, where the
KEY is the email (column a) and the value is an Array containing the
values (column b).
Going through all lines, test if the email address already exists in the
Hash, if so update the Array, otherwise create a new entry into the
Hash.

My flavourite idiom is…
require ‘set’
h = Hash.new{|hash,key| hash[key] = Set.new}

then in the loop…
values = lines.split(“,”)
email = values.shift
h[email].merge(values)

Ooh… That’s just sooo pretty!

John C. Phone : (64)(3) 358 6639
Tait Electronics Fax : (64)(3) 359 4632
PO Box 1645 Christchurch Email : [email protected]
New Zealand

The others have shown you how to create a Hash with a block to provide
a default value. Another way to program this is to say:

File.open(“Sector_brand.csv”).each_line do |lines|
values = lines.split(",")
(h[values[0]] ||= []) << values[1]
end

…or the equivalent using one of the CSV libraries.

Clifford H…

On Jan 23, 2008 9:05 AM, Jair Rillo J. [email protected]
wrote:

My initial thought was store the values into a Hash object, where the

my initial thought was just to output them plainly :slight_smile:
my stupid example follows,

botp@pc4all:~$ cat test.rb
v0=nil
File.open(“test.txt”).each_line do |lines|
values = lines.chomp.split(“,”)
if v0 != values[0]
puts unless v0.nil?
v0 = values[0]
print v0
end
print “,”,values[1]
end

botp@pc4all:~$ ruby test.rb
[email protected],value1,value2,value3,value4
[email protected],value1,value2

On Jan 22, 7:05 pm, Jair Rillo J. [email protected] wrote:

[email protected],value3
[email protected],value4
[email protected],value1
[email protected],value2

the output should be in two lines
[email protected],value1,value2,value3,value4
[email protected],value1,value

awk -F, “{a[$1]=a[$1] FS $2} END{for(k in a)print k a[k]}” file

John C. wrote:

Probably the mostly useful first thought in Ruby is… “Nah, I bet
it’s in the standard library somewhere, better check ruby-doc.org

I disagree with that. In my experience if you can avoid the Ruby
Standard library, you will save yourself a lot of headaches because it
is so poorly documented, and your code will probably run faster as well.

From standard library module ‘csv’

In particular, the csv module is so inefficient, James G. wrote his
own module and called it fastercsv.

Hey guys, so many ways to do !!!

I didn’t know about csv library, as well as I didn’t know about the <<
operator.

Thank you very much guys!!

You can try this code

#!/usr/bin/env ruby

require “csv”

hash = Hash.new { |hash,key| hash[key] = [] }

CSV.open( “file.csv”, “r”, “,” ) do |row|
hash[row[0]] << row[1]
end

Good luck

Stephane

Jair Rillo J. wrote:

Hey guys, so many ways to do !!!

I didn’t know about csv library, as well as I didn’t know about the <<
operator.

Yeah its actually a method… Array#<<

it seems to be prefered over Array#push, although they both return the
array itself so you can string along a load of appending values
together…

foo = [1,2]
=> [1, 2]

foo << 3 << 4
=> [1, 2, 3, 4]

foo.push(5,6)
=> [1, 2, 3, 4, 5, 6]

foo.push(7).push(8)
=> [1, 2, 3, 4, 5, 6, 7, 8]

Regards,
Lee

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Jan 23, 2008, at 12:05 AM, 7stud – wrote:

From standard library module ‘csv’

In particular, the csv module is so inefficient, James G. wrote his
own module and called it fastercsv.

Most of the time, it’s better to look for standard libraries or at
least good third party libraries rather than re-inventing the wheel
though. Most of the time, when one re-invents the wheel, one gets it
wrong.

David M.
Maia Mailguard http://www.maiamailguard.com
[email protected]

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)

iD8DBQFHl3p3Uy30ODPkzl0RAudbAJ9O4nwdWYftZ+JYk8da7erHGaBv/QCaA/Ug
4ujI4f8GGvUD+Bk2emEsozI=
=qE3W
-----END PGP SIGNATURE-----

Hm, I think that we, Ruby programmers, like “<<” (it’s verbose and
less typing) above “push”, so:

h[email] << content


Rados³aw Bu³at

http://radarek.jogger.pl - mój blog

David M. wrote:

In particular, the csv module is so inefficient, James G. wrote his
own module and called it fastercsv.

Most of the time, it’s better to look for standard libraries or at
least good third party libraries rather than re-inventing the wheel
though. Most of the time, when one re-invents the wheel, one gets it
wrong.

Looking for a standard library module so that you can split a string on
a comma is a ridiculous waste of time. At some point, you actually
have to be able to program something.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Jan 23, 2008, at 1:21 PM, 7stud – wrote:

wrong.

Looking for a standard library module so that you can split a string
on
a comma is a ridiculous waste of time. At some point, you actually
have to be able to program something.

If you think parsing a CSV file is as simple as splitting on a comma,
you need to think again.

Look up RFC 4180. It’s not a hard format, but it is more than just
“foo,bar”.split(‘,’).

It’s enough code that I’d rather use an existing library than to waste
a ridiculous amount of time doing it (correctly) myself.

David M.
Maia Mailguard http://www.maiamailguard.com
[email protected]

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)

iD8DBQFHl5pEUy30ODPkzl0RAkPDAKCcpJjxQZfSjGIuPBvtY0AQg7nU7wCeJT9s
vP9SrJlvtHuAzElaQgTvZQQ=
=Qyq3
-----END PGP SIGNATURE-----

I used the standard CSV class and James’ fasterCSV with Ruby 1.8.x
James’ solution is much faster. If I know not wrong, I think fasterCSV
replaced the 1.8 class in Ruby 1.9
Just gem install fasterCSV then google for it. You’ll find a lot of
good explanations and the doc was mostly good enough for me.

On Jan 23, 2008, at 2:41 PM, Thomas W. wrote:

If I know not wrong, I think fasterCSV replaced the 1.8 class in
Ruby 1.9

Correct. In Ruby 1.9, when you require "csv" you are getting the
FasterCSV code under its new name.

James Edward G. II