Forum: Ruby Converting CSV

4e8d6556819733bd071089f37fb38f99?d=identicon&s=25 cristian cristian (currambero)
on 2012-04-18 21:15
Hello all!
I'm trying to learn ruby and i'm using it for different tasks and I have
no
one to ask for help. So bear with me :)

I receive CSV files that are separated by semicolon and no quotes.
Received format:

Heading1;Heading2;Heading3;
String1;String2;String3

I would like to make a script for reading all these files and converting
them to CSV files separated by commas and I would like all fields to be
quoted. If possible change encoding to UTF8 without BOM.

Desired format:

"Heading1","Heading2","Heading3"
"String1","String2","String3"

My first problem is how to parse and create a new csv.
I read that I could parse a whole csv to an array of arrays like this:

require 'fastercsv'
array_of_arrays = FasterCSV.read("myfile.csv")

But how should use this array when creating a new file?
I would like to loop through the array.

Any suggestions?

Br
cristian
F5a540b04b1f6430efe51d9f3361ef17?d=identicon&s=25 Jan E. (jacques1)
on 2012-04-18 21:29
Hi,

Why do you even want to parse the CSV? I would simply replace the
semicolons with commas and quote the strings with a regex.
4e8d6556819733bd071089f37fb38f99?d=identicon&s=25 cristian cristian (currambero)
on 2012-04-18 21:43
Jan E. wrote in post #1057239:
> Hi,
>
> Why do you even want to parse the CSV? I would simply replace the
> semicolons with commas and quote the strings with a regex.

Ok? Sounds difficult. Could you give me an example?

Thanks.

Br
cristian
F5a540b04b1f6430efe51d9f3361ef17?d=identicon&s=25 Jan E. (jacques1)
on 2012-04-18 22:02
Well, parsing the file and processing its content is certainly more
difficult than doing search and replace.

File.open 'myfile.csv', 'r+' do |csv|
  new_csv = csv.read.gsub(/[^\n\r;]+/, '"\0"').gsub(';', ',')
  csv.print new_csv
end

This quotes everything between the semicolons and then replaces them
with commas.
0d130f179f85401f248e6ebc2ef8292f?d=identicon&s=25 Eric C. (eric_c)
on 2012-04-18 22:10
Hi Christian:

The way you loop through the array, is with a Ruby iterator.  The method
"each" will iterate through anything generally:

require 'fastercsv'
array_of_arrays = FasterCSV.read("myfile.csv")

new_file = FasterCSV.new()
array_of_arrays.each do | item |
  new_file.add_row(item)
end
File.open("saved_file.csv", "w") { |f| new_file.dump }

It seems to me that there must be a better way to do what you're trying
to accomplish.  I don't know why you'd want to mess-around with csv
files in the first place.  I don't understand why the comma delimited
csv file is better than the original.  You could just use the original.

Also, if you're just doing a simple conversion, I'd just use simple ruby
code instead of having to learn everything about the CSV stuff:

File.open("original_file.txt", "r") do |f|
  new_file = File.open("new_file.csv", "w")
  f.each_line do |line|
    fields = line.split(";")
    fields.each { |fd| fd = "\"#{fd}\""
    new_file.write(fields.join(",")
  end
  new_file.close
end

Perhaps if you explain what you're trying to do with these files, I
could give you better advise.  Ruby has many tools to save objects like
YAML and JSON.  Generally its better not to mess around with formatting
files yourself.
4e8d6556819733bd071089f37fb38f99?d=identicon&s=25 cristian cristian (currambero)
on 2012-04-18 22:22
Jan E. wrote in post #1057244:
> Well, parsing the file and processing its content is certainly more
> difficult than doing search and replace.
>
> File.open 'myfile.csv', 'r+' do |csv|
>   new_csv = csv.read.gsub(/[^\n\r;]+/, '"\0"').gsub(';', ',')
>   csv.print new_csv
> end
>
> This quotes everything between the semicolons and then replaces them
> with commas.


Ok, thanks! Will this save the new file with the name "new_csv"?

Br
cristian
4e8d6556819733bd071089f37fb38f99?d=identicon&s=25 cristian cristian (currambero)
on 2012-04-18 22:26
Eric C. wrote in post #1057247:
> Hi Christian:
>
> The way you loop through the array, is with a Ruby iterator.  The method
> "each" will iterate through anything generally:
>
> require 'fastercsv'
> array_of_arrays = FasterCSV.read("myfile.csv")
>
> new_file = FasterCSV.new()
> array_of_arrays.each do | item |
>   new_file.add_row(item)
> end
> File.open("saved_file.csv", "w") { |f| new_file.dump }
>
> It seems to me that there must be a better way to do what you're trying
> to accomplish.  I don't know why you'd want to mess-around with csv
> files in the first place.  I don't understand why the comma delimited
> csv file is better than the original.  You could just use the original.
>
> Also, if you're just doing a simple conversion, I'd just use simple ruby
> code instead of having to learn everything about the CSV stuff:
>
> File.open("original_file.txt", "r") do |f|
>   new_file = File.open("new_file.csv", "w")
>   f.each_line do |line|
>     fields = line.split(";")
>     fields.each { |fd| fd = "\"#{fd}\""
>     new_file.write(fields.join(",")
>   end
>   new_file.close
> end
>
> Perhaps if you explain what you're trying to do with these files, I
> could give you better advise.  Ruby has many tools to save objects like
> YAML and JSON.  Generally its better not to mess around with formatting
> files yourself.

Hi!
The files are to be read by some system that cannot handle csv files
with semicolon. The problem is that the fields might contain commas
also. So the file should be comma separated with quoted fields.

I get these error messages with your last example:

CSVConverter.rb:7: syntax error, unexpected kEND, expecting ')'
CSVConverter.rb:9: syntax error, unexpected kEND, expecting '}'


Br
cristian
F5a540b04b1f6430efe51d9f3361ef17?d=identicon&s=25 Jan E. (jacques1)
on 2012-04-18 22:30
cristian cristian wrote in post #1057250:
> Ok, thanks! Will this save the new file with the name "new_csv"?

No, it will overwrite the original file. If you want to write the CSV to
a new file, change the code to

File.open 'new.csv', 'w' do |csv|
  new_csv = File.read('old.csv').gsub(/[^\n\r;]+/, '"\0"').gsub(';',
',')
  csv.print new_csv
end
4e8d6556819733bd071089f37fb38f99?d=identicon&s=25 cristian cristian (currambero)
on 2012-04-18 23:15
Jan E. wrote in post #1057256:
> cristian cristian wrote in post #1057250:
>> Ok, thanks! Will this save the new file with the name "new_csv"?
>
> No, it will overwrite the original file. If you want to write the CSV to
> a new file, change the code to
>
> File.open 'new.csv', 'w' do |csv|
>   new_csv = File.read('old.csv').gsub(/[^\n\r;]+/, '"\0"').gsub(';',
> ',')
>   csv.print new_csv
> end

Thanks! Works fine. I have to read a lot about regular expressions to
understand it.

I will play around a little now with an array of files in the directory.

Br
cristian
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (robert_k78)
on 2012-04-18 23:25
(Received via mailing list)
On Wed, Apr 18, 2012 at 10:30 PM, Jan E. <lists@ruby-forum.com> wrote:
> end
If the file is large this can easily break because you need to read
the whole thing into memory.  I'd also rather use the proper tool for
the job instead of cooking something with regexp.  In this case I'd do

require 'csv'

CSV.open("new.csv", "wb", col_sep: ",", force_quotes: true) do |csv_out|
  CSV.foreach("old.csv", col_sep: ";") do |rec|
    csv_out << rec
  end
end

Kind regards

robert
F5a540b04b1f6430efe51d9f3361ef17?d=identicon&s=25 Jan E. (jacques1)
on 2012-04-19 00:19
Robert Klemme wrote in post #1057261:
> If the file is large this can easily break because you need to read
> the whole thing into memory.

I don't expect the CSVs to be *that* big. But sure, if we're talking
about hundreds of millions of entries here, you'll have to read the file
in small portions.



> I'd also rather use the proper tool for
> the job instead of cooking something with regexp.

Well, that's probably a question of personal preferences. I don't think
it's necessary to load a complete library for every tiny task that comes
around.

I mean: If I want to do some simple matrix calculations for example, I
don't really need a full 100 MB algebra library.
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (robert_k78)
on 2012-04-19 10:49
(Received via mailing list)
On Thu, Apr 19, 2012 at 12:19 AM, Jan E. <lists@ruby-forum.com> wrote:
> Robert Klemme wrote in post #1057261:
>> I'd also rather use the proper tool for
>> the job instead of cooking something with regexp.
>
> Well, that's probably a question of personal preferences. I don't think
> it's necessary to load a complete library for every tiny task that comes
> around.
>
> I mean: If I want to do some simple matrix calculations for example, I
> don't really need a full 100 MB algebra library.

Of course you can write everything yourself.  For any other than
trivial applications it's absurd though.  Plus, even for the small
ones using a lib which exists vs. coding yourself is often quicker.
As always, it's a matter of tradeoffs.

Btw, your code creates two copies of the input.  You could reduce
memory requirements by using String#gsub! instead of String#gsub.

Kind regards

robert
4e8d6556819733bd071089f37fb38f99?d=identicon&s=25 cristian cristian (currambero)
on 2012-04-19 14:33
Thank you!
Both examples works great!

How can I use an array of file names from the directory to use it when
creating the new csv's?

I was thinking something like this:

files = Dir.glob("*.csv")

files.each |filename|

file.open (filename+"_converted.csv",'w') do |csv|
new_csv = File.read(filname).gsub(/[^\n\r;]+/,'"\0"').gsub(';',',')
    csv.print new_csv
  end
end

Br
cristian
F5a540b04b1f6430efe51d9f3361ef17?d=identicon&s=25 Jan E. (jacques1)
on 2012-04-19 15:04
cristian cristian wrote in post #1057358:
> How can I use an array of file names from the directory to use it when
> creating the new csv's?
>
> I was thinking something like this:
>
> files = Dir.glob("*.csv")
>
> files.each |filename|
>
> file.open (filename+"_converted.csv",'w') do |csv|
> [...]

You should strip the ".csv" extension from filename. Otherwise, you'll
end up with names like "myfile.csv_converted.csv".

For example:

"#{filename[0...-4]}_converted.csv"

Also it doesn't really make sense to save the Dir#glob Enumerator in
files (unless you want to use it again).

Simple write it as one continuous expression:

Dir.glob("*.csv") do |file|
  ...
end
Please log in before posting. Registration is free and takes only a minute.
Existing account

NEW: Do you have a Google/GoogleMail, Yahoo or Facebook account? No registration required!
Log in with Google account | Log in with Yahoo account | Log in with Facebook account
No account? Register here.