Converting CSV

Hello all!
I’m trying to learn ruby and i’m using it for different tasks and I have
no
one to ask for help. So bear with me :slight_smile:

I receive CSV files that are separated by semicolon and no quotes.
Received format:

Heading1;Heading2;Heading3;
String1;String2;String3

I would like to make a script for reading all these files and converting
them to CSV files separated by commas and I would like all fields to be
quoted. If possible change encoding to UTF8 without BOM.

Desired format:

“Heading1”,“Heading2”,“Heading3”
“String1”,“String2”,“String3”

My first problem is how to parse and create a new csv.
I read that I could parse a whole csv to an array of arrays like this:

require ‘fastercsv’
array_of_arrays = FasterCSV.read(“myfile.csv”)

But how should use this array when creating a new file?
I would like to loop through the array.

Any suggestions?

Br
cristian

Hi,

Why do you even want to parse the CSV? I would simply replace the
semicolons with commas and quote the strings with a regex.

Jan E. wrote in post #1057239:

Hi,

Why do you even want to parse the CSV? I would simply replace the
semicolons with commas and quote the strings with a regex.

Ok? Sounds difficult. Could you give me an example?

Thanks.

Br
cristian

Well, parsing the file and processing its content is certainly more
difficult than doing search and replace.

File.open ‘myfile.csv’, ‘r+’ do |csv|
new_csv = csv.read.gsub(/[^\n\r;]+/, ‘"\0"’).gsub(’;’, ‘,’)
csv.print new_csv
end

This quotes everything between the semicolons and then replaces them
with commas.

Jan E. wrote in post #1057244:

Well, parsing the file and processing its content is certainly more
difficult than doing search and replace.

File.open ‘myfile.csv’, ‘r+’ do |csv|
new_csv = csv.read.gsub(/[^\n\r;]+/, ‘"\0"’).gsub(’;’, ‘,’)
csv.print new_csv
end

This quotes everything between the semicolons and then replaces them
with commas.

Ok, thanks! Will this save the new file with the name “new_csv”?

Br
cristian

Eric C. wrote in post #1057247:

Hi Christian:

The way you loop through the array, is with a Ruby iterator. The method
“each” will iterate through anything generally:

require ‘fastercsv’
array_of_arrays = FasterCSV.read(“myfile.csv”)

new_file = FasterCSV.new()
array_of_arrays.each do | item |
new_file.add_row(item)
end
File.open(“saved_file.csv”, “w”) { |f| new_file.dump }

It seems to me that there must be a better way to do what you’re trying
to accomplish. I don’t know why you’d want to mess-around with csv
files in the first place. I don’t understand why the comma delimited
csv file is better than the original. You could just use the original.

Also, if you’re just doing a simple conversion, I’d just use simple ruby
code instead of having to learn everything about the CSV stuff:

File.open(“original_file.txt”, “r”) do |f|
new_file = File.open(“new_file.csv”, “w”)
f.each_line do |line|
fields = line.split(";")
fields.each { |fd| fd = “”#{fd}""
new_file.write(fields.join(",")
end
new_file.close
end

Perhaps if you explain what you’re trying to do with these files, I
could give you better advise. Ruby has many tools to save objects like
YAML and JSON. Generally its better not to mess around with formatting
files yourself.

Hi!
The files are to be read by some system that cannot handle csv files
with semicolon. The problem is that the fields might contain commas
also. So the file should be comma separated with quoted fields.

I get these error messages with your last example:

CSVConverter.rb:7: syntax error, unexpected kEND, expecting ‘)’
CSVConverter.rb:9: syntax error, unexpected kEND, expecting ‘}’

Br
cristian

Hi Christian:

The way you loop through the array, is with a Ruby iterator. The method
“each” will iterate through anything generally:

require ‘fastercsv’
array_of_arrays = FasterCSV.read(“myfile.csv”)

new_file = FasterCSV.new()
array_of_arrays.each do | item |
new_file.add_row(item)
end
File.open(“saved_file.csv”, “w”) { |f| new_file.dump }

It seems to me that there must be a better way to do what you’re trying
to accomplish. I don’t know why you’d want to mess-around with csv
files in the first place. I don’t understand why the comma delimited
csv file is better than the original. You could just use the original.

Also, if you’re just doing a simple conversion, I’d just use simple ruby
code instead of having to learn everything about the CSV stuff:

File.open(“original_file.txt”, “r”) do |f|
new_file = File.open(“new_file.csv”, “w”)
f.each_line do |line|
fields = line.split(";")
fields.each { |fd| fd = “”#{fd}""
new_file.write(fields.join(",")
end
new_file.close
end

Perhaps if you explain what you’re trying to do with these files, I
could give you better advise. Ruby has many tools to save objects like
YAML and JSON. Generally its better not to mess around with formatting
files yourself.

cristian cristian wrote in post #1057250:

Ok, thanks! Will this save the new file with the name “new_csv”?

No, it will overwrite the original file. If you want to write the CSV to
a new file, change the code to

File.open ‘new.csv’, ‘w’ do |csv|
new_csv = File.read(‘old.csv’).gsub(/[^\n\r;]+/, ‘"\0"’).gsub(’;’,
‘,’)
csv.print new_csv
end

Robert K. wrote in post #1057261:

If the file is large this can easily break because you need to read
the whole thing into memory.

I don’t expect the CSVs to be that big. But sure, if we’re talking
about hundreds of millions of entries here, you’ll have to read the file
in small portions.

I’d also rather use the proper tool for
the job instead of cooking something with regexp.

Well, that’s probably a question of personal preferences. I don’t think
it’s necessary to load a complete library for every tiny task that comes
around.

I mean: If I want to do some simple matrix calculations for example, I
don’t really need a full 100 MB algebra library.

Jan E. wrote in post #1057256:

cristian cristian wrote in post #1057250:

Ok, thanks! Will this save the new file with the name “new_csv”?

No, it will overwrite the original file. If you want to write the CSV to
a new file, change the code to

File.open ‘new.csv’, ‘w’ do |csv|
new_csv = File.read(‘old.csv’).gsub(/[^\n\r;]+/, ‘"\0"’).gsub(’;’,
‘,’)
csv.print new_csv
end

Thanks! Works fine. I have to read a lot about regular expressions to
understand it.

I will play around a little now with an array of files in the directory.

Br
cristian

On Wed, Apr 18, 2012 at 10:30 PM, Jan E. [email protected] wrote:

end
If the file is large this can easily break because you need to read
the whole thing into memory. I’d also rather use the proper tool for
the job instead of cooking something with regexp. In this case I’d do

require ‘csv’

CSV.open(“new.csv”, “wb”, col_sep: “,”, force_quotes: true) do |csv_out|
CSV.foreach(“old.csv”, col_sep: “;”) do |rec|
csv_out << rec
end
end

Kind regards

robert

On Thu, Apr 19, 2012 at 12:19 AM, Jan E. [email protected] wrote:

Robert K. wrote in post #1057261:

I’d also rather use the proper tool for
the job instead of cooking something with regexp.

Well, that’s probably a question of personal preferences. I don’t think
it’s necessary to load a complete library for every tiny task that comes
around.

I mean: If I want to do some simple matrix calculations for example, I
don’t really need a full 100 MB algebra library.

Of course you can write everything yourself. For any other than
trivial applications it’s absurd though. Plus, even for the small
ones using a lib which exists vs. coding yourself is often quicker.
As always, it’s a matter of tradeoffs.

Btw, your code creates two copies of the input. You could reduce
memory requirements by using String#gsub! instead of String#gsub.

Kind regards

robert

cristian cristian wrote in post #1057358:

How can I use an array of file names from the directory to use it when
creating the new csv’s?

I was thinking something like this:

files = Dir.glob("*.csv")

files.each |filename|

file.open (filename+"_converted.csv",‘w’) do |csv|
[…]

You should strip the “.csv” extension from filename. Otherwise, you’ll
end up with names like “myfile.csv_converted.csv”.

For example:

“#{filename[0…-4]}_converted.csv”

Also it doesn’t really make sense to save the Dir#glob Enumerator in
files (unless you want to use it again).

Simple write it as one continuous expression:

Dir.glob("*.csv") do |file|

end

Thank you!
Both examples works great!

How can I use an array of file names from the directory to use it when
creating the new csv’s?

I was thinking something like this:

files = Dir.glob("*.csv")

files.each |filename|

file.open (filename+"_converted.csv",‘w’) do |csv|
new_csv = File.read(filname).gsub(/[^\n\r;]+/,’"\0"’).gsub(’;’,’,’)
csv.print new_csv
end
end

Br
cristian