Ruby CSV unique row insert

addis_a · August 25, 2014, 2:42pm

Hi! I would like to ask if any body knows how to check in CSV file if
row is unique and not existing in the file before inserting it to this
file. I got a this code but it dose not work.

file = CSV.new(“result.csv”, headers: headers, write_headers: true,
return_headers: true)

CSV.open(“new_result_url.csv”, “wb”, headers: headers, write_headers:
true, return_headers: true) do |csv|
CSV.foreach(“result.csv”, headers: true, return_headers: false) do
|row|
if file.include?(row) == false
csv << row

end
end
end

Thank you very much for your help!

arnthur1981 · August 27, 2014, 8:06am

file = CSV.new(“result.csv”)

First of all, the line above won’t do what you might think it does. From
the ruby docs:

new(data, options = Hash.new)
This constructor will wrap either a String or IO object for
reading and/or writing.

The main problem with your code is that a call to CSV#include? (from
Enumerable) has got a side-effect: It advances the offset, ie proceeds
to read the file. Consider the following csv file:

in.csv:

1,“a”
2,“b”
1,“a”

Running the following code:

CSV.open(‘in.csv’) do |csv|
puts csv.include?([‘1’,‘a’])
puts csv.include?([‘1’,‘a’])
puts csv.include?([‘1’,‘a’])
end

It will produce

true
true
false

You could seek to the beginning of the file before each call to
#include?, and to the end of the file before each writing operation…
Or hash the lines that have been written already:

CSV.open(“out.csv”, “wb+”) do |csv_of|
hash = {}
CSV.foreach(“in.csv”) do |row|
row_hash = row.hash # or just row, but it takes more memory/time
unless hash[row_hash]
csv_of << row
end
hash[row_hash] = true
end
end

arnthur1981 · August 28, 2014, 11:56am

Thank you very much!