Hi! I would like to ask if any body knows how to check in CSV file if
row is unique and not existing in the file before inserting it to this
file. I got a this code but it dose not work.
file = CSV.new(“result.csv”, headers: headers, write_headers: true,
return_headers: true)
CSV.open(“new_result_url.csv”, “wb”, headers: headers, write_headers:
true, return_headers: true) do |csv|
CSV.foreach(“result.csv”, headers: true, return_headers: false) do
|row|
if file.include?(row) == false
csv << row
end
end
end
Thank you very much for your help!
file = CSV.new(“result.csv”)
First of all, the line above won’t do what you might think it does. From
the ruby docs:
new(data, options = Hash.new)
This constructor will wrap either a String or IO object for
reading and/or writing.
The main problem with your code is that a call to CSV#include? (from
Enumerable) has got a side-effect: It advances the offset, ie proceeds
to read the file. Consider the following csv file:
in.csv:
1,“a”
2,“b”
1,“a”
Running the following code:
CSV.open(‘in.csv’) do |csv|
puts csv.include?([‘1’,‘a’])
puts csv.include?([‘1’,‘a’])
puts csv.include?([‘1’,‘a’])
end
It will produce
true
true
false
You could seek to the beginning of the file before each call to
#include?, and to the end of the file before each writing operation…
Or hash the lines that have been written already:
CSV.open(“out.csv”, “wb+”) do |csv_of|
hash = {}
CSV.foreach(“in.csv”) do |row|
row_hash = row.hash # or just row, but it takes more memory/time
unless hash[row_hash]
csv_of << row
end
hash[row_hash] = true
end
end