FasterCSV - Merge CSV

caefer · July 2, 2010, 9:35am

I have 3 CSVs with the same content with say 10 rows. There is a slight
variation to 1 column in the data it contains.

csv1 - 20 lines 10 cols
csv2 - 52 lines 10 cols
csv3 - 24 lines 10 cols

How can I merge all 3 csvs into 1 csv using fastercsv so I have

csv4 96 lines 10 cols

Thanks!

Seed

caefer · July 2, 2010, 1:11pm

Christian Smith wrote:

I have 3 CSVs with the same content with say 10 rows. There is a slight
variation to 1 column in the data it contains.

csv1 - 20 lines 10 cols
csv2 - 52 lines 10 cols
csv3 - 24 lines 10 cols

How can I merge all 3 csvs into 1 csv using fastercsv so I have

csv4 96 lines 10 cols

Thanks!

Seed

Why use fastercsv?
cat csv1 csv2 csv3 >csv4
would meet your requirement.

But if you want to use fastercsv, then open each file in turn, read it
line at a time, and output the line you just read.

caefer · July 2, 2010, 3:55pm

On Jul 2, 2010, at 7:11 AM, Brian C. wrote:

csv4 96 lines 10 cols

Thanks!

Seed

Why use fastercsv?
cat csv1 csv2 csv3 >csv4
would meet your requirement.

except that you’d have headers from csv2 and csv3 (but perhaps your
line counts imply no headers?)

But if you want to use fastercsv, then open each file in turn, read it
line at a time, and output the line you just read.

If the files are small-ish, you can avoid a chicken-and-egg problem of
the headers by reading all the input files (saving the headers from
the first), then writing it all out from memory.

-Rob

Rob B.
[email protected] http://AgileConsultingLLC.com/
[email protected] http://GaslightSoftware.com/

caefer · July 2, 2010, 7:15pm

Rob B. wrote:

On Jul 2, 2010, at 7:11 AM, Brian C. wrote:

csv4 96 lines 10 cols

Thanks!

Seed

Why use fastercsv?
cat csv1 csv2 csv3 >csv4
would meet your requirement.

except that you’d have headers from csv2 and csv3 (but perhaps your
line counts imply no headers?)

But if you want to use fastercsv, then open each file in turn, read it
line at a time, and output the line you just read.

If the files are small-ish, you can avoid a chicken-and-egg problem of
the headers by reading all the input files (saving the headers from
the first), then writing it all out from memory.

-Rob

Rob B.
[email protected] http://AgileConsultingLLC.com/
[email protected] http://GaslightSoftware.com/

If the files are small-ish, you can avoid a chicken-and-egg problem of
the headers by reading all the input files (saving the headers from
the first), then writing it all out from memory.

The files aren’t smallish but memory isn’t an issue. I would love to be
able to do this. I am able to read the 3 files into an array but it’s
parsing them back into 1 csv I am having trouble with. I would assume
this would be a lot faster than a line read>write approach.

caefer · July 2, 2010, 7:32pm

On Sat, Jul 03, 2010 at 02:15:06AM +0900, Christian Smith wrote:

cat csv1 csv2 csv3 >csv4
If the files are small-ish, you can avoid a chicken-and-egg problem of
the headers by reading all the input files (saving the headers from
the first), then writing it all out from memory.

The files aren’t smallish but memory isn’t an issue. I would love to be
able to do this. I am able to read the 3 files into an array but it’s
parsing them back into 1 csv I am having trouble with. I would assume
this would be a lot faster than a line read>write approach.

–
Posted via http://www.ruby-forum.com/.

just cat and grep out the header lines

cat csv* |grep -v string-portion-unique-to-headers > full.csv

If you want to head a header row, then

cat csv* | grep string-portion-unique-to-headers |sort | uniq >
full.csv
cat csv* | grep -v string-portion-unique-to-headers >> full.csv

caefer · July 2, 2010, 9:28pm

On Jul 2, 2010, at 1:15 PM, Christian Smith wrote:

cat csv1 csv2 csv3 >csv4

If the files are small-ish, you can avoid a chicken-and-egg problem of
the headers by reading all the input files (saving the headers from
the first), then writing it all out from memory.

The files aren’t smallish but memory isn’t an issue. I would love to
be
able to do this. I am able to read the 3 files into an array but it’s
parsing them back into 1 csv I am having trouble with. I would assume
this would be a lot faster than a line read>write approach.

OK, let’s read them all in and then write out one file…

headers = nil
all_rows = []
input_files.each do |input_file|
csv = FasterCSV.table(input_file, :headers => true)
in_headers, *in_rows = csv.to_a
headers ||= in_headers
all_rows.concat(in_rows)
end
FasterCSV.open(output_file, ‘w’) do |csv|
csv << headers
all_rows.each {|row| csv << row }
end

The full example is at:
combined.csv · GitHub

The details may have to change a bit depending on your circumstances,
but the general idea is sound.

-Rob

Rob B.
[email protected] http://AgileConsultingLLC.com/
[email protected] http://GaslightSoftware.com/

caefer · July 2, 2010, 10:19pm

On Fri, Jul 2, 2010 at 3:27 PM, Rob B.
[email protected] wrote:

FasterCSV.open(output_file, ‘w’) do |csv|
csv << headers
all_rows.each {|row| csv << row }
end

FasterCSV - Merge CSV

But if you want to use fastercsv, then open each file in turn, read it line at a time, and output the line you just read.

But if you want to use fastercsv, then open each file in turn, read it line at a time, and output the line you just read.

But if you want to use fastercsv, then open each file in turn, read it
line at a time, and output the line you just read.

But if you want to use fastercsv, then open each file in turn, read it
line at a time, and output the line you just read.