Implement "paste" using ruby


#1

hi, all

I have a programming question: in the *NIX world, there is a small
utility named “paste”, that can combine several files together by
columns. For example:

file “x.dat”'s content is:
1
2
3

file “y.dat”'s content is:
a
b
c

then “paste x.dat y.dat > z.dat” will generate z.dat as:
1 a
2 b
3 c

If I want to do it in Ruby, and number of files is a variable, and
each file itself can be potentially huge … what would be the most
cost efficient way of implementing this?

Thanks in advance.

Oliver


#2

Oliver wrote:

If I want to do it in Ruby, and number of files is a variable, and
each file itself can be potentially huge … what would be the most
cost efficient way of implementing this?

Assuming the number of files to paste together is reasonable (say under
1000), then I’d simply open all the files up front:

files = ARGV.map { |fname| File.open(fname) }

and then inside a loop use ‘gets’ to pick one line from each, and output
those values together.

HTH,

Brian.


#3

On Mar 3, 2009, at 20:28 , Oliver wrote:

If I want to do it in Ruby, and number of files is a variable, and
each file itself can be potentially huge … what would be the most
cost efficient way of implementing this?

The most cost efficient way is by not reinventing the wheel:

system “paste”, *ARGV


#4

On Tue, Mar 3, 2009 at 11:28 PM, Oliver removed_email_address@domain.invalid wrote:


3 c

#!/usr/bin/env ruby -wKU

files = ARGV.map { |fname| File.open(fname) }

while (lines = files.map {|file| file.gets}).any? {|line| line}
puts lines.map {|line| line.to_s.chomp}.join("\t")
end


Rick DeNatale

Blog: http://talklikeaduck.denhaven2.com/
Twitter: http://twitter.com/RickDeNatale
WWR: http://www.workingwithrails.com/person/9021-rick-denatale
LinkedIn: http://www.linkedin.com/in/rickdenatale


#5

On Mar 4, 2009, at 08:22 , Yossef M. wrote:

Either way, this answer is not nearly as helpful as you seem to think
it is.

I disagree. The OP emphasized the most cost efficient way. In terms
of both CPU time and developer time efficiency, my method reigns
supreme. I learned this valuable lesson while at Gemstone. They used
unix sort for certain types of sorts (esp for large data) because you
can’t beat a 20+ year old tool, so why reinvent the wheel?


#6

On Mar 4, 4:12 am, Ryan D. removed_email_address@domain.invalid wrote:

The most cost efficient way is by not reinventing the wheel:

system “paste”, *ARGV

To me, the reasoning behind the question seemed to be what to do if
the sytem has Ruby installed, but not the ‘paste’ utility. Or maybe
it’s simply an exercise in handling memory usage, speed, and overall
efficiency.

Either way, this answer is not nearly as helpful as you seem to think
it is.

But that’s just what I think. Maybe Oliver has different ideas.