Manipulating CSV files over SSH


#1

All -

I have a nice little script that takes a large CSV document and splits
it into 65,000 line chunks which are readable in Excel. However, I
currently have a .csv file which I’d like to split that is sitting on a
unix box. I have ssh access to the box, however the file is quite big
and I’d like to avoid downloading the whole file. Is there a simple way
to modify my script such that I use the ssh library to access the
document and perform the splitting over the network?

Thanks in advance,
Drew


#2

On 2/3/07, Drew O. removed_email_address@domain.invalid wrote:

Thanks in advance,
Drew

The natural way would be to upload the script there and run it on he
remote host, provided that ruby is installed on the host.

If you don’t need CSV escaping, (embedded newlines etc.) then using
split(1) should be enough, i.e.

split -l 65000 filename prefix

for alphanum counter and

split -l 65000 -d filename prefix

for numerical one.


#3

On Sun, Feb 04, 2007 at 07:47:31AM +0900, Drew O. wrote:

Thanks in advance,
Drew

ssh user@host -x ‘cat large.csv’ | your_script
should work pretty well for this.


#4

Drew O. removed_email_address@domain.invalid wrote/schrieb
removed_email_address@domain.invalid:

I have a nice little script that takes a large CSV document and splits
it into 65,000 line chunks which are readable in Excel. However, I
currently have a .csv file which I’d like to split that is sitting on a
unix box.

Where shall the result of that splitting reside? Also on the same unix
box where the original CSV document resides?

Regards
Thomas


#5

On 2/3/07, Logan C. removed_email_address@domain.invalid wrote:

Thanks in advance,
Drew

ssh user@host -x ‘cat large.csv’ | your_script
should work pretty well for this.

That is essentially downloading the entire file, which he doesn’t want
to do.

Wow, hot topic. I was going to suggest what Jan S. did: The
easiest thing to do is to upload and run the ruby script on the unix
box.

If the script takes a while to run, you might want to check out
“screen” (but that is a bit off topic - ping me personally if you need
more info on this than you can google).

Cameron


#6

Drew O. wrote:

Yes, I’d like the split files to be on the unix box as well. Obviously,
the easy solution is to install rails but I can’t (client paranoid,
etc). I do have ksh or perl to work with, but I have a working ruby
script which i’d like to use. Any other ideas?

-Drew

Any ideas? If I can’t do this with ruby does anyone here have any ksh or
perl - foo to share?

-Drew


#7

Where shall the result of that splitting reside? Also on the same unix
box where the original CSV document resides?

Regards
Thomas

Yes, I’d like the split files to be on the unix box as well. Obviously,
the easy solution is to install rails but I can’t (client paranoid,
etc). I do have ksh or perl to work with, but I have a working ruby
script which i’d like to use. Any other ideas?

-Drew


#8

One obvious question is ³do you need to do this once an hour or 50 times
a
second?² If its infrequent and you are looking for simplicity you could
look
at sshfs, which exposes a client side file system to a remote server via
ssh.

On 2/4/07 1:49 PM, “Drew O.” removed_email_address@domain.invalid wrote:

-Drew


The information contained in and accompanying this communication is
strictly confidential and intended solely for the use of the intended
recipient(s).

If you have received it by mistake please let us know by reply and then
delete it from your system; you should not copy the message or disclose
its content to anyone.

MarketAxess reserves the right to monitor the content of emails sent to
or from its systems.

Any comments or statements made are not necessarily those of
MarketAxess. For more information, please visit www.marketaxess.com.
MarketAxess Europe Limited is regulated in the UK by the FSA, registered
in England no. 4017610, registered office at 71 Fenchurch Street,
London, EC3M 4BS. Telephone (020) 7709 3100.

MarketAxess Corporation is regulated in the USA by the SEC and the NASD,
incorporated in Delaware, executive offices at 140 Broadway, New York,
NY 10005. Telephone (1) 212 813 6000.


#9

On Sun, 4 Feb 2007, Drew O. wrote:

Where shall the result of that splitting reside? Also on the same unix
box where the original CSV document resides?

Regards
Thomas

Yes, I’d like the split files to be on the unix box as well. Obviously,
the easy solution is to install rails but I can’t (client paranoid,
etc). I do have ksh or perl to work with, but I have a working ruby
script which i’d like to use. Any other ideas?

installing rails would be a twenty ton sledgehammer approach.

ruby uses stdin if no script is provided. you need to send the script
to ruby
on stdin via ssh and let it process the local file, creating local
output.

here’s the code

harp:~ > cat a.rb
input = ARGV.shift
output = “#{ input }.out”

open(output, ‘w’) do |fd_out|
open(input) do |fd_in|
fd_in.each do |line|
fd_out.puts line.split(’,’).inspect
end
end
end

here is the remote file

harp:~ > ssh fortytwo.merseine.nu cat foo.csv
1,2,3
a,b,c

we spawn ruby on the remote host reading from stdin, giving ‘foo.csv’ as
an argument and the file’a.rb’ as the script to run

harp:~ > ssh fortytwo.merseine.nu ruby - foo.csv < a.rb

this works as expected: the output is created on the remote host

harp:~ > ssh fortytwo.merseine.nu cat foo.csv.out
[“1”, “2”, “3\n”]
[“a”, “b”, “c\n”]

hth.

-a