Forum: Ruby Manipulating CSV files over SSH

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
94cc3e46cfc5bc361e409e2e884ecfa4?d=identicon&s=25 Drew Olson (dfg59)
on 2007-02-03 23:47
All -

I have a nice little script that takes a large CSV document and splits
it into 65,000 line chunks which are readable in Excel. However, I
currently have a .csv file which I'd like to split that is sitting on a
unix box. I have ssh access to the box, however the file is quite big
and I'd like to avoid downloading the whole file. Is there a simple way
to modify my script such that I use the ssh library to access the
document and perform the splitting over the network?

Thanks in advance,
Drew
97550977337c9f0a0e1a9553e55bfaa0?d=identicon&s=25 Jan Svitok (Guest)
on 2007-02-04 00:19
(Received via mailing list)
On 2/3/07, Drew Olson <olsonas@gmail.com> wrote:
> Thanks in advance,
> Drew

The natural way would be to upload the script there and run it on he
remote host, provided that ruby is installed on the host.

If you don't need CSV escaping, (embedded newlines etc.) then using
split(1) should be enough, i.e.

split -l 65000 filename prefix

for alphanum counter and

split -l 65000 -d filename prefix

for numerical one.
E34b5cae57e0dd170114dba444e37852?d=identicon&s=25 Logan Capaldo (Guest)
on 2007-02-04 00:19
(Received via mailing list)
On Sun, Feb 04, 2007 at 07:47:31AM +0900, Drew Olson wrote:
> Thanks in advance,
> Drew
>
ssh user@host -x 'cat large.csv' | your_script
should work pretty well for this.
A4a4095ff08bd0fced3c3fddbeac743a?d=identicon&s=25 Cameron McBride (Guest)
on 2007-02-04 00:23
(Received via mailing list)
On 2/3/07, Logan Capaldo <logancapaldo@gmail.com> wrote:
> >
> > Thanks in advance,
> > Drew
> >
> ssh user@host -x 'cat large.csv' | your_script
> should work pretty well for this.

That is essentially downloading the entire file, which he doesn't want
to do.

Wow, hot topic.  I was going to suggest what Jan Svitok did:  The
easiest thing to do is to upload and run the ruby script on the unix
box.

If the script takes a while to run, you might want to check out
"screen" (but that is a bit off topic - ping me personally if you need
more info on this than you can google).

Cameron
8029153bbcbda4a6844440c93e0c6422?d=identicon&s=25 Thomas Hafner (Guest)
on 2007-02-04 01:05
(Received via mailing list)
Drew Olson <olsonas@gmail.com> wrote/schrieb
<9914f65ae092f7af0f024f289c0282b2@ruby-forum.com>:

> I have a nice little script that takes a large CSV document and splits
> it into 65,000 line chunks which are readable in Excel. However, I
> currently have a .csv file which I'd like to split that is sitting on a
> unix box.

Where shall the result of that splitting reside? Also on the same unix
box where the original CSV document resides?

Regards
  Thomas
94cc3e46cfc5bc361e409e2e884ecfa4?d=identicon&s=25 Drew Olson (dfg59)
on 2007-02-04 01:59
> Where shall the result of that splitting reside? Also on the same unix
> box where the original CSV document resides?
>
> Regards
>   Thomas

Yes, I'd like the split files to be on the unix box as well. Obviously,
the easy solution is to install rails but I can't (client paranoid,
etc). I do have ksh or perl to work with, but I have a working ruby
script which i'd like to use. Any other ideas?

-Drew
94cc3e46cfc5bc361e409e2e884ecfa4?d=identicon&s=25 Drew Olson (dfg59)
on 2007-02-04 19:49
Drew Olson wrote:
> Yes, I'd like the split files to be on the unix box as well. Obviously,
> the easy solution is to install rails but I can't (client paranoid,
> etc). I do have ksh or perl to work with, but I have a working ruby
> script which i'd like to use. Any other ideas?
>
> -Drew

Any ideas? If I can't do this with ruby does anyone here have any ksh or
perl - foo to share?

-Drew
Cb48ca5059faf7409a5ab3745a964696?d=identicon&s=25 unknown (Guest)
on 2007-02-04 20:03
(Received via mailing list)
On Sun, 4 Feb 2007, Drew Olson wrote:

>> Where shall the result of that splitting reside? Also on the same unix
>> box where the original CSV document resides?
>>
>> Regards
>>   Thomas
>
> Yes, I'd like the split files to be on the unix box as well. Obviously,
> the easy solution is to install rails but I can't (client paranoid,
> etc). I do have ksh or perl to work with, but I have a working ruby
> script which i'd like to use. Any other ideas?

installing rails would be a twenty ton sledgehammer approach.

ruby uses stdin if no script is provided.  you need to send the script
to ruby
on stdin via ssh and let it process the local file, creating local
output.

here's the code

   harp:~ > cat a.rb
   input = ARGV.shift
   output = "#{ input }.out"

   open(output, 'w') do |fd_out|
     open(input) do |fd_in|
       fd_in.each do |line|
         fd_out.puts line.split(',').inspect
       end
     end
   end

here is the remote file

   harp:~ > ssh fortytwo.merseine.nu cat foo.csv
   1,2,3
   a,b,c

we spawn ruby on the remote host reading from stdin, giving 'foo.csv' as
an argument and the file'a.rb' as the script to run

   harp:~ > ssh fortytwo.merseine.nu ruby - foo.csv  <  a.rb

this works as expected: the output is created on the remote host

   harp:~ > ssh fortytwo.merseine.nu cat foo.csv.out
   ["1", "2", "3\n"]
   ["a", "b", "c\n"]


hth.


-a
3fbe8928f4cf14dc9a308140ba8f98b1?d=identicon&s=25 Peter Booth (Guest)
on 2007-02-05 17:15
(Received via mailing list)
One obvious question is ³do you need to do this once an hour or 50 times
a
second?² If its infrequent and you are looking for simplicity you could
look
at sshfs, which exposes a client side file system to a remote server via
ssh.


On 2/4/07 1:49 PM, "Drew Olson" <olsonas@gmail.com> wrote:

>
> -Drew


----------------------------------------------------------
The information contained in and accompanying this communication is
strictly confidential and intended solely for the use of the intended
recipient(s).

If you have received it by mistake please let us know by reply and then
delete it from your system; you should not copy the message or disclose
its content to anyone.

MarketAxess reserves the right to monitor the content of emails sent to
or from its systems.

Any comments or statements made are not necessarily those of
MarketAxess. For more information, please visit www.marketaxess.com.
MarketAxess Europe Limited is regulated in the UK by the FSA, registered
in England no. 4017610, registered office at 71 Fenchurch Street,
London, EC3M 4BS. Telephone (020) 7709 3100.

MarketAxess Corporation is regulated in the USA by the SEC and the NASD,
incorporated in Delaware, executive offices at 140 Broadway, New York,
NY 10005. Telephone (1) 212 813 6000.
This topic is locked and can not be replied to.