Forum: Ruby Join all text files in a folder, with a single line of Ruby code

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
31bf7512bf2bfe1c729c1cd7ac764d43?d=identicon&s=25 luisbebop (Guest)
on 2008-10-25 13:45
(Received via mailing list)
I did a single line of code in Ruby, which joins all text files in a
folder to a bigfile. I got some tests, and it's works!
Does anyone knows a better way, or other 'Ruby Way' to do that ?

File.open('bigfile','w') { |mergedFile| Dir.glob("*.txt").each { |
file| File.readlines(file).each { |line| mergedFile << line } } }

Thanks everyone!

www.twitter.com/luisbebop
753dcb78b3a3651127665da4bed3c782?d=identicon&s=25 Brian Candler (candlerb)
on 2008-10-25 13:56
luisbebop wrote:
> I did a single line of code in Ruby, which joins all text files in a
> folder to a bigfile. I got some tests, and it's works!
> Does anyone knows a better way, or other 'Ruby Way' to do that ?
>
> File.open('bigfile','w') { |mergedFile| Dir.glob("*.txt").each { |
> file| File.readlines(file).each { |line| mergedFile << line } } }

system("cat *.txt >bigfile")

:-?
F53b05cdbdf561cfe141f69b421244f3?d=identicon&s=25 David A. Black (Guest)
on 2008-10-25 14:56
(Received via mailing list)
Hi --

On Sat, 25 Oct 2008, luisbebop wrote:

> I did a single line of code in Ruby, which joins all text files in a
> folder to a bigfile. I got some tests, and it's works!
> Does anyone knows a better way, or other 'Ruby Way' to do that ?
>
> File.open('bigfile','w') { |mergedFile| Dir.glob("*.txt").each { |
> file| File.readlines(file).each { |line| mergedFile << line } } }

You can use read rather than readlines, and save a loop:

   Dir["*.txt"].each {|f| merged_file.print(File.read(f)) }

or similar.


David
2ee1a7960cc761a6e92efb5000c0f2c9?d=identicon&s=25 William James (Guest)
on 2008-10-25 15:51
(Received via mailing list)
luisbebop wrote:

> I did a single line of code in Ruby, which joins all text files in a
> folder to a bigfile. I got some tests, and it's works!
> Does anyone knows a better way, or other 'Ruby Way' to do that ?
>
> File.open('bigfile','w') { |mergedFile| Dir.glob("*.txt").each { |
> file| File.readlines(file).each { |line| mergedFile << line } } }

"We don't need no stinkin' loops!"

ruby -e"puts ARGF.to_a" *.txt >merged


"We still don't need no stinkin' loops!"

File.open("mrg","w"){|f|f.puts Dir['*.txt'].map{|nm|IO.read nm}}
753dcb78b3a3651127665da4bed3c782?d=identicon&s=25 Brian Candler (candlerb)
on 2008-10-25 16:03
William James wrote:
> "We still don't need no stinkin' loops!"
>
> File.open("mrg","w"){|f|f.puts Dir['*.txt'].map{|nm|IO.read nm}}

Note that 'puts' will add a newline to the end of each file which
doesn't already have one. If you don't want this, use 'print' or 'write'
instead.
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2008-10-25 16:42
(Received via mailing list)
On 25.10.2008 13:56, Brian Candler wrote:
> luisbebop wrote:
>> I did a single line of code in Ruby, which joins all text files in a
>> folder to a bigfile. I got some tests, and it's works!
>> Does anyone knows a better way, or other 'Ruby Way' to do that ?
>>
>> File.open('bigfile','w') { |mergedFile| Dir.glob("*.txt").each { |
>> file| File.readlines(file).each { |line| mergedFile << line } } }
>
> system("cat *.txt >bigfile")

Why not directly invoke "cat" from the shell prompt? :-)

Kind regards

  robert
31bf7512bf2bfe1c729c1cd7ac764d43?d=identicon&s=25 luisbebop (Guest)
on 2008-10-25 17:09
(Received via mailing list)
'Case I'm learning Ruby , and I wanna see some snippets to make some
tasks in a single line of Ruby code.
Directly from prompt is not funny!
31bf7512bf2bfe1c729c1cd7ac764d43?d=identicon&s=25 luisbebop (Guest)
on 2008-10-25 17:11
(Received via mailing list)
Without loops, it's very nice!
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2008-10-25 17:25
(Received via mailing list)
On 25.10.2008 15:47, William James wrote:
>
> ruby -e"puts ARGF.to_a" *.txt >merged

There's also

ruby -e '$defout.write(ARGF.read)' *.txt >merged
ruby -e 'File.open("out","w") {|io| io.write(ARGF.read)}' *.txt

> "We still don't need no stinkin' loops!"
>
> File.open("mrg","w"){|f|f.puts Dir['*.txt'].map{|nm|IO.read nm}}

That's vastly inefficient since it reads all the files into memory
before writing a single byte.  This is not necessary.  You can at least
improve to

File.open("mrg","w"){|f|Dir['*.txt'].each{|nm|f.write(File.read(nm))}}

But a proper solution (i.e. one that deals with arbitrary large files)
would use a fixed buffer size - but that looks ugly on a single line...

Kind regards

  robert
31bf7512bf2bfe1c729c1cd7ac764d43?d=identicon&s=25 luisbebop (Guest)
on 2008-10-25 18:21
(Received via mailing list)
I got your point. We need an one loop, to be more efficient.
Really, I don't need deal with arbitrary large files.
Like as said, the main goals here are: use ruby (without prompt
commands), and one line of code.
Thanks :)
47b1910084592eb77a032bc7d8d1a84e?d=identicon&s=25 Joel VanderWerf (Guest)
on 2008-10-25 21:40
(Received via mailing list)
William James wrote:
> "We don't need no stinkin' loops!"
>
> ruby -e"puts ARGF.to_a" *.txt >merged

Cheating a bit:

ARGV.replace Dir['*']; print ARGF.read

Not recommended, though, since it reads all the data into memory and
steps on ARGV.
34a7615f38496a5dafbb3e6b721c435e?d=identicon&s=25 Mohit Sindhwani (Guest)
on 2008-10-26 10:51
(Received via mailing list)
Brian Candler wrote:
> system("cat *.txt >bigfile")
>

and on Windows
system("copy *.txt > bigfile")
(make sure that the bigfile name doesn't match the pattern, so use
bigfile rather than bigfile.txt)

Cheers,
Mohit.
10/26/2008 | 5:49 PM.
F1d6cc2b735bfd82c8773172da2aeab9?d=identicon&s=25 Nobuyoshi Nakada (nobu)
on 2008-10-26 15:07
(Received via mailing list)
Hi,

At Sun, 26 Oct 2008 04:08:48 +0900,
Joel VanderWerf wrote in [ruby-talk:318574]:
> > ruby -e"puts ARGF.to_a" *.txt >merged
>
> Cheating a bit:
>
> ARGV.replace Dir['*']; print ARGF.read
>
> Not recommended, though, since it reads all the data into memory and
> steps on ARGV.

ruby -pe 'BEGIN{ARGV.replace Dir["*"]}'
47b1910084592eb77a032bc7d8d1a84e?d=identicon&s=25 Joel VanderWerf (Guest)
on 2008-10-26 19:49
(Received via mailing list)
Nobuyoshi Nakada wrote:
>> steps on ARGV.
>
> ruby -pe 'BEGIN{ARGV.replace Dir["*"]}'
>

Very nice! But if you are going that far, why not go all the way:

ruby -pe'1' *
31bf7512bf2bfe1c729c1cd7ac764d43?d=identicon&s=25 luisbebop (Guest)
on 2008-10-27 02:57
(Received via mailing list)
> ruby -pe'1' *

Can you explain ? Sorry, but I didn't understand.

Thanks :)
47b1910084592eb77a032bc7d8d1a84e?d=identicon&s=25 Joel VanderWerf (Guest)
on 2008-10-27 03:45
(Received via mailing list)
luisbebop wrote:
>> ruby -pe'1' *
>
> Can you explain ? Sorry, but I didn't understand.

If you run this in a shell, the * expands to all files. The -p switch
means "for each  line in the files on the command line, store the line
into $_, and print $_. Usually, you want to use -e'some code' to operate
on $_. In this case, the '1' is a no-op, so it just prints the line
without changing it. HTH.
6087a044557d6b59ab52e7dd20f94da8?d=identicon&s=25 Peña, Botp (Guest)
on 2008-10-27 06:14
(Received via mailing list)
From: Joel VanderWerf [mailto:vjoel@path.berkeley.edu]
# luisbebop wrote:
# >> ruby -pe'1' *
# >
# > Can you explain ? Sorry, but I didn't understand.
# If you run this in a shell, the * expands to all files.
# The -p switch means "for each  line in the files on the
# command line, store  the line into $_, and print $_.
# Usually, you want to use -e'some code' to operate
# on $_. In this case, the '1' is a no-op, so it just
# prints the line without changing it. HTH.

wc also means,

  ruby -pe '' *
47b1910084592eb77a032bc7d8d1a84e?d=identicon&s=25 Joel VanderWerf (Guest)
on 2008-10-27 06:45
(Received via mailing list)
Peña wrote:
> # prints the line without changing it. HTH.
>
> wc also means,
>
>   ruby -pe '' *

Ah, you're right. I tried

ruby -pe'' *

but that failed. With the extra space it works.
F1d6cc2b735bfd82c8773172da2aeab9?d=identicon&s=25 Nobuyoshi Nakada (nobu)
on 2008-10-27 07:40
(Received via mailing list)
Hi,

At Mon, 27 Oct 2008 14:43:58 +0900,
Joel VanderWerf wrote in [ruby-talk:318648]:
> Ah, you're right. I tried
>
> ruby -pe'' *
>
> but that failed. With the extra space it works.

I often use -ep to get rid of quotes and "unused literal"
warning.
6087a044557d6b59ab52e7dd20f94da8?d=identicon&s=25 Peña, Botp (Guest)
on 2008-10-27 08:11
(Received via mailing list)
From: Nobuyoshi Nakada [mailto:nobu@ruby-lang.org]
# I often use -ep to get rid of quotes and "unused literal"
# warning.

i just tried that nobu, but it gives no output

:~$ ruby -ep *.txt
:~$
F1d6cc2b735bfd82c8773172da2aeab9?d=identicon&s=25 Nobuyoshi Nakada (nobu)
on 2008-10-27 11:00
(Received via mailing list)
Hi,

At Mon, 27 Oct 2008 16:10:31 +0900,
Peña, Botp <botp@delmonte-phil.com> wrote in [ruby-talk:318654]:
> i just tried that nobu, but it gives no output
>
> :~$ ruby -ep *.txt

You needs -p option.

  ruby -pep *.txt
6087a044557d6b59ab52e7dd20f94da8?d=identicon&s=25 Peña, Botp (Guest)
on 2008-10-27 11:23
(Received via mailing list)
From: Nobuyoshi Nakada [mailto:nobu@ruby-lang.org]
# At Mon, 27 Oct 2008 16:10:31 +0900,
# Peña, Botp <botp@delmonte-phil.com> wrote in [ruby-talk:318654]:
# > i just tried that nobu, but it gives no output
# >
# > :~$ ruby -ep *.txt
#
# You needs -p option.
#
#   ruby -pep *.txt

you're saying -ep is different from -e -p ?
i'm asking since i do not see it ruby -h

thanks for the info -botp
F1d6cc2b735bfd82c8773172da2aeab9?d=identicon&s=25 Nobuyoshi Nakada (nobu)
on 2008-10-27 13:17
(Received via mailing list)
Hi,

At Mon, 27 Oct 2008 19:22:57 +0900,
Peña, Botp <botp@delmonte-phil.com> wrote in [ruby-talk:318662]:
> # > i just tried that nobu, but it gives no output
> # >
> # > :~$ ruby -ep *.txt
> #
> # You needs -p option.
> #
> #   ruby -pep *.txt
>
> you're saying -ep is different from -e -p ?
> i'm asking since i do not see it ruby -h

Each -e needs a following expression, so 'p' after it is
Kernel#p, however, -p doesn't take arguments so 'e' after it is
-e.
31bf7512bf2bfe1c729c1cd7ac764d43?d=identicon&s=25 luisbebop (Guest)
on 2008-10-27 14:25
(Received via mailing list)
Thanks for all replies. A lot of interesting solutions!

www.twitter.com/luisbebop
797ef431a5e1295b56c08e1db4c8d2df?d=identicon&s=25 botp (Guest)
on 2008-10-27 14:27
(Received via mailing list)
on Mon, Oct 27, 2008 at 8:17 PM, Nobuyoshi Nakada <nobu@ruby-lang.org>
wrote:
> Each -e needs a following expression, so 'p' after it is
> Kernel#p, however, -p doesn't take arguments so 'e' after it is -e.
>

dumb me. all the time i thought the quotes were required for -e and
that a space should separate it fr the expression :))
so now even "ruby -pe0 *.txt" should work!

many thanks for the englightenment, nobu.
kind regards -botp
D812408537ac3a0fa2fec96eb8811559?d=identicon&s=25 John Carter (johncarter)
on 2008-10-28 01:19
(Received via mailing list)
On Sun, 26 Oct 2008, Robert Klemme wrote:

>> File.open("mrg","w"){|f|f.puts Dir['*.txt'].map{|nm|IO.read nm}}
>
> That's vastly inefficient since it reads all the files into memory before
> writing a single byte.  This is not necessary.  You can at least improve to
>
> File.open("mrg","w"){|f|Dir['*.txt'].each{|nm|f.write(File.read(nm))}}

If we're into fast and ugly...

We need a Ruby interface to Linux "splice"...

   splice()  moves  data  between  two  file  descriptors  without
copying
        between kernel address space and user address space.  It
transfers  up
        to  len  bytes  of  data  from  the  file  descriptor fd_in to
the file
        descriptor fd_out, where one of the descriptors must refer to a
pipe.

See "man splice" for more.


John Carter                             Phone : (64)(3) 358 6639
Tait Electronics                        Fax   : (64)(3) 359 4632
PO Box 1645 Christchurch                Email : john.carter@tait.co.nz
New Zealand
This topic is locked and can not be replied to.