Forum: Ruby Read and re-write file with one open?

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
C06405cc37a9abceff09e36fa8cf9e51?d=identicon&s=25 Adam Bender (Guest)
on 2009-04-30 06:28
(Received via mailing list)
I would like to write a Ruby script that opens a text file, performs a
gsub
on each line, and then overwrites the file with the updated contents.
Right
now I open the file twice: once to read and once to write.  The reason
for
this is that if I try to perform both operations on the same IO object
by
calling io.rewind and writing from the beginning, and the substituted
word
is shorter than what it is replacing, some portion of the end of the
original file remains.  Is there an idiom for "clearing" the contents of
the
file before writing?

Thanks,

Adam
60ddadf1a8a8ebac760e7e4cc1e342b3?d=identicon&s=25 Siddick Ebramsha (siddick)
on 2009-04-30 06:56
Instead of using the rewind method, you can use the reopen method to
open a same file in Write mode.
Example :-
  file = File.open( "filename", "r" )
  file.reopen( "filename", "w" )
31e038e4e9330f6c75ccfd1fca8010ee?d=identicon&s=25 Gregory Brown (Guest)
on 2009-04-30 07:10
(Received via mailing list)
On Thu, Apr 30, 2009 at 12:27 AM, Adam Bender <abender@gmail.com> wrote:
> I would like to write a Ruby script that opens a text file, performs a gsub
> on each line, and then overwrites the file with the updated contents.  Right
> now I open the file twice: once to read and once to write.  The reason for
> this is that if I try to perform both operations on the same IO object by
> calling io.rewind and writing from the beginning, and the substituted word
> is shorter than what it is replacing, some portion of the end of the
> original file remains.  Is there an idiom for "clearing" the contents of the
> file before writing?

Yes, but typically this is done by creating a new file and then
renaming it to replace the old one.

Here's an example from my upcoming book "Ruby Best Practices"[0].  It
naively strips comments from source files.
You can modify it to fit your needs.

----------

require "tempfile"
require "fileutils"

temp = Tempfile.new("without_comments")
File.foreach(ARGV[0]) do |line|
  temp << line unless line =~ /^\s*#/
end
temp.close

FileUtils.cp(ARGV[0],"#{ARGV[0]}.bak") # comment out if you don't want
backups.
FileUtils.mv(temp.path,ARGV[0])

----------

-greg

PS: sorry for the shameless book plug, but it really might be helpful
for questions like these. :)


[0] http://rubybestpractices.com
54404bcac0f45bf1c8e8b827cd9bb709?d=identicon&s=25 7stud -- (7stud)
on 2009-04-30 07:26
Adam Bender wrote:
> I would like to write a Ruby script that opens a text file, performs a
> gsub
> on each line, and then overwrites the file with the updated contents.
> Right
> now I open the file twice: once to read and once to write.  The reason
> for
> this is that if I try to perform both operations on the same IO object
> by
> calling io.rewind and writing from the beginning, and the substituted
> word
> is shorter than what it is replacing, some portion of the end of the
> original file remains.  Is there an idiom for "clearing" the contents of
> the
> file before writing?
>

Yes, opening the file in write mode!  However, you are going to expose
yourself to this catastrophe.   Suppose you read the contents of the
file into a variable, then open the file for writing, which then erases
the file, but immediately thereafter your program crashes or the power
goes out in your city.  What are you left with?  You will be left with
an empty file, and the variable that contained the contents of the file
will have evaporated into the ether.  In other words, you will lose all
your data!

So the idiom for rewriting a file is:

1) Open the file for *reading*.
2) Open another file for writing with a name like origName-edited.txt
3) Read the original file line by line(saves memory, but is slower)
4) Write each altered line to the file origName-edited.txt
5) Delete the original file.
6) Change the name of the new file (origName-edited.txt) to origName.txt
C06405cc37a9abceff09e36fa8cf9e51?d=identicon&s=25 Adam Bender (Guest)
on 2009-04-30 09:12
(Received via mailing list)
On Thu, Apr 30, 2009 at 1:26 AM, 7stud -- <bbxx789_05ss@yahoo.com>
wrote:

> 3) Read the original file line by line(saves memory, but is slower)
> 4) Write each altered line to the file origName-edited.txt
> 5) Delete the original file.
> 6) Change the name of the new file (origName-edited.txt) to origName.txt


I see the potential for catastrophe with the way I suggested, however,
there
is potential for catastrophe here if there is already an
"origName-edited.txt" file (I know, slim chance, but you never know).
You
could get around this by generating new file names until you found one
that
didn't exist, or writing to /tmp, of course.  I think I'll switch to
Greg
Brown's suggestion.  Does Tempfile guarantee that it won't overwrite an
existing file?

Adam
54404bcac0f45bf1c8e8b827cd9bb709?d=identicon&s=25 7stud -- (7stud)
on 2009-04-30 09:54
Adam Bender wrote:
> On Thu, Apr 30, 2009 at 1:26 AM, 7stud -- <bbxx789_05ss@yahoo.com>
> wrote:
>
>> 3) Read the original file line by line(saves memory, but is slower)
>> 4) Write each altered line to the file origName-edited.txt
>> 5) Delete the original file.
>> 6) Change the name of the new file (origName-edited.txt) to origName.txt
>
>
> I see the potential for catastrophe with the way I suggested, however,
> there
> is potential for catastrophe here if there is already an
> "origName-edited.txt" file (I know, slim chance, but you never know).
>

Of course, if that was a possibility then you would take extra measures
like create a new file name with rand, and then check it with
File.exists?, which is probably what Tempfile does.

> Does Tempfile guarantee that it won't overwrite an
existing file?

What the standard library docs aren't clear enough for you:

----
tempfile - manipulates temporary files
----

???!!  lol. pathetic.  But once in a great while you can actually find
some information on a standard library module using google:

http://www.rubytips.org/2008/01/11/using-temporary...
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2009-04-30 11:08
(Received via mailing list)
2009/4/30 Gregory Brown <gregory.t.brown@gmail.com>:
> Yes, but typically this is done by creating a new file and then
> renaming it to replace the old one.
>
> Here's an example from my upcoming book "Ruby Best Practices"[0].  It
> naively strips comments from source files.

A variant exploiting Ruby's command line parameters:

11:05:08 Temp$ ruby -e '10.times {|i| puts i}' >| x
11:05:22 Temp$ cat x
0
1
2
3
4
5
6
7
8
9
11:05:23 Temp$ ./x.rb x
11:05:29 Temp$ cat x
<<<0>>>
<<<1>>>
<<<2>>>
<<<3>>>
<<<4>>>
<<<5>>>
<<<6>>>
<<<7>>>
<<<8>>>
<<<9>>>
11:05:34 Temp$ cat x.bak
0
1
2
3
4
5
6
7
8
9
11:05:37 Temp$ cat x.rb
#!/opt/bin/ruby19 -pi.bak

$_.sub! /^/, '<<<'
$_.sub! /$/, '>>>'
11:05:40 Temp$

Kind regards

robert
31e038e4e9330f6c75ccfd1fca8010ee?d=identicon&s=25 Gregory Brown (Guest)
on 2009-04-30 16:02
(Received via mailing list)
On Thu, Apr 30, 2009 at 3:11 AM, Adam Bender <abender@gmail.com> wrote:

> I see the potential for catastrophe with the way I suggested, however, there
> is potential for catastrophe here if there is already an
> "origName-edited.txt" file (I know, slim chance, but you never know).  You
> could get around this by generating new file names until you found one that
> didn't exist, or writing to /tmp, of course.  I think I'll switch to Greg
> Brown's suggestion.  Does Tempfile guarantee that it won't overwrite an
> existing file?

Yes, Tempfile avoids file collisions.

-greg
1636be0d225f58321def06fb92ab93a9?d=identicon&s=25 James Dinkel (jdinkel)
on 2009-04-30 16:41
Gregory Brown wrote:
>
> Here's an example from my upcoming book "Ruby Best Practices"[0].  It
> naively strips comments from source files.
> You can modify it to fit your needs.
>
> ----------
>
> require "tempfile"
> require "fileutils"
>
> temp = Tempfile.new("without_comments")
> File.foreach(ARGV[0]) do |line|
>   temp << line unless line =~ /^\s*#/
> end
> temp.close
>
> FileUtils.cp(ARGV[0],"#{ARGV[0]}.bak") # comment out if you don't want
> backups.
> FileUtils.mv(temp.path,ARGV[0])
>
> ----------
>
> -greg

What about if another program opens the file after it has been read by
the Ruby program but before the Ruby program has copied the temp file?
If the second program makes a change and saves it, those changes will be
lost when the Ruby program copies the temp file over it.  Or if the
second program still has it open and the Ruby program finishes, then
when the second program saves it's open file, it will overwrite the Ruby
program's changes.
31e038e4e9330f6c75ccfd1fca8010ee?d=identicon&s=25 Gregory Brown (Guest)
on 2009-04-30 16:54
(Received via mailing list)
On Thu, Apr 30, 2009 at 10:41 AM, James Dinkel <jdinkel@gmail.com>
wrote:

> What about if another program opens the file after it has been read by
> the Ruby program but before the Ruby program has copied the temp file?
> If the second program makes a change and saves it, those changes will be
> lost when the Ruby program copies the temp file over it.  Or if the
> second program still has it open and the Ruby program finishes, then
> when the second program saves it's open file, it will overwrite the Ruby
> program's changes.

Is this related to the OP's concerns?  In this case, you'd need file
locking (see the Ruby API).
But I didn't see any mention of these sorts of issues in Adam's original
post.

If you need this feature, read the API docs for File#flock

-greg
699a3d471442eb22c0ab9458c2c573a4?d=identicon&s=25 timr (Guest)
on 2009-05-01 21:00
(Received via mailing list)
On Apr 29, 9:27 pm, Adam Bender <aben...@gmail.com> wrote:
699a3d471442eb22c0ab9458c2c573a4?d=identicon&s=25 timr (Guest)
on 2009-05-01 21:18
(Received via mailing list)
On Apr 29, 9:27 pm, Adam Bender <aben...@gmail.com> wrote:
>
> Thanks,
>
> Adam

I think File objects have a truncate method you can use after reading
(f.truncate(0) #the file is now blank) to delete the contents.

file.truncate(integer) => 0

Truncates file to at most integer bytes. The file must be opened for
writing. Not available on all platforms.

   f = File.new("out", "w")
   f.syswrite("1234567890")   #=> 10
   f.truncate(5)              #=> 0
   f.close()                  #=> nil
   File.size("out")           #=> 5
Ae16cb4f6d78e485b04ce1e821592ae5?d=identicon&s=25 Martin DeMello (Guest)
on 2009-05-01 21:20
(Received via mailing list)
On Thu, Apr 30, 2009 at 1:24 PM, 7stud -- <bbxx789_05ss@yahoo.com>
wrote:
> What the standard library docs aren't clear enough for you:
>
> ----
> tempfile - manipulates temporary files
> ----
>
> ???!!  lol. pathetic.

Don't just scoff, send in a patch!

martin
1636be0d225f58321def06fb92ab93a9?d=identicon&s=25 James Dinkel (jdinkel)
on 2009-05-01 22:33
timr wrote:
> On Apr 29, 9:27�pm, Adam Bender <aben...@gmail.com> wrote:
>
> Truncates file to at most integer bytes. The file must be opened for
> writing. Not available on all platforms.
>
>    f = File.new("out", "w")
>    f.syswrite("1234567890")   #=> 10
>    f.truncate(5)              #=> 0
>    f.close()                  #=> nil
>    File.size("out")           #=> 5

That's pretty close to how I've been modifying files:

  File.open(filename, 'r+') do |file|
    lines = file.readlines

    # modify data in the lines array

    file.pos = 0
    file.print lines # will not put \$ between array elements
    file.truncate(file.pos)
  end

This opens a file for reading and writing, reads the file into an array,
then you modify the array how you need to.  Then the block returns to
the beginning of the file, writes out your changes over the existing
file, then chops off what's left.  Then, of course, the file is closed
when the block exits.

To get a little more complicated, I wrapped this in a class method:

class File

  def self.change!(filename, create = false)
    ### method to make it easy to open a file and make changes to it.
    ### usage example
    ## File.change(myfile) do |contents|
    ##   contents.gsub!(/this/, "that")
    ## end
    ### I can also use "throw(:nochanges)" anywhere in the block
    ### to prevent the file from being written.
    ### Make sure 'contents' does not get pointed to a new object;
    ### for example, an assignemt "contents = 'new data'" will break the
method

    # if create is true, create the file if it does not exist
    if create == true
      File.open(filename, 'w') { |blank| blank.write '' } unless
File.exist?(filename)
    end

    # read the file, execute a block, then write the file
    if File.exist?(filename)
      File.open(filename, 'r+') do |file|
        lines = file.readlines
        # do not write the file if it did not change (block must "throw
:nochanges")
        catch(:nochanges) do
          yield lines
          file.pos = 0
          file.print lines # will not put \$ between array elements
          file.truncate(file.pos)
        end
      end
    end
  end
end
1636be0d225f58321def06fb92ab93a9?d=identicon&s=25 James Dinkel (jdinkel)
on 2009-05-01 22:37
Aw, word wrapping broke some of my lines...

these should be single lines:

> ### for example, an assignemt "contents = 'new data'" will break the method


> File.open(filename, 'w') { |blank| blank.write '' } unless File.exist?(filename)


> # do not write the file if it did not change (block must "throw :nochanges")
C06405cc37a9abceff09e36fa8cf9e51?d=identicon&s=25 Adam Bender (Guest)
on 2009-05-02 01:49
(Received via mailing list)
On Fri, May 1, 2009 at 4:33 PM, James Dinkel <jdinkel@gmail.com> wrote:

>  end
>
> This opens a file for reading and writing, reads the file into an array,
> then you modify the array how you need to.  Then the block returns to
> the beginning of the file, writes out your changes over the existing
> file, then chops off what's left.  Then, of course, the file is closed
> when the block exits.
>

This is exactly what I wanted.  Thanks everyone.

Adam
54404bcac0f45bf1c8e8b827cd9bb709?d=identicon&s=25 7stud -- (7stud)
on 2009-05-02 10:05
Adam Bender wrote:
> On Fri, May 1, 2009 at 4:33 PM, James Dinkel <jdinkel@gmail.com> wrote:
>
>>  end
>>
>> This opens a file for reading and writing, reads the file into an array,
>> then you modify the array how you need to.  Then the block returns to
>> the beginning of the file, writes out your changes over the existing
>> file, then chops off what's left.  Then, of course, the file is closed
>> when the block exits.
>>
>
> This is exactly what I wanted.  Thanks everyone.
>
> Adam

I think you guys are missing the point.   There are lots of ways to
rewrite a file that 'work'. For instance, simply reading a file into an
array, closing the file, then opening the file for writing(which erases
the file), and then writing the altered lines back to the file 'works'.
You can test it yourself and see that it works.  You can rewrite the
file like that 1,000 times and it will 'work'.

However, if your data is important you need to ask yourself the
question: what happens if my program crashes while I am writing the data
back out to the file?

So let's ask that question about the solution you've decided to adopt.
Suppose your program is at the point where it has written half of the
altered data back to the file, and the file contains half altered data
and half original data.  Then your program crashes.  What are you left
with?

The "rewriting a file" issue has been hashed out by many programmers for
decades.  You can either try to come up with your own screwy method, or
you can adopt an accepted idiom.  Within the accepted idiom, modules
like Tempfile were created to deal with the problem of overwriting an
existing file name, and it provides a shortcut.

>truncate: Not available on all platforms.

That should also be a red flag.  Generally, you should strive to write
cross platform programs.
54404bcac0f45bf1c8e8b827cd9bb709?d=identicon&s=25 7stud -- (7stud)
on 2009-05-02 10:29
7stud -- wrote:
> Adam Bender wrote:
>> On Fri, May 1, 2009 at 4:33 PM, James Dinkel <jdinkel@gmail.com> wrote:
>>
>>>  end
>>>
>>> This opens a file for reading and writing, reads the file into an array,
>>> then you modify the array how you need to.  Then the block returns to
>>> the beginning of the file, writes out your changes over the existing
>>> file, then chops off what's left.  Then, of course, the file is closed
>>> when the block exits.
>>>
>>
>> This is exactly what I wanted.  Thanks everyone.
>>
>> Adam
>
> I think you guys are missing the point.   There are lots of ways to
> rewrite a file that 'work'. For instance, simply reading a file into an
> array, closing the file, then opening the file for writing(which erases
> the file), and then writing the altered lines back to the file 'works'.
> You can test it yourself and see that it works.  You can rewrite the
> file like that 1,000 times and it will 'work'.
>
> However, if your data is important you need to ask yourself the
> question: what happens if my program crashes while I am writing the data
> back out to the file?
>
> So let's ask that question about the solution you've decided to adopt.
> Suppose your program is at the point where it has written half of the
> altered data back to the file, and the file contains half altered data
> and half original data.  Then your program crashes.  What are you left
> with?

Hmmm...I guess you are left with a file that you can examine and then
locate the spot where you should begin gsub'ing again.

You may however run into a problem if the program crashes in the middle
of a line.   Input and output to/from files gets buffered to make things
more efficient because repeatedly accessing files is relatively slow.
For instance, when you tell your program to write a line to a file, it
doesn't actually do that.  Instead, programs store the line in a buffer.
Then when the buffer fills up, the contents of the buffer get written to
the file in one big chunk.  That cuts down on the number of file
accesses.  The same thing happens when you read from a file.  You may
tell your program to read one line from a file, but your program will
ignore you.  Instead, your program will read a chunk of the file and
store it in a buffer.  Then If you request more lines from the file,
your program will retrieve them from the buffer.  That cuts down on the
number of times your program has to access the file.

As a result, I think even though you may write one line at a time to the
file, it's possible the buffer may get written to the file where the
last thing in the buffer is half a line.  Then if your program crashes
you are going to have a corrupted line in your file.
31e038e4e9330f6c75ccfd1fca8010ee?d=identicon&s=25 Gregory Brown (Guest)
on 2009-05-02 21:37
(Received via mailing list)
On Sat, May 2, 2009 at 4:05 AM, 7stud -- <bbxx789_05ss@yahoo.com> wrote:

> existing file name, and it provides a shortcut.
Thanks for going into the detail here.  I threw the atomic save
solution out there, but didn't give much background, and this
establishes a much better motivation for it.

-greg
This topic is locked and can not be replied to.