Read and re-write file with one open?

adambeazley · April 30, 2009, 6:28am

I would like to write a Ruby script that opens a text file, performs a
gsub
on each line, and then overwrites the file with the updated contents.
Right
now I open the file twice: once to read and once to write. The reason
for
this is that if I try to perform both operations on the same IO object
by
calling io.rewind and writing from the beginning, and the substituted
word
is shorter than what it is replacing, some portion of the end of the
original file remains. Is there an idiom for “clearing” the contents of
the
file before writing?

Thanks,

Adam

adambeazley · April 30, 2009, 6:56am

Instead of using the rewind method, you can use the reopen method to
open a same file in Write mode.
Example :-
file = File.open( “filename”, “r” )
file.reopen( “filename”, “w” )

adambeazley · April 30, 2009, 7:10am

On Thu, Apr 30, 2009 at 12:27 AM, Adam B. [email protected] wrote:

I would like to write a Ruby script that opens a text file, performs a gsub
on each line, and then overwrites the file with the updated contents. Right
now I open the file twice: once to read and once to write. The reason for
this is that if I try to perform both operations on the same IO object by
calling io.rewind and writing from the beginning, and the substituted word
is shorter than what it is replacing, some portion of the end of the
original file remains. Is there an idiom for “clearing” the contents of the
file before writing?

Yes, but typically this is done by creating a new file and then
renaming it to replace the old one.

Here’s an example from my upcoming book “Ruby Best Practices”[0]. It
naively strips comments from source files.
You can modify it to fit your needs.

require “tempfile”
require “fileutils”

temp = Tempfile.new(“without_comments”)
File.foreach(ARGV[0]) do |line|
temp << line unless line =~ /^\s*#/
end
temp.close

FileUtils.cp(ARGV[0],“#{ARGV[0]}.bak”) # comment out if you don’t want
backups.
FileUtils.mv(temp.path,ARGV[0])

-greg

PS: sorry for the shameless book plug, but it really might be helpful
for questions like these.

[0] http://rubybestpractices.com

adambeazley · April 30, 2009, 7:26am

Adam B. wrote:

I would like to write a Ruby script that opens a text file, performs a
gsub
on each line, and then overwrites the file with the updated contents.
Right
now I open the file twice: once to read and once to write. The reason
for
this is that if I try to perform both operations on the same IO object
by
calling io.rewind and writing from the beginning, and the substituted
word
is shorter than what it is replacing, some portion of the end of the
original file remains. Is there an idiom for “clearing” the contents of
the
file before writing?

Yes, opening the file in write mode! However, you are going to expose
yourself to this catastrophe. Suppose you read the contents of the
file into a variable, then open the file for writing, which then erases
the file, but immediately thereafter your program crashes or the power
goes out in your city. What are you left with? You will be left with
an empty file, and the variable that contained the contents of the file
will have evaporated into the ether. In other words, you will lose all
your data!

So the idiom for rewriting a file is:

Open the file for reading.
Open another file for writing with a name like origName-edited.txt
Read the original file line by line(saves memory, but is slower)
Write each altered line to the file origName-edited.txt
Delete the original file.
Change the name of the new file (origName-edited.txt) to origName.txt

adambeazley · April 30, 2009, 9:54am

Adam B. wrote:

On Thu, Apr 30, 2009 at 1:26 AM, 7stud – [email protected]
wrote:

Read the original file line by line(saves memory, but is slower)

Write each altered line to the file origName-edited.txt

Delete the original file.

Change the name of the new file (origName-edited.txt) to origName.txt

I see the potential for catastrophe with the way I suggested, however,
there
is potential for catastrophe here if there is already an
“origName-edited.txt” file (I know, slim chance, but you never know).

Of course, if that was a possibility then you would take extra measures
like create a new file name with rand, and then check it with
File.exists?, which is probably what Tempfile does.

Does Tempfile guarantee that it won’t overwrite an
existing file?

What the standard library docs aren’t clear enough for you:

tempfile - manipulates temporary files

???!! lol. pathetic. But once in a great while you can actually find
some information on a standard library module using google:

http://www.rubytips.org/2008/01/11/using-temporary-files-in-ruby-tempfilenew/

adambeazley · April 30, 2009, 11:08am

2009/4/30 Gregory B. [email protected]:

Yes, but typically this is done by creating a new file and then
renaming it to replace the old one.

Here’s an example from my upcoming book “Ruby Best Practices”[0]. It
naively strips comments from source files.

A variant exploiting Ruby’s command line parameters:

11:05:08 Temp$ ruby -e ‘10.times {|i| puts i}’ >| x
11:05:22 Temp$ cat x
0
1
2
3
4
5
6
7
8
9
11:05:23 Temp$ ./x.rb x
11:05:29 Temp$ cat x
<<<0>>>
<<<1>>>
<<<2>>>
<<<3>>>
<<<4>>>
<<<5>>>
<<<6>>>
<<<7>>>
<<<8>>>
<<<9>>>
11:05:34 Temp$ cat x.bak
0
1
2
3
4
5
6
7
8
9
11:05:37 Temp$ cat x.rb
#!/opt/bin/ruby19 -pi.bak

$.sub! /^/, ‘<<<’
$.sub! /$/, ‘>>>’
11:05:40 Temp$

Kind regards

robert

adambeazley · April 30, 2009, 4:02pm

On Thu, Apr 30, 2009 at 3:11 AM, Adam B. [email protected] wrote:

I see the potential for catastrophe with the way I suggested, however, there
is potential for catastrophe here if there is already an
“origName-edited.txt” file (I know, slim chance, but you never know). You
could get around this by generating new file names until you found one that
didn’t exist, or writing to /tmp, of course. I think I’ll switch to Greg
Brown’s suggestion. Does Tempfile guarantee that it won’t overwrite an
existing file?

Yes, Tempfile avoids file collisions.

-greg

adambeazley · April 30, 2009, 9:12am

On Thu, Apr 30, 2009 at 1:26 AM, 7stud – [email protected]
wrote:

Read the original file line by line(saves memory, but is slower)

Write each altered line to the file origName-edited.txt

Delete the original file.

Change the name of the new file (origName-edited.txt) to origName.txt

I see the potential for catastrophe with the way I suggested, however,
there
is potential for catastrophe here if there is already an
“origName-edited.txt” file (I know, slim chance, but you never know).
You
could get around this by generating new file names until you found one
that
didn’t exist, or writing to /tmp, of course. I think I’ll switch to
Greg
Brown’s suggestion. Does Tempfile guarantee that it won’t overwrite an
existing file?

Adam

adambeazley · April 30, 2009, 4:54pm

On Thu, Apr 30, 2009 at 10:41 AM, James D. [email protected]
wrote:

What about if another program opens the file after it has been read by
the Ruby program but before the Ruby program has copied the temp file?
If the second program makes a change and saves it, those changes will be
lost when the Ruby program copies the temp file over it. Or if the
second program still has it open and the Ruby program finishes, then
when the second program saves it’s open file, it will overwrite the Ruby
program’s changes.

Is this related to the OP’s concerns? In this case, you’d need file
locking (see the Ruby API).
But I didn’t see any mention of these sorts of issues in Adam’s original
post.

If you need this feature, read the API docs for File#flock

-greg

adambeazley · May 1, 2009, 9:00pm

On Apr 29, 9:27 pm, Adam B. [email protected] wrote:

adambeazley · April 30, 2009, 4:41pm

Gregory B. wrote:

Here’s an example from my upcoming book “Ruby Best Practices”[0]. It
naively strips comments from source files.
You can modify it to fit your needs.

require “tempfile”
require “fileutils”

temp = Tempfile.new(“without_comments”)
File.foreach(ARGV[0]) do |line|
temp << line unless line =~ /^\s*#/
end
temp.close

FileUtils.cp(ARGV[0],"#{ARGV[0]}.bak") # comment out if you don’t want
backups.
FileUtils.mv(temp.path,ARGV[0])

-greg

What about if another program opens the file after it has been read by
the Ruby program but before the Ruby program has copied the temp file?
If the second program makes a change and saves it, those changes will be
lost when the Ruby program copies the temp file over it. Or if the
second program still has it open and the Ruby program finishes, then
when the second program saves it’s open file, it will overwrite the Ruby
program’s changes.

adambeazley · May 1, 2009, 9:20pm

On Thu, Apr 30, 2009 at 1:24 PM, 7stud – [email protected]
wrote:

What the standard library docs aren’t clear enough for you:

tempfile - manipulates temporary files

???!! lol. pathetic.

Don’t just scoff, send in a patch!

martin

adambeazley · May 1, 2009, 9:18pm

On Apr 29, 9:27 pm, Adam B. [email protected] wrote:

Thanks,

Adam

I think File objects have a truncate method you can use after reading
(f.truncate(0) #the file is now blank) to delete the contents.

file.truncate(integer) => 0

Truncates file to at most integer bytes. The file must be opened for
writing. Not available on all platforms.

f = File.new(“out”, “w”)
f.syswrite(“1234567890”) #=> 10
f.truncate(5) #=> 0
f.close() #=> nil
File.size(“out”) #=> 5

adambeazley · May 1, 2009, 10:33pm

timr wrote:

On Apr 29, 9:27ï¿½pm, Adam B. [email protected] wrote:

Truncates file to at most integer bytes. The file must be opened for
writing. Not available on all platforms.

f = File.new(“out”, “w”)
f.syswrite(“1234567890”) #=> 10
f.truncate(5) #=> 0
f.close() #=> nil
File.size(“out”) #=> 5

That’s pretty close to how I’ve been modifying files:

File.open(filename, ‘r+’) do |file|
lines = file.readlines

# modify data in the lines array

file.pos = 0
file.print lines # will not put \$ between array elements
file.truncate(file.pos)

end

This opens a file for reading and writing, reads the file into an array,
then you modify the array how you need to. Then the block returns to
the beginning of the file, writes out your changes over the existing
file, then chops off what’s left. Then, of course, the file is closed
when the block exits.

To get a little more complicated, I wrapped this in a class method:

class File

def self.change!(filename, create = false)
### method to make it easy to open a file and make changes to it.
### usage example
## File.change(myfile) do |contents|
## contents.gsub!(/this/, “that”)
## end
### I can also use “throw(:nochanges)” anywhere in the block
### to prevent the file from being written.
### Make sure ‘contents’ does not get pointed to a new object;
### for example, an assignemt “contents = ‘new data’” will break the
method

# if create is true, create the file if it does not exist
if create == true
  File.open(filename, 'w') { |blank| blank.write '' } unless

File.exist?(filename)
end

# read the file, execute a block, then write the file
if File.exist?(filename)
  File.open(filename, 'r+') do |file|
    lines = file.readlines
    # do not write the file if it did not change (block must "throw

:nochanges")
catch(:nochanges) do
yield lines
file.pos = 0
file.print lines # will not put $ between array elements
file.truncate(file.pos)
end
end
end
end
end

adambeazley · May 2, 2009, 1:49am

On Fri, May 1, 2009 at 4:33 PM, James D. [email protected] wrote:

end

This opens a file for reading and writing, reads the file into an array,
then you modify the array how you need to. Then the block returns to
the beginning of the file, writes out your changes over the existing
file, then chops off what’s left. Then, of course, the file is closed
when the block exits.

This is exactly what I wanted. Thanks everyone.

Adam

adambeazley · May 1, 2009, 10:37pm

Aw, word wrapping broke some of my lines…

these should be single lines:

for example, an assignemt “contents = ‘new data’” will break the method

File.open(filename, ‘w’) { |blank| blank.write ‘’ } unless File.exist?(filename)

do not write the file if it did not change (block must “throw :nochanges”)

adambeazley · May 2, 2009, 10:05am

Adam B. wrote:

On Fri, May 1, 2009 at 4:33 PM, James D. [email protected] wrote:

end

This opens a file for reading and writing, reads the file into an array,
then you modify the array how you need to. Then the block returns to
the beginning of the file, writes out your changes over the existing
file, then chops off what’s left. Then, of course, the file is closed
when the block exits.

This is exactly what I wanted. Thanks everyone.

Adam

I think you guys are missing the point. There are lots of ways to
rewrite a file that ‘work’. For instance, simply reading a file into an
array, closing the file, then opening the file for writing(which erases
the file), and then writing the altered lines back to the file ‘works’.
You can test it yourself and see that it works. You can rewrite the
file like that 1,000 times and it will ‘work’.

However, if your data is important you need to ask yourself the
question: what happens if my program crashes while I am writing the data
back out to the file?

So let’s ask that question about the solution you’ve decided to adopt.
Suppose your program is at the point where it has written half of the
altered data back to the file, and the file contains half altered data
and half original data. Then your program crashes. What are you left
with?

The “rewriting a file” issue has been hashed out by many programmers for
decades. You can either try to come up with your own screwy method, or
you can adopt an accepted idiom. Within the accepted idiom, modules
like Tempfile were created to deal with the problem of overwriting an
existing file name, and it provides a shortcut.

truncate: Not available on all platforms.

That should also be a red flag. Generally, you should strive to write
cross platform programs.

adambeazley · May 2, 2009, 9:37pm

On Sat, May 2, 2009 at 4:05 AM, 7stud – [email protected] wrote:

existing file name, and it provides a shortcut.
Thanks for going into the detail here. I threw the atomic save
solution out there, but didn’t give much background, and this
establishes a much better motivation for it.

-greg

adambeazley · May 2, 2009, 10:29am

7stud – wrote:

Adam B. wrote:

On Fri, May 1, 2009 at 4:33 PM, James D. [email protected] wrote:

end

This opens a file for reading and writing, reads the file into an array,
then you modify the array how you need to. Then the block returns to
the beginning of the file, writes out your changes over the existing
file, then chops off what’s left. Then, of course, the file is closed
when the block exits.

This is exactly what I wanted. Thanks everyone.

Adam

I think you guys are missing the point. There are lots of ways to
rewrite a file that ‘work’. For instance, simply reading a file into an
array, closing the file, then opening the file for writing(which erases
the file), and then writing the altered lines back to the file ‘works’.
You can test it yourself and see that it works. You can rewrite the
file like that 1,000 times and it will ‘work’.

However, if your data is important you need to ask yourself the
question: what happens if my program crashes while I am writing the data
back out to the file?

So let’s ask that question about the solution you’ve decided to adopt.
Suppose your program is at the point where it has written half of the
altered data back to the file, and the file contains half altered data
and half original data. Then your program crashes. What are you left
with?

Hmmm…I guess you are left with a file that you can examine and then
locate the spot where you should begin gsub’ing again.

You may however run into a problem if the program crashes in the middle
of a line. Input and output to/from files gets buffered to make things
more efficient because repeatedly accessing files is relatively slow.
For instance, when you tell your program to write a line to a file, it
doesn’t actually do that. Instead, programs store the line in a buffer.
Then when the buffer fills up, the contents of the buffer get written to
the file in one big chunk. That cuts down on the number of file
accesses. The same thing happens when you read from a file. You may
tell your program to read one line from a file, but your program will
ignore you. Instead, your program will read a chunk of the file and
store it in a buffer. Then If you request more lines from the file,
your program will retrieve them from the buffer. That cuts down on the
number of times your program has to access the file.

As a result, I think even though you may write one line at a time to the
file, it’s possible the buffer may get written to the file where the
last thing in the buffer is half a line. Then if your program crashes
you are going to have a corrupted line in your file.