Open file, get first line, delete first line close file

indigoi · August 23, 2008, 9:25pm

Hey, i’m trying to open a file, get the first line of the file, delete
that line from the file, and then close the file. Using ruby 1.8.6.

I’ve tried using

The_File = IO.readlines(“public/languages/chinese/practice.txt”)

&

The_File = open(‘public/languages/chinese/practice.txt’,‘r+’)

but i can’t figure out the correct syntax for how to delete a line in a
file, and then save that file.

indigoi · August 24, 2008, 1:36am

That is one option…the reason i need to be able to delete lines, is
that i am running a rake task on a server, that i cannot easily modify
files, and I can’t predict how long the rake task will time out. The
task takes entries from text based dictionaries and adds it to my DB,
the thing is, if i didn’t delete the lines i’ve already added, everytime
i ran the task, (it has to be run multiple times due to timeouts) i
would only re-add the same lines. The deletion acts as a place holder of
sorts. I’ll play around with your suggestion, and i’ll let you know. If
anyone else has an alternate method…i’m all ears

indigoi · August 24, 2008, 1:27am

Richard S. wrote:

Hey, i’m trying to open a file, get the first line of the file, delete
that line from the file, and then close the file. Using ruby 1.8.6.

I’ve tried using

The_File = IO.readlines(“public/languages/chinese/practice.txt”)

&

The_File = open(‘public/languages/chinese/practice.txt’,‘r+’)

but i can’t figure out the correct syntax for how to delete a line in a
file, and then save that file.

Simplest thing is to get it into memory like you did with readlines
above and write out the altered contents to a new file and then move it.
(I prefer File.readlines - although this is the same method either way).
Large files might require different treatment rather than slurping into
memory like that.

That being said, I’d be curious to hear how people use r+ mode.

Daniel

indigoi · August 24, 2008, 1:43am

Richard S. wrote:

That is one option…the reason i need to be able to delete lines, is
that i am running a rake task on a server, that i cannot easily modify
files, and I can’t predict how long the rake task will time out. The
task takes entries from text based dictionaries and adds it to my DB,
the thing is, if i didn’t delete the lines i’ve already added, everytime
i ran the task, (it has to be run multiple times due to timeouts) i
would only re-add the same lines. The deletion acts as a place holder of
sorts. I’ll play around with your suggestion, and i’ll let you know. If
anyone else has an alternate method…i’m all ears

Its gross but it works

active_dictionary =
File.readlines(“public/languages/chinese/practice.txt”)

     open('public/languages/chinese/practice.txt', 'w') do |file|
         file.puts active_dictionary[1,active_dictionary.size]
     end

Once again, i’m interested in different approaches, or something, not
quite so processor intensive

indigoi · August 24, 2008, 10:19pm

Richard S. wrote:

but i can’t figure out the correct syntax for how to delete a line in a
file, and then save that file.

tail +2 the_file > the_new_file

Not all problems are best solved with ruby

indigoi · August 24, 2008, 12:13pm

Richard S. wrote:
…

Its gross but it works

I don’t know if it’s all that gross. I have feeling this is
standard way to do it for apps and editors ie write out entire altered
content
after working with file in-memory using whatever scheme.
Different story if you’re a database I guess.

Here is one scheme some text editors use:
Gap buffer - Wikipedia
although it doesn’t discuss file system/persistence issues.
Presumably when you hit the save button, the system writes
out to a new file (the gap buffer is not playing around with
the old file stream).

If you’re just replacing stuff character for character, then
it seems ok to use the file stream (in r+ mode) or if you’re
appending (or both); but deleting or inserting content seems
problematic - not sure it’s possible let alone standardized.
Anyone want to weigh in here?

Once again, i’m interested in different approaches, or something, not
quite so processor intensive

If the file is really large, you can perhaps just move through the
stream till you get to the point where you want to start
then commence writing from the old stream to the new file stream.
May be ways to optimise it.

Sparse files and fixed line lengths ?

Maybe I’ve said enough wrong things to provoke a reacion
from someone else.

Daniel

indigoi · August 24, 2008, 10:34pm

Erik H. wrote:

Not all problems are best solved with ruby

And sometimes the problems are not with the program but with the data
structure – maybe a flat file isn’t the right way to do things?

And… it’s a lot easier to delete the last line of a file than the
first.

Just some thoughts.

indigoi · August 25, 2008, 2:08pm

On Aug 25, 2008, at 4:07 AM, Erik H. wrote:

Dave B. wrote:

And… it’s a lot easier to delete the last line of a file than the
first.

I don’t think this is actually true, can you explain further?

-Erik

You can just truncate the file size. You don’t have any subsequent
lines (bytes) to move into a new position within the file.

-Rob

Rob B. http://agileconsultingllc.com
[email protected]

indigoi · August 25, 2008, 3:25pm

Rob B. wrote:

On Aug 25, 2008, at 4:07 AM, Erik H. wrote:

Dave B. wrote:

And… it’s a lot easier to delete the last line of a file than the
first.

I don’t think this is actually true, can you explain further?

-Erik

You can just truncate the file size. You don’t have any subsequent
lines (bytes) to move into a new position within the file.

How do you know which line is the last line?

Unless there’s something I don’t know, that involves reading the whole
file, or a combination of seek/read from the end until you find the last
newline, which is essentially what tail +2 does, but starts at the
beginning of the file.

“Moving” data in a file is the worst possible scenario for I/O at all.
You can do both of these operations in a single pass read of the file
without shoving the whole thing into memory at once. It just involves
writing to one file and reading from another, is all.

indigoi · August 25, 2008, 3:51pm

On Aug 25, 2008, at 9:21 AM, Erik H. wrote:

-Erik
newline, which is essentially what tail +2 does, but starts at the
beginning of the file.

“Moving” data in a file is the worst possible scenario for I/O at all.
You can do both of these operations in a single pass read of the file
without shoving the whole thing into memory at once. It just involves
writing to one file and reading from another, is all.

Posted via http://www.ruby-forum.com/.

Well, if you want/need the last line(s) of a file (presumably text or
how would you define a “line”), you can take a look at the File::Tail
gem.

gem install file-tail

I had some Perl code (lifted from some forum or article) that would
cut initial lines out of a log file using sysread/syswrite with a
truncate to reset the end-of-file. I don’t recall if it used a single
file descriptor or two separate ones, but the idea is the same – move
bytes “backward” across the gap that you want to eliminate. I agree
with your “worst possible scenario for I/O” assessment.

-Rob

Rob B. http://agileconsultingllc.com
[email protected]

indigoi · August 25, 2008, 10:12am

Dave B. wrote:

And… it’s a lot easier to delete the last line of a file than the
first.

I don’t think this is actually true, can you explain further?

-Erik

indigoi · August 25, 2008, 4:00pm

Rob B. wrote:

I had some Perl code (lifted from some forum or article) that would
cut initial lines out of a log file using sysread/syswrite with a
truncate to reset the end-of-file. I don’t recall if it used a single
file descriptor or two separate ones, but the idea is the same – move
bytes “backward” across the gap that you want to eliminate. I agree
with your “worst possible scenario for I/O” assessment.

You are talking about tail -f. This is different.

(And if you ever need to find that again, perldoc -q tail).

-Erik

indigoi · August 25, 2008, 6:18pm

Erik H. wrote:

Dave B. wrote:

And… it’s a lot easier to delete the last line of a file than the
first.

I don’t think this is actually true, can you explain further?

My Ruby isn’t up to coding it, but in principle I’d seek to the end of
the file, then backtrack until I found the appropriate newline. Then I’d
truncate the file.

indigoi · August 25, 2008, 8:57pm

the thing is, if i didn’t delete the lines i’ve already added, everytime
i ran the task, (it has to be run multiple times due to timeouts) i
would only re-add the same lines. The deletion acts as a place holder of
sorts. I’ll play around with your suggestion, and i’ll let you know. If
anyone else has an alternate method…i’m all ears

I suggest exploring a distributed worker system like Rinda,
Backgroundrb, AP4R, or Sparrow. You can prepare a master list of
dictionary words, and then worker processes can take one at a time and
add them to your database. Having a timeout won’t slow things down,
nor will it cause you to have to re-read your wordlist.

indigoi · August 26, 2008, 12:20am

I’ve set it up with a rake task and a cron job, that re-runs every 5hrs
(my timeout window)

i this is the code that i’ve already uploaded, and is currently
running…

namespace :chinese do

desc “adds all chinese files to database”
task :create => :environment do

active_dictionary =
File.readlines(“public/languages/chinese/practice.txt”)
count = 0
for @element in active_dictionary
count += 1
process_chinese
open(‘public/languages/chinese/practice.txt’, ‘w’) do
|file|
file.puts
active_dictionary[count,active_dictionary.size]
end

end

end

where process_chinese contains all my proprietary code, i played around
with only writing to the file every time i’ve processed ten entries, but
it only cut my process time down by a trivial amount of time, so i just
let it write over the file after every line. As we’ve seen here, there
are quite a few ways this can be accomplished, this ended up
working…and didn’t kill my processor (the writes to the DB are
infinitely more expensive than opening and writing to this file).

Thanks for the help!!
Richard

indigoi · August 27, 2008, 1:59am

Erik H. wrote:

Richard S. wrote:
If the file is exceptionally large, you can save a lot of memory (and
processing time, likely), by doing something like this:

File.open(“my_file”) do |f|
f.readline
File.open(“my_file.tmp”, ‘w’) do |f2|
f2 << f.read
end
end

FileUtils.mv(“my_file.tmp”, “my_file”)

Just on the “f2 << f.read” part, isn’t this still reading the rest of
the file into ruby?
I was thinking more of reading stuff into a fixed buffer and then
writing it.
ie
while buf=f.read(32000) # bytes
f2.write buf # or f2 << buf
end
which will result in a bazillion more calls to IO#read and IO#write on a
large file but doesn’t read the whole thing into memory. I’m not
recommending this or anything - just wanted to clarify.

Daniel

indigoi · August 26, 2008, 12:32am

Richard S. wrote:

i this is the code that i’ve already uploaded, and is currently
running…

If the file is exceptionally large, you can save a lot of memory (and
processing time, likely), by doing something like this:

File.open(“my_file”) do |f|
f.readline
File.open(“my_file.tmp”, ‘w’) do |f2|
f2 << f.read
end
end

FileUtils.mv(“my_file.tmp”, “my_file”)

The point here is that almost all the work is done on the file
descriptors instead of in memory. I don’t know if ruby has a sendfile()
implementation, but that would be the most ideal, as it’d instruct the
OS to do the copy.

indigoi · July 12, 2016, 7:15pm

Hello Richard, This worked perfectly, can i get the code for deleting
first letter of each row for the same file.

Open file, get first line, delete first line close file

“Moving” data in a file is the worst possible scenario for I/O at all. You can do both of these operations in a single pass read of the file without shoving the whole thing into memory at once. It just involves writing to one file and reading from another, is all.

“Moving” data in a file is the worst possible scenario for I/O at all.
You can do both of these operations in a single pass read of the file
without shoving the whole thing into memory at once. It just involves
writing to one file and reading from another, is all.