Get first and last line from txt file - how?

mmcolli00 · December 20, 2008, 3:24pm

I have txt file with date/time stamps only. I want to grab the first
date/time and the last date/time. For instance, I will be needing
08/09/08 3:00 and 08/24/08 3:00 from the below queued.txt. Do you know
how I can pull these out? Thanks in advance.

queued.txt
8/09/08 3:00
8/10/08 5:00
8/23/08 22:00
8/24/08 3:00

firstDate = “”
lastDate = “”

File.open(‘queued.txt’, ‘r’) do |f1|
while line = f1.gets
if f1.lineno == 1 then #<-this would only give me 8/09/08 3:00
@@fistDate = f1
end
end

mmcolli00 · December 20, 2008, 3:39pm

Mmcolli00 Mom wrote:

firstDate = “”
lastDate = “”

File.open(‘queued.txt’, ‘r’) do |f1|
while line = f1.gets
if f1.lineno == 1 then #<-this would only give me 8/09/08 3:00
@@fistDate = f1
end
end

lines = IO.readlines(“queued.txt”)
first = lines.first
last = lines.last

puts first
puts last

mmcolli00 · December 20, 2008, 4:44pm

I’m just wondering…
Let’s say that we only need to read the last line. Can we do that
without
reading the other lines?

Regards,
Yaser S.

mmcolli00 · December 20, 2008, 5:02pm

Yaser S. wrote:

I’m just wondering…
Let’s say that we only need to read the last line. Can we do that
without
reading the other lines?

Regards,
Yaser S.

It would work the same? Or do you mean without loading up the entire
file?

lines = IO.readlines(“foo.bar”)

puts lines.last

mmcolli00 · December 20, 2008, 5:23pm

Yaser S. wrote:

On Sat, Dec 20, 2008 at 6:54 PM, Ch Ba [email protected] wrote:

It would work the same? Or do you mean without loading up the entire
file?

Yep, that is exactly what I mean.

If you know where the last line starts (that is, the byte offset of the
first character in the last line) then you could use IO#seek to seek to
that offset and then read.

How do you know where the last line starts? When you write the file,
call IO#tell to get the current byte offset before you write the last
line.

mmcolli00 · December 20, 2008, 5:55pm

2008/12/20 Yaser S. [email protected]:

I’m just wondering…
Let’s say that we only need to read the last line. Can we do that without
reading the other lines?

Yes. Position your file pointer to the last byte in a file, read and
collect backwards each byte until you find a newline character (or the
first byte of the file). This is the last line.

-Thomas

mmcolli00 · December 20, 2008, 5:11pm

On Sat, Dec 20, 2008 at 6:54 PM, Ch Ba [email protected] wrote:

It would work the same? Or do you mean without loading up the entire
file?

Yep, that is exactly what I mean.

mmcolli00 · December 21, 2008, 7:25pm

2008/12/21 Robert K. [email protected]:

first byte of the file). This is the last line.

You have to admit that this approach is rather inefficient.

Really?

I did a comparision of your code and my idea:

$ time ruby r.rb input
111111111111111111111111111111
999999999999999999999999999999

real 0m0.053s
user 0m0.000s
sys 0m0.004s

$ time ruby t.rb input
999999999999999999999999999999

real 0m0.043s
user 0m0.004s
sys 0m0.000s

the first result is your code, the second is mine.

I did the tests with a test file with almost 8,000,000 lines.

My q&d code:

f=File.open(“input”)
pos = 2
f.seek(-pos, File::SEEK_END)
c = f.getc
result = ‘’
while c.chr != “\n”
result.insert(0,c.chr)
pos += 1
f.seek(-pos, File::SEEK_END)
c = f.getc
end
f.close

puts result

-Thomas

mmcolli00 · December 21, 2008, 10:05pm

Yaser S. [email protected] (2008-12-20) schrieb:

I’m just wondering…
Let’s say that we only need to read the last line. Can we do that without
reading the other lines?

Yes, of course. It’s exactly the same problem as reading the first line.
The only difference is that there is a standard function for the first
line: gets.

For the last line you need to implement it yourself.

If I had mmap in Ruby, I’d just map the file into memory and do
mapped_file[/^.*\z/].

mfg, simon … l

mmcolli00 · December 21, 2008, 3:55pm

On 20.12.2008 17:46, Thomas P. wrote:

2008/12/20 Yaser S. [email protected]:

I’m just wondering…
Let’s say that we only need to read the last line. Can we do that without
reading the other lines?

Yes. Position your file pointer to the last byte in a file, read and
collect backwards each byte until you find a newline character (or the
first byte of the file). This is the last line.

You have to admit that this approach is rather inefficient. Here’s a
more efficient variant - especially for large files:

$ cat r.rb
#!/bin/env ruby

OFFSET = 512 # > 2 * assumed avg line length

file = ARGV.shift or abort “ERROR: need a file name”

File.open file do |io|
first = io.gets
break unless first
puts first

limit = io.stat.size
offset = OFFSET
lines = []

while lines.size < 2 && offset <= limit
io.seek -offset, IO::SEEK_END
lines = io.readlines
offset += OFFSET
end # while lines.size < 2

puts lines.last unless lines.empty?
end

Cheers

robert

mmcolli00 · December 22, 2008, 1:05am

Brian C. [email protected] (22:56) schrieb:

Why re-invent the wheel?

lastline = tail -1 queued.txt

Cause there’s not always a tool out there to do the job.

New programming languages are always reinventing wheels.

mfg, simon … l

mmcolli00 · December 21, 2008, 11:04pm

Why re-invent the wheel?

lastline = tail -1 queued.txt

mmcolli00 · December 22, 2008, 1:36am

On Dec 20, 7:16 am, Mmcolli00 Mom [email protected] wrote:

Posted viahttp://www.ruby-forum.com/.
Aside from the suggestions already made for getting just the last
line, there’s also James G.'s Elif: http://elif.rubyforge.org/

HTH,
Chris

mmcolli00 · December 22, 2008, 2:48am

2008/12/22 Peña, Botp [email protected]:

quick reaction: this would sure to fail on zero-or-one-liners that do not end w a newline, no?

Maybe, but this was only code to illustrate my idea. In a real
implementation it would be necessary to consider these circumstances.

-Thomas

mmcolli00 · December 22, 2008, 1:56am

From: Thomas P. [mailto:[email protected]]

Really?

…

I did the tests with a test file with almost 8,000,000 lines.

test w zero or one line first

My q&d code:

f=File.open(“input”)

pos = 2

f.seek(-pos, File::SEEK_END)

c = f.getc

result = ‘’

while c.chr != “\n”

quick reaction: this would sure to fail on zero-or-one-liners that do
not end w a newline, no?

result.insert(0,c.chr)

pos += 1

f.seek(-pos, File::SEEK_END)

c = f.getc

end

f.close

mmcolli00 · December 22, 2008, 11:19am

Why re-invent the wheel?

Because your wheel will not work on i.e. Windows without the “tail”
binary, but the ruby wheel will work wherever ruby works. And I honestly
think that everything that is possible in ruby, should be done as well.
The whole modularity of Unix tools has also led to shell scripts, which
are just plain UGLY and a mess to maintain, especially the more
complicated they grow (which is only less true for ruby scripts, because
maintaining even complicated ruby scripts is a lot easier IMHO)

I personally would rather maintain a collection of ruby or python files,
than countless shell scripts that use various tools with various
different syntax rules (awk, sed, grep and so on) to cope with.

Noone will use a wooden wheel to drive on the 24 Hours of Le Mans.

Use the better wheel.

Use Ruby.

mmcolli00 · December 22, 2008, 2:44pm

On Dec 22, 2008, at 4:11 AM, Marc H. wrote:

Why re-invent the wheel?

Because your wheel will not work on i.e. Windows without the “tail”
binary, but the ruby wheel will work wherever ruby works.

Definitely have a look at Elif then. It’s a tail like algorithm in
pure Ruby.

James Edward G. II

mmcolli00 · December 22, 2008, 9:34pm

Marc H. wrote:

Why re-invent the wheel?

Because your wheel will not work on i.e. Windows without the “tail”
binary, but the ruby wheel will work wherever ruby works.

Sure. But if this particular poster is running under Linux, or cygwin,
or MacOS X, then the tail solution is (a) dead quick to write, and (b)
already highly optimised. As has been pointed out, the algorithm for
doing tail efficiently is not as easy as it might first appear.

If the OP is not writing this code for his personal use, but in a
library which must be as widely portable as possible, then of course a
pure Ruby solution is going to be beneficial. But in that case, he may
wish to consider releasing the tail algorithm as a standalone library.

The whole modularity of Unix tools has also led to shell scripts, which
are just plain UGLY and a mess to maintain, especially the more
complicated they grow (which is only less true for ruby scripts, because
maintaining even complicated ruby scripts is a lot easier IMHO)

I agree: fork/exec, argv, env and stdin/stdout are a fairly lousy API,
but:

Use the better wheel.

The tail wheel in gnu coreutils is a highly polished, aerodynamic and
tested one.

mmcolli00 · December 22, 2008, 5:11pm

On 21.12.2008 19:16, Thomas P. wrote:

sys 0m0.004s

$ time ruby t.rb input
999999999999999999999999999999

real 0m0.043s
user 0m0.004s
sys 0m0.000s

the first result is your code, the second is mine.

Did you make sure that no OS disk buffering distorts this result? I
suggest to include both variants in a single script, execute each
variant multiple times in a loop and use Benchmark#bmbm.

I did the tests with a test file with almost 8,000,000 lines.

My q&d code:

f=File.open(“input”)
pos = 2

This opens the door for character loss of the last line under certain
conditions.

puts result

Also, this code is not equivalent to mine as it does not output the
first line - which you can nicely see from the console output shown
above.

Please keep also in mind, that my code tries to do some error checking
which avoids printing the line from a single line file twice (although
that bit is a slightly flawed, I’ll leave that debugging task as
exercise for the reader).

A final remark: using the block form of File.open is always safer.

Cheers

robert