Forum: Ruby get first and last line from txt file - how?

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Mmcolli00 M. (Guest)
on 2008-12-20 16:24
I have txt file with date/time stamps only. I want to grab the first
date/time and the last date/time. For instance, I will be needing
08/09/08 3:00 and 08/24/08 3:00 from the below queued.txt. Do you know
how I can pull these out? Thanks in advance.

queued.txt
8/09/08 3:00
8/10/08 5:00
8/23/08 22:00
8/24/08 3:00

firstDate = ""
lastDate = ""

File.open('queued.txt', 'r') do |f1|
 while line = f1.gets
   if f1.lineno ==  1 then #<-this would only give me 8/09/08 3:00
    @@fistDate = f1
   end
end
Tim H. (Guest)
on 2008-12-20 16:39
(Received via mailing list)
Mmcolli00 Mom wrote:
>
> firstDate = ""
> lastDate = ""
>
> File.open('queued.txt', 'r') do |f1|
>  while line = f1.gets
>    if f1.lineno ==  1 then #<-this would only give me 8/09/08 3:00
>     @@fistDate = f1
>    end
> end

lines = IO.readlines("queued.txt")
first = lines.first
last = lines.last

puts first
puts last
Yaser S. (Guest)
on 2008-12-20 17:44
(Received via mailing list)
I'm just wondering..
Let's say that we only need to read the last line. Can we do that
without
reading the other lines?

Regards,
Yaser S.
Ch B. (Guest)
on 2008-12-20 18:02
Yaser S. wrote:
> I'm just wondering..
> Let's say that we only need to read the last line. Can we do that
> without
> reading the other lines?
>
> Regards,
> Yaser S.


 It would work the same? Or do you mean without loading up the entire
file?

lines = IO.readlines("foo.bar")

puts lines.last
Yaser S. (Guest)
on 2008-12-20 18:11
(Received via mailing list)
On Sat, Dec 20, 2008 at 6:54 PM, Ch Ba <removed_email_address@domain.invalid> 
wrote:
>
>  It would work the same? Or do you mean without loading up the entire
> file?

Yep, that is exactly what I mean.
Tim H. (Guest)
on 2008-12-20 18:23
(Received via mailing list)
Yaser S. wrote:
> On Sat, Dec 20, 2008 at 6:54 PM, Ch Ba <removed_email_address@domain.invalid> wrote:
>>  It would work the same? Or do you mean without loading up the entire
>> file?
>
> Yep, that is exactly what I mean.
>

If you know where the last line starts (that is, the byte offset of the
first character in the last line) then you could use IO#seek to seek to
that offset and then read.

How do you know where the last line starts? When you write the file,
call IO#tell to get the current byte offset before you write the last
line.
Thomas P. (Guest)
on 2008-12-20 18:55
(Received via mailing list)
2008/12/20 Yaser S. <removed_email_address@domain.invalid>:
> I'm just wondering..
> Let's say that we only need to read the last line. Can we do that without
> reading the other lines?

Yes. Position your file pointer to the last byte in a file, read and
collect backwards each byte until you find a newline character (or the
first byte of the file). This is the last line.

-Thomas
Robert K. (Guest)
on 2008-12-21 16:55
(Received via mailing list)
On 20.12.2008 17:46, Thomas P. wrote:
> 2008/12/20 Yaser S. <removed_email_address@domain.invalid>:
>> I'm just wondering..
>> Let's say that we only need to read the last line. Can we do that without
>> reading the other lines?
>
> Yes. Position your file pointer to the last byte in a file, read and
> collect backwards each byte until you find a newline character (or the
> first byte of the file). This is the last line.

You have to admit that this approach is rather inefficient.  Here's a
more efficient variant - especially for large files:

$ cat r.rb
#!/bin/env ruby

OFFSET = 512 # > 2 * assumed avg line length

file = ARGV.shift or abort "ERROR: need a file name"

File.open file do |io|
   first = io.gets
   break unless first
   puts first

   limit = io.stat.size
   offset = OFFSET
   lines = []

   while lines.size < 2 && offset <= limit
     io.seek -offset, IO::SEEK_END
     lines = io.readlines
     offset += OFFSET
   end # while lines.size < 2

   puts lines.last unless lines.empty?
end

Cheers

  robert
Thomas P. (Guest)
on 2008-12-21 20:25
(Received via mailing list)
2008/12/21 Robert K. <removed_email_address@domain.invalid>:
>> first byte of the file). This is the last line.
>
> You have to admit that this approach is rather inefficient.

Really?

I did a comparision of your code and my idea:

$ time ruby r.rb input
111111111111111111111111111111
999999999999999999999999999999

real  0m0.053s
user  0m0.000s
sys  0m0.004s

$ time ruby t.rb input
999999999999999999999999999999

real  0m0.043s
user  0m0.004s
sys  0m0.000s

the first result is your code, the second is mine.

I did the tests with a test file with almost 8,000,000 lines.

My q&d code:

f=File.open("input")
pos = 2
f.seek(-pos, File::SEEK_END)
c = f.getc
result = ''
while c.chr != "\n"
  result.insert(0,c.chr)
  pos += 1
  f.seek(-pos, File::SEEK_END)
  c = f.getc
end
f.close

puts result

-Thomas
Simon K. (Guest)
on 2008-12-21 23:05
(Received via mailing list)
* Yaser S. <removed_email_address@domain.invalid> (2008-12-20) schrieb:

> I'm just wondering..
> Let's say that we only need to read the last line. Can we do that without
> reading the other lines?

Yes, of course. It's exactly the same problem as reading the first line.
The only difference is that there is a standard function for the first
line: gets.

For the last line you need to implement it yourself.

If I had mmap in Ruby, I'd just map the file into memory and do
mapped_file[/^.*\z/].

mfg,                      simon .... l
Brian C. (Guest)
on 2008-12-22 00:04
Why re-invent the wheel?

lastline = `tail -1 queued.txt`
Simon K. (Guest)
on 2008-12-22 02:05
(Received via mailing list)
* Brian C. <removed_email_address@domain.invalid> (22:56) schrieb:

> Why re-invent the wheel?
>
> lastline = `tail -1 queued.txt`

Cause there's not always a tool out there to do the job.

New programming languages are always reinventing wheels.

mfg,             simon .... l
Chris S. (Guest)
on 2008-12-22 02:36
(Received via mailing list)
On Dec 20, 7:16 am, Mmcolli00 Mom <removed_email_address@domain.invalid> wrote:
>
> Posted viahttp://www.ruby-forum.com/.
Aside from the suggestions already made for getting just the last
line, there's also James G.'s Elif: http://elif.rubyforge.org/

HTH,
Chris
Peña, Botp (Guest)
on 2008-12-22 02:56
(Received via mailing list)
From: Thomas P. [mailto:removed_email_address@domain.invalid]
# Really?
# ....
# I did the tests with a test file with almost 8,000,000 lines.

test w zero or one line first

# My q&d code:
#
# f=File.open("input")
# pos = 2
# f.seek(-pos, File::SEEK_END)
# c = f.getc
# result = ''
# while c.chr != "\n"


quick reaction:  this would sure to fail on zero-or-one-liners that do
not end w a newline, no?


#   result.insert(0,c.chr)
#   pos += 1
#   f.seek(-pos, File::SEEK_END)
#   c = f.getc
# end
# f.close
Thomas P. (Guest)
on 2008-12-22 03:48
(Received via mailing list)
2008/12/22 Peña, Botp <removed_email_address@domain.invalid>:
> quick reaction:  this would sure to fail on zero-or-one-liners that do not end w a 
newline, no?

Maybe, but this was only code to illustrate my idea. In a real
implementation it would be necessary to consider these circumstances.

-Thomas
Robert H. (Guest)
on 2008-12-22 12:19
> Why re-invent the wheel?

Because your wheel will not work on i.e. Windows without the "tail"
binary, but the ruby wheel will work wherever ruby works. And I honestly
think that everything that is possible in ruby, should be done as well.
The whole modularity of Unix tools has also led to shell scripts, which
are just plain UGLY and a mess to maintain, especially the more
complicated they grow (which is only less true for ruby scripts, because
maintaining even complicated ruby scripts is a lot easier IMHO)

I personally would rather maintain a collection of ruby or python files,
than countless shell scripts that use various tools with various
different syntax rules (awk, sed, grep and so on) to cope with.

Noone will use a wooden wheel to drive on the 24 Hours of Le Mans.

Use the better wheel.

Use Ruby.
James G. (Guest)
on 2008-12-22 15:44
(Received via mailing list)
On Dec 22, 2008, at 4:11 AM, Marc H. wrote:

>> Why re-invent the wheel?
>
> Because your wheel will not work on i.e. Windows without the "tail"
> binary, but the ruby wheel will work wherever ruby works.

Definitely have a look at Elif then.  It's a tail like algorithm in
pure Ruby.

James Edward G. II
Robert K. (Guest)
on 2008-12-22 18:11
(Received via mailing list)
On 21.12.2008 19:16, Thomas P. wrote:
>
> sys  0m0.004s
>
> $ time ruby t.rb input
> 999999999999999999999999999999
>
> real  0m0.043s
> user  0m0.004s
> sys  0m0.000s
>
> the first result is your code, the second is mine.

Did you make sure that no OS disk buffering distorts this result?  I
suggest to include both variants in a single script, execute each
variant multiple times in a loop and use Benchmark#bmbm.

> I did the tests with a test file with almost 8,000,000 lines.
>
> My q&d code:
>
> f=File.open("input")
> pos = 2

This opens the door for character loss of the last line under certain
conditions.

>
> puts result

Also, this code is not equivalent to mine as it does not output the
first line - which you can nicely see from the console output shown
above.

Please keep also in mind, that my code tries to do some error checking
which avoids printing the line from a single line file twice (although
that bit is a slightly flawed, I'll leave that debugging task as
exercise for the reader).

A final remark: using the block form of File.open is always safer.

Cheers

  robert
Brian C. (Guest)
on 2008-12-22 22:34
Marc H. wrote:
>> Why re-invent the wheel?
>
> Because your wheel will not work on i.e. Windows without the "tail"
> binary, but the ruby wheel will work wherever ruby works.

Sure. But if this particular poster is running under Linux, or cygwin,
or MacOS X, then the `tail` solution is (a) dead quick to write, and (b)
already highly optimised. As has been pointed out, the algorithm for
doing tail efficiently is not as easy as it might first appear.

If the OP is not writing this code for his personal use, but in a
library which must be as widely portable as possible, then of course a
pure Ruby solution is going to be beneficial. But in that case, he may
wish to consider releasing the tail algorithm as a standalone library.

> The whole modularity of Unix tools has also led to shell scripts, which
> are just plain UGLY and a mess to maintain, especially the more
> complicated they grow (which is only less true for ruby scripts, because
> maintaining even complicated ruby scripts is a lot easier IMHO)

I agree: fork/exec, argv, env and stdin/stdout are a fairly lousy API,
but:

> Use the better wheel.

The tail wheel in gnu coreutils is a highly polished, aerodynamic and
tested one.
This topic is locked and can not be replied to.