Forum: Ruby get first and last line from txt file - how?

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Cf63da956b6ba955687a2f2f262928cb?d=identicon&s=25 Mmcolli00 Mom (mmcolli00)
on 2008-12-20 15:24
I have txt file with date/time stamps only. I want to grab the first
date/time and the last date/time. For instance, I will be needing
08/09/08 3:00 and 08/24/08 3:00 from the below queued.txt. Do you know
how I can pull these out? Thanks in advance.

queued.txt
8/09/08 3:00
8/10/08 5:00
8/23/08 22:00
8/24/08 3:00

firstDate = ""
lastDate = ""

File.open('queued.txt', 'r') do |f1|
 while line = f1.gets
   if f1.lineno ==  1 then #<-this would only give me 8/09/08 3:00
    @@fistDate = f1
   end
end
3afd3e5e05dc9310c89aa5762cc8dd1d?d=identicon&s=25 Tim Hunter (Guest)
on 2008-12-20 15:39
(Received via mailing list)
Mmcolli00 Mom wrote:
>
> firstDate = ""
> lastDate = ""
>
> File.open('queued.txt', 'r') do |f1|
>  while line = f1.gets
>    if f1.lineno ==  1 then #<-this would only give me 8/09/08 3:00
>     @@fistDate = f1
>    end
> end

lines = IO.readlines("queued.txt")
first = lines.first
last = lines.last

puts first
puts last
B09f4659460545e38ece34ddd0d96b46?d=identicon&s=25 Yaser Sulaiman (Guest)
on 2008-12-20 16:44
(Received via mailing list)
I'm just wondering..
Let's say that we only need to read the last line. Can we do that
without
reading the other lines?

Regards,
Yaser Sulaiman
F3be80254c63fb37ccb1bfc5a3794d1b?d=identicon&s=25 Ch Ba (navouri)
on 2008-12-20 17:02
Yaser Sulaiman wrote:
> I'm just wondering..
> Let's say that we only need to read the last line. Can we do that
> without
> reading the other lines?
>
> Regards,
> Yaser Sulaiman


 It would work the same? Or do you mean without loading up the entire
file?

lines = IO.readlines("foo.bar")

puts lines.last
B09f4659460545e38ece34ddd0d96b46?d=identicon&s=25 Yaser Sulaiman (Guest)
on 2008-12-20 17:11
(Received via mailing list)
On Sat, Dec 20, 2008 at 6:54 PM, Ch Ba <navouri@gmail.com> wrote:
>
>  It would work the same? Or do you mean without loading up the entire
> file?

Yep, that is exactly what I mean.
3afd3e5e05dc9310c89aa5762cc8dd1d?d=identicon&s=25 Tim Hunter (Guest)
on 2008-12-20 17:23
(Received via mailing list)
Yaser Sulaiman wrote:
> On Sat, Dec 20, 2008 at 6:54 PM, Ch Ba <navouri@gmail.com> wrote:
>>  It would work the same? Or do you mean without loading up the entire
>> file?
>
> Yep, that is exactly what I mean.
>

If you know where the last line starts (that is, the byte offset of the
first character in the last line) then you could use IO#seek to seek to
that offset and then read.

How do you know where the last line starts? When you write the file,
call IO#tell to get the current byte offset before you write the last
line.
Cd6b438f1238ee36cf4daecbae1d3917?d=identicon&s=25 Thomas Preymesser (Guest)
on 2008-12-20 17:55
(Received via mailing list)
2008/12/20 Yaser Sulaiman <yaserbuntu@gmail.com>:
> I'm just wondering..
> Let's say that we only need to read the last line. Can we do that without
> reading the other lines?

Yes. Position your file pointer to the last byte in a file, read and
collect backwards each byte until you find a newline character (or the
first byte of the file). This is the last line.

-Thomas
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2008-12-21 15:55
(Received via mailing list)
On 20.12.2008 17:46, Thomas Preymesser wrote:
> 2008/12/20 Yaser Sulaiman <yaserbuntu@gmail.com>:
>> I'm just wondering..
>> Let's say that we only need to read the last line. Can we do that without
>> reading the other lines?
>
> Yes. Position your file pointer to the last byte in a file, read and
> collect backwards each byte until you find a newline character (or the
> first byte of the file). This is the last line.

You have to admit that this approach is rather inefficient.  Here's a
more efficient variant - especially for large files:

$ cat r.rb
#!/bin/env ruby

OFFSET = 512 # > 2 * assumed avg line length

file = ARGV.shift or abort "ERROR: need a file name"

File.open file do |io|
   first = io.gets
   break unless first
   puts first

   limit = io.stat.size
   offset = OFFSET
   lines = []

   while lines.size < 2 && offset <= limit
     io.seek -offset, IO::SEEK_END
     lines = io.readlines
     offset += OFFSET
   end # while lines.size < 2

   puts lines.last unless lines.empty?
end

Cheers

  robert
Cd6b438f1238ee36cf4daecbae1d3917?d=identicon&s=25 Thomas Preymesser (Guest)
on 2008-12-21 19:25
(Received via mailing list)
2008/12/21 Robert Klemme <shortcutter@googlemail.com>:
>> first byte of the file). This is the last line.
>
> You have to admit that this approach is rather inefficient.

Really?

I did a comparision of your code and my idea:

$ time ruby r.rb input
111111111111111111111111111111
999999999999999999999999999999

real  0m0.053s
user  0m0.000s
sys  0m0.004s

$ time ruby t.rb input
999999999999999999999999999999

real  0m0.043s
user  0m0.004s
sys  0m0.000s

the first result is your code, the second is mine.

I did the tests with a test file with almost 8,000,000 lines.

My q&d code:

f=File.open("input")
pos = 2
f.seek(-pos, File::SEEK_END)
c = f.getc
result = ''
while c.chr != "\n"
  result.insert(0,c.chr)
  pos += 1
  f.seek(-pos, File::SEEK_END)
  c = f.getc
end
f.close

puts result

-Thomas
1d53b088a989e069b94597c282eebbbc?d=identicon&s=25 Simon Krahnke (Guest)
on 2008-12-21 22:05
(Received via mailing list)
* Yaser Sulaiman <yaserbuntu@gmail.com> (2008-12-20) schrieb:

> I'm just wondering..
> Let's say that we only need to read the last line. Can we do that without
> reading the other lines?

Yes, of course. It's exactly the same problem as reading the first line.
The only difference is that there is a standard function for the first
line: gets.

For the last line you need to implement it yourself.

If I had mmap in Ruby, I'd just map the file into memory and do
mapped_file[/^.*\z/].

mfg,                      simon .... l
753dcb78b3a3651127665da4bed3c782?d=identicon&s=25 Brian Candler (candlerb)
on 2008-12-21 23:04
Why re-invent the wheel?

lastline = `tail -1 queued.txt`
1d53b088a989e069b94597c282eebbbc?d=identicon&s=25 Simon Krahnke (Guest)
on 2008-12-22 01:05
(Received via mailing list)
* Brian Candler <b.candler@pobox.com> (22:56) schrieb:

> Why re-invent the wheel?
>
> lastline = `tail -1 queued.txt`

Cause there's not always a tool out there to do the job.

New programming languages are always reinventing wheels.

mfg,             simon .... l
017e05d1a49ffa59ea03e149e7af720b?d=identicon&s=25 Chris Shea (chrisshea)
on 2008-12-22 01:36
(Received via mailing list)
On Dec 20, 7:16 am, Mmcolli00 Mom <mmc_coll...@yahoo.com> wrote:
>
> Posted viahttp://www.ruby-forum.com/.
Aside from the suggestions already made for getting just the last
line, there's also James Gray's Elif: http://elif.rubyforge.org/

HTH,
Chris
6087a044557d6b59ab52e7dd20f94da8?d=identicon&s=25 Peña, Botp (Guest)
on 2008-12-22 01:56
(Received via mailing list)
From: Thomas Preymesser [mailto:thopre@gmail.com]
# Really?
# ....
# I did the tests with a test file with almost 8,000,000 lines.

test w zero or one line first

# My q&d code:
#
# f=File.open("input")
# pos = 2
# f.seek(-pos, File::SEEK_END)
# c = f.getc
# result = ''
# while c.chr != "\n"


quick reaction:  this would sure to fail on zero-or-one-liners that do
not end w a newline, no?


#   result.insert(0,c.chr)
#   pos += 1
#   f.seek(-pos, File::SEEK_END)
#   c = f.getc
# end
# f.close
Cd6b438f1238ee36cf4daecbae1d3917?d=identicon&s=25 Thomas Preymesser (Guest)
on 2008-12-22 02:48
(Received via mailing list)
2008/12/22 Peña, Botp <botp@delmonte-phil.com>:
> quick reaction:  this would sure to fail on zero-or-one-liners that do not end w a 
newline, no?

Maybe, but this was only code to illustrate my idea. In a real
implementation it would be necessary to consider these circumstances.

-Thomas
4828d528e2e46f7c8160c336eb332836?d=identicon&s=25 Robert Heiler (shevegen)
on 2008-12-22 11:19
> Why re-invent the wheel?

Because your wheel will not work on i.e. Windows without the "tail"
binary, but the ruby wheel will work wherever ruby works. And I honestly
think that everything that is possible in ruby, should be done as well.
The whole modularity of Unix tools has also led to shell scripts, which
are just plain UGLY and a mess to maintain, especially the more
complicated they grow (which is only less true for ruby scripts, because
maintaining even complicated ruby scripts is a lot easier IMHO)

I personally would rather maintain a collection of ruby or python files,
than countless shell scripts that use various tools with various
different syntax rules (awk, sed, grep and so on) to cope with.

Noone will use a wooden wheel to drive on the 24 Hours of Le Mans.

Use the better wheel.

Use Ruby.
4299e35bacef054df40583da2d51edea?d=identicon&s=25 James Gray (bbazzarrakk)
on 2008-12-22 14:44
(Received via mailing list)
On Dec 22, 2008, at 4:11 AM, Marc Heiler wrote:

>> Why re-invent the wheel?
>
> Because your wheel will not work on i.e. Windows without the "tail"
> binary, but the ruby wheel will work wherever ruby works.

Definitely have a look at Elif then.  It's a tail like algorithm in
pure Ruby.

James Edward Gray II
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2008-12-22 17:11
(Received via mailing list)
On 21.12.2008 19:16, Thomas Preymesser wrote:
>
> sys  0m0.004s
>
> $ time ruby t.rb input
> 999999999999999999999999999999
>
> real  0m0.043s
> user  0m0.004s
> sys  0m0.000s
>
> the first result is your code, the second is mine.

Did you make sure that no OS disk buffering distorts this result?  I
suggest to include both variants in a single script, execute each
variant multiple times in a loop and use Benchmark#bmbm.

> I did the tests with a test file with almost 8,000,000 lines.
>
> My q&d code:
>
> f=File.open("input")
> pos = 2

This opens the door for character loss of the last line under certain
conditions.

>
> puts result

Also, this code is not equivalent to mine as it does not output the
first line - which you can nicely see from the console output shown
above.

Please keep also in mind, that my code tries to do some error checking
which avoids printing the line from a single line file twice (although
that bit is a slightly flawed, I'll leave that debugging task as
exercise for the reader).

A final remark: using the block form of File.open is always safer.

Cheers

  robert
753dcb78b3a3651127665da4bed3c782?d=identicon&s=25 Brian Candler (candlerb)
on 2008-12-22 21:34
Marc Heiler wrote:
>> Why re-invent the wheel?
>
> Because your wheel will not work on i.e. Windows without the "tail"
> binary, but the ruby wheel will work wherever ruby works.

Sure. But if this particular poster is running under Linux, or cygwin,
or MacOS X, then the `tail` solution is (a) dead quick to write, and (b)
already highly optimised. As has been pointed out, the algorithm for
doing tail efficiently is not as easy as it might first appear.

If the OP is not writing this code for his personal use, but in a
library which must be as widely portable as possible, then of course a
pure Ruby solution is going to be beneficial. But in that case, he may
wish to consider releasing the tail algorithm as a standalone library.

> The whole modularity of Unix tools has also led to shell scripts, which
> are just plain UGLY and a mess to maintain, especially the more
> complicated they grow (which is only less true for ruby scripts, because
> maintaining even complicated ruby scripts is a lot easier IMHO)

I agree: fork/exec, argv, env and stdin/stdout are a fairly lousy API,
but:

> Use the better wheel.

The tail wheel in gnu coreutils is a highly polished, aerodynamic and
tested one.
This topic is locked and can not be replied to.