Forum: Ruby Parsing a file with look ahead

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
53a1f0aba6f0489d49f5e6fc3df323fa?d=identicon&s=25 Robert James (robertjames)
on 2007-02-22 01:36
(Received via mailing list)
I need to parse a file line by line, and output the results line by
line (too big to fit into memory).  So far, simple enough:
file.each_line.

However, the parser needs the ability to peek ahead to the next line,
in order to parse this line.  What's the right way to do this? Again,
I really don't want to try to slurp the whole file into memory and
split on newlines.

Here's an example:
Line1: Hi
Line2: How
Line3: Are
Line4: you?

I'd like to:
parse('Hi', 'How')
parse('How', 'Are')
parse('Are', 'you?')
parse('you?', false)
# hey, this is practically a unit test!

Any ideas?
7eb8dd853f87b15352a3b86d37f48ec6?d=identicon&s=25 Carl Lerche (Guest)
on 2007-02-22 01:38
(Received via mailing list)
The first thing I can think of is do file.each_line and store that
line in a previous_line variable at the end of the proc. Then you have
access to the line that was read before hand and the current one.
D84df7c68f790e492c4ad4ec5fe65547?d=identicon&s=25 Florian Frank (Guest)
on 2007-02-22 01:57
(Received via mailing list)
S. Robert James wrote:
>
>
require 'enumerator'

File.new(filename).enum_slice(2).each do |first, second|
    p [ first, second ? second : false ]
end
53a1f0aba6f0489d49f5e6fc3df323fa?d=identicon&s=25 Robert James (robertjames)
on 2007-02-22 02:40
(Received via mailing list)
Thanks!  BTW, looking at the Rdoc, it seems each_cons is what I want,
no?
31e038e4e9330f6c75ccfd1fca8010ee?d=identicon&s=25 Gregory Brown (Guest)
on 2007-02-22 03:00
(Received via mailing list)
On 2/21/07, S. Robert James <srobertjames@gmail.com> wrote:
> Thanks!  BTW, looking at the Rdoc, it seems each_cons is what I want,
> no?

If you are dealing with paired lines, use enum_slice(2)

if you are dealing with data dependent on the current and previous
line, use each_cons, yes.
E0526a6bf302e77598ef142d91bdd31c?d=identicon&s=25 Daniel DeLorme (Guest)
on 2007-03-01 02:42
(Received via mailing list)
Gregory Brown wrote:
> On 2/21/07, S. Robert James <srobertjames@gmail.com> wrote:
>> Thanks!  BTW, looking at the Rdoc, it seems each_cons is what I want,
>> no?
>
> If you are dealing with paired lines, use enum_slice(2)
>
> if you are dealing with data dependent on the current and previous
> line, use each_cons, yes.

Except each_cons(n) will iterate 9 times if you have 10 lines.

Maybe something simple like this?

line = f.gets
while line
   nextline = f.gets
   #do stuff...
   line = nextline
end

Daniel
8029153bbcbda4a6844440c93e0c6422?d=identicon&s=25 Thomas Hafner (Guest)
on 2007-03-04 03:45
(Received via mailing list)
"S. Robert James" <srobertjames@gmail.com> wrote/schrieb
<1172104495.175931.201910@p10g2000cwp.googlegroups.com>:

> I need to parse a file line by line, and output the results line by
> line (too big to fit into memory).  So far, simple enough:
> file.each_line.
>
> However, the parser needs the ability to peek ahead to the next line,
> in order to parse this line.  What's the right way to do this? Again,
> I really don't want to try to slurp the whole file into memory and
> split on newlines.

Sounds for me like it could be solved elegantly with a lazy stream of
input lines. For lazy streams see the Usenet thread starting with
article <9oib84-sf6.ln1@faun.hafner.nl.eu.org>, for instance.

The file will be split into lines, but lazily, and for that reason all
the lines don't need to be hold in memory at the same time. Old, i.e.
already consumed lines will be garbage collected soon, because the
application does no longer reference them. You can have as many
lookahead lines as you want (tradeoff: needs more memory, of course).

Regards
  Thomas
This topic is locked and can not be replied to.