Reading file after a particular line in file

Vandana · May 12, 2010, 7:20pm

Hello All,

  I would like to read a file in ruby. It is a 2G file, but

contain useless data in the beginning portion of the file.

There is a particular pattern towards the middle of the file after
which useful data begins. Is there a way to grep for this pattern and
then read every line henceforth, but ignore all lines previous to line
on which pattern found?

Thanks,
Vandana

Vandana · May 13, 2010, 1:21am

File.open(“myfile”, “r”) do |f|

Skip the garbage before pattern:

while f.gets !~ /pattern/ do; end

Read your data:

while l = f.gets
puts l
end

end

Vandana · May 13, 2010, 2:45am

Thank you very much.

Vandana · May 13, 2010, 11:10am

On 05/13/2010 02:40 AM, Vandana wrote:

end

There’s also the flip flop operator:

File.foreach “myfile” do |line|
if /pattern/ =~ line … false
puts line
end
end

The trick I am using is that the FF operator starts to return true if
the first expression returns true and stays true until the last
expression returns true - in this case never since you want to read
until the end of the file.

Kind regards

robert

Vandana · May 13, 2010, 4:35pm

Robert K. [email protected] wrote:

expression returns true - in this case never since you want to read
until the end of the file.

coud that trick be used for start and stop tags ? like :

File.foreach “myfile” do |line|
if /<body/ =~ line … /</body/ =~ line
puts line
end
end

if true, that’s clever !

Vandana · May 13, 2010, 4:52pm

Line-oriented solutions assume small lines, and that the pattern has
no beeline. Perhaps that is true, but it is unknown.

Vandana · May 13, 2010, 4:45pm

2010/5/13 Une Bévue [email protected]:

coud that trick be used for start and stop tags ? like :

File.foreach “myfile” do |line|
if /<body/ =~ line … /</body/ =~ line
puts line
end
end

yes.
but like every case, you should test it.

kind regards -botp

Vandana · May 13, 2010, 6:35pm

On 13.05.2010 16:34, Une Bévue wrote:

The trick I am using is that the FF operator starts to return true if
end

if true, that’s clever !

Yes, that could be done. However, I would not use this for languages
from the SGML family (XML, HTML) because there are no guarantees as to
how many tags you’ll find on a single line of text. There are better
tools do deal with that (REXML, Nokogiri…).

Kind regards

robert

Vandana · May 13, 2010, 4:54pm

On Thursday, May 13, 2010, Xavier N. [email protected] wrote:

Line-oriented solutions assume small lines, and that the pattern has
no beeline.

newline (beeline is damn phone autocorrection)

Vandana · May 13, 2010, 8:33pm

Robert K. [email protected] wrote:

Yes, that could be done. However, I would not use this for languages
from the SGML family (XML, HTML) because there are no guarantees as to
how many tags you’ll find on a single line of text. There are better
tools do deal with that (REXML, Nokogiri…).

Right, however REXML isn’t working for badly balanced tags.
I dis some test, today, of Nokogiri, it works even better than tidy for
the first step cleaning unbalanced tags.

the only question i have about Nokogiri is how to avoid the DOCTYPE
because it outputs :

even if i’m using #to_xhtml :

then, the DOCTYPE is wrong…