How do I quickly search the end of a huge text file?

I am trying to create a ruby script that will search a maya ascii file
for specific text. The problem I’m running into is that it’s running to
slow for the system at work. I know that all the information I need is
in the last 5% of the text file - but I haven’t been able to figure out
a way to either jump to near the end, and then start search through
lines or even better iterating backwards through the file till I find
what I’m looking for… here is the code I’m currently using - which
works but slowly - any suggestions for how to speed this up would be
greatly appreciated! :smiley:

require “FileUtils”
require “ftools”

def FindRenderLayers (root)
layersFile = []
dirLocation = root.gsub(/(\)$/, ‘’)
list = Dir.entries(dirLocation)

list.each do |file|
if file =~ /.ma$/
fileName = root + file
layersFile.push file
File.open(fileName) do |file|
while line = file.gets
if line =~ /connectAttr
(“renderLayerManager.rlmi[[0-9]]”)/
if $1 != “defaultRenderLayer”
editedLine = “-” + $1
layersFile.push editedLine
end
end
end
end
end
end
return layersFile
end

root = “C:\Users\Brian\Documents\Ruby\”
puts FindRenderLayers(root)

IO::SEEK_END at Ruby-Doc may be the ticket…
http://www.ruby-doc.org/core/classes/IO.html#M002305

Victor G. wrote:

IO::SEEK_END at Ruby-Doc may be the ticket…
class IO - RDoc Documentation

Thanks for your input… I actually tried using SEEK_END - couldn’t get
it to work right…

From: Brian G. [mailto:[email protected]]

I am trying to create a ruby script that will search a maya ascii file

for specific text. The problem I’m running into is that it’s

running to slow for the system at work.

why do you say it is slow? what is your comparison? where is your
benchmark?
how many files do you have? how large are the files?
how much disk space do you have?
how much memory do you have?
how fast is your cpu?

I know that all the information I need is

in the last 5% of the text file - but I haven’t been able to

are you sure of the 5% ?
where is your proof?

figure out a way to either jump to near the end, and then

start search through lines

low level, use IO:SEEK_END

or even better iterating backwards through the file till I find

what I’m looking for…

arggh. but your comparison will be forward. otherwise, you’ll have to
reverse your search/regex pattern. implement a reverse readline/gets.

here is the code I’m currently using - which

works

are you sure it works? see my comment below, inline of your code.

but slowly - any suggestions for how to speed this up would be

greatly appreciated! :smiley:

require “FileUtils”

require “ftools”

def FindRenderLayers (root)

layersFile = []

dirLocation = root.gsub(/(\)$/, ‘’)

list = Dir.entries(dirLocation)

list.each do |file|

if file =~ /.ma$/

fileName = root + file

layersFile.push file

File.open(fileName) do |file|

while line = file.gets

if line =~ /connectAttr

#("renderLayerManager.rlmi[[0-9]]")/

if $1 != “defaultRenderLayer”

pls forgive me at this point because i am at a lost

  1. how could $1, which is patterned after
    "renderLayerManager.rlmi[[0-9]]", be ever be equal to
    “defaultRenderLayer” ??

  2. and besides why need to compare again, if you can ask it straight
    from your regex comparison?

editedLine = “-” + $1

layersFile.push editedLine

end

end

end

end

end

end

return layersFile

end

root = “C:\Users\Brian\Documents\Ruby\”

puts FindRenderLayers(root)

kind regards -botp

Peña, Botp wrote:

From: Brian G. [mailto:[email protected]]

I am trying to create a ruby script that will search a maya ascii file

for specific text. The problem I’m running into is that it’s

running to slow for the system at work.

why do you say it is slow? what is your comparison? where is your
benchmark?
how many files do you have? how large are the files?
how much disk space do you have?
how much memory do you have?
how fast is your cpu?

It’s slow because the script is going to integrated into the companies
online asset management software - and I was told by the IT guys that if
it’s slower than a certain speed it will time out - it currently is too
slow.

As far as how many files it ranges between 3-5 (usually), the sizes of
the files vary from about 5MB-50MB

Disk space is not an issue - there’s tons of it. As far memory goes -
the IT guys said it can’t load the whole file into memory.

CPU is fairly fast - but again this isn’t the problem - since it will be
running from a server…

I know that all the information I need is

in the last 5% of the text file - but I haven’t been able to

are you sure of the 5% ?
where is your proof?

I’ve gone through many files and manually located where the text I’m
looking for appears - they appear no further out that 5% from the end…

figure out a way to either jump to near the end, and then

start search through lines

low level, use IO:SEEK_END

I’m not sure how to use the SEEK_END properly and it’s hard finding good
examples…

or even better iterating backwards through the file till I find

what I’m looking for…

arggh. but your comparison will be forward. otherwise, you’ll have to
reverse your search/regex pattern. implement a reverse readline/gets.

That sounds good how do I do that?

here is the code I’m currently using - which

works

are you sure it works? see my comment below, inline of your code.

but slowly - any suggestions for how to speed this up would be

greatly appreciated! :smiley:

require “FileUtils”

require “ftools”

def FindRenderLayers (root)

layersFile = []

dirLocation = root.gsub(/(\)$/, ‘’)

list = Dir.entries(dirLocation)

list.each do |file|

if file =~ /.ma$/

fileName = root + file

layersFile.push file

File.open(fileName) do |file|

while line = file.gets

if line =~ /connectAttr

#("renderLayerManager.rlmi[[0-9]]")/

if $1 != “defaultRenderLayer”

pls forgive me at this point because i am at a lost

  1. how could $1, which is patterned after
    "renderLayerManager.rlmi[[0-9]]", be ever be equal to
    “defaultRenderLayer” ??

Sorry - yeah that’s not needed - had it a while ago and forgot to erase
it.

  1. and besides why need to compare again, if you can ask it straight
    from your regex comparison?

You’re right…

editedLine = “-” + $1

layersFile.push editedLine

end

end

end

end

end

end

return layersFile

end

root = “C:\Users\Brian\Documents\Ruby\”

puts FindRenderLayers(root)

kind regards -botp

Here is an usage example :

begin
file = File.open(ARGV[0])
rescue
puts “file does not exist or is not a file\n”
end

file.seek(-25,IO::SEEK_END)
puts file.readlines

The code will read the rest of the files from that location . Try it on
a file and see .

Lex W. wrote:

Here is an usage example :

begin
file = File.open(ARGV[0])
rescue
puts “file does not exist or is not a file\n”
end

file.seek(-25,IO::SEEK_END)
puts file.readlines

The code will read the rest of the files from that location . Try it on
a file and see .

I meant the rest of the lines . Sorry .

From: Brian G. [mailto:[email protected]]

It’s slow because the script is going to integrated into the

companies

online asset management software - and I was told by the IT

guys that if

it’s slower than a certain speed it will time out - it

currently is too slow.

As far as how many files it ranges between 3-5 (usually), the

sizes of

the files vary from about 5MB-50MB

max of 5 * 50MB
not so bad if you have lots of ram

Disk space is not an issue - there’s tons of it. As far memory goes -

the IT guys said it can’t load the whole file into memory.

CPU is fairly fast - but again this isn’t the problem - since

it will be running from a server…

>

> # I know that all the information I need is

> # in the last 5% of the text file - but I haven’t been able to

>

> are you sure of the 5% ?

> where is your proof?

I’ve gone through many files and manually located where the text I’m

looking for appears - they appear no further out that 5% from

the end…

ok no problem. we can adjust it anytime :wink:

> # figure out a way to either jump to near the end, and then

> # start search through lines

>

> low level, use IO:SEEK_END

I’m not sure how to use the SEEK_END properly and it’s hard

finding good examples…

the examples are clear enough. try it first for one file. then post your
tried codes again here.

kind regards -botp

Thank you very much!! That’s exactly what I was looking for!

I just added

file.seek(-2000,IO::SEEK_END)

right after the line

fileSize = File.size(fileName)

and it worked perfectly! It’s running about 18x faster - which is a huge
improvement - I think the guys at work will be satisifed with it’s speed
now!

Thanks again Lex!! :smiley:

Lex W. wrote:

Lex W. wrote:

Here is an usage example :

begin
file = File.open(ARGV[0])
rescue
puts “file does not exist or is not a file\n”
end

file.seek(-25,IO::SEEK_END)
puts file.readlines

The code will read the rest of the files from that location . Try it on
a file and see .

I meant the rest of the lines . Sorry .

Brian G. wrote:

Thank you very much!! That’s exactly what I was looking for!

I just added

file.seek(-2000,IO::SEEK_END)

right after the line

fileSize = File.size(fileName)

50 megabytes = 52 428 800 bytes
5% = 52428800 * .05 = 2621440
2621440 != 2000

perhaps:
fileSize = File.size(fileName)
seeklen = ((0.05 * fileSize) * -1).to_i
file = File.open(ARGV[0)
file.seek(seeklen, IO::SEEK_END)
puts file.readlines

On Fri, Sep 5, 2008 at 7:32 PM, Brian G. [email protected]
wrote:

I just added
file.seek(-2000,IO::SEEK_END)
right after the line
fileSize = File.size(fileName)

if i’m not mistaken, that would be

fileSize = File.size(fileName)
file.seek(-0.05*fileSize, IO::SEEK_END)