Info regarding Zlib::GzipReader

J-H_Johansen · June 15, 2007, 5:15pm

Hi,

I’m trying to parse through a gzip’ed proxy access log with
Zlib::GzipReader and I’m having some difficulties.

f = File.open(file, “r”)
gz = Zlib::GzipReader.new(f)
gz.readlines.each do |block|
puts block
end

What this piece of code will do is to read the first 6 lines of the
proxy log before it reaches (what it believes to be) the end of the
file. These few lines happens to be the info header which contains:

#Software: …
#Version: …
#Start-date: …
#Date: …
#Fields: …
#Remark: …

The access log contains a wee bit more than that though (980796 lines).
By just using File.open(file) it seems I can read the whole file.

I’m speculating here but I think that maybe the gzip file may have
been buffered. I.e. first 6 lines has been gzip’ed and then the rest
of the file has been gzip’ed and appended to it afterwards.

One way of fixing the problem is to gunzip the file and then gzip the
output into a new file. Problem solved (sort of).

Do any of you know of any other way to do this without actually
modifying the access logs ?

I’m thinking of something along the lines of breaking up the file into
smaller file handles which in turn can be used by GzipReader, but I
don’t know how this is done.

Anyone know how this can be done or if there is any better ways of doing
it ?

Thanks