Making File.open work on gzipped files

Hello all,

I am trying to write a class for a file parser with an class method for
opening files for reading:

class Parser
def self.open
ios = File.open(*args)
parser = self.new(ios)

if block_given?
  begin
    yield parser
  ensure
    ios.close
  end

  return true
else
  return parser
end

end
end

This works nicely, but I would like it to work on gzipped files too.

I was thinking about checking the file type using a system call ->
file.match(“gzip”) and if that is true then possibly using popen with
“|gzip -f”. But I have no idea how to get that working in this block
context?

Cheers,

Martin

  • possibly by detecting the file type using file - and then somehow
    modify

This works nicely, but I would like it to work on gzipped files too.

Ruby’s Zlib library should do that nicely for you.

But if you want to do it using the external gzip program, then

ios = IO.popen(“gzip -dc ‘#{filename}’”)

should be all you need - plus a bit of checking that filename doesn’t
include single quote.

Ruby’s Zlib library should do that nicely for you.

i.e. http://ruby-doc.org/core/classes/Zlib/GzipReader.html

Thanks Brian,

I did look at ruby’s zlib and wondered why there is no method to check
if a file is zipped or not - one could perhaps could fix something by
rescuing the Zlib exception?

Looking at the docs there is a couple of TODOs. Perhaps this is another
one.

Martin

Robert K. wrote:

On 17.08.2010 20:56, Martin H. wrote:

Thanks Brian,

I did look at ruby’s zlib and wondered why there is no method to check
if a file is zipped or not - one could perhaps could fix something by
rescuing the Zlib exception?

Exactly, just try to open with GzipReader and if that throws just work
with the regular file which you have opened already.

Remember to rewind it too.

require ‘zlib’
=> true

io = File.open("/etc/passwd")
=> #<File:/etc/passwd>

z = Zlib::GzipReader.new(io)
Zlib::GzipFile::Error: not in gzip format
from (irb):3:in initialize' from (irb):3:innew’
from (irb):3
from :0

io.gets
=> “1:119:PulseAudio daemon,:/var/run/pulse:/bin/false\n”

io.rewind
=> 0

io.gets
=> “root:x:0:0:root:/root:/bin/bash\n”

Thanks Brian and Robert. The below snippet appears to be working nicely

  • though I am not sure that the file is closed if zipped?

class Parser
def self.open(*args)
ios = File.open(*args)

begin
  ios = Zlib::GzipReader.new(ios)
rescue
  ios.rewind
end

parse = self.new(ios)

if block_given?
  begin
    yield parse
  ensure
    ios.close
  end

  return true
else
  return parse
end

end
end

On 17.08.2010 20:56, Martin H. wrote:

Thanks Brian,

I did look at ruby’s zlib and wondered why there is no method to check
if a file is zipped or not - one could perhaps could fix something by
rescuing the Zlib exception?

Exactly, just try to open with GzipReader and if that throws just work
with the regular file which you have opened already.

Looking at the docs there is a couple of TODOs. Perhaps this is another
one.

What todo do you mean?

Cheers

robert

2010/8/18 Martin H. [email protected]:

 ios = Zlib::GzipReader.new(ios)
   ios.close
 end

 return true

else
return parse
end
end
end

I would apply these changes:

  1. refactor opening code (everything before “if block_given?”) into a
    separate method which returns either IO or GzipReader.

  2. fold parse and ios into one (i.e. the value returned from the other
    method).

See http://ruby-doc.org/core/classes/Zlib/GzipFile.html#method-M007448

  1. In case of block_given? do not return true but rather nothing (i.e.
    what the block returned). This is more flexible.

That way your code will become simpler.

Kind regards

robert

2010/8/18 Martin H. [email protected]:

what the block returned). This is more flexible.

1 & 3 done! I dont quite get 2.

If you have done 1 you will have a single variable left and hence done
2 as well. :slight_smile: Sorry for my bad wording.

Kind regards

robert

  1. refactor opening code (everything before “if block_given?”) into a
    separate method which returns either IO or GzipReader.

  2. fold parse and ios into one (i.e. the value returned from the other
    method).

See http://ruby-doc.org/core/classes/Zlib/GzipFile.html#method-M007448

  1. In case of block_given? do not return true but rather nothing (i.e.
    what the block returned). This is more flexible.

1 & 3 done! I dont quite get 2.

Martin

2010/8/18 Brian C. [email protected]:

Martin H. wrote:

Thanks Brian and Robert. The below snippet appears to be working nicely

  • though I am not sure that the file is closed if zipped?

Good question, but the documentation for GzipReader#close gives you the
answer:

“Closes the GzipFile object. This method calls close method of the
associated IO object. Returns the associated IO object.”

That’s why I suggested my item 1. :slight_smile:

Cheers

robert

Martin H. wrote:

Thanks Brian and Robert. The below snippet appears to be working nicely

  • though I am not sure that the file is closed if zipped?

Good question, but the documentation for GzipReader#close gives you the
answer:

“Closes the GzipFile object. This method calls close method of the
associated IO object. Returns the associated IO object.”

I must be missing something :o/

#Parse:0x000001008a5920
parse.rb:14:in open': undefined methodclose’ for
#<Parse:0x000001008a5920 @io=#<File:/etc/passwd>> (NoMethodError)
from parse.rb:38:in `’

Test code below.

Martin

require ‘zlib’
require ‘pp’

class Parse
def self.open(*args)
ios = self.zopen(*args)

if block_given?
  begin
    yield ios
  ensure
    ios.close
  end
else
  return ios
end

end

def initialize(io)
@io = io
end

private

def self.zopen(*args)
ios = File.open(*args)

begin
  ios = Zlib::GzipReader.new(ios)
rescue
  ios.rewind
end

self.new(ios)

end
end

Parse.open("/etc/passwd") do |ios|
puts ios
end

Of cause! And inserting the close method the right place inside the
class even makes everything work!

Thanks a zillion guys!

Martin

2010/8/18 Martin H. [email protected]:

I must be missing something :o/

Parse#close does not exist. :slight_smile:

Cheers

robert