Getting files out of a .tar.gz archive

Hi,

I have a .tar.gz file containing some xml files. I need to locate
particular files in the archive and process them. I’m looking for a
pure Ruby way to do this without resorting to external system
commands.

I found Archive::Tar::Minitar that allows me to process my files once
I have expanded the .tar.gz file to a .tar file. So I can do this :-

open(@tarfile, “rb”) do |f|
Archive::Tar::Minitar::Reader.open(f).each do |entry|
fpl = StringIO.new( entry.read) if entry.name =~ /
#{@date}#{channel}_pl/
fpi = StringIO.new( entry.read) if entry.name =~ /
#{@date}#{channel}_pi/
end
end

However, in order to get a tar file, I have to call gunzip to expand
my .tar.gz file. Does anybody know of a way for me to replace the
gunzip call with a Ruby library call of some sort? Or does anyone have
suggestions for an alternative way to do this?

Cheers,

Chris
http://smuby.org

On Mon, Mar 10, 2008 at 9:09 AM, celldee [email protected] wrote:

open(@tarfile, “rb”) do |f|
gunzip call with a Ruby library call of some sort? Or does anyone have
suggestions for an alternative way to do this?

Cheers,

Chris

There’s Zlib::Gzip* classes. I’ve never used them, though.

Todd

Hi Todd,

I’ve tried to use Zlib::GzipReader, but that just gives me a
continuous stream of text or a series of strings that do not resemble
the actual file structure, unless I’ve missed something.

Thanks,

Chris
http://smuby.org

On Mon, Mar 10, 2008 at 9:09 AM, celldee [email protected] wrote:

open(@tarfile, “rb”) do |f|
gunzip call with a Ruby library call of some sort? Or does anyone have
suggestions for an alternative way to do this?

Cheers,

Chris
http://smuby.org

Use the docs.

From Minitar’s readme:
tgz = Zlib::GzipReader.new(File.open(‘test.tgz’, ‘rb’))

Warning: tgz and the file will be closed.

Minitar.unpack(tgz, ‘x’)

For GZip and the rest of the standard library,
http://www.ruby-doc.org/stdlib/

Daniel Brumbaugh K.

celldee wrote:

I saw that, but I don’t want to expand the .tar.gz any more than I
have to. The code that I put up earlier is getting what I want out of
the .tar file using Minitar::Reader which I quite like, I’m just
looking to eliminate the gunzip step.

If packed with Gzip, you always have to unpack it in order to read the
content.

Tar is the container and Gzip does the compression. So the files in a
tar.gz file are first put into a container then packed. So in order you
read anything inside the tar you have to unpack the file to have access
the tarball to read your file inside.

Chris,

Persevere with Zlib::GzipReader, that’ll give you what you want.

Mac

Hi Daniel,

I saw that, but I don’t want to expand the .tar.gz any more than I
have to. The code that I put up earlier is getting what I want out of
the .tar file using Minitar::Reader which I quite like, I’m just
looking to eliminate the gunzip step. Minitar.unpack expands
the .tar.gz and writes the files to disk, which means that I’ll have
to mess around in the filesystem more than I need to.

Thanks for your reply,

Chris
http://smuby.org

On Mar 10, 5:04 pm, “Daniel Brumbaugh K.”

On Mon, Mar 10, 2008 at 7:01 PM, Daniel Brumbaugh K.
[email protected] wrote:

tgz = Zlib::GzipReader.new(File.open(‘test.tgz’, ‘rb’))

Warning: tgz and the file will be closed.

I seem to have failed to remove the former comment. It looks like
Minitar::Reader at no point closes tgz or the file, and therefore tgz
needs to be closed manually afterward, as I my code correctly
demonstrated, though the comment did not.

Daniel Brumbaugh K.

On Mon, Mar 10, 2008 at 12:43 PM, celldee [email protected] wrote:

Chris

My apologies for failing to understand the issue. GZip and Minitar
both provide incremental readers, although I have not used them. I
believe the correct combination for what you’re asking is this:

tgz = Zlib::GzipReader.new(File.open(‘test.tgz’, ‘rb’))

Warning: tgz and the file will be closed.

reader = Minitar::Reader.new(tgz)
reader.each_entry do |file|
#do something with each file, and break if you like
end
reader.close # does this do anything?
tgz.close

Daniel Brumbaugh K.

Daniel Brumbaugh K. wrote:

On Mon, Mar 10, 2008 at 7:01 PM, Daniel Brumbaugh K.
[email protected] wrote:

tgz = Zlib::GzipReader.new(File.open(‘test.tgz’, ‘rb’))

Warning: tgz and the file will be closed.

I seem to have failed to remove the former comment. It looks like
Minitar::Reader at no point closes tgz or the file, and therefore tgz
needs to be closed manually afterward, as I my code correctly
demonstrated, though the comment did not.

Daniel Brumbaugh K.

You can use Zlib::GzipReader.open with a block. (Since I set Chris the
original problem, and this isn’t the hard part, I’ll show the code that
is used in the app, with a couple of minor mods to make it look like
Chris’s example.)

Zlib::GzipReader.open(@tarfile) { |tgz|
Archive::Tar::Minitar::Reader.open(tgz).each do |entry|

Chris’s code

fpl = StringIO.new( entry.read) if entry.name =~

/#{@date}#{channel}_pl/

fpi = StringIO.new( entry.read) if entry.name =~

/#{@date}#{channel}_pi/

Or test the verification by using the XML document directly

fpl=REXML::Document.new(StringIO.new(entry.read)) if entry.name

=~/#{date}#{channel}_pl/

fpi=REXML::Document.new(StringIO.new(entry.read)) if entry.name

=~/#{date}#{channel}_pi/

end
}

Mac

Thanks Mac, works like a charm. Thanks for your efforts everybody
else.