Minitar#tar? and inclusion in Ruby?

Using Archive::Tar:Minitar, is there any way to quickly check if a file
is a tar file or not?

Also I would like to recommend Austin’s lib for inclusion in Ruby. We
already have Zlib so it makes sense to provide for tar.gz files.
Although I think maybe simplify the naming to just Archive::Tar --no
Minitar. Hopefully it will get some additonal features eventually and
not be so “mini” :wink: In so doing, perhaps Zlib belongs in Archive
namespace too?

Thanks,
T.

[email protected] wrote:

Using Archive::Tar:Minitar, is there any way to quickly check if a file
is a tar file or not?

To answer my own question, sadly, “no”:

Magic bytes
None.
However, identification of tar files is not limited to looking at
the file extension. Each tar archive entry header also stores a
checksum on itself. Thus, reading the first 512 bytes of a potential
tar file, creating that checksum and comparing it to the stored
checksum will tell if the file really is in tar format.

That’s too complex for efficent checking. OTOH, Luckly gzips are easy
to identify. Not tested but basically:

class File
# Is a file a gzip file?
def self.gzip?( file )
File.open(file,‘rb’) { |f|
return false unless f.getc == 0x1f
return false unless f.getc == 0x8b
}
end
end

Might be worth adding to Zlib (assuming it’snot already there --I
didn’t see it).

T.

On 7/2/06, [email protected] [email protected] wrote:

Using Archive::Tar:Minitar, is there any way to quickly check if a file
is a tar file or not?

Also I would like to recommend Austin’s lib for inclusion in Ruby. We
already have Zlib so it makes sense to provide for tar.gz files.
Although I think maybe simplify the naming to just Archive::Tar --no
Minitar. Hopefully it will get some additonal features eventually and
not be so “mini” :wink: In so doing, perhaps Zlib belongs in Archive
namespace too?

Actually, Archive::Tar::Minitar will be deprecated hopefully by the
end of the year. Members of TRUG started on a project (RAT) to replace
it with a port of libarchive at a hackathon earlier this year. This
didn’t produce something usable, but work will continue in other
hackathons later this year and something should be usable (hopefully)
by RubyConf.

There’s a lot that Minitar can’t do – and little value in adding it
considering the options that libarchive give us.

-austin

Austin Z. wrote:

Actually, Archive::Tar::Minitar will be deprecated hopefully by the
end of the year. Members of TRUG started on a project (RAT) to replace
it with a port of libarchive at a hackathon earlier this year. This
didn’t produce something usable, but work will continue in other
hackathons later this year and something should be usable (hopefully)
by RubyConf.

Nice. I look foward to using libarchive – really hope it pans out. I
need it to pan out!

Thanks Austin,
T.

On 7/3/06, [email protected] [email protected] wrote:

However, identification of tar files is not limited to looking at

the file extension. Each tar archive entry header also stores a
checksum on itself. Thus, reading the first 512 bytes of a potential
tar file, creating that checksum and comparing it to the stored
checksum will tell if the file really is in tar format.

That’s too complex for efficent checking. OTOH, Luckly gzips are easy
to identify. Not tested but basically:

Might it be sufficent for your purpose to check if the first 100 bytes
are
of the form ASCI+\0* ?
Sorry I do not have a uft8 fs handy to see if the ASCI+ contraint can be
easily checked in that case.

Cheers
Robert

class File

didn’t see it).

T.


Deux choses sont infinies : l’univers et la bêtise humaine ; en ce qui
concerne l’univers, je n’en ai pas acquis la certitude absolue.

  • Albert Einstein