Test if file is binary?

From: Rebhan, Gilbert [mailto:[email protected]]

Is there an exisiting standard what is considered as a binary file,

if you’re on a *nix (non-windows) box, you should use the os file
command and then just wrap it in ruby,

irb(main):022:0> def is_bin(f)
irb(main):023:1> %x(file #{f}) !~ /text/
irb(main):024:1> end
=> nil
irb(main):025:0> is_bin “test.rb”
=> false
irb(main):026:0> is_bin “test.txt”
=> false
irb(main):027:0> is_bin “/usr/local/bin/dnscache”
=> true
irb(main):028:0> is_bin “/bin/ps”
=> true
irb(main):029:0> def is_text(f)
irb(main):030:1> %x(file #{f}) =~ /text/
irb(main):031:1> end
=> nil
irb(main):032:0> is_text “test.rb”
=> 27
irb(main):033:0> is_text “test.txt”
=> 16
irb(main):034:0> is_text “/usr/local/bin/dnscache”
=> nil
irb(main):035:0> is_text “/bin/ps”
=> nil

kind regards -botp

On Aug 21, 2007, at 10:21 AM, Rebhan, Gilbert wrote:

always
considered as textfile

??

What’s the heuristic in Subversion?

– fxn

2007/8/21, Rebhan, Gilbert [email protected]:

def self.is_binary?(name)
ascii = total = 0
File.open(name, “rb”) { |io| io.read(1024) }.each_byte do |c|
total += 1;
ascii +=1 if c >= 128 or c == 0
end
ascii.to_f / total.to_f > 0.33
end
end

Yep. But I’d leave the “is_” out - that’s handled by the “?” already.

Cheers

robert

Don’t forget the possibility, that a file ist encoded in UTF-16 or
UTF-32. To recognize these textual data you need an extra recognition
step in front of the rest.

Wolfgang WoNáDo