Detect file encoding utf-8

Hi,

I want to check the file encoding of files in a directory.
Until now i have tried =

found in an older thread in comp.lang.ruby

class String
def utf8?
unpack(‘U*’) rescue return false
true
end
end

found in an older thread in comp.lang.ruby

utf=Array.new
others=Array.new
Dir[“Y:/test/**/*.xml”].each do |path|
open(path) { |f|
(f.read.utf8?) ? uts<<path : others<<path
}
end

and also tried the chardet Library (no ruby documentation included)
like that

require ‘UniversalDetector’

utf=Array.new
others=Array.new
Dir[“Y:/test/**/*.xml”].each do |path|
open(path) { |f|
UniversalDetector.chardet(f.read) =~ /utf-8/ ?
uts<<path : others<<path
}
end
puts utf.join(",")
puts others.join(",")

Are there better / simpler ways ?

Regards, Gilbert

You could use some regular expressions, to search for code points in
your source string that are outside of what is legal for UTF-8.

Basically you assume it is UTF-8, and then reject it if it contains
illegal
or unknown code points.

On Aug 29, 2007, at 2:14 PM, Rebhan, Gilbert wrote:

I want to check the file encoding of files in a directory.

Have you tried charguess?

http://raa.ruby-lang.org/project/charguess

– fxn

Xavier N. wrote:

On Aug 29, 2007, at 2:14 PM, Rebhan, Gilbert wrote:

I want to check the file encoding of files in a directory.

Have you tried charguess?

http://raa.ruby-lang.org/project/charguess

No, how to install it ?

only =

charguess.c
extconf.rb
MANIFEST
sample.rb

in the tarfile.

Regards, Gilbert