Forum: Ruby nkf #guess1 and #guess2 on html files

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
C8da03a9f69be8910fa9b16b4db969ed?d=identicon&s=25 unknown (Guest)
on 2006-03-23 09:09
(Received via mailing list)
may be i'm not using correctly nkf #guess1 but it gaves me return type 3
(suuposed to be UTF-8) for ISO-8859-1 encoded files.

it gaves me also 3 for UTF-8 encoded files ???

my code is simply :

NKF.guess1(string)

with string=<whole file content>

also sometimes guess1 disaggreed with guess2 ???

whare could i find a table giving the encoding versus returned values
???
7b77d11772fc910ef3cc39d8e891fbdb?d=identicon&s=25 YANAGAWA Kazuhisa (Guest)
on 2006-03-23 12:16
(Received via mailing list)
In Message-Id: <1hcnady.1rrszh87n73rfN%pere.noel@laponie.com.invalid>
pere.noel@laponie.com.invalid (Une b.ANivue) writes:

> may be i'm not using correctly nkf #guess1 but it gaves me return type 3
> (suuposed to be UTF-8) for ISO-8859-1 encoded files.
>
> it gaves me also 3 for UTF-8 encoded files ???

Unfortunately NKF is just for Japanese tool, so you can't use it for
general code conversion / guessing, I think.
C8da03a9f69be8910fa9b16b4db969ed?d=identicon&s=25 unknown (Guest)
on 2006-03-23 12:53
(Received via mailing list)
YANAGAWA Kazuhisa <kjana@dm4lab.to> wrote:

>
> Unfortunately NKF is just for Japanese tool, so you can't use it for
> general code conversion / guessing, I think.

ok, fine, i need just a tool in order to discriminate between ISO-8859-1
and UTF-8 (as a first step) without using the meta content-type charset
in the html file, which isn't reliable, for example a Ruby Cocoa site
(<http://www.rubycocoa.com/the-rubification-of-rtw>) says it's
ISO-8859-1 encoding (in the meta tag) but it is in fact UTF-8 (said by
Firefox and text editor and also http headers...)
This topic is locked and can not be replied to.