#valid_encoding? - unexpected result after #encode

addis_a · July 19, 2014, 11:24am

I was writing a function would give me some valid utf8 string for any
string in any encoding, when I noticed the following behavior.

Try running the the attached script. It produces the following output,
is this expected, or am I not getting something here?

Take this string: [144, 14, 87, 195, 27, 83, 94, 242, 36, 66, 53, 36,
115, 0, 131, 24, 183, 163, 204, 221, 134, 204, 67, 22, 206, 222, 10,
233, 30, 33, 180, 49, 64, 182, 195, 151, 224, 228, 22, 6, 157, 37, 70,
108, 242, 159, 146, 179, 117, 131]
UTF8_DoCoMo encoded string is valid: false
Alright then. Convert to utf8, and tell the converter to replace
everything invalid.

#####################################################

Now string is in: UTF-8; Is it valid now: true

#####################################################

Now forcing utf8, just to make sure.

#####################################################

Encoding now: UTF-8; Is it still valid: false

#####################################################

Encoding same as before: true
String same as before: true

I would have expected both calls to #valid_encoding? to return the same
result (false).

(Is it just the preview or how can I add line breaks?)[Alright, it’s
just the preview.]