I was writing a function would give me some valid utf8 string for any
string in any encoding, when I noticed the following behavior.
Try running the the attached script. It produces the following output,
is this expected, or am I not getting something here?
Take this string: [144, 14, 87, 195, 27, 83, 94, 242, 36, 66, 53, 36,
115, 0, 131, 24, 183, 163, 204, 221, 134, 204, 67, 22, 206, 222, 10,
233, 30, 33, 180, 49, 64, 182, 195, 151, 224, 228, 22, 6, 157, 37, 70,
108, 242, 159, 146, 179, 117, 131]
UTF8_DoCoMo encoded string is valid: false
Alright then. Convert to utf8, and tell the converter to replace
everything invalid.#####################################################
Now string is in: UTF-8; Is it valid now: true
#####################################################
Now forcing utf8, just to make sure.
#####################################################
Encoding now: UTF-8; Is it still valid: false
#####################################################
Encoding same as before: true
String same as before: true
I would have expected both calls to #valid_encoding? to return the same
result (false).
(Is it just the preview or how can I add line breaks?)[Alright, it’s
just the preview.]