Problems with utf8 + String class


there is a problem i’ve been trying to solve for a couple of hours now,
and after some useless googeling and searching around, i haven’t come up
with anything substansial - i thought the forum might help me.

say i have a string and want to display only the first 10 characters or

shortstring = “this is a very long string object”[0…10]

shortstring = "this is a " # which is great

but if i use the same method on a utf8 string, i get some weird
characters popping in there, sometimes yes, sometimes no. from looking
around it seems that because every character is two bytes(as apposed to
1 in regular encoding) there is sometimes a sum of odd/even characters,
and then the [0…10] doesn’t work correctly, populating wierd
characters. (same deal goes for the String#slice method)

the final result i need, in essence of this message is this:

“very long string in utf8” to become
“very lon…”

without weird characters.
any help, much appreciated.


Ruby String methods assume the string is a single byte per character,
which as you know, is not the case with unicode strings. therefore a
multibyte character in your string is going to throw everything off.
Such is the nature of Ruby.

as a starting point, i suggest you check out: