Javascript’s encodeURIComponent works differently from CGI.eacape or
ERB::Util.u.
Well the difference is that the javascript stuff is produced UTF16 and
the ruby UTF8 (although the documentation I can find suggests that the
javascript should also be producing utf8).
for example:
encodeURIComponent(‘ä¸æ–‡’) = ‘%D6%D0%CE%C4’
but>> CGI.escape(“ä¸æ–‡”)
=> “%E4%B8%AD%E6%96%87”>> ERB::Util.u(“ä¸æ–‡”)
=> “%E4%B8%AD%E6%96%87”
Is there any way to get the same encoded result with ruby code?
The are various libraries for messing around with string encodings,
including iconv, and pack/unpack have some specifiers that are
relevant for unicode stuff, and rails itself also has various unicode
utilities in it.
Well the difference is that the javascript stuff is produced UTF16 and
the ruby UTF8 (although the documentation I can find suggests that the
javascript should also be producing utf8).ith ruby code?
Thank you for your replied. May be it is the true. But how can the utf16
encodeURIComponent result to be the shorter?
The are various libraries for messing around with string encodings,
including iconv, and pack/unpack have some specifiers that are
relevant for unicode stuff, and rails itself also has various unicode
utilities in it.
I tried to encode the string to utf-16 encoding before passing it to
CGI.escape(), But I don’t have any luck to production the same result as
encodeURIComponent did. ( I got “%FE%FFN-e%87” from “ä¸æ–‡”)
Well the difference is that the javascript stuff is produced UTF16 and
the ruby UTF8 (although the documentation I can find suggests that the
javascript should also be producing utf8).ith ruby code?
Thank you for your replied. May be it is the true. But how can the utf16
encodeURIComponent result to be the shorter?
Because for double byte characters utf16 is shorter than utf8.
Can you help me changing the url_encode code a bit, so it can return
utf16 result? ( which ‘%D6%D0%CE%C4’ is the one I want.)
well s.unpack(“U*”) will turn a string into a array of integers (utf
code points) that it should then be easy to split into bytes. I’d
start from scratch rather than using url_encode though.
So, it is a way turning [20013, 25991] to ‘%D6%D0%CE%C4’, right?
Well 20013 is 0x4E2D which is the utf16 for the first of your
characters. Looking back at what you write I’d no idea where D6D0 is
coming from - that’s a completely different character according to the
unicode character palette I have. Not sure what you javascript has
been doing.
well s.unpack(“U*”) will turn a string into a array of integers (utf
code points) that it should then be easy to split into bytes. I’d
start from scratch rather than using url_encode though.