Javascript's encodeURIComponent works differently from CGI.eacape or
ERB::Util.u.
for example:
encodeURIComponent('䏿–‡') = '%D6%D0%CE%C4'
but
>> CGI.escape("䏿–‡")
=> "%E4%B8%AD%E6%96%87"
>> ERB::Util.u("䏿–‡")
=> "%E4%B8%AD%E6%96%87"
Is there any way to get the same encoded result with ruby code?
on 2009-03-31 15:06
on 2009-03-31 15:41
On Mar 31, 2:06 pm, Nanyang Zhan <rails-mailing-l...@andreas-s.net> wrote: > Javascript's encodeURIComponent works differently from CGI.eacape or > ERB::Util.u. Well the difference is that the javascript stuff is produced UTF16 and the ruby UTF8 (although the documentation I can find suggests that the javascript should also be producing utf8). > for example: > encodeURIComponent('䏿–‡') = '%D6%D0%CE%C4' > but>> CGI.escape("䏿–‡") > > => "%E4%B8%AD%E6%96%87">> ERB::Util.u("䏿–‡") > > => "%E4%B8%AD%E6%96%87" > > Is there any way to get the same encoded result with ruby code? The are various libraries for messing around with string encodings, including iconv, and pack/unpack have some specifiers that are relevant for unicode stuff, and rails itself also has various unicode utilities in it. Fred
on 2009-03-31 17:27
Frederick Cheung wrote: > Well the difference is that the javascript stuff is produced UTF16 and > the ruby UTF8 (although the documentation I can find suggests that the > javascript should also be producing utf8).ith ruby code? Thank you for your replied. May be it is the true. But how can the utf16 encodeURIComponent result to be the shorter? > The are various libraries for messing around with string encodings, > including iconv, and pack/unpack have some specifiers that are > relevant for unicode stuff, and rails itself also has various unicode > utilities in it. I tried to encode the string to utf-16 encoding before passing it to CGI.escape(), But I don't have any luck to production the same result as encodeURIComponent did. ( I got "%FE%FFN-e%87" from "䏿–‡") I find a perl and a python way to do encodeURIComponent on the net, and their are here: http://d.hatena.ne.jp/ruby-U/20081110/1226313786 It is a pity that I don't know perl nor python. Can anyone figure out the ruby code for me from them?
on 2009-03-31 17:35
On Mar 31, 4:27Â pm, Nanyang Zhan <rails-mailing-l...@andreas-s.net> wrote: > Frederick Cheung wrote: > > Well the difference is that the javascript stuff is produced UTF16 and > > the ruby UTF8 (although the documentation I can find suggests that the > > javascript should also be producing utf8).ith ruby code? > > Thank you for your replied. May be it is the true. But how can the utf16 > encodeURIComponent result to be the shorter? Because for double byte characters utf16 is shorter than utf8. > I find a perl and a python way to do encodeURIComponent on the net, and > their are here:http://d.hatena.ne.jp/ruby-U/20081110/1226313786 > > It is a pity that I don't know perl nor python. Can anyone figure out > the ruby code for me from them? > Those aren't playing with encodings which is apparently the issue here. Why does it matter anyway? Fred
on 2009-03-31 17:44
Frederick Cheung wrote: > Those aren't playing with encodings which is apparently the issue > here. Why does it matter anyway? ok. Here is the source code of ERB::Util.url_encode(s) method. # File erb.rb, line 801 def url_encode(s) s.to_s.gsub(/[^a-zA-Z0-9_\-.]/n){ sprintf("%%%02X", $&.unpack("C")[0]) } end now it works like this: > ERB::Util.url_encode("䏿–‡") > > => "%E4%B8%AD%E6%96%87" Can you help me changing the url_encode code a bit, so it can return utf16 result? ( which '%D6%D0%CE%C4' is the one I want.)
on 2009-03-31 17:54
On Mar 31, 4:44 pm, Nanyang Zhan <rails-mailing-l...@andreas-s.net> wrote: > $&.unpack("C")[0]) } >   end > > now it works like this: > > > ERB::Util.url_encode("䏿–‡") > > > => "%E4%B8%AD%E6%96%87" > > Can you help me changing the url_encode code a bit, so it can return > utf16 result? ( which '%D6%D0%CE%C4' is the one I want.) well s.unpack("U*") will turn a string into a array of integers (utf code points) that it should then be easy to split into bytes. I'd start from scratch rather than using url_encode though. Fred
on 2009-03-31 18:04
Frederick Cheung wrote: > well s.unpack("U*") will turn a string into a array of integers (utf > code points) that it should then be easy to split into bytes. I'd > start from scratch rather than using url_encode though. Thanks! Fred. >> "䏿–‡".unpack("C*") => [228, 184, 173, 230, 150, 135] > ERB::Util.url_encode("䏿–‡") > => "%E4%B8%AD%E6%96%87" For the first time,I have a little idea what url_encode is doing. when: >> "䏿–‡".unpack("U*") => [20013, 25991] So, it is a way turning [20013, 25991] to '%D6%D0%CE%C4', right?
on 2009-03-31 18:45
On Mar 31, 5:04 pm, Nanyang Zhan <rails-mailing-l...@andreas-s.net> wrote: > > when:>> "䏿–‡".unpack("U*") > > => [20013, 25991] > > So, it is a way turning [20013, 25991] to '%D6%D0%CE%C4', right? > Well 20013 is 0x4E2D which is the utf16 for the first of your characters. Looking back at what you write I'd no idea where D6D0 is coming from - that's a completely different character according to the unicode character palette I have. Not sure what you javascript has been doing. Fred
on 2009-03-31 18:56
Frederick Cheung wrote: > I'd no idea where D6D0 is > coming from OK, problem solved. Thank you, Fred. I may never have it done without your help. It turns out %D6%D0%CE%C4 is not a utf16 relate result, but a GB2312 encoding production. I convert the string from utf8 to GB2312 with iconv, then the url_encode products the right string I need. Thank you again.
on 2009-05-20 08:58
Nanyang Zhan wrote: > Frederick Cheung wrote: >> I'd no idea where D6D0 is >> coming from > > OK, problem solved. Thank you, Fred. I may never have it done without > your help. > > It turns out %D6%D0%CE%C4 is not a utf16 relate result, but a GB2312 > encoding production. > > I convert the string from utf8 to GB2312 with iconv, then the url_encode > products the right string I need. > > Thank you again. could you give me some codes you soloved the problem? thanks a lot.
Please log in before posting. Registration is free and takes only a minute.
Existing account
(Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
Log in with Google account | Log in with Yahoo account
No account? Register here.