Forum: Ruby on Rails javascript encodeURIComponent equal code

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
C91098dc76d7ad473165ef24fe805312?d=identicon&s=25 Nanyang Zhan (xain)
on 2009-03-31 15:06
Javascript's encodeURIComponent works differently from CGI.eacape or
ERB::Util.u.
for example:
encodeURIComponent('中文') = '%D6%D0%CE%C4'
but
>> CGI.escape("中文")
=> "%E4%B8%AD%E6%96%87"
>> ERB::Util.u("中文")
=> "%E4%B8%AD%E6%96%87"

Is there any way to get the same encoded result with ruby code?
81b61875e41eaa58887543635d556fca?d=identicon&s=25 Frederick Cheung (Guest)
on 2009-03-31 15:41
(Received via mailing list)
On Mar 31, 2:06 pm, Nanyang Zhan <rails-mailing-l...@andreas-s.net>
wrote:
> Javascript's encodeURIComponent works differently from CGI.eacape or
> ERB::Util.u.

Well the difference is that the javascript stuff is produced UTF16 and
the ruby UTF8 (although the documentation I can find suggests that the
javascript should also be producing utf8).

> for example:
> encodeURIComponent('中文') = '%D6%D0%CE%C4'
> but>> CGI.escape("中文")
>
> => "%E4%B8%AD%E6%96%87">> ERB::Util.u("中文")
>
> => "%E4%B8%AD%E6%96%87"
>
> Is there any way to get the same encoded result with ruby code?

The are various libraries for messing around with string encodings,
including iconv, and pack/unpack have some specifiers that are
relevant for unicode stuff, and rails itself also has various unicode
utilities in it.

Fred
C91098dc76d7ad473165ef24fe805312?d=identicon&s=25 Nanyang Zhan (xain)
on 2009-03-31 17:27
Frederick Cheung wrote:
> Well the difference is that the javascript stuff is produced UTF16 and
> the ruby UTF8 (although the documentation I can find suggests that the
> javascript should also be producing utf8).ith ruby code?

Thank you for your replied. May be it is the true. But how can the utf16
encodeURIComponent result to be the shorter?

> The are various libraries for messing around with string encodings,
> including iconv, and pack/unpack have some specifiers that are
> relevant for unicode stuff, and rails itself also has various unicode
> utilities in it.

I tried to encode the string to utf-16 encoding before passing it to
CGI.escape(), But I don't have any luck to production the same result as
encodeURIComponent did. ( I got "%FE%FFN-e%87" from "中文")


I find a perl and a python way to do encodeURIComponent on the net, and
their are here:
http://d.hatena.ne.jp/ruby-U/20081110/1226313786

It is a pity that I don't know perl nor python. Can anyone figure out
the ruby code for me from them?
81b61875e41eaa58887543635d556fca?d=identicon&s=25 Frederick Cheung (Guest)
on 2009-03-31 17:35
(Received via mailing list)
On Mar 31, 4:27 pm, Nanyang Zhan <rails-mailing-l...@andreas-s.net>
wrote:
> Frederick Cheung wrote:
> > Well the difference is that the javascript stuff is produced UTF16 and
> > the ruby UTF8 (although the documentation I can find suggests that the
> > javascript should also be producing utf8).ith ruby code?
>
> Thank you for your replied. May be it is the true. But how can the utf16
> encodeURIComponent result to be the shorter?

Because for double byte characters utf16 is shorter than utf8.

> I find a perl and a python way to do encodeURIComponent on the net, and
> their are here:http://d.hatena.ne.jp/ruby-U/20081110/1226313786
>
> It is a pity that I don't know perl nor python. Can anyone figure out
> the ruby code for me from them?
>
Those aren't playing with encodings which is apparently the issue
here. Why does it matter anyway?

Fred
C91098dc76d7ad473165ef24fe805312?d=identicon&s=25 Nanyang Zhan (xain)
on 2009-03-31 17:44
Frederick Cheung wrote:

> Those aren't playing with encodings which is apparently the issue
> here. Why does it matter anyway?

ok.


Here is the source code of ERB::Util.url_encode(s) method.
# File erb.rb, line 801
    def url_encode(s)
      s.to_s.gsub(/[^a-zA-Z0-9_\-.]/n){ sprintf("%%%02X",
$&.unpack("C")[0]) }
    end


now it works like this:
> ERB::Util.url_encode("中文")
>
> => "%E4%B8%AD%E6%96%87"

Can you help me changing the url_encode code a bit, so it can return
utf16 result? ( which '%D6%D0%CE%C4' is the one I want.)
81b61875e41eaa58887543635d556fca?d=identicon&s=25 Frederick Cheung (Guest)
on 2009-03-31 17:54
(Received via mailing list)
On Mar 31, 4:44 pm, Nanyang Zhan <rails-mailing-l...@andreas-s.net>
wrote:
> $&.unpack("C")[0]) }
>     end
>
> now it works like this:
>
> > ERB::Util.url_encode("中文")
>
> > => "%E4%B8%AD%E6%96%87"
>
> Can you help me changing the url_encode code a bit, so it can return
> utf16 result? ( which '%D6%D0%CE%C4' is the one I want.)

well s.unpack("U*") will turn a string into a array of integers (utf
code points) that it should then be easy to split into bytes. I'd
start from scratch rather than using url_encode though.


Fred
C91098dc76d7ad473165ef24fe805312?d=identicon&s=25 Nanyang Zhan (xain)
on 2009-03-31 18:04
Frederick Cheung wrote:
> well s.unpack("U*") will turn a string into a array of integers (utf
> code points) that it should then be easy to split into bytes. I'd
> start from scratch rather than using url_encode though.

Thanks! Fred.

>> "中文".unpack("C*")
=> [228, 184, 173, 230, 150, 135]
 > ERB::Util.url_encode("中文")
> => "%E4%B8%AD%E6%96%87"

For the first time,I have a little idea what url_encode is doing.

when:
>> "中文".unpack("U*")
=> [20013, 25991]

So, it is a way turning [20013, 25991] to '%D6%D0%CE%C4', right?
81b61875e41eaa58887543635d556fca?d=identicon&s=25 Frederick Cheung (Guest)
on 2009-03-31 18:45
(Received via mailing list)
On Mar 31, 5:04 pm, Nanyang Zhan <rails-mailing-l...@andreas-s.net>
wrote:
>
> when:>> "中文".unpack("U*")
>
> => [20013, 25991]
>
> So, it is a way turning [20013, 25991] to '%D6%D0%CE%C4', right?
>
Well 20013 is 0x4E2D which is the utf16 for the first of your
characters. Looking back at  what you write I'd no idea where D6D0 is
coming from - that's a completely different character according to the
unicode character palette I have. Not sure what you javascript has
been doing.

Fred
C91098dc76d7ad473165ef24fe805312?d=identicon&s=25 Nanyang Zhan (xain)
on 2009-03-31 18:56
Frederick Cheung wrote:
> I'd no idea where D6D0 is
> coming from

OK, problem solved. Thank you, Fred. I may never have it done without
your help.

It turns out %D6%D0%CE%C4 is not a utf16 relate result, but a GB2312
encoding production.

I convert the string from utf8 to GB2312 with iconv, then the url_encode
products the right string I need.

Thank you again.
A265cc3009c8d6e25f635141ef4ceaa5?d=identicon&s=25 Qi Xiang (qtlove)
on 2009-05-20 08:58
Nanyang Zhan wrote:
> Frederick Cheung wrote:
>> I'd no idea where D6D0 is
>> coming from
>
> OK, problem solved. Thank you, Fred. I may never have it done without
> your help.
>
> It turns out %D6%D0%CE%C4 is not a utf16 relate result, but a GB2312
> encoding production.
>
> I convert the string from utf8 to GB2312 with iconv, then the url_encode
> products the right string I need.
>
> Thank you again.

could you give me some codes you soloved the problem?
thanks a lot.
This topic is locked and can not be replied to.